High Level Assembly

Last updated
High Level Assembly (HLA)
Developer(s) Randall Hyde
Stable release
2.16 / July 6, 2011;11 years ago (2011-07-06)
Repository sourceforge.net/projects/hlav1
Written in Assembly language
Operating system Windows, Linux, FreeBSD, macOS
Platform IA-32
Available inEnglish
Type Assembler
License Public domain
Website plantation-productions.com/Webster

High Level Assembly (HLA) is a language developed by Randall Hyde that allows the use of higher-level language constructs to aid both beginners and advanced assembly developers. It fully supports advanced data types and object-oriented programming. It uses a syntax loosely based on several high-level programming languages (HLLs), such as Pascal, Ada, Modula-2, and C++, to allow the creation of readable assembly language programs, and to allow HLL programmers to learn HLA as fast as possible.

Contents

Origins and goals

HLA was originally conceived as a tool to teach assembly language programming at the college-university level. The goal is to leverage students' existing programming knowledge when learning assembly language to get them up to speed as fast as possible. Most students taking an assembly language programming course have already been introduced to high-level control flow structures, such as IF, WHILE, FOR, etc. HLA allows students to immediately apply that programming knowledge to assembly language coding early in their course, allowing them to master other prerequisite subjects in assembly before learning how to code low-level forms of these control structures. The book The Art of Assembly Language Programming by Randall Hyde uses HLA for this purpose. [1]

High vs. low-level assembler

The HLA v2.x assembler supports the same low-level machine instructions as a regular low-level assembler, while high-end assemblers also support high-level-language-like statements, such as IF, WHILE, and so on, and fancier data declaration directives, such as structures-records, unions, and even classes.

Some examples of high-end assemblers are HLA, Microsoft Macro Assembler (MASM), and the Turbo Assembler (TASM) on the Intel x86 processor family,

Unlike most other assembler tools, the HLA compiler includes a Standard Library with thousands of functions, procedures, and macros that can be used to create full applications with the ease of a high-level language. While assembly language libraries are not new, a language that includes a large standardized library encourages programmers to use the library code rather than simply writing their own library functions.

HLA supports all the same low-level machine instructions as other x86 assemblers. Furthermore, HLA's high-level control structures are based on the ones found in MASM and TASM, which HLL-like features predated the arrival of HLA by several years. In HLA, low-level assembly code can be written as easily as with any other assembler by simply ignoring the HLL-control constructs. In contrast to HLLs like Pascal and C(++), HLA doesn't require inline asm statements. In HLA, HLL-like features appear to provide a learning aid for beginning assembly programmers by smoothing the learning curve, with the assumption that they will discontinue the use of those statements once they master the low-level instruction set. In practice, many experienced programmers continue to use HLL-like statements in HLA, MASM, and TASM, long after mastering the low-level instruction set, but this is usually done to improve readability.

It is also possible to write high-level programs using HLA, avoiding much of the tedium of low-level assembly language programming. Some assembly language programmers reject HLA out of hand[ citation needed ], because it allows programmers to do this. However, supporting both high-level and low-level programming gives any language an expanded range of applicability.

Distinguishing features

Two HLA features set it apart from other x86 assemblers: its powerful macro system (compile-time language) and the HLA Standard Library.

Macro system

HLA's compile-time language allows extending the language with ease, even creating small domain-specific languages to help easily solve common programming problems. The macro stdout.put is a good example of a sophisticated macro that can simplify programming. Consider the following invocation of that macro:

stdout.put( "I=", i, " s=", s, " u=", u, " r=", r:10:2, nl );

The stdout.put macro processes each of the arguments to determine the argument's type and then calls an appropriate procedure in the HLA Standard library to handle the output of each of these operands.

Most assemblers provide some sort of macro ability: the advantage that HLA offers over other assemblers is that it can process macro arguments like r:10:2 using HLA's extensive compile-time string functions, and HLA's macro facilities can infer the types of variables and use that information to direct macro expansion.

HLA's macro language provides a special Context-Free macro facility. This feature allows easily writing macros that span other sections of code via a starting and terminating macro pair (along with optional intermediate macro invocations that are only available between the start–terminate macros). For example, one can write a fully recursive-nestable SWITCH–CASE–DEFAULT–ENDSWITCH statement using this macro facility.

Because of the HLA macro facilities context-free design, these switch..case..default..endswitch statements can be nested, and the nested statements' emitted code will not conflict with the outside statements.

Compile-Time Language

The HLA macro system is actually a subset of a larger feature known as the HLA Compile-Time Language (CTL). The HLA CTL is an interpreted language that is available in an HLA program source file. An interpreter executes HLA CTL statements during the compiling of an HLA source file; hence the name compile-time language.

The HLA CTL includes many control statements such as #IF, #WHILE, #FOR, #PRINT, an assignment statement and so on. One can also create compile-time variables and constants (including structured data types such as records and unions). The HLA CTL also provides hundreds of built-in functions (including a very rich set of string and pattern-matching functions). The HLA CTL allows programmers to create CTL programs that scan and parse strings, allowing those programmers to create embedded domain-specific languages (EDSLs, also termed mini-languages ). The stdout.put macro appearing earlier is an example of such an EDSL. The put macro (in the stdout namespace, hence the name stdout.put) parses its macro parameter list and emits the code that will print its operands.

Standard library

The HLA Standard Library is an extensive set of pre-made routines and macros (like the stdout.put macro described above) that make life easier for programmers, saving them from starting from scratch every time they write a new application. Perhaps just as important, the HLA Standard Library allows programmers to write portable applications that run under Windows or Linux with nothing more than recompiling the source code. Like the C standard library for the programming language C, the HLA Standard Library allows users to abstract away low-level operating system (OS) calls, so the same set of OS application programming interfaces (APIs) can serve for all operating systems that HLA supports. While an assembly language allows making any needed OS calls, where programs use the HLA Standard Library API set, writing OS-portable programs is easy.

The HLA Standard Library provides thousands of functions, procedures, and macros. While the list changes over time, as of mid-2010 for HLA v2.12, it included functions in these categories:

Design

The HLA v2.x language system is a command-line driven tool that consists of several components, including a shell program (e.g., hla.exe under Windows), the HLA language compiler (e.g., hlaparse.exe), a low-level translator (e.g., the HLABE, or HLA Back Engine), a linker (link.exe under Windows, ld under Linux), and other tools such as a resource compiler for Windows. Versions before 2.0 relied on an external assembler back end; versions 2.x and later of HLA use the built-in HLABE as the back-end object code formatter.

The HLA shell application processes command line parameters and routes appropriate files to each of the programs that make up the HLA system. It accepts as input .hla files (HLA source files), .asm files (source files for MASM, TASM, FASM, NASM, or Gas assemblers), .obj files for input to the linker, and .rc files (for use by a resource compiler).

Source code translation

Originally, the HLA v1.x tool compiled its source code into an intermediate source file that a back-end assembler such as MASM, TASM, flat assembler (FASM), Netwide Assembler (NASM), or GNU Assembler (Gas) would translate into the low-level object code file. As of HLA v2.0, HLA included its own HLA Back Engine (HLABE) that provided the low-level object code translation. However, via various command-line parameters, HLA v2.x still has the ability to translate an HLA source file into a source file that is compatible with one of these other assemblers.

HLA Back Engine

The HLA Back Engine (HLABE) is a compiler back end that translates an internal intermediate language into low-level Portable Executable (PE), Common Object File Format (COFF), Executable and Linkable Format (ELF), or Mach-O object code. An HLABE program mostly consists of data (byte) emission statements, 32-bit relocatable address statements, x86 control-transfer instructions, and various directives. In addition to translating the byte and relocatable address statements into the low-level object code format, HLABE also handles branch-displacement optimization (picking the shortest possible form of a branch instruction).

Although the HLABE is incorporated into the HLA v2.x compiler, it is actually a separate product. It is public domain and open source (hosted on SourceForge.net).

See also

Notes

  1. "The Art of Assembly Language Programming". Archived from the original on 2018-03-29. Retrieved 2010-02-12.

Related Research Articles

<span class="mw-page-title-main">Assembly language</span> Low-level programming language

In computer programming, assembly language, often referred to simply as Assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence between the instructions in the language and the architecture's machine code instructions. Assembly language usually has one statement per machine instruction (1:1), but constants, comments, assembler directives, symbolic labels of, e.g., memory locations, registers, and macros are generally also supported.

C is a general-purpose computer programming language. It was created in the 1970s by Dennis Ritchie, and remains very widely used and influential. By design, C's features cleanly reflect the capabilities of the targeted CPUs. It has found lasting use in operating systems, device drivers, protocol stacks, though decreasingly for application software. C is commonly used on computer architectures that range from the largest supercomputers to the smallest microcontrollers and embedded systems.

<span class="mw-page-title-main">Macro (computer science)</span> Rule for substituting a set input with a set output

In computer programming, a macro is a rule or pattern that specifies how a certain input should be mapped to a replacement output. Applying a macro to an input is known as macro expansion. The input and output may be a sequence of lexical tokens or characters, or a syntax tree. Character macros are supported in software applications to make it easy to invoke common command sequences. Token and tree macros are supported in some programming languages to enable code reuse or to extend the language, sometimes for domain-specific languages.

SNOBOL is a series of programming languages developed between 1962 and 1967 at AT&T Bell Laboratories by David J. Farber, Ralph E. Griswold and Ivan P. Polonsky, culminating in SNOBOL4. It was one of a number of text-string-oriented languages developed during the 1950s and 1960s; others included COMIT and TRAC.

Turbo Pascal is a software development system that includes a compiler and an integrated development environment (IDE) for the Pascal programming language running on CP/M, CP/M-86, and DOS. It was originally developed by Anders Hejlsberg at Borland, and was notable for its extremely fast compilation. Turbo Pascal, and the later but similar Turbo C, made Borland a leader in PC-based development.

<span class="mw-page-title-main">Windows API</span> Microsofts core set of application programming interfaces on Windows

The Windows API, informally WinAPI, is Microsoft's core set of application programming interfaces (APIs) available in the Microsoft Windows operating systems. The name Windows API collectively refers to several different platform implementations that are often referred to by their own names ; see the versions section. Almost all Windows programs interact with the Windows API. On the Windows NT line of operating systems, a small number use the Native API.

<span class="mw-page-title-main">FASM</span> Open source assembler for x86 processors

FASM is an assembler for x86 processors. It supports Intel-style assembly language on the IA-32 and x86-64 computer architectures. It claims high speed, size optimizations, operating system (OS) portability, and macro abilities. It is a low-level assembler and intentionally uses very few command-line options. It is free and open-source software.

In computer science, a high-level programming language is a programming language with strong abstraction from the details of the computer. In contrast to low-level programming languages, it may use natural language elements, be easier to use, or may automate significant areas of computing systems, making the process of developing a program simpler and more understandable than when using a lower-level language. The amount of abstraction provided defines how "high-level" a programming language is.

A low-level programming language is a programming language that provides little or no abstraction from a computer's instruction set architecture—commands or functions in the language map that are structurally similar to processor's instructions. Generally, this refers to either machine code or assembly language. Because of the low abstraction between the language and machine language, low-level languages are sometimes described as being "close to the hardware". Programs written in low-level languages tend to be relatively non-portable, due to being optimized for a certain type of system architecture.

Programming paradigms are a way to classify programming languages based on their features. Languages can be classified into multiple paradigms.

A high-level assembler in computing is an assembler for assembly language that incorporate features found in a high-level programming language.

<span class="mw-page-title-main">CICS</span> IBM mainframe transaction monitor

IBM CICS is a family of mixed-language application servers that provide online transaction management and connectivity for applications on IBM mainframe systems under z/OS and z/VSE.

Metaprogramming is a programming technique in which computer programs have the ability to treat other programs as their data. It means that a program can be designed to read, generate, analyze or transform other programs, and even modify itself while running. In some cases, this allows programmers to minimize the number of lines of code to express a solution, in turn reducing development time. It also allows programs a greater flexibility to efficiently handle new situations without recompilation.

Turbo Assembler is an assembler for software development published by Borland in 1989. It runs on and produces code for 16- or 32-bit x86 MS-DOS and compatible on Microsoft Windows. It can be used with Borland's other language products: Turbo Pascal, Turbo Basic, Turbo C, and Turbo C++. The Turbo Assembler package is bundled with Turbo Linker and is interoperable with Turbo Debugger.

The Microsoft Macro Assembler (MASM) is an x86 assembler that uses the Intel syntax for MS-DOS and Microsoft Windows. Beginning with MASM 8.0, there are two versions of the assembler: One for 16-bit & 32-bit assembly sources, and another (ML64) for 64-bit sources only.

Basic Assembly Language (BAL) is the commonly used term for a low-level programming language used on IBM System/360 and successor mainframes. Originally, "Basic Assembly Language" applied only to an extremely restricted dialect designed to run under control of IBM Basic Programming Support (BPS/360) on systems with only 8 KB of main memory, and only a card reader, a card punch, and a printer for input/output — thus the word "Basic". However, the full name and the initialism "BAL" almost immediately attached themselves in popular use to all assembly-language dialects on the System/360 and its descendants. BAL for BPS/360 was introduced with the System/360 in 1964.

Lite-C is a programming language for multimedia applications and video games, using a syntax subset of the C language with some elements of the C++ language. Its main difference to C is the native implementation of multimedia and computer game related objects like sounds, images, movies, GUI elements, 2D and 3D models, collision detection and rigid body physics. Lite-C executables are compiled instead of interpreted. Lite-C runs on 32-bit and 64-bit Windows XP or Vista operating systems.

<span class="mw-page-title-main">Vala (programming language)</span> Programming language

Vala is an object-oriented programming language with a self-hosting compiler that generates C code and uses the GObject system.

Open Watcom Assembler or WASM is an x86 assembler produced by Watcom, based on the Watcom Assembler found in Watcom C/C++ compiler and Watcom FORTRAN 77. Further development is being done on the 32- and 64-bit JWASM project, which more closely matches the syntax of Microsoft's assembler.

<span class="mw-page-title-main">Nim (programming language)</span> Programming language

Nim is a general-purpose, multi-paradigm, statically typed, compiled high-level systems programming language, designed and developed by a team around Andreas Rumpf. Nim is designed to be "efficient, expressive, and elegant", supporting metaprogramming, functional, message passing, procedural, and object-oriented programming styles by providing several features such as compile time code generation, algebraic data types, a foreign function interface (FFI) with C, C++, Objective-C, and JavaScript, and supporting compiling to those same languages as intermediate representations.

References

Further reading