Low-level programming language

Last updated

A low-level programming language is a programming language that provides little or no abstraction from a computer's instruction set architecture, memory or underlying physical hardware; commands or functions in the language are structurally similar to a processor's instructions. These languages provide the programmer with full control over program memory and the underlying machine code instructions. Because of the low level of abstraction (hence the term "low-level") between the language and machine language, low-level languages are sometimes described as being "close to the hardware". Programs written in low-level languages tend to be relatively non-portable, due to being optimized for a certain type of system architecture. [1] [2] [3] [4]

Contents

Low-level languages are directly converted to machine code with or without a compiler or interpretersecond-generation programming languages [5] [6] depending on programming language. A program written in a low-level language can be made to run very quickly, with a small memory footprint. A program that written with those programming languages often end up becoming architecture dependent or operating system dependent, due to using low level APIs. [1]

Machine code

Front panel of a PDP-8/E minicomputer. The row of switches at the bottom can be used to toggle in a machine language program. Digital pdp8-e2.jpg
Front panel of a PDP-8/E minicomputer. The row of switches at the bottom can be used to toggle in a machine language program.

Machine code is the form in which code that can be directly executed is stored on a computer. It consists of machine language instructions, stored in memory, that perform operations such as moving values in and out of memory locations, arithmetic and Boolean logic, and testing values and, based on the test, either executing the next instruction in memory or executing an instruction at another location.

Machine code is usually stored in memory as binary data. Programmers almost never write programs directly in machine code; instead, they write code in assembly language or higher-level programming languages. [1]

Although few programs are written in machine languages, programmers often become adept at reading it through working with core dumps or debugging from the front panel.

Example of a function in hexadecimal representation of x86-64 machine code to calculate the nth Fibonacci number, with each line corresponding to one instruction:

89 f8 85 ff 74 26 83 ff 02 76 1c 89 f9 ba 01 00 00 00 be 01 00 00 00 8d 04 16 83 f9 02 74 0d 89 d6 ff c9 89 c2 eb f0 b8 01 00 00 c3

Assembly language

Second-generation languages provide one abstraction level on top of the machine code. In the early days of coding on computers like TX-0 and PDP-1, the first thing MIT hackers did was to write assemblers. [7] Assembly language has little semantics or formal specification, being only a mapping of human-readable symbols, including symbolic addresses, to opcodes, addresses, numeric constants, strings and so on. Typically, one machine instruction is represented as one line of assembly code, commonly called a mnemonic. [8] Assemblers produce object files that can link with other object files or be loaded on their own.

Most assemblers provide macros to generate common sequences of instructions.

Example: The same Fibonacci number calculator as above, but in x86-64 assembly language using Intel syntax:

fib:movrax,rdi; The argument is stored in rdi, put it into raxtestrdi,rdi; Is the argument zero?je.return_from_fib; Yes - return 0, which is already in raxcmprdi,2; No - compare the argument to 2jbe.return_1_from_fib; If it is less than or equal to 2, return 1movrcx,rdi; Otherwise, put it in rcx, for use as a countermovrdx,1; The first previous number starts out as 1, put it in rdxmovrsi,1; The second previous number also starts out as 1, put it in rsi.fib_loop:learax,[rsi+rdx]; Put the sum of the previous two numbers into raxcmprcx,2; Is the counter 2?je.return_from_fib; Yes - rax contains the resultmovrsi,rdx; No - make the first previous number the second previous numberdecrcx; Decrement the countermovrdx,rax; Make the current number the first previous numberjmp.fib_loop; Keep going.return_1_from_fib:movrax,1; Set the return value to 1.return_from_fib:ret; Return

In this code example, the registers of the x86-64 processor are named and manipulated directly. The function loads its 64-bit argument from rdi in accordance to the System V application binary interface for x86-64 and performs its calculation by manipulating values in the rax, rcx, rsi, and rdi registers until it has finished and returns. Note that in this assembly language, there is no concept of returning a value. The result having been stored in the rax register, again in accordance with System V application binary interface, the ret instruction simply removes the top 64-bit element on the stack and causes the next instruction to be fetched from that location (that instruction is usually the instruction immediately after the one that called this function), with the result of the function being stored in rax. x86-64 assembly language imposes no standard for passing values to a function or returning values from a function (and in fact, has no concept of a function); those are defined by an application binary interface (ABI), such as the System V ABI for a particular instruction set.

Compare this with the same function in C:

unsignedintfib(unsignedintn){if(!n){return0;}elseif(n<=2){return1;}else{unsignedintf_nminus2,f_nminus1,f_n;for(f_nminus2=f_nminus1=1,f_n=0;;--n){f_n=f_nminus2+f_nminus1;if(n<=2){returnf_n;}f_nminus2=f_nminus1;}}}

This code is similar in structure to the assembly language example but there are significant differences in terms of abstraction:

These abstractions make the C code compilable without modification on any architecture for which a C compiler has been written, whereas the assembly language code above will only run on processors using the x86-64 architecture.

C programming language

C has variously been described as low-level and high-level. [9] Traditionally considered high-level, C’s level of abstraction from the hardware is far lower than many subsequently developed languages, particularly interpreted languages. The direct interface C provides between the programmer and hardware memory allocation and management make C the lowest-level language of the 10 most popular languages currently in use.

C is architecture independent — the same C code may, in most cases, be compiled (by different machine-specific compilers) for use on a wide breadth of machine platforms. In many respects (including directory operations and memory allocation), C provides “an interface to system-dependent objects that is itself relatively system independent”. [10] This feature is considered “high-level” in comparison of platform-specific assembly languages.

Low-level programming in high-level languages

During the late 1960s and 1970s, high-level languages that included some degree of access to low-level programming functions, such as PL/S, BLISS, BCPL, extended ALGOL and NEWP (for Burroughs large systems/Unisys Clearpath MCP systems), and C, were introduced. One method for this is inline assembly, in which assembly code is embedded in a high-level language that supports this feature. Some of these languages also allow architecture-dependent compiler optimization directives to adjust the way a compiler uses the target processor architecture.

Furthermore, as referenced above, the following block of C is from the GNU Compiler and shows the inline assembly ability of C. Per the GCC documentation this is a simple copy and addition code. This code displays the interaction between a generally high level language like C and its middle/low level counter part Assembly. Although this may not make C a natively low level language these facilities express the interactions in a more direct way. [11]

intsrc=1;intdst;asm("mov %1, %0\n\t""add $1, %0":"=r"(dst):"r"(src));printf("%d\n",dst);

Related Research Articles

<span class="mw-page-title-main">Assembly language</span> Low-level programming language

In computer programming, assembly language, often referred to simply as assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence between the instructions in the language and the architecture's machine code instructions. Assembly language usually has one statement per machine instruction (1:1), but constants, comments, assembler directives, symbolic labels of, e.g., memory locations, registers, and macros are generally also supported.

x86 Family of instruction set architectures

x86 is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel, based on the 8086 microprocessor and its 8-bit-external-bus variant, the 8088. The 8086 was introduced in 1978 as a fully 16-bit extension of 8-bit Intel's 8080 microprocessor, with memory segmentation as a solution for addressing more memory than can be covered by a plain 16-bit address. The term "x86" came into being because the names of several successors to Intel's 8086 processor end in "86", including the 80186, 80286, 80386 and 80486. Colloquially, their names were "186", "286", "386" and "486".

<span class="mw-page-title-main">Application binary interface</span> Binary interface between two program units

In computer software, an application binary interface (ABI) is an interface between two binary program modules. Often, one of these modules is a library or operating system facility, and the other is a program that is being run by a user.

In the x86 architecture, the CPUID instruction is a processor supplementary instruction allowing software to discover details of the processor. It was introduced by Intel in 1993 with the launch of the Pentium and SL-enhanced 486 processors.

x86 assembly language is a family of low-level programming languages that are used to produce object code for the x86 class of processors. These languages provide backward compatibility with CPUs dating back to the Intel 8008 microprocessor, introduced in April 1972. As assembly languages, they are closely tied to the architecture's machine code instructions, allowing for precise control over hardware.

x86-64 64-bit version of x86 architecture

x86-64 is a 64-bit extension of the x86 instruction set architecture first announced in 1999. It introduces two new operating modes: 64-bit mode and compatibility mode, along with a new four-level paging mechanism.

The x86 instruction set refers to the set of instructions that x86-compatible microprocessors support. The instructions are usually part of an executable program, often stored as a computer file and executed on the processor.

<span class="mw-page-title-main">LLVM</span> Compiler backend for multiple programming languages

LLVM, also called LLVM Core, is a target-independent optimizer and code generator. It can be used to develop a frontend for any programming language and a backend for any instruction set architecture. LLVM is designed around a language-independent intermediate representation (IR) that serves as a portable, high-level assembly language that can be optimized with a variety of transformations over multiple passes. The name LLVM originally stood for Low Level Virtual Machine. However, the project has since expanded, and the name is no longer an acronym but an orphan initialism.

In computer programming, an inline assembler is a feature of some compilers that allows low-level code written in assembly language to be embedded within a program, among code that otherwise has been compiled from a higher-level language such as C or Ada.

In computer science, a calling convention is an implementation-level (low-level) scheme for how subroutines or functions receive parameters from their caller and how they return a result. When some code calls a function, design choices have been taken for where and how parameters are passed to that function, and where and how results are returned from that function, with these transfers typically done via certain registers or within a stack frame on the call stack. There are design choices for how the tasks of preparing for a function call and restoring the environment after the function has completed are divided between the caller and the callee. Some calling convention specifies the way every function should get called. The correct calling convention should be used for every function call, to allow the correct and reliable execution of the whole program using these functions.

In assembly language programming, the function prologue is a few lines of code at the beginning of a function, which prepare the stack and registers for use within the function. Similarly, the function epilogue appears at the end of the function, and restores the stack and registers to the state they were in before the function was called.

Bit manipulation is the act of algorithmically manipulating bits or other pieces of data shorter than a word. Computer programming tasks that require bit manipulation include low-level device control, error detection and correction algorithms, data compression, encryption algorithms, and optimization. For most other tasks, modern programming languages allow the programmer to work directly with abstractions instead of bits that represent those abstractions.

On many computer operating systems, a computer process terminates its execution by making an exit system call. More generally, an exit in a multithreading environment means that a thread of execution has stopped running. For resource management, the operating system reclaims resources that were used by the process. The process is said to be a dead process after it terminates.

crt0 is a set of execution startup routines linked into a C program that performs any initialization work required before calling the program's main function. After the main function completes the control returns to crt0, which calls the library function exit(0) to terminate the process.

This article describes the calling conventions used when programming x86 architecture microprocessors.

Memory ordering is the order of accesses to computer memory by a CPU. Memory ordering depends on both the order of the instructions generated by the compiler at compile time and the execution order of the CPU at runtime. However, memory order is of little concern outside of multithreading and memory-mapped I/O, because if the compiler or CPU changes the order of any operations, it must necessarily ensure that the reordering does not change the output of ordinary single-threaded code.

<span class="mw-page-title-main">Cosmos (operating system)</span> Toolkit for building GUI and command-line based operating systems

C# Open Source Managed Operating System (Cosmos) is a toolkit for building GUI and command-line based operating systems, written mostly in the programming language C# and small amounts of a high-level assembly language named X#. Cosmos is a backronym, in that the acronym was chosen before the meaning. It is open-source software released under a BSD license.

<span class="mw-page-title-main">LLDB (debugger)</span> Software debugger

The LLDB Debugger (LLDB) is the debugger component of the LLVM project. It is built as a set of reusable components which extensively use existing libraries from LLVM, such as the Clang expression parser and LLVM disassembler. LLDB is free and open-source software under the University of Illinois/NCSA Open Source License, a BSD-style permissive software license. Since v9.0.0, it was relicensed to the Apache License 2.0 with LLVM Exceptions.

Blind return-oriented programming (BROP) is an exploit technique which can successfully create an exploit even if the attacker does not possess the target binary. BROP attacks shown by Bittau et al. have defeated address space layout randomization (ASLR) and stack canaries on 64-bit systems.

References

  1. 1 2 3 "3.1: Structure of low-level programs". Workforce LibreTexts. 2021-03-05. Retrieved 2023-04-03.
  2. "What is a Low Level Language?". GeeksforGeeks. 2023-11-19. Retrieved 2024-04-27.
  3. "Low Level Language? What You Need to Know | Lenovo US". www.lenovo.com. Retrieved 2024-04-27.
  4. "Low-level languages - Classifying programming languages and translators - AQA - GCSE Computer Science Revision - AQA". BBC Bitesize. Retrieved 2024-04-27.
  5. "Generation of Programming Languages". GeeksforGeeks. 2017-10-22. Retrieved 2024-04-27.
  6. "What is a Generation Languages?". www.computerhope.com. Retrieved 2024-04-27.
  7. Levy, Stephen (1994). Hackers: Heroes of the Computer Revolution. Penguin Books. p. 32. ISBN   0-14-100051-1.
  8. "Machine Language/Assembly Language/High Level Language". www.cs.mtsu.edu. Archived from the original on 2024-12-14. Retrieved 2024-04-27.
  9. Jindal, G.; Khurana, P.; Goel, T. (January 2013). "Comparative study of C, Objective C, C++ programming language". International Journal of Advanced Trends in Computer Science and Engineering. 2 (1): 203.
  10. Kernighan, B.; Ritchie, D. (1988). The C Programming Language, 2nd Edition. p. 163.
  11. "Extended Asm (Using the GNU Compiler Collection (GCC))". gcc.gnu.org. Retrieved 2024-04-27.

Bibliography