High-level programming language

Last updated

In computer science, a high-level programming language is a programming language with strong abstraction from the details of the computer. In contrast to low-level programming languages, it may use natural language elements, be easier to use, or may automate (or even hide entirely) significant areas of computing systems (e.g. memory management), making the process of developing a program simpler and more understandable than when using a lower-level language. The amount of abstraction provided defines how "high-level" a programming language is. [1]

Contents

In the 1960s, a high-level programming language using a compiler was commonly called an autocode . [2] Examples of autocodes are COBOL and Fortran. [3]

The first high-level programming language designed for computers was Plankalkül, created by Konrad Zuse. [4] However, it was not implemented in his time, and his original contributions were largely isolated from other developments due to World War II, aside from the language's influence on the "Superplan" language by Heinz Rutishauser and also to some degree ALGOL. The first significantly widespread high-level language was Fortran, a machine-independent development of IBM's earlier Autocode systems. The ALGOL family, with ALGOL 58 defined in 1958 and ALGOL 60 defined in 1960 by committees of European and American computer scientists, introduced recursion as well as nested functions under lexical scope. ALGOL 60 was also the first language with a clear distinction between value and name-parameters and their corresponding semantics. [5] ALGOL also introduced several structured programming concepts, such as the while-do and if-then-else constructs and its syntax was the first to be described in formal notation – Backus–Naur form (BNF). During roughly the same period, COBOL introduced records (also called structs) and Lisp introduced a fully general lambda abstraction in a programming language for the first time.

Features

"High-level language" refers to the higher level of abstraction from machine language. Rather than dealing with registers, memory addresses, and call stacks, high-level languages deal with variables, arrays, objects, complex arithmetic or Boolean expressions, subroutines and functions, loops, threads, locks, and other abstract computer science concepts, with a focus on usability over optimal program efficiency. Unlike low-level assembly languages, high-level languages have few, if any, language elements that translate directly into a machine's native opcodes. Other features, such as string handling routines, object-oriented language features, and file input/output, may also be present. One thing to note about high-level programming languages is that these languages allow the programmer to be detached and separated from the machine. That is, unlike low-level languages like assembly or machine language, high-level programming can amplify the programmer's instructions and trigger a lot of data movements in the background without their knowledge. The responsibility and power of executing instructions have been handed over to the machine from the programmer.

Abstraction penalty

High-level languages intend to provide features that standardize common tasks, permit rich debugging, and maintain architectural agnosticism; while low-level languages often produce more efficient code through optimization for a specific system architecture. Abstraction penalty is the cost that high-level programming techniques pay for being unable to optimize performance or use certain hardware because they don't take advantage of certain low-level architectural resources. High-level programming exhibits features like more generic data structures and operations, run-time interpretation, and intermediate code files; which often result in execution of far more operations than necessary, higher memory consumption, and larger binary program size. [6] [7] [8] For this reason, code which needs to run particularly quickly and efficiently may require the use of a lower-level language, even if a higher-level language would make the coding easier. In many cases, critical portions of a program mostly in a high-level language can be hand-coded in assembly language, leading to a much faster, more efficient, or simply reliably functioning optimised program.

However, with the growing complexity of modern microprocessor architectures, well-designed compilers for high-level languages frequently produce code comparable in efficiency to what most low-level programmers can produce by hand, and the higher abstraction may allow for more powerful techniques providing better overall results than their low-level counterparts in particular settings. [9] High-level languages are designed independent of a specific computing system architecture. This facilitates executing a program written in such a language on any computing system with compatible support for the Interpreted or JIT program. High-level languages can be improved as their designers develop improvements. In other cases, new high-level languages evolve from one or more others with the goal of aggregating the most popular constructs with new or improved features. An example of this is Scala which maintains backward compatibility with Java, meaning that programs and libraries written in Java will continue to be usable even if a programming shop switches to Scala; this makes the transition easier and the lifespan of such high-level coding indefinite. In contrast, low-level programs rarely survive beyond the system architecture which they were written for without major revision. This is the engineering 'trade-off' for the 'Abstraction Penalty'.

Relative meaning

Examples of high-level programming languages in active use today include Python, JavaScript, Visual Basic, Delphi, Perl, PHP, ECMAScript, Ruby, C#, Java and many others.

The terms high-level and low-level are inherently relative. Some decades ago,[ timeframe? ] the C language, and similar languages, were most often considered "high-level", as it supported concepts such as expression evaluation, parameterised recursive functions, and data types and structures, while assembly language was considered "low-level". Today, many programmers might refer to C as low-level, as it lacks a large runtime-system (no garbage collection, etc.), basically supports only scalar operations, and provides direct memory addressing; it therefore, readily blends with assembly language and the machine level of CPUs and microcontrollers. Also, in the introduction chapter of The C Programming Language (second edition) by K&R, C is considered as a relatively "low level" language. [10]

Assembly language may itself be regarded as a higher level (but often still one-to-one if used without macros) representation of machine code, as it supports concepts such as constants and (limited) expressions, sometimes even variables, procedures, and data structures. Machine code, in its turn, is inherently at a slightly higher level than the microcode or micro-operations used internally in many processors. [11]

Execution modes

There are three general modes of execution for modern high-level languages:

Interpreted
When code written in a language is interpreted, its syntax is read and then executed directly, with no compilation stage. A program called an interpreter reads each program statement, following the program flow, then decides what to do, and does it. A hybrid of an interpreter and a compiler will compile the statement into machine code and execute that; the machine code is then discarded, to be interpreted anew if the line is executed again. Interpreters are commonly the simplest implementations of the behavior of a language, compared to the other two variants listed here.
Compiled
When code written in a language is compiled, its syntax is transformed into an executable form before running. There are two types of compilation:
Machine code generation
Some compilers compile source code directly into machine code. This is the original mode of compilation, and languages that are directly and completely transformed to machine-native code in this way may be called truly compiled languages. See assembly language.
Intermediate representations
When code written in a language is compiled to an intermediate representation, that representation can be optimized or saved for later execution without the need to re-read the source file. When the intermediate representation is saved, it may be in a form such as bytecode. The intermediate representation must then be interpreted or further compiled to execute it. Virtual machines that execute bytecode directly or transform it further into machine code have blurred the once clear distinction between intermediate representations and truly compiled languages.
Source-to-source translated or transcompiled
Code written in a language may be translated into terms of a lower-level language for which native code compilers are already common. JavaScript and the language C are common targets for such translators. See CoffeeScript, Chicken Scheme, and Eiffel as examples. Specifically, the generated C and C++ code can be seen (as generated from the Eiffel language when using the EiffelStudio IDE) in the EIFGENs directory of any compiled Eiffel project. In Eiffel, the translated process is referred to as transcompiling or transcompiled, and the Eiffel compiler as a transcompiler or source-to-source compiler.

Note that languages are not strictly interpreted languages or compiled languages. Rather, implementations of language behavior use interpreting or compiling. For example, ALGOL 60 and Fortran have both been interpreted (even though they were more typically compiled). Similarly, Java shows the difficulty of trying to apply these labels to languages, rather than to implementations; Java is compiled to bytecode which is then executed by either interpreting (in a Java virtual machine (JVM)) or compiling (typically with a just-in-time compiler such as HotSpot, again in a JVM). Moreover, compiling, transcompiling, and interpreting is not strictly limited to only a description of the compiler artifact (binary executable or IL assembly).

High-level language computer architecture

Alternatively, it is possible for a high-level language to be directly implemented by a computer – the computer directly executes the HLL code. This is known as a high-level language computer architecture – the computer architecture itself is designed to be targeted by a specific high-level language. The Burroughs large systems were target machines for ALGOL 60, for example. [12]

See also

Related Research Articles

Computer programming or coding is the composition of sequences of instructions, called programs, that computers can follow to perform tasks. It involves designing and implementing algorithms, step-by-step specifications of procedures, by writing code in one or more programming languages. Programmers typically use high-level programming languages that are more easily intelligible to humans than machine code, which is directly executed by the central processing unit. Proficient programming usually requires expertise in several different subjects, including knowledge of the application domain, details of programming languages and generic code libraries, specialized algorithms, and formal logic.

In computing, a compiler is a computer program that translates computer code written in one programming language into another language. The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a low-level programming language to create an executable program.

<span class="mw-page-title-main">Programming language</span> Language for communicating instructions to a machine

A programming language is a system of notation for writing computer programs.

<span class="mw-page-title-main">Interpreter (computing)</span> Program that executes source code without a separate compilation step

In computer science, an interpreter is a computer program that directly executes instructions written in a programming or scripting language, without requiring them previously to have been compiled into a machine language program. An interpreter generally uses one of the following strategies for program execution:

  1. Parse the source code and perform its behavior directly;
  2. Translate source code into some efficient intermediate representation or object code and immediately execute that;
  3. Explicitly execute stored precompiled bytecode made by a compiler and matched with the interpreter Virtual Machine.

Bytecode is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references that encode the result of compiler parsing and performing semantic analysis of things like type, scope, and nesting depths of program objects.

A compiled language is a programming language whose implementations are typically compilers, and not interpreters.

In computer science, imperative programming is a programming paradigm of software that uses statements that change a program's state. In much the same way that the imperative mood in natural languages expresses commands, an imperative program consists of commands for the computer to perform. Imperative programming focuses on describing how a program operates step by step, rather than on high-level descriptions of its expected results.

The Burroughs Large Systems Group produced a family of large 48-bit mainframes using stack machine instruction sets with dense syllables. The first machine in the family was the B5000 in 1961, which was optimized for compiling ALGOL 60 programs extremely well, using single-pass compilers. The B5000 evolved into the B5500 and the B5700. Subsequent major redesigns include the B6500/B6700 line and its successors, as well as the separate B8500 line.

<span class="mw-page-title-main">History of programming languages</span> History of languages used to program computers

The history of programming languages spans from documentation of early mechanical computers to modern tools for software development. Early programming languages were highly specialized, relying on mathematical notation and similarly obscure syntax. Throughout the 20th century, research in compiler theory led to the creation of high-level programming languages, which use a more accessible syntax to communicate instructions.

In computer science, automatic programming is a type of computer programming in which some mechanism generates a computer program to allow human programmers to write the code at a higher abstraction level.

An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good" IR must be accurate – capable of representing the source code without loss of information – and independent of any particular source or target language. An IR may take one of several forms: an in-memory data structure, or a special tuple- or stack-based code readable by the program. In the latter case it is also called an intermediate language.

Autocode is the name of a family of "simplified coding systems", later called programming languages, devised in the 1950s and 1960s for a series of digital computers at the Universities of Manchester, Cambridge and London. Autocode was a generic term; the autocodes for different machines were not necessarily closely related as are, for example, the different versions of the single language Fortran.

Programming languages are used for controlling the behavior of a machine. Like natural languages, programming languages follow rules for syntax and semantics.

In computer programming, a programming language implementation is a system for executing computer programs. There are two general approaches to programming language implementation:

<span class="mw-page-title-main">History of compiler construction</span>

In computing, a compiler is a computer program that transforms source code written in a programming language or computer language, into another computer language. The most common reason for transforming source code is to create an executable program.

Programming languages have been classified into several programming language generations. Historically, this classification was used to indicate increasing power of programming styles. Later writers have somewhat redefined the meanings as distinctions previously seen as important became less significant to current practice.

A third-generation programming language (3GL) is a high-level computer programming language that tends to be more machine-independent and programmer-friendly than the machine code of the first-generation and assembly languages of the second-generation, while having a less specific focus to the fourth and fifth generations. Examples of common and historical third-generation programming languages are ALGOL, BASIC, C, COBOL, Fortran, Java, and Pascal.

Java bytecode is the instruction set of the Java virtual machine (JVM), crucial for executing programs written in the Java language and other JVM-compatible languages. Each bytecode operation in the JVM is represented by a single byte, hence the name "bytecode", making it a compact form of instruction. This intermediate form enables Java programs to be platform-independent, as they are compiled not to native machine code but to a universally executable format across different JVM implementations.

A high-level language computer architecture (HLLCA) is a computer architecture designed to be targeted by a specific high-level programming language (HLL), rather than the architecture being dictated by hardware considerations. It is accordingly also termed language-directed computer design, coined in McKeeman (1967) and primarily used in the 1960s and 1970s. HLLCAs were popular in the 1960s and 1970s, but largely disappeared in the 1980s. This followed the dramatic failure of the Intel 432 (1981) and the emergence of optimizing compilers and reduced instruction set computer (RISC) architectures and RISC-like complex instruction set computer (CISC) architectures, and the later development of just-in-time compilation (JIT) for HLLs. A detailed survey and critique can be found in Ditzel & Patterson (1980).

References

  1. "HThreads - RD Glossary". Archived from the original on 26 August 2007.
  2. London, Keith (1968). "4, Programming". Introduction to Computers. 24 Russell Square London WC1: Faber and Faber Limited. p. 184. ISBN   0571085938. The 'high' level programming languages are often called autocodes and the processor program, a compiler.{{cite book}}: CS1 maint: location (link)
  3. London, Keith (1968). "4, Programming". Introduction to Computers. 24 Russell Square London WC1: Faber and Faber Limited. p. 186. ISBN   0571085938. Two high level programming languages which can be used here as examples to illustrate the structure and purpose of autocodes are COBOL (Common Business Oriented Language) and FORTRAN (Formular Translation).{{cite book}}: CS1 maint: location (link)
  4. Giloi, Wolfgang, K.  [ de ] (1997). "Konrad Zuse's Plankalkül: The First High-Level "non von Neumann" Programming Language". IEEE Annals of the History of Computing, vol. 19, no. 2, pp. 17–24, April–June, 1997. (abstract)
  5. Although it lacked a notion of reference-parameters, which could be a problem in some situations. Several successors, including ALGOL W, ALGOL 68, Simula, Pascal, Modula and Ada thus included reference-parameters (The related C-language family instead allowed addresses as value-parameters).
  6. Surana P (2006). "Meta-Compilation of Language Abstractions" (PDF). Archived (PDF) from the original on 17 February 2015. Retrieved 17 March 2008.{{cite journal}}: Cite journal requires |journal= (help)
  7. Kuketayev. "The Data Abstraction Penalty (DAP) Benchmark for Small Objects in Java". Archived from the original on 11 January 2009. Retrieved 17 March 2008.
  8. Chatzigeorgiou; Stephanides (2002). "Evaluating Performance and Power Of Object-Oriented Vs. Procedural Programming Languages". In Blieberger; Strohmeier (eds.). Proceedings - 7th International Conference on Reliable Software Technologies - Ada-Europe'2002. Springer. p. 367.
  9. Manuel Carro; José F. Morales; Henk L. Muller; G. Puebla; M. Hermenegildo (2006). "High-level languages for small devices: a case study" (PDF). Proceedings of the 2006 International Conference on Compilers, Architecture and Synthesis for Embedded Systems. ACM.
  10. Kernighan, Brian W.; Ritchie, Dennis M. (1988). The C Programming Language: 2nd Edition. Prentice Hall. ISBN   9780131103627. Archived from the original on 13 December 2018.
  11. Hyde, Randall. (2010). The art of assembly language (2nd ed.). San Francisco: No Starch Press. ISBN   9781593273019. OCLC   635507601.
  12. Chu, Yaohan (1975), "Concepts of High-Level Language Computer Architecture", High-Level Language Computer Architecture, Elsevier, pp. 1–14, doi:10.1016/b978-0-12-174150-1.50007-0, ISBN   9780121741501