High-level language computer architecture

Last updated

A high-level language computer architecture (HLLCA) is a computer architecture designed to be targeted by a specific high-level programming language (HLL), rather than the architecture being dictated by hardware considerations. It is accordingly also termed language-directed computer design, coined in McKeeman (1967) and primarily used in the 1960s and 1970s. HLLCAs were popular in the 1960s and 1970s, but largely disappeared in the 1980s. This followed the dramatic failure of the Intel 432 (1981) and the emergence of optimizing compilers and reduced instruction set computer (RISC) architectures and RISC-like complex instruction set computer (CISC) architectures, and the later development of just-in-time compilation (JIT) for HLLs. A detailed survey and critique can be found in Ditzel & Patterson (1980).

Contents

HLLCAs date almost to the beginning of HLLs, in the Burroughs large systems (1961), which were designed for ALGOL 60 (1960), one of the first HLLs. The best known HLLCAs may be the Lisp machines of the 1970s and 1980s, for the language Lisp (1959). At present the most popular HLLCAs are Java processors, for the language Java (1995), and these are a qualified success, being used for certain applications. A recent architecture in this vein is the Heterogeneous System Architecture (2012), which HSA Intermediate Layer (HSAIL) provides instruction set support for HLL features such as exceptions and virtual functions; this uses JIT to ensure performance.

Definition

There are a wide variety of systems under this heading. The most extreme example is a Directly Executed Language (DEL), where the instruction set architecture (ISA) of the computer equals the instructions of the HLL, and the source code is directly executable with minimal processing. In extreme cases, the only compiling needed is tokenizing the source code and feeding the tokens directly to the processor; this is found in stack-oriented programming languages running on a stack machine. For more conventional languages, the HLL statements are grouped into instruction + arguments, and infix order is transformed to prefix or postfix order. DELs are typically only hypothetical, though they were advocated in the 1970s. [1]

In less extreme examples, the source code is first parsed to bytecode, which is then the machine code that is passed to the processor. In these cases, the system typically lacks an assembler, as the compiler is deemed sufficient, though in some cases (such as Java), assemblers are used to produce legal bytecode which would not be output by the compiler. This approach was found in the Pascal MicroEngine (1979), and is currently used by Java processors.

More loosely, a HLLCA may simply be a general-purpose computer architecture with some features specifically to support a given HLL or several HLLs. This was found in Lisp machines from the 1970s onward, which augmented general-purpose processors with operations specifically designed to support Lisp.

Examples

The Burroughs Large Systems (1961) were the first HLLCA, designed to support ALGOL (1959), one of the earliest HLLs. This was referred to at the time as "language-directed design." The Burroughs Medium Systems (1966) were designed to support COBOL for business applications. The Burroughs Small Systems (mid-1970s, designed from late 1960s) were designed to support multiple HLLs by a writable control store. These were all mainframes.

The Wang 2200 (1973) series were designed with a BASIC interpreter in micro-code.

The Pascal MicroEngine (1979) was designed for the UCSD Pascal form of Pascal, and used p-code (Pascal compiler bytecode) as its machine code. This was influential on the later development of Java and Java machines.

Lisp machines (1970s and 1980s) were a well-known and influential group of HLLCAs.

Intel iAPX 432 (1981) was designed to support Ada. This was Intel's first 32-bit processor design, and was intended to be Intel's main processor family for the 1980s, but failed commercially.

Rekursiv (mid-1980s) was a minor system, designed to support object-oriented programming and the Lingo programming language in hardware, and supported recursion at the instruction set level, hence the name.

A number of processors and coprocessors intended to implement Prolog more directly were designed in the late 1980s and early 1990s, including the Berkeley VLSI-PLM, its successor (the PLUM), and a related microcode implementation. There were also a number of simulated designs that were not produced as hardware A VHDL-based methodology for designing a Prolog processor, A Prolog coprocessor for superconductors. Like Lisp, Prolog's basic model of computation is radically different from standard imperative designs, and computer scientists and electrical engineers were eager to escape the bottlenecks caused by emulating their underlying models.

Niklaus Wirth's Lilith project included a custom CPU geared toward the Modula-2 language. [2]

The INMOS Transputer was designed to support concurrent programming, using occam.

The AT&T Hobbit processor, stemming from a design called CRISP (C-language Reduced Instruction Set Processor), was optimized to run C code.

In the late 1990s, there were plans by Sun Microsystems and other companies to build CPUs that directly (or closely) implemented the stack-based Java virtual machine. As a result, several Java processors have been built and used.

Ericsson developed ECOMP, a processor designed to run Erlang. [3] It was never commercially produced.

The HSA Intermediate Layer (HSAIL) of the Heterogeneous System Architecture (2012) provides a virtual instruction set to abstract away from the underlying ISAs, and has support for HLL features such as exceptions and virtual functions, and include debugging support.

Implementation

HLLCA are frequently implemented via a stack machine (as in the Burroughs Large Systems and Intel 432), and implemented the HLL via microcode in the processor (as in Burroughs Small Systems and Pascal MicroEngine). Tagged architectures are frequently used to support types (as in the Burroughs Large Systems and Lisp machines). More radical examples use a non-von Neumann architecture, though these are typically only hypothetical proposals, not actual implementations.

Application

Some HLLCs have been particularly popular as developer machines (workstations), due to fast compiles and low-level control of the system with a high-level language. Pascal MicroEngine and Lisp machines are good examples of this.

HLLCAs have often been advocated when a HLL has a radically different model of computation than imperative programming (which is a relatively good match for typical processors), notably for functional programming (Lisp) and logic programming (Prolog).

Motivation

A detailed list of putative advantages is given in Ditzel & Patterson (1980).

HLLCAs are intuitively appealing, as the computer can in principle be customized for a language, allowing optimal support for the language, and simplifying compiler writing. It can further natively support multiple languages by simply changing the microcode. Key advantages are to developers: fast compilation and detailed symbolic debugging from the machine.

A further advantage is that a language implementation can be updated by updating the microcode (firmware), without requiring recompilation of an entire system. This is analogous to updating an interpreter for an interpreted language.

An advantage that's reappearing post-2000 is safety or security. Mainstream IT has largely moved to languages with type and/or memory safety for most applications.[ citation needed ] The software those depend on, from OS to virtual machines, leverage native code with no protection. Many vulnerabilities have been found in such code. One solution is to use a processor custom built to execute a safe high level language or at least understand types. Protections at the processor word level make attackers' job difficult compared to low level machines that see no distinction between scalar data, arrays, pointers, or code. Academics are also developing languages with similar properties that might integrate with high level processors in the future. An example of both of these trends is the SAFE [4] project. Compare language-based systems, where the software (especially operating system) is based around a safe, high-level language, though the hardware need not be: the "trusted base" may still be in a lower level language.

Disadvantages

A detailed critique is given in Ditzel & Patterson (1980).

The simplest reason for the lack of success of HLLCAs is that from 1980 optimizing compilers resulted in much faster code and were easier to develop than implementing a language in microcode. Many compiler optimizations require complex analysis and rearrangement of the code, so the machine code is very different from the original source code. These optimizations are either impossible or impractical to implement in microcode, due to the complexity and the overhead. Analogous performance problems have a long history with interpreted languages (dating to Lisp (1958)), only being resolved adequately for practical use by just-in-time compilation, pioneered in Self and commercialized in the HotSpot Java virtual machine (1999).

The fundamental problem is that HLLCAs only simplify the code generation step of compilers, which is typically a relatively small part of compilation, and a questionable use of computing power (transistors and microcode). At the minimum tokenization is needed, and typically syntactic analysis and basic semantic checks (unbound variables) will still be performed – so there is no benefit to the front end – and optimization requires ahead-of-time analysis – so there is no benefit to the middle end.

A deeper problem, still an active area of development as of 2014, [5] is that providing HLL debugging information from machine code is quite difficult, basically because of the overhead of debugging information, and more subtly because compilation (particularly optimization) makes determining the original source for a machine instruction quite involved. Thus the debugging information provided as an essential part of HLLCAs either severely limits implementation or adds significant overhead in ordinary use.

Further, HLLCAs are typically optimized for one language, supporting other languages more poorly. Similar issues arise in multi-language virtual machines, notably the Java virtual machine (designed for Java) and the .NET Common Language Runtime (designed for C#), where other languages are second-class citizens, and often must hew closely to the main language in semantics. For this reason lower-level ISAs allow multiple languages to be well-supported, given compiler support. However, a similar issue arises even for many apparently language-neutral processors, which are well-supported by the language C, and where transpiling to C (rather than directly targeting the hardware) yields efficient programs and simple compilers.

The advantages of HLLCAs can be alternatively achieved in HLL Computer Systems (language-based systems) in alternative ways, primarily via compilers or interpreters: the system is still written in a HLL, but there is a trusted base in software running on a lower-level architecture. This has been the approach followed since circa 1980: for example, a Java system where the runtime environment itself is written in C, but the operating system and applications written in Java.

Alternatives

Since the 1980s the focus of research and implementation in general-purpose computer architectures has primarily been in RISC-like architectures, typically internally register-rich load–store architectures, with rather stable, non-language-specific ISAs, featuring multiple registers, pipelining, and more recently multicore systems, rather than language-specific ISAs. Language support has focused on compilers and their runtimes, and interpreters and their virtual machines (particularly JIT'ing ones), with little direct hardware support. For example, the current Objective-C runtime for iOS implements tagged pointers, which it uses for type-checking and garbage collection, despite the hardware not being a tagged architecture.

In computer architecture, the RISC approach has proven very popular and successful instead, and is opposite from HLLCAs, emphasizing a very simple instruction set architecture. However, the speed advantages of RISC computers in the 1980s was primarily due to early adoption of on-chip cache and room for large registers, rather than intrinsic advantages of RISC.[ citation needed ].

See also

Related Research Articles

In computing, a compiler is a computer program that translates computer code written in one programming language into another language. The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a low-level programming language to create an executable program.

A complex instruction set computer is a computer architecture in which single instructions can execute several low-level operations or are capable of multi-step operations or addressing modes within single instructions. The term was retroactively coined in contrast to reduced instruction set computer (RISC) and has therefore become something of an umbrella term for everything that is not RISC, where the typical differentiating characteristic is that most RISC designs use uniform instruction length for almost all instructions, and employ strictly separate load and store instructions.

<span class="mw-page-title-main">Lisp machine</span> Computer specialized in running Lisp

Lisp machines are general-purpose computers designed to efficiently run Lisp as their main software and programming language, usually via hardware support. They are an example of a high-level language computer architecture. In a sense, they were the first commercial single-user workstations. Despite being modest in number Lisp machines commercially pioneered many now-commonplace technologies, including effective garbage collection, laser printing, windowing systems, computer mice, high-resolution bit-mapped raster graphics, computer graphic rendering, and networking innovations such as Chaosnet. Several firms built and sold Lisp machines in the 1980s: Symbolics, Lisp Machines Incorporated, Texas Instruments, and Xerox. The operating systems were written in Lisp Machine Lisp, Interlisp (Xerox), and later partly in Common Lisp.

In processor design, microcode serves as an intermediary layer situated between the central processing unit (CPU) hardware and the programmer-visible instruction set architecture of a computer, also known as its machine code. It consists of a set of hardware-level instructions that implement the higher-level machine code instructions or control internal finite-state machine sequencing in many digital processing components. While microcode is utilized in Intel and AMD general-purpose CPUs in contemporary desktops and laptops, it functions only as a fallback path for scenarios that the faster hardwired control unit is unable to manage.

Symbolics, Inc., was a privately held American computer manufacturer that acquired the assets of the former company and continues to sell and maintain the Open Genera Lisp system and the Macsyma computer algebra system.

In computer science, an instruction set architecture (ISA) is an abstract model that generally defines how software controls the CPU in a computer or a family of computers. A device or program that executes instructions described by that ISA, such as a central processing unit (CPU), is called an implementation of that ISA.

<span class="mw-page-title-main">Interpreter (computing)</span> Program that executes source code without a separate compilation step

In computer science, an interpreter is a computer program that directly executes instructions written in a programming or scripting language, without requiring them previously to have been compiled into a machine language program. An interpreter generally uses one of the following strategies for program execution:

  1. Parse the source code and perform its behavior directly;
  2. Translate source code into some efficient intermediate representation or object code and immediately execute that;
  3. Explicitly execute stored precompiled bytecode made by a compiler and matched with the interpreter's virtual machine.

In computer science, an abstract machine is a theoretical model that allows for a detailed and precise analysis of how a computer system functions. It is similar to a mathematical function in that it receives inputs and produces outputs based on predefined rules. Abstract machines vary from literal machines in that they are expected to perform correctly and independently of hardware. Abstract machines are "machines" because they allow step-by-step execution of programs; they are "abstract" because they ignore many aspects of actual (hardware) machines. A typical abstract machine consists of a definition in terms of input, output, and the set of allowable operations used to turn the former into the latter. They can be used for purely theoretical reasons as well as models for real-world computer systems. In the theory of computation, abstract machines are often used in thought experiments regarding computability or to analyse the complexity of algorithms. This use of abstract machines is fundamental to the field of computational complexity theory, such as finite state machines, Mealy machines, push-down automata, and Turing machines.

The 801 was an experimental central processing unit (CPU) design developed by IBM during the 1970s. It is considered to be the first modern RISC design, relying on processor registers for all computations and eliminating the many variant addressing modes found in CISC designs. Originally developed as the processor for a telephone switch, it was later used as the basis for a minicomputer and a number of products for their mainframe line. The initial design was a 24-bit processor; that was soon replaced by 32-bit implementations of the same concepts and the original 24-bit 801 was used only into the early 1980s.

In computer science, a high-level programming language is a programming language with strong abstraction from the details of the computer. In contrast to low-level programming languages, it may use natural language elements, be easier to use, or may automate significant areas of computing systems, making the process of developing a program simpler and more understandable than when using a lower-level language. The amount of abstraction provided defines how "high-level" a programming language is.

The Burroughs Large Systems Group produced a family of large 48-bit mainframes using stack machine instruction sets with dense syllables. The first machine in the family was the B5000 in 1961, which was optimized for compiling ALGOL 60 programs extremely well, using single-pass compilers. The B5000 evolved into the B5500 and the B5700. Subsequent major redesigns include the B6500/B6700 line and its successors, as well as the separate B8500 line.

Interlisp is a programming environment built around a version of the programming language Lisp. Interlisp development began in 1966 at Bolt, Beranek and Newman in Cambridge, Massachusetts with Lisp implemented for the Digital Equipment Corporation (DEC) PDP-1 computer by Danny Bobrow and D. L. Murphy. In 1970, Alice K. Hartley implemented BBN LISP, which ran on PDP-10 machines running the operating system TENEX. In 1973, when Danny Bobrow, Warren Teitelman and Ronald Kaplan moved from BBN to the Xerox Palo Alto Research Center (PARC), it was renamed Interlisp. Interlisp became a popular Lisp development tool for artificial intelligence (AI) researchers at Stanford University and elsewhere in the community of the Defense Advanced Research Projects Agency (DARPA). Interlisp was notable for integrating interactive development tools into an integrated development environment (IDE), such as a debugger, an automatic correction tool for simple errors, and analysis tools.

In computer science, computer engineering and programming language implementations, a stack machine is a computer processor or a virtual machine in which the primary interaction is moving short-lived temporary values to and from a push down stack. In the case of a hardware processor, a hardware stack is used. The use of a stack significantly reduces the required number of processor registers. Stack machines extend push-down automata with additional load/store operations or multiple stacks and hence are Turing-complete.

<span class="mw-page-title-main">LLVM</span> Compiler backend for multiple programming languages

LLVM is a set of compiler and toolchain technologies that can be used to develop a frontend for any programming language and a backend for any instruction set architecture. LLVM is designed around a language-independent intermediate representation (IR) that serves as a portable, high-level assembly language that can be optimized with a variety of transformations over multiple passes. The name LLVM originally stood for Low Level Virtual Machine, though the project has expanded and the name is no longer officially an initialism.

<span class="mw-page-title-main">History of programming languages</span>

The history of programming languages spans from documentation of early mechanical computers to modern tools for software development. Early programming languages were highly specialized, relying on mathematical notation and similarly obscure syntax. Throughout the 20th century, research in compiler theory led to the creation of high-level programming languages, which use a more accessible syntax to communicate instructions.

<span class="mw-page-title-main">History of general-purpose CPUs</span>

The history of general-purpose CPUs is a continuation of the earlier history of computing hardware.

Rekursiv was a computer processor designed by David M. Harland in the mid-1980s at a division of hi-fi manufacturer Linn Products. It was one of the few computer architectures intended to implement object-oriented concepts directly in hardware, a form of high-level language computer architecture. The Rekursiv operated directly on objects rather than bits, nibbles, bytes and words. Virtual memory was used as a persistent object store and unusually, the processor instruction set supported recursion.

No instruction set computing (NISC) is a computing architecture and compiler technology for designing highly efficient custom processors and hardware accelerators by allowing a compiler to have low-level control of hardware resources.

<span class="mw-page-title-main">Computer architecture</span> Set of rules describing computer system

In computer science and computer engineering, computer architecture is a description of the structure of a computer system made from component parts. It can sometimes be a high-level description that ignores details of the implementation. At a more detailed level, the description may include the instruction set architecture design, microarchitecture design, logic design, and implementation.

References

  1. See Yaohan Chu references.
  2. "Pascal for Small Machines – History of Lilith". Pascal.hansotten.com. 28 September 2010. Archived from the original on 20 March 2012. Retrieved 12 November 2011.
  3. "ECOMP - an Erlang Processor". Archived from the original on 2021-04-24. Retrieved 2022-12-01.
  4. "SAFE Project". Archived from the original on 2019-10-22. Retrieved 2022-07-09.
  5. See LLVM and the Clang compiler.

Further reading