Glasgow Haskell Compiler

Last updated

The Glasgow Haskell Compiler
Original author(s) Kevin Hammond
Developer(s) The Glasgow Haskell Team [1]
Initial releaseDecember 1992 (1992-12) [2]
Stable release
9.0.1  OOjs UI icon edit-ltr-progressive.svg / 4 February 2021;7 days ago (4 February 2021) [3]
Repository OOjs UI icon edit-ltr-progressive.svg
Written in Haskell and C
Operating system Linux, OS X 10.7 Lion and later, iOS, Windows 2000 and later, FreeBSD, Solaris 10 and later
Platform x86, x86-64, ARM
Available inEnglish
Type Compiler
License New BSD License
Website www.haskell.org/ghc/

The Glasgow Haskell Compiler (GHC) is an open-source native code compiler for the functional programming language Haskell. [4] It provides a cross-platform environment for the writing and testing of Haskell code and it supports numerous extensions, libraries, and optimisations that streamline the process of generating and executing code. GHC is the most commonly used Haskell compiler. [5] The lead developers are Simon Peyton Jones and Simon Marlow.

Contents

History

GHC originally started in 1989 as a prototype, written in LML (Lazy ML) by Kevin Hammond at the University of Glasgow. Later that year, the prototype was completely rewritten in Haskell, except for its parser, by Cordelia Hall, Will Partain, and Simon Peyton Jones. Its first beta release was on 1 April 1991 and subsequent releases added a strictness analyzer as well as language extensions such as monadic I/O, mutable arrays, unboxed data types, concurrent and parallel programming models (such as software transactional memory and data parallelism) and a profiler. [2]

Peyton Jones, as well as Marlow, later moved to Microsoft Research in Cambridge, England, where they continued to be primarily responsible for developing GHC. GHC also contains code from more than three hundred other contributors. [1] Since 2009, third-party contributions to GHC have been funded by the Industrial Haskell Group. [6]

Architecture

GHC itself is written in Haskell, [7] but the runtime system for Haskell, essential to run programs, is written in C and C--.

GHC's front end—incorporating the lexer, parser and typechecker—is designed to preserve as much information about the source language as possible until after type inference is complete, toward the goal of providing clear error messages to users. [2] After type checking, the Haskell code is desugared into a typed intermediate language known as "Core" (based on System F, extended with let and case expressions). Recently, Core was extended to support generalized algebraic datatypes in its type system, and is now based on an extension to System F known as System FC. [8]

In the tradition of type-directed compilation, GHC's simplifier, or "middle end", where most of the optimizations implemented in GHC are performed, is structured as a series of source-to-source transformations on Core code. The analyses and transformations performed in this compiler stage include demand analysis (a generalization of strictness analysis), application of user-defined rewrite rules (including a set of rules included in GHC's standard libraries that performs foldr/build fusion), unfolding (called "inlining" in more traditional compilers), let-floating, an analysis that determines which function arguments can be unboxed, constructed product result analysis, specialization of overloaded functions, as well as a set of simpler local transformations such as constant folding and beta reduction. [9]

The back end of the compiler transforms Core code into an internal representation of C--, via an intermediate language STG (short for "Spineless Tagless G-machine"). [10] The C-- code can then take one of three routes: it is either printed as C code for compilation with GCC, converted directly into native machine code (the traditional "code generation" phase), or converted to LLVM virtual machine code for compilation with LLVM. In all three cases, the resultant native code is finally linked against the GHC runtime system to produce an executable.

Language

GHC complies with the language standards, both Haskell 98 [11] and Haskell 2010. [12] It also supports many optional extensions to the Haskell standard: for example, the software transactional memory (STM) library, which allows for Composable Memory Transactions.

Extensions to Haskell

A number of extensions to Haskell have been proposed. These extensions provide features not described in the language specification, or they redefine existing constructs. As such, each extension may not be supported by all Haskell implementations. There is an ongoing effort [13] to describe extensions and select those which will be included in future versions of the language specification.

The extensions [14] supported by the Glasgow Haskell Compiler include:

Type system extensions

An expressive static type system is one of the major defining features of Haskell. Accordingly, much of the work in extending the language has been directed towards types and type classes.

The Glasgow Haskell Compiler supports an extended type system based on the theoretical System FC. [8] Major extensions to the type system include:

Extensions relating to type classes include:

Portability

Versions of GHC are available for several platforms, including Windows and most varieties of Unix (such as Linux, FreeBSD, OpenBSD, and macOS). [17] GHC has also been ported to several different processor architectures. [17]

See also

Related Research Articles

ML is a general-purpose functional programming language. ML is statically-scoped. It is known for its use of the polymorphic Hindley–Milner type system, which automatically assigns the types of most expressions without requiring explicit type annotations, and ensures type safety – there is a formal proof that a well-typed ML program does not cause runtime type errors. ML provides pattern matching for function arguments, garbage collection, imperative programming, call-by-value and currying. It is used heavily in programming language research and is one of the few languages to be completely specified and verified using formal semantics. Its types and pattern matching make it well-suited and commonly used to operate on other formal languages, such as in compiler writing, automated theorem proving, and formal verification.

Generic programming is a style of computer programming in which algorithms are written in terms of types to-be-specified-later that are then instantiated when needed for specific types provided as parameters. This approach, pioneered by the ML programming language in 1973, permits writing common functions or types that differ only in the set of types on which they operate when used, thus reducing duplication. Such software entities are known as generics in Ada, C#, Delphi, Eiffel, F#, Java, Nim, Python, Rust, Swift, TypeScript and Visual Basic .NET. They are known as parametric polymorphism in ML, Scala, Julia, and Haskell ; templates in C++ and D; and parameterized types in the influential 1994 book Design Patterns.

Template metaprogramming (TMP) is a metaprogramming technique in which templates are used by a compiler to generate temporary source code, which is merged by the compiler with the rest of the source code and then compiled. The output of these templates include compile-time constants, data structures, and complete functions. The use of templates can be thought of as compile-time polymorphism. The technique is used by a number of languages, the best-known being C++, but also Curl, D, and XL.

Programming languages can be grouped by the number and types of paradigms supported.

In computer science, pattern matching is the act of checking a given sequence of tokens for the presence of the constituents of some pattern. In contrast to pattern recognition, the match usually has to be exact: "either it will or will not be a match." The patterns generally have the form of either sequences or tree structures. Uses of pattern matching include outputting the locations of a pattern within a token sequence, to output some component of the matched pattern, and to substitute the matching pattern with some other token sequence.

In computer programming, especially functional programming and type theory, an algebraic data type is a kind of composite type, i.e., a type formed by combining other types.

Template Haskell is an experimental language extension to the Haskell programming language implemented in the Glasgow Haskell Compiler. In early incarnations it was also known as Template Meta-Haskell.

In functional programming, a monad is an abstraction that allows structuring programs generically. Supporting languages may use monads to abstract away boilerplate code needed by the program logic. Monads achieve this by providing their own data type, which represents a specific form of computation, along with one procedure to wrap values of any basic type within the monad and another to compose functions that output monadic values.

C-- is a C-like programming language. Its creators, functional programming researchers Simon Peyton Jones and Norman Ramsey, designed it to be generated mainly by compilers for very high-level languages rather than written by human programmers. Unlike many other intermediate languages, its representation is plain ASCII text, not bytecode or another binary format.

Simon Peyton Jones British computer scientist

Simon Peyton Jones is a British computer scientist who researches the implementation and applications of functional programming languages, particularly lazy functional programming.

In computer science, a type class is a type system construct that supports ad hoc polymorphism. This is achieved by adding constraints to type variables in parametrically polymorphic types. Such a constraint typically involves a type class T and a type variable a, and means that a can only be instantiated to a type whose members support the overloaded operations associated with T.

In functional programming, a generalized algebraic data type is a generalization of parametric algebraic data types.

This article describes the features in Haskell.

Haskell is a general-purpose, statically typed, purely functional programming language with type inference and lazy evaluation. Developed to be suitable for teaching, research and industrial application, Haskell has pioneered a number of advanced programming language features such as type classes, which enable type-safe operator overloading. Haskell's main implementation is the Glasgow Haskell Compiler (GHC). It is named after logician Haskell Curry.

The following outline is provided as an overview of and topical guide to C++:

Yesod is a free and open-source web framework based on Haskell for productive development of type-safe, REST model based, high performance web applications, developed by Michael Snoyman et al.

In computer science, a type family associates data types with other data types, using a type-level function defined by an open-ended collection of valid instances of input types and the corresponding output types.

Ur also called Ur/Web is a Free and Open source functional programming language specific for web development, created by Adam Chlipala at the Massachusetts Institute of Technology that from a single program produces server code, browser client code and SQL code specific for the chosen database backend.

PureScript

PureScript is a strongly-typed, purely-functional programming language that compiles to JavaScript. It can be used to develop web applications, server side apps, and also desktop applications with use of Electron. Its syntax is mostly comparable to that of Haskell. In addition, it introduces row polymorphism and extensible records. Also, contrary to Haskell, PureScript adheres to a strict evaluation strategy.

References

  1. 1 2 "The GHC Team". Haskell.org. Retrieved 1 September 2016.
  2. 1 2 3 Hudak, P.; Hughes, J.; Peyton Jones, S.; Wadler, P. (June 2007). "A History of Haskell: Being Lazy With Class" (PDF). Proc. Third ACM SIGPLAN History of Programming Languages Conference (HOPL-III). Retrieved 1 September 2016.
  3. "Download — The Glasgow Haskell Compiler". Haskell.org.
  4. "The Glorious Glasgow Haskell Compilation System User's Guide". Haskell.org. Retrieved 27 July 2014.
  5. "2017 state of Haskell survey results". taylor.fausak.me. 15 November 2017. Retrieved 11 December 2017.
  6. "Industrial Haskell Group". Haskell.org. 2014. Retrieved 1 September 2016.
  7. "GHC Commentary: The Compiler". Haskell.org. 23 March 2016. Archived from the original on 23 March 2016. Retrieved 26 May 2016.
  8. 1 2 Sulzmann, M.; Chakravarty, M. M. T.; Peyton Jones, S.; Donnelly, K. (January 2007). "System F with Type Equality Coercions". Proc. ACM Workshop on Types in Language Design and Implementation (TLDI).
  9. Peyton Jones, S. (April 1996). "Compiling Haskell by program transformation: a report from the trenches". Proc. European Symposium on Programming (ESOP).
  10. Peyton Jones, S. (April 1992). "Implementing lazy functional languages on stock hardware: the Spineless Tagless G-machine, Version 2.5". Journal of Functional Programming. 2 (2): 127–202. doi:10.1017/S0956796800000319.
  11. "Haskell 98 Language and Libraries: The Revised Report". Haskell.org. Retrieved 28 January 2007.
  12. "Haskell 2010 Language Report". Haskell.org. Retrieved 30 August 2012.
  13. "Welcome to Haskell' (Haskell Prime)". Haskell.org. Retrieved 26 May 2016.
  14. "GHC Language Features". Haskell.org. Retrieved 25 May 2016.
  15. Coutts, D.; Leshchinskiy, R.; Stewart, D. (April 2007). "Stream Fusion: From Lists to Streams to Nothing at All". Proc. ACM SIGPLAN International Conference on Functional Programming (ICFP). Archived from the original on 23 September 2007.
  16. Mitchell, Neil; Fletcher, Shayne (3 May 2020). "Record Dot Syntax". ghc-proposals. GitHub . Retrieved 30 June 2020.
  17. 1 2 Platforms at gitlab.haskell.org