ML (programming language)

Last updated
ML
Paradigm Multi-paradigm: functional, imperative
Designed by Robin Milner and others at the University of Edinburgh
First appeared1973;48 years ago (1973)
Typing discipline Inferred, static, strong
Dialects
OCaml, Standard ML, F#
Influenced by
ISWIM
Influenced
Clojure, Coq, Cyclone, C++, Elm, F#, F*, Haskell, Idris, Kotlin, Miranda, Nemerle, OCaml, Opa, Erlang, Rust, Scala, Standard ML

ML (Meta Language) is a general-purpose functional programming language. It is known for its use of the polymorphic Hindley–Milner type system, which automatically assigns the types of most expressions without requiring explicit type annotations, and ensures type safety there is a formal proof that a well-typed ML program does not cause runtime type errors. [1] ML provides pattern matching for function arguments, garbage collection, imperative programming, call-by-value and currying. It is used heavily in programming language research and is one of the few languages to be completely specified and verified using formal semantics. Its types and pattern matching make it well-suited and commonly used to operate on other formal languages, such as in compiler writing, automated theorem proving, and formal verification.

Contents

Overview

Features of ML include a call-by-value evaluation strategy, first-class functions, automatic memory management through garbage collection, parametric polymorphism, static typing, type inference, algebraic data types, pattern matching, and exception handling. ML uses static scoping rules.[ citation needed ]

ML can be referred to as an impure functional language, because although it encourages functional programming, it does allow side-effects (like languages such as Lisp, but unlike a purely functional language such as Haskell). Like most programming languages, ML uses eager evaluation, meaning that all subexpressions are always evaluated, though lazy evaluation can be achieved through the use of closures. Thus one can create and use infinite streams as in Haskell, but their expression is indirect.

ML's strengths are mostly applied in language design and manipulation (compilers, analyzers, theorem provers), but it is a general-purpose language also used in bioinformatics and financial systems.

ML was developed by Robin Milner and others in the early 1970s at the University of Edinburgh, [2] and its syntax is inspired by ISWIM. Historically, ML was conceived to develop proof tactics in the LCF theorem prover (whose language, pplambda, a combination of the first-order predicate calculus and the simply-typed polymorphic lambda calculus, had ML as its metalanguage).

Today there are several languages in the ML family; the three most prominent are Standard ML (SML), OCaml and F#. Ideas from ML have influenced numerous other languages, like Haskell, Cyclone, Nemerle, [3] ATS, and Elm. [4]

Examples

The following examples use the syntax of Standard ML. Other ML dialects such as OCaml and F# differ in small ways.

Factorial

The factorial function expressed as pure ML:

funfac(0:int):int=1|fac(n:int):int=n*fac(n-1)

This describes the factorial as a recursive function, with a single terminating base case. It is similar to the descriptions of factorials found in mathematics textbooks. Much of ML code is similar to mathematics in facility and syntax.

Part of the definition shown is optional, and describes the types of this function. The notation E : t can be read as expression E has type t. For instance, the argument n is assigned type integer (int), and fac (n : int), the result of applying fac to the integer n, also has type integer. The function fac as a whole then has type function from integer to integer (int -> int), that is, fac accepts an integer as an argument and returns an integer result. Thanks to type inference, the type annotations can be omitted and will be derived by the compiler. Rewritten without the type annotations, the example looks like:

funfac0=1|facn=n*fac(n-1)

The function also relies on pattern matching, an important part of ML programming. Note that parameters of a function are not necessarily in parentheses but separated by spaces. When the function's argument is 0 (zero) it will return the integer 1 (one). For all other cases the second line is tried. This is the recursion, and executes the function again until the base case is reached.

This implementation of the factorial function is not guaranteed to terminate, since a negative argument causes an infinite descending chain of recursive calls. A more robust implementation would check for a nonnegative argument before recursing, as follows:

funfactn=letfunfac0=1|facn=n*fac(n-1)inif(n<0)thenraiseDomainelsefacnend

The problematic case (when n is negative) demonstrates a use of ML's exception system.

The function can be improved further by writing its inner loop as a tail call, such that the call stack need not grow in proportion to the number of function calls. This is achieved by adding an extra, accumulator, parameter to the inner function. At last, we arrive at

funfactn=letfunfac0acc=acc|facnacc=fac(n-1)(n*acc)inif(n<0)thenraiseDomainelsefacn1end

List reverse

The following function reverses the elements in a list. More precisely, it returns a new list whose elements are in reverse order compared to the given list.

funreverse[]=[]|reverse(x::xs)=(reversexs)@[x]

This implementation of reverse, while correct and clear, is inefficient, requiring quadratic time for execution. The function can be rewritten to execute in linear time:

fun'areversexs:'alist=List.foldl(op::)[]xs

Notably, this function is an example of parametric polymorphism. That is, it can consume lists whose elements have any type, and return lists of the same type.

Modules

Modules are ML's system for structuring large projects and libraries. A module consists of a signature file and one or more structure files. The signature file specifies the API to be implemented (like a C header file, or Java interface file). The structure implements the signature (like a C source file or Java class file). For example, the following define an Arithmetic signature and an implementation of it using Rational numbers:

signatureARITH=sigtypetvalzero:tvalsucc:t->tvalsum:t*t->tend
structureRational:ARITH=structdatatypet=Ratofint*intvalzero=Rat(0,1)funsucc(Rat(a,b))=Rat(a+b,b)funsum(Rat(a,b),Rat(c,d))=Rat(a*d+c*b,b*d)end

These are imported into the interpreter by the 'use' command. Interaction with the implementation is only allowed via the signature functions, for example it is not possible to create a 'Rat' data object directly via this code. The 'structure' block hides all the implementation detail from outside.

ML's standard libraries are implemented as modules in this way.

See also

Related Research Articles

In computer science, functional programming is a programming paradigm where programs are constructed by applying and composing functions. It is a declarative programming paradigm in which function definitions are trees of expressions that map values to other values, rather than a sequence of imperative statements which update the running state of the program.

OCaml is a general-purpose, multi-paradigm programming language which extends the Caml dialect of ML with object-oriented features. OCaml was created in 1996 by Xavier Leroy, Jérôme Vouillon, Damien Doligez, Didier Rémy, Ascánder Suárez, and others.

Miranda (programming language)

Miranda is a lazy, purely functional programming language designed by David Turner as a successor to his earlier programming languages SASL and KRC, using some concepts from ML and Hope. It was produced by Research Software Ltd. of England and was the first purely functional language to be commercially supported.

Standard ML (SML) is a general-purpose modular functional programming language with compile-time type checking and type inference. It is popular among compiler writers and programming language researchers, as well as in the development of theorem provers.

Generic programming is a style of computer programming in which algorithms are written in terms of types to-be-specified-later that are then instantiated when needed for specific types provided as parameters. This approach, pioneered by the ML programming language in 1973, permits writing common functions or types that differ only in the set of types on which they operate when used, thus reducing duplication. Such software entities are known as generics in Ada, C#, Delphi, Eiffel, F#, Java, Nim, Python, Rust, Swift, TypeScript and Visual Basic .NET. They are known as parametric polymorphism in ML, Scala, Julia, and Haskell ; templates in C++ and D; and parameterized types in the influential 1994 book Design Patterns.

Clean (programming language)

Clean is a general-purpose purely functional computer programming language. For much of the language's active development history it was called Concurrent Clean, but this was dropped at some point. Clean is being developed by a group of researchers from the Radboud University in Nijmegen since 1987.

F Sharp (programming language) Microsoft programming language

F# is a functional-first, general purpose, strongly typed, multi-paradigm programming language that encompasses functional, imperative, and object-oriented programming methods. F# is most often used as a cross-platform Common Language Infrastructure (CLI) language on .NET, but it can also generate JavaScript and graphics processing unit (GPU) code.

In computer science, pattern matching is the act of checking a given sequence of tokens for the presence of the constituents of some pattern. In contrast to pattern recognition, the match usually has to be exact: "either it will or will not be a match." The patterns generally have the form of either sequences or tree structures. Uses of pattern matching include outputting the locations of a pattern within a token sequence, to output some component of the matched pattern, and to substitute the matching pattern with some other token sequence.

Type inference refers to the automatic detection of the type of an expression in a formal language. These include programming languages and mathematical type systems, but also natural languages in some branches of computer science and linguistics.

In computer programming, especially functional programming and type theory, an algebraic data type is a kind of composite type, i.e., a type formed by combining other types.

In computer science, a tagged union, also called a variant, variant record, choice type, discriminated union, disjoint union, sum type or coproduct, is a data structure used to hold a value that could take on several different, but fixed, types. Only one of the types can be in use at any one time, and a tag field explicitly indicates which one is in use. It can be thought of as a type that has several "cases", each of which should be handled correctly when that type is manipulated. This is critical in defining recursive datatypes, in which some component of a value may have the same type as the value itself, for example in defining a type for representing trees, where it is necessary to distinguish multi-node subtrees and leaves. Like ordinary unions, tagged unions can save storage by overlapping storage areas for each type, since only one is in use at a time.

In computer science, corecursion is a type of operation that is dual to recursion. Whereas recursion works analytically, starting on data further from a base case and breaking it down into smaller data and repeating until one reaches a base case, corecursion works synthetically, starting from a base case and building it up, iteratively producing data further removed from a base case. Put simply, corecursive algorithms use the data that they themselves produce, bit by bit, as they become available, and needed, to produce further bits of data. A similar but distinct concept is generative recursion which may lack a definite "direction" inherent in corecursion and recursion.

Caml

Caml is a multi-paradigm, general-purpose programming language which is a dialect of the ML programming language family. Caml was developed in France at INRIA and ENS.

In computer science, a type class is a type system construct that supports ad hoc polymorphism. This is achieved by adding constraints to type variables in parametrically polymorphic types. Such a constraint typically involves a type class T and a type variable a, and means that a can only be instantiated to a type whose members support the overloaded operations associated with T.

In functional programming, a generalized algebraic data type is a generalization of parametric algebraic data types.

In computer programming, an anonymous function is a function definition that is not bound to an identifier. Anonymous functions are often arguments being passed to higher-order functions, or used for constructing the result of a higher-order function that needs to return a function. If the function is only used once, or a limited number of times, an anonymous function may be syntactically lighter than using a named function. Anonymous functions are ubiquitous in functional programming languages and other languages with first-class functions, where they fulfill the same role for the function type as literals do for other data types.

In functional programming, filter is a higher-order function that processes a data structure in some order to produce a new data structure containing exactly those elements of the original data structure for which a given predicate returns the boolean value true.

In computer science, polymorphic recursion refers to a recursive parametrically polymorphic function where the type parameter changes with each recursive invocation made, instead of staying constant. Type inference for polymorphic recursion is equivalent to semi-unification and therefore undecidable and requires the use of a semi-algorithm or programmer supplied type annotations.

This article describes the features in Haskell.

Haskell is a general-purpose, statically typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research and industrial application, Haskell has pioneered a number of advanced programming language features such as type classes, which enable type-safe operator overloading. Haskell's main implementation is the Glasgow Haskell Compiler (GHC). It is named after logician Haskell Curry.

References

  1. Robin Milner. A theory of type polymorphism in programming. Journal of Computer and System Sciences, 17(3):348–375, 1978.
  2. Gordon, Michael J. C. (1996). "From LCF to HOL: a short history" . Retrieved 2007-10-11.
  3. Programming language for "special forces" of developers, Russian Software Development Network: Nemerle Project Team, retrieved January 24, 2021
  4. Tate, Bruce A.; Daoud, Fred; Dees, Ian; Moffitt, Jack (2014). "3. Elm". Seven More Languages in Seven Weeks (Book version: P1.0-November 2014 ed.). The Pragmatic Programmers, LLC. pp. 97, 101. ISBN   978-1-941222-15-7. On page 101, Elm creator Evan Czaplicki says: 'I tend to say "Elm is an ML-family language" to get at the shared heritage of all these languages.' ["these languages" is referring to Haskell, OCaml, SML, and F#.]
  5. "OCaml is an industrial strength programming language supporting functional, imperative and object-oriented styles". Retrieved on January 2, 2018.

Further reading