Paradigm | Multi-paradigm: functional, imperative, modular [1] |
---|---|
Family | ML |
First appeared | 1983[2] |
Stable release | Standard ML '97 [2] / 1997 |
Typing discipline | Inferred, static, strong |
Filename extensions | .sml |
Website | smlfamily |
Major implementations | |
SML/NJ, MLton, Poly/ML | |
Dialects | |
Alice, Concurrent ML, Dependent ML | |
Influenced by | |
ML, Hope, Pascal | |
Influenced | |
Elm, F#, F*, Haskell, OCaml, Python, [3] Rust, [4] Scala |
Standard ML (SML) is a general-purpose, high-level, modular, functional programming language with compile-time type checking and type inference. It is popular for writing compilers, for programming language research, and for developing theorem provers.
Standard ML is a modern dialect of ML, the language used in the Logic for Computable Functions (LCF) theorem-proving project. It is distinctive among widely used languages in that it has a formal specification, given as typing rules and operational semantics in The Definition of Standard ML. [5]
This section has multiple issues. Please help improve it or discuss these issues on the talk page . (Learn how and when to remove these template messages)
|
Standard ML is a functional programming language with some impure features. Programs written in Standard ML consist of expressions in contrast to statements or commands, although some expressions of type unit are only evaluated for their side-effects.
Like all functional languages, a key feature of Standard ML is the function, which is used for abstraction. The factorial function can be expressed as follows:
funfactorialn=ifn=0then1elsen*factorial(n-1)
An SML compiler must infer the static type valfactorial:int->int
without user-supplied type annotations. It has to deduce that n
is only used with integer expressions, and must therefore itself be an integer, and that all terminal expressions are integer expressions.
The same function can be expressed with clausal function definitions where the if-then-else conditional is replaced with templates of the factorial function evaluated for specific values:
funfactorial0=1|factorialn=n*factorial(n-1)
or iteratively:
funfactorialn=letvali=refnandacc=ref1inwhile!i>0do(acc:=!acc*!i;i:=!i-1);!accend
or as a lambda function:
valrecfactorial=fn0=>1|n=>n*factorial(n-1)
Here, the keyword val
introduces a binding of an identifier to a value, fn
introduces an anonymous function, and rec
allows the definition to be self-referential.
The encapsulation of an invariant-preserving tail-recursive tight loop with one or more accumulator parameters within an invariant-free outer function, as seen here, is a common idiom in Standard ML.
Using a local function, it can be rewritten in a more efficient tail-recursive style:
localfunloop(0,acc)=acc|loop(m,acc)=loop(m-1,m*acc)infunfactorialn=loop(n,1)end
A type synonym is defined with the keyword type
. Here is a type synonym for points on a plane, and functions computing the distances between two points, and the area of a triangle with the given corners as per Heron's formula. (These definitions will be used in subsequent examples).
typeloc=real*realfunsquare(x:real)=x*xfundist(x,y)(x',y')=Math.sqrt(square(x'-x)+square(y'-y))funheron(a,b,c)=letvalx=distabvaly=distbcvalz=distacvals=(x+y+z)/2.0inMath.sqrt(s*(s-x)*(s-y)*(s-z))end
Standard ML provides strong support for algebraic datatypes (ADT). A data type can be thought of as a disjoint union of tuples (or a "sum of products"). They are easy to define and easy to use, largely because of pattern matching, and most Standard ML implementations' pattern-exhaustiveness checking and pattern redundancy checking.
In object-oriented programming languages, a disjoint union can be expressed as class hierarchies. However, in contrast to class hierarchies, ADTs are closed. Thus, the extensibility of ADTs is orthogonal to the extensibility of class hierarchies. Class hierarchies can be extended with new subclasses which implement the same interface, while the functions of ADTs can be extended for the fixed set of constructors. See expression problem.
A datatype is defined with the keyword datatype
, as in:
datatypeshape=Circleofloc*real(* center and radius *)|Squareofloc*real(* upper-left corner and side length; axis-aligned *)|Triangleofloc*loc*loc(* corners *)
Note that a type synonym cannot be recursive; datatypes are necessary to define recursive constructors. (This is not at issue in this example.)
Patterns are matched in the order in which they are defined. C programmers can use tagged unions, dispatching on tag values, to do what ML does with datatypes and pattern matching. Nevertheless, while a C program decorated with appropriate checks will, in a sense, be as robust as the corresponding ML program, those checks will of necessity be dynamic; ML's static checks provide strong guarantees about the correctness of the program at compile time.
Function arguments can be defined as patterns as follows:
funarea(Circle(_,r))=Math.pi*squarer|area(Square(_,s))=squares|area(Trianglep)=heronp(* see above *)
The so-called "clausal form" of function definition, where arguments are defined as patterns, is merely syntactic sugar for a case expression:
funareashape=caseshapeofCircle(_,r)=>Math.pi*squarer|Square(_,s)=>squares|Trianglep=>heronp
Pattern-exhaustiveness checking will make sure that each constructor of the datatype is matched by at least one pattern.
The following pattern is not exhaustive:
funcenter(Circle(c,_))=c|center(Square((x,y),s))=(x+s/2.0,y+s/2.0)
There is no pattern for the Triangle
case in the center
function. The compiler will issue a warning that the case expression is not exhaustive, and if a Triangle
is passed to this function at runtime, exceptionMatch
will be raised.
The pattern in the second clause of the following (meaningless) function is redundant:
funf(Circle((x,y),r))=x+y|f(Circle_)=1.0|f_=0.0
Any value that would match the pattern in the second clause would also match the pattern in the first clause, so the second clause is unreachable. Therefore, this definition as a whole exhibits redundancy, and causes a compile-time warning.
The following function definition is exhaustive and not redundant:
valhasCorners=fn(Circle_)=>false|_=>true
If control gets past the first pattern (Circle
), we know the shape must be either a Square
or a Triangle
. In either of those cases, we know the shape has corners, so we can return true
without discerning the actual shape.
Functions can consume functions as arguments:
funmapf(x,y)=(fx,fy)
Functions can produce functions as return values:
funconstantk=(fn_=>k)
Functions can also both consume and produce functions:
funcompose(f,g)=(fnx=>f(gx))
The function List.map
from the basis library is one of the most commonly used higher-order functions in Standard ML:
funmap_[]=[]|mapf(x::xs)=fx::mapfxs
A more efficient implementation with tail-recursive List.foldl
:
funmapf=List.revoList.foldl(fn(x,acc)=>fx::acc)[]
Exceptions are raised with the keyword raise
and handled with the pattern matching handle
construct. The exception system can implement non-local exit; this optimization technique is suitable for functions like the following.
localexceptionZero;valp=fn(0,_)=>raiseZero|(a,b)=>a*binfunprodxs=List.foldlp1xshandleZero=>0end
When exceptionZero
is raised, control leaves the function List.foldl
altogether. Consider the alternative: the value 0 would be returned, it would be multiplied by the next integer in the list, the resulting value (inevitably 0) would be returned, and so on. The raising of the exception allows control to skip over the entire chain of frames and avoid the associated computation. Note the use of the underscore (_
) as a wildcard pattern.
The same optimization can be obtained with a tail call.
localfunpa(0::_)=0|pa(x::xs)=p(a*x)xs|pa[]=ainvalprod=p1end
Standard ML's advanced module system allows programs to be decomposed into hierarchically organized structures of logically related type and value definitions. Modules provide not only namespace control but also abstraction, in the sense that they allow the definition of abstract data types. Three main syntactic constructs comprise the module system: signatures, structures and functors.
A signature is an interface, usually thought of as a type for a structure; it specifies the names of all entities provided by the structure, the arity of each type component, the type of each value component, and the signature of each substructure. The definitions of type components are optional; type components whose definitions are hidden are abstract types.
For example, the signature for a queue may be:
signatureQUEUE=sigtype'aqueueexceptionQueueError;valempty:'aqueuevalisEmpty:'aqueue->boolvalsingleton:'a->'aqueuevalfromList:'alist->'aqueuevalinsert:'a*'aqueue->'aqueuevalpeek:'aqueue->'avalremove:'aqueue->'a*'aqueueend
This signature describes a module that provides a polymorphic type 'aqueue
, exceptionQueueError
, and values that define basic operations on queues.
A structure is a module; it consists of a collection of types, exceptions, values and structures (called substructures) packaged together into a logical unit.
A queue structure can be implemented as follows:
structureTwoListQueue:>QUEUE=structtype'aqueue='alist*'alistexceptionQueueError;valempty=([],[])funisEmpty([],[])=true|isEmpty_=falsefunsingletona=([],[a])funfromLista=([],a)funinsert(a,([],[]))=singletona|insert(a,(ins,outs))=(a::ins,outs)funpeek(_,[])=raiseQueueError|peek(ins,outs)=List.hdoutsfunremove(_,[])=raiseQueueError|remove(ins,[a])=(a,([],List.revins))|remove(ins,a::outs)=(a,(ins,outs))end
This definition declares that structureTwoListQueue
implements signatureQUEUE
. Furthermore, the opaque ascription denoted by :>
states that any types which are not defined in the signature (i.e. type'aqueue
) should be abstract, meaning that the definition of a queue as a pair of lists is not visible outside the module. The structure implements all of the definitions in the signature.
The types and values in a structure can be accessed with "dot notation":
valq:stringTwoListQueue.queue=TwoListQueue.emptyvalq'=TwoListQueue.insert(Real.toStringMath.pi,q)
A functor is a function from structures to structures; that is, a functor accepts one or more arguments, which are usually structures of a given signature, and produces a structure as its result. Functors are used to implement generic data structures and algorithms.
One popular algorithm [6] for breadth-first search of trees makes use of queues. Here is a version of that algorithm parameterized over an abstract queue structure:
(* after Okasaki, ICFP, 2000 *)functorBFS(Q:QUEUE)=structdatatype'atree=E|Tof'a*'atree*'atreelocalfunbfsQq=ifQ.isEmptyqthen[]elsesearch(Q.removeq)andsearch(E,q)=bfsQq|search(T(x,l,r),q)=x::bfsQ(insert(insertql)r)andinsertqa=Q.insert(a,q)infunbfst=bfsQ(Q.singletont)endendstructureQueueBFS=BFS(TwoListQueue)
Within functorBFS
, the representation of the queue is not visible. More concretely, there is no way to select the first list in the two-list queue, if that is indeed the representation being used. This data abstraction mechanism makes the breadth-first search truly agnostic to the queue's implementation. This is in general desirable; in this case, the queue structure can safely maintain any logical invariants on which its correctness depends behind the bulletproof wall of abstraction.
Snippets of SML code are most easily studied by entering them into an interactive top-level.
The following is a "Hello, World!" program:
hello.sml |
---|
print"Hello, world!\n"; |
bash |
$ mltonhello.sml $ ./hello Hello, world! |
Insertion sort for intlist
(ascending) can be expressed concisely as follows:
funinsert(x,[])=[x]|insert(x,h::t)=sortx(h,t)andsortx(h,t)=ifx<hthen[x,h]@telseh::insert(x,t)valinsertionsort=List.foldlinsert[]
Here, the classic mergesort algorithm is implemented in three functions: split, merge and mergesort. Also note the absence of types, with the exception of the syntax op::
and []
which signify lists. This code will sort lists of any type, so long as a consistent ordering function cmp
is defined. Using Hindley–Milner type inference, the types of all variables can be inferred, even complicated types such as that of the function cmp
.
Split
funsplit
is implemented with a stateful closure which alternates between true
and false
, ignoring the input:
funalternator{}=letvalstate=reftrueinfna=>!statebeforestate:=not(!state)end(* Split a list into near-halves which will either be the same length, * or the first will have one more element than the other. * Runs in O(n) time, where n = |xs|. *)funsplitxs=List.partition(alternator{})xs
Merge
Merge uses a local function loop for efficiency. The inner loop
is defined in terms of cases: when both lists are non-empty (x::xs
) and when one list is empty ([]
).
This function merges two sorted lists into one sorted list. Note how the accumulator acc
is built backwards, then reversed before being returned. This is a common technique, since 'alist
is represented as a linked list; this technique requires more clock time, but the asymptotics are not worse.
(* Merge two ordered lists using the order cmp. * Pre: each list must already be ordered per cmp. * Runs in O(n) time, where n = |xs| + |ys|. *)funmergecmp(xs,[])=xs|mergecmp(xs,y::ys)=letfunloop(a,acc)(xs,[])=List.revAppend(a::acc,xs)|loop(a,acc)(xs,y::ys)=ifcmp(a,y)thenloop(y,a::acc)(ys,xs)elseloop(a,y::acc)(xs,ys)inloop(y,[])(ys,xs)end
Mergesort
The main function:
funapf(x,y)=(fx,fy)(* Sort a list in according to the given ordering operation cmp. * Runs in O(n log n) time, where n = |xs|. *)funmergesortcmp[]=[]|mergesortcmp[x]=[x]|mergesortcmpxs=(mergecmpoap(mergesortcmp)osplit)xs
Quicksort can be expressed as follows. funpart
is a closure that consumes an order operator op<<
.
infix<<funquicksort(op<<)=letfunpartp=List.partition(fnx=>x<<p)funsort[]=[]|sort(p::xs)=joinp(partpxs)andjoinp(l,r)=sortl@p::sortrinsortend
Note the relative ease with which a small expression language can be defined and processed:
exceptionTyErr;datatypety=IntTy|BoolTyfununify(IntTy,IntTy)=IntTy|unify(BoolTy,BoolTy)=BoolTy|unify(_,_)=raiseTyErrdatatypeexp=True|False|Intofint|Notofexp|Addofexp*exp|Ifofexp*exp*expfuninferTrue=BoolTy|inferFalse=BoolTy|infer(Int_)=IntTy|infer(Note)=(asserteBoolTy;BoolTy)|infer(Add(a,b))=(assertaIntTy;assertbIntTy;IntTy)|infer(If(e,t,f))=(asserteBoolTy;unify(infert,inferf))andassertet=unify(infere,t)funevalTrue=True|evalFalse=False|eval(Intn)=Intn|eval(Note)=ifevale=TruethenFalseelseTrue|eval(Add(a,b))=(case(evala,evalb)of(Intx,Inty)=>Int(x+y))|eval(If(e,t,f))=eval(ifevale=Truethentelsef)funrune=(infere;SOME(evale))handleTyErr=>NONE
Example usage on well-typed and ill-typed expressions:
valSOME(Int3)=run(Add(Int1,Int2))(* well-typed *)valNONE=run(If(Not(Int1),True,False))(* ill-typed *)
The IntInf
module provides arbitrary-precision integer arithmetic. Moreover, integer literals may be used as arbitrary-precision integers without the programmer having to do anything.
The following program implements an arbitrary-precision factorial function:
fact.sml |
---|
funfactn:IntInf.int=ifn=0then1elsen*fact(n-1);funprintLinestr=TextIO.output(TextIO.stdOut,str^"\n");val()=printLine(IntInf.toString(fact120)); |
bash |
$ mltonfact.sml $ ./fact 6689502913449127057588118054090372586752746333138029810295671352301633557244962989366874165271984981308157637893214090552534408589408121859898481114389650005964960521256960000000000000000000000000000 |
Curried functions have many applications, such as eliminating redundant code. For example, a module may require functions of type a->b
, but it is more convenient to write functions of type a*c->b
where there is a fixed relationship between the objects of type a
and c
. A function of type c->(a*c->b)->a->b
can factor out this commonality. This is an example of the adapter pattern.[ citation needed ]
In this example, fund
computes the numerical derivative of a given function f
at point x
:
-funddeltafx=(f(x+delta)-f(x-delta))/(2.0*delta)vald=fn:real->(real->real)->real->real
The type of fund
indicates that it maps a "float" onto a function with the type (real->real)->real->real
. This allows us to partially apply arguments, known as currying. In this case, function d
can be specialised by partially applying it with the argument delta
. A good choice for delta
when using this algorithm is the cube root of the machine epsilon.[ citation needed ]
-vald'=d1E~8;vald'=fn:(real->real)->real->real
The inferred type indicates that d'
expects a function with the type real->real
as its first argument. We can compute an approximation to the derivative of at . The correct answer is .
-d'(fnx=>x*x*x-x-1.0)3.0;valit=25.9999996644:real
The Basis Library [7] has been standardized and ships with most implementations. It provides modules for trees, arrays, and other data structures, and input/output and system interfaces.
For numerical computing, a Matrix module exists (but is currently broken), https://www.cs.cmu.edu/afs/cs/project/pscico/pscico/src/matrix/README.html.
For graphics, cairo-sml is an open source interface to the Cairo graphics library. For machine learning, a library for graphical models exists.
Implementations of Standard ML include the following:
Standard
Derivative
Research
All of these implementations are open-source and freely available. Most are implemented themselves in Standard ML. There are no longer any commercial implementations; Harlequin, now defunct, once produced a commercial IDE and compiler called MLWorks which passed on to Xanalys and was later open-sourced after it was acquired by Ravenbrook Limited on April 26, 2013.
The IT University of Copenhagen's entire enterprise architecture is implemented in around 100,000 lines of SML, including staff records, payroll, course administration and feedback, student project management, and web-based self-service interfaces. [8]
The proof assistants HOL4, Isabelle, LEGO, and Twelf are written in Standard ML. It is also used by compiler writers and integrated circuit designers such as ARM. [9]
ML is a general-purpose, high-level, functional programming language. It is known for its use of the polymorphic Hindley–Milner type system, which automatically assigns the data types of most expressions without requiring explicit type annotations, and ensures type safety; there is a formal proof that a well-typed ML program does not cause runtime type errors. ML provides pattern matching for function arguments, garbage collection, imperative programming, call-by-value and currying. While a general-purpose programming language, ML is used heavily in programming language research and is one of the few languages to be completely specified and verified using formal semantics. Its types and pattern matching make it well-suited and commonly used to operate on other formal languages, such as in compiler writing, automated theorem proving, and formal verification.
OCaml is a general-purpose, high-level, multi-paradigm programming language which extends the Caml dialect of ML with object-oriented features. OCaml was created in 1996 by Xavier Leroy, Jérôme Vouillon, Damien Doligez, Didier Rémy, Ascánder Suárez, and others.
In computer science, denotational semantics is an approach of formalizing the meanings of programming languages by constructing mathematical objects that describe the meanings of expressions from the languages. Other approaches providing formal semantics of programming languages include axiomatic semantics and operational semantics.
Oz is a multiparadigm programming language, developed in the Programming Systems Lab at Université catholique de Louvain, for programming-language education. It has a canonical textbook: Concepts, Techniques, and Models of Computer Programming.
In computer science, a tagged union, also called a variant, variant record, choice type, discriminated union, disjoint union, sum type, or coproduct, is a data structure used to hold a value that could take on several different, but fixed, types. Only one of the types can be in use at any one time, and a tag field explicitly indicates which type is in use. It can be thought of as a type that has several "cases", each of which should be handled correctly when that type is manipulated. This is critical in defining recursive datatypes, in which some component of a value may have the same type as that value, for example in defining a type for representing trees, where it is necessary to distinguish multi-node subtrees and leaves. Like ordinary unions, tagged unions can save storage by overlapping storage areas for each type, since only one is in use at a time.
In computer science, corecursion is a type of operation that is dual to recursion. Whereas recursion works analytically, starting on data further from a base case and breaking it down into smaller data and repeating until one reaches a base case, corecursion works synthetically, starting from a base case and building it up, iteratively producing data further removed from a base case. Put simply, corecursive algorithms use the data that they themselves produce, bit by bit, as they become available, and needed, to produce further bits of data. A similar but distinct concept is generative recursion, which may lack a definite "direction" inherent in corecursion and recursion.
In computer programming languages, a recursive data type is a data type for values that may contain other values of the same type. Data of recursive types are usually viewed as directed graphs.
Caml is a multi-paradigm, general-purpose, high-level, functional programming language which is a dialect of the ML programming language family. Caml was developed in France at French Institute for Research in Computer Science and Automation (INRIA) and École normale supérieure (Paris) (ENS).
In programming languages and type theory, parametric polymorphism allows a single piece of code to be given a "generic" type, using variables in place of actual types, and then instantiated with particular types as needed. Parametrically polymorphic functions and data types are sometimes called generic functions and generic datatypes, respectively, and they form the basis of generic programming.
In computer science, a type class is a type system construct that supports ad hoc polymorphism. This is achieved by adding constraints to type variables in parametrically polymorphic types. Such a constraint typically involves a type class T
and a type variable a
, and means that a
can only be instantiated to a type whose members support the overloaded operations associated with T
.
In computer programming, an enumerated type is a data type consisting of a set of named values called elements, members, enumeral, or enumerators of the type. The enumerator names are usually identifiers that behave as constants in the language. An enumerated type can be seen as a degenerate tagged union of unit type. A variable that has been declared as having an enumerated type can be assigned any of the enumerators as a value. In other words, an enumerated type has values that are different from each other, and that can be compared and assigned, but are not specified by the programmer as having any particular concrete representation in the computer's memory; compilers and interpreters can represent them arbitrarily.
In functional programming, a generalized algebraic data type is a generalization of parametric algebraic data types.
In computer programming, an anonymous function is a function definition that is not bound to an identifier. Anonymous functions are often arguments being passed to higher-order functions or used for constructing the result of a higher-order function that needs to return a function. If the function is only used once, or a limited number of times, an anonymous function may be syntactically lighter than using a named function. Anonymous functions are ubiquitous in functional programming languages and other languages with first-class functions, where they fulfil the same role for the function type as literals do for other data types.
In programming language semantics, normalisation by evaluation (NBE) is a method of obtaining the normal form of terms in the λ-calculus by appealing to their denotational semantics. A term is first interpreted into a denotational model of the λ-term structure, and then a canonical (β-normal and η-long) representative is extracted by reifying the denotation. Such an essentially semantic, reduction-free, approach differs from the more traditional syntactic, reduction-based, description of normalisation as reductions in a term rewrite system where β-reductions are allowed deep inside λ-terms.
Irvine Dataflow (Id) is a general-purpose parallel programming language, started at the University of California at Irvine in 1975 by Arvind and K. P. Gostelow. Arvind continued work with Id at MIT into the 1990s.
In computing, ATS is a multi-paradigm, general-purpose, high-level, functional programming language. It is a dialect of the programming language ML, designed by Hongwei Xi to unify computer programming with formal specification. ATS has support for combining theorem proving with practical programming through the use of advanced type systems. A past version of The Computer Language Benchmarks Game has demonstrated that the performance of ATS is comparable to that of the languages C and C++. By using theorem proving and strict type checking, the compiler can detect and prove that its implemented functions are not susceptible to bugs such as division by zero, memory leaks, buffer overflow, and other forms of memory corruption by verifying pointer arithmetic and reference counting before the program compiles. Also, by using the integrated theorem-proving system of ATS (ATS/LF), the programmer may make use of static constructs that are intertwined with the operative code to prove that a function conforms to its specification.
In computer science, polymorphic recursion refers to a recursive parametrically polymorphic function where the type parameter changes with each recursive invocation made, instead of staying constant. Type inference for polymorphic recursion is equivalent to semi-unification and therefore undecidable and requires the use of a semi-algorithm or programmer-supplied type annotations.
This article describes the features in the programming language Haskell.
Haskell is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research, and industrial applications, Haskell has pioneered a number of programming language features such as type classes, which enable type-safe operator overloading, and monadic input/output (IO). It is named after logician Haskell Curry. Haskell's main implementation is the Glasgow Haskell Compiler (GHC).
Flix is a functional, imperative, and logic programming language developed at Aarhus University, with funding from the Independent Research Fund Denmark, and by a community of open source contributors. The Flix language supports algebraic data types, pattern matching, parametric polymorphism, currying, higher-order functions, extensible records, channel and process-based concurrency, and tail call elimination. Two notable features of Flix are its type and effect system and its support for first-class Datalog constraints.
About Standard ML
About successor ML
Practical
Academic