OCaml

Last updated
OCaml
OCaml Logo.svg
Paradigm Multi-paradigm: functional, imperative, object-oriented
Family ML
Designed by Xavier Leroy, Jérôme Vouillon, Damien Doligez, Didier Rémy, Ascánder Suárez
Developer INRIA
First appeared1996;24 years ago (1996)
Stable release
4.09.0 / September 18, 2019;4 months ago (2019-09-18) [1]
Typing discipline Inferred, static, strong, structural
Implementation languageOCaml, C
Platform IA-32, x86-64, Power, SPARC, ARM 32-64
OS Cross-platform: Unix, macOS, Windows
License LGPLv2.1
Filename extensions .ml, .mli
Website ocaml.org
Influenced by
Caml, Standard ML, Pascal
Influenced
ATS, Coq, Elm, F#, F*, Haxe, Opa, Reason, Rust, Scala

OCaml ( /ˈkæməl/ oh-KAM-əl) (formerly Objective Caml) is the main implementation of the Caml programming language, created in 1996 by Xavier Leroy, Jérôme Vouillon, Damien Doligez, Didier Rémy, Ascánder Suárez, and others. It extends Caml with object-oriented features, and is a member of the ML family.

Contents

The OCaml toolchain includes an interactive top-level interpreter, a bytecode compiler, an optimizing native code compiler, a reversible debugger, and a package manager (OPAM). It has a large standard library, which makes it useful for many of the same applications as Python and Perl, and has robust modular and object-oriented programming constructs that make it applicable for large-scale software engineering.

The acronym CAML originally stood for Categorical Abstract Machine Language, but OCaml omits this abstract machine. [2] OCaml is a free and open-source software project managed and principally maintained by the French Institute for Research in Computer Science and Automation (INRIA). In the early 2000s, elements from OCaml were adopted by many languages, notably F# and Scala.

Philosophy

ML-derived languages are best known for their static type systems and type-inferring compilers. OCaml unifies functional, imperative, and object-oriented programming under an ML-like type system. Thus, programmers need not be highly familiar with the pure functional language paradigm to use OCaml.

By requiring the programmer to work within the constraints of its static type system, OCaml eliminates many of the type-related runtime problems associated with dynamically typed languages. Also, OCaml's type-inferring compiler greatly reduces the need for the manual type annotations that are required in most statically typed languages. For example, the data type of variables and the signature of functions usually need not be declared explicitly, as they do in languages like Java and C#, because they can be inferred from the operators and other functions that are applied to the variables and other values in the code. Effective use of OCaml's type system can require some sophistication on the part of a programmer, but this discipline is rewarded with reliable, high-performance software.

OCaml is perhaps most distinguished from other languages with origins in academia by its emphasis on performance. Its static type system prevents runtime type mismatches and thus obviates runtime type and safety checks that burden the performance of dynamically typed languages, while still guaranteeing runtime safety, except when array bounds checking is turned off or when some type-unsafe features like serialization are used. These are rare enough that avoiding them is quite possible in practice.

Aside from type-checking overhead, functional programming languages are, in general, challenging to compile to efficient machine language code, due to issues such as the funarg problem. Along with standard loop, register, and instruction optimizations, OCaml's optimizing compiler employs static program analysis methods to optimize value boxing and closure allocation, helping to maximize the performance of the resulting code even if it makes extensive use of functional programming constructs.

Xavier Leroy has stated that "OCaml delivers at least 50% of the performance of a decent C compiler", [3] although a direct comparison is impossible. Some functions in the OCaml standard library are implemented with faster algorithms than equivalent functions in the standard libraries of other languages. For example, the implementation of set union in the OCaml standard library in theory is asymptotically faster than the equivalent function in the standard libraries of imperative languages (e.g., C++, Java) because the OCaml implementation exploits the immutability of sets to reuse parts of input sets in the output (see persistent data structure).

Features

OCaml features: a static type system, type inference, parametric polymorphism, tail recursion, pattern matching, first class lexical closures, functors (parametric modules), exception handling, and incremental generational automatic garbage collection.

OCaml is notable for extending ML-style type inference to an object system in a general-purpose language. This permits structural subtyping, where object types are compatible if their method signatures are compatible, regardless of their declared inheritance (an unusual feature in statically typed languages).

A foreign function interface for linking to C primitives is provided, including language support for efficient numerical arrays in formats compatible with both C and Fortran. OCaml also supports creating libraries of OCaml functions that can be linked to a main program in C, so that an OCaml library can be distributed to C programmers who have no knowledge or installation of OCaml.

The OCaml distribution contains:

The native code compiler is available for many platforms, including Unix, Microsoft Windows, and Apple macOS. Portability is achieved through native code generation support for major architectures: IA-32, X86-64 (AMD64), Power, SPARC, ARM, and ARM64. [4]

OCaml bytecode and native code programs can be written in a multithreaded style, with preemptive context switching. However, because the garbage collector of the INRIA OCaml system (which is the only currently available full implementation of the language) is not designed for concurrency, symmetric multiprocessing is unsupported. [5] OCaml threads in the same process execute by time sharing only. There are however several libraries for distributed computing such as Functory and ocamlnet/Plasma.

Development environment

Since 2011, many new tools and libraries have been contributed to the OCaml development environment:

Code examples

Snippets of OCaml code are most easily studied by entering them into the top-level. This is an interactive OCaml session that prints the inferred types of resulting or defined expressions. The OCaml top-level is started by simply executing the OCaml program:

$ ocaml      Objective Caml version 3.09.0#

Code can then be entered at the "#" prompt. For example, to calculate 1+2*3:

#1 + 2 * 3;;- : int = 7

OCaml infers the type of the expression to be "int" (a machine-precision integer) and gives the result "7".

Hello World

The following program "hello.ml":

print_endline"Hello World!"

can be compiled into a bytecode executable:

$ ocamlc hello.ml -o hello

or compiled into an optimized native-code executable:

$ ocamlopt hello.ml -o hello

and executed:

$ ./hello Hello World!$

Summing a list of integers

Lists are one of the fundamental datatypes in OCaml. The following code example defines a recursive function sum that accepts one argument xs. (Note the keyword rec.) The function recursively iterates over a given list and provides a sum of integer elements. The match statement has similarities to C's switch element, though it is far more general.

letrecsumxs=matchxswith|[]->0(* yield 0 if xs has the form [] *)|x::xs'->x+sumxs';;(* recursive call if xs has the form x::xs' for suitable x and xs' *)
#sum[1;2;3;4;5];;-:int=15

Another way is to use standard fold function that works with lists.

letsumxs=List.fold_left(funaccx->acc+x)0xs;;
#sum[1;2;3;4;5];;-:int=15

Since the anonymous function is simply the application of the + operator, this can be shortened to:

letsumxs=List.fold_left(+)0xs

Furthermore, one can omit the list argument by making use of a partial application:

letsum=List.fold_left(+)0

Quicksort

OCaml lends itself to concisely expressing recursive algorithms. The following code example implements an algorithm similar to quicksort that sorts a list in increasing order.

letrecqsort=function|[]->[]|pivot::rest->letis_lessx=x<pivotinletleft,right=List.partitionis_lessrestinqsortleft@[pivot]@qsortright

Birthday problem

The following program calculates the smallest number of people in a room for whom the probability of completely unique birthdays is less than 50% (the birthday problem, where for 1 person the probability is 365/365 (or 100%), for 2 it is 364/365, for 3 it is 364/365 × 363/365, etc.) (answer = 23).

letyear_size=365.letrecbirthday_paradoxprobpeople=letprob=(year_size-.floatpeople)/.year_size*.probinifprob<0.5thenPrintf.printf"answer = %d\n"(people+1)elsebirthday_paradoxprob'(people+1);;birthday_paradox1.01

Church numerals

The following code defines a Church encoding of natural numbers, with successor (succ) and addition (add). A Church numeral n is a higher-order function that accepts a function f and a value x and applies f to x exactly n times. To convert a Church numeral from a functional value to a string, we pass it a function that prepends the string "S" to its input and the constant string "0".

letzerofx=xletsuccnfx=f(nfx)letone=succzerolettwo=succ(succzero)letaddn1n2fx=n1f(n2fx)letto_stringn=n(funk->"S"^k)"0"let_=to_string(add(succtwo)two)

Arbitrary-precision factorial function (libraries)

A variety of libraries are directly accessible from OCaml. For example, OCaml has a built-in library for arbitrary-precision arithmetic. As the factorial function grows very rapidly, it quickly overflows machine-precision numbers (typically 32- or 64-bits). Thus, factorial is a suitable candidate for arbitrary-precision arithmetic.

In OCaml, the Num module (now superseded by the ZArith module) provides arbitrary-precision arithmetic and can be loaded into a running top-level using:

##use"topfind";;##require"num";;#openNum;;

The factorial function may then be written using the arbitrary-precision numeric operators =/, */ and -/ :

#letrecfactn=ifn=/Int0thenInt1elsen*/fact(n-/Int1);;valfact:Num.num->Num.num=<fun>

This function can compute much larger factorials, such as 120!:

#string_of_num(fact(Int120));;-:string="6689502913449127057588118054090372586752746333138029810295671352301633557244962989366874165271984981308157637893214090552534408589408121859898481114389650005964960521256960000000000000000000000000000"

Triangle (graphics)

The following program renders a rotating triangle in 2D using OpenGL:

let()=ignore(Glut.initSys.argv);Glut.initDisplayMode~double_buffer:true();ignore(Glut.createWindow~title:"OpenGL Demo");letanglet=10.*.t*.tinletrender()=GlClear.clear[`color];GlMat.load_identity();GlMat.rotate~angle:(angle(Sys.time()))~z:1.();GlDraw.begins`triangles;List.iterGlDraw.vertex2[-1.,-1.;0.,1.;1.,-1.];GlDraw.ends();Glut.swapBuffers()inGlMat.mode`modelview;Glut.displayFunc~cb:render;Glut.idleFunc~cb:(SomeGlut.postRedisplay);Glut.mainLoop()

The LablGL bindings to OpenGL are required. The program may then be compiled to bytecode with:

  $ ocamlc -I +lablGL lablglut.cma lablgl.cma simple.ml -o simple

or to nativecode with:

  $ ocamlopt -I +lablGL lablglut.cmxa lablgl.cmxa simple.ml -o simple

or, more simply, using the ocamlfind build command

  $ ocamlfind opt simple.ml -package lablgl.glut -linkpkg -o simple

and run:

  $ ./simple

Far more sophisticated, high-performance 2D and 3D graphical programs can be developed in OCaml. Thanks to the use of OpenGL and OCaml, the resulting programs can be cross-platform, compiling without any changes on many major platforms.

Fibonacci sequence

The following code calculates the Fibonacci sequence of a number n inputted. It uses tail recursion and pattern matching.

letfibn=letrecfib_auxmab=matchmwith|0->a|_->fib_aux(m-1)b(a+b)infib_auxn01

Higher-order functions

Functions may take functions as input and return functions as result. For example, applying twice to a function f yields a function that applies f two times to its argument.

lettwice(f:'a->'a)=fun(x:'a)->f(fx);;letinc(x:int):int=x+1;;letadd2=twiceinc;;letinc_str(x:string):string=x^" "^x;;letadd_str=twice(inc_str);;
#add298;;-:int=100#add_str"Test";;-:string="Test Test Test Test"

The function twice uses a type variable 'a to indicate that it can be applied to any function f mapping from a type 'a to itself, rather than only to int->int functions. In particular, twice can even be applied to itself.

#letfourtimesf=(twicetwice)f;;valfourtimes:('a->'a)->'a->'a=<fun>#letadd4=fourtimesinc;;valadd4:int->int=<fun>#add498;;-:int=102

Derived languages

MetaOCaml

MetaOCaml [6] is a multi-stage programming extension of OCaml enabling incremental compiling of new machine code during runtime. Under some circumstances, significant speedups are possible using multistage programming, because more detailed information about the data to process is available at runtime than at the regular compile time, so the incremental compiler can optimize away many cases of condition checking, etc.

As an example: if at compile time it is known that some power function x->x^n is needed often, but the value of n is known only at runtime, a two-stage power function can be used in MetaOCaml:

letrecpowernx=ifn=0then.<1>.elseifevennthensqr(power(n/2)x)else.<.~x*..~(power(n-1)x)>.

As soon as n is known at runtime, a specialized and very fast power function can be created:

.<funx->.~(power5.<x>.)>.

The result is:

funx_1->(x_1*lety_3=lety_2=(x_1*1)in(y_2*y_2)in(y_3*y_3))

The new function is automatically compiled.

Other derived languages

Software written in OCaml

Users

Several dozen companies use OCaml to some degree. [12] Notable examples include:

languages from which OCaml evolved

Related Research Articles

In computer science, functional programming is a programming paradigm—a style of building the structure and elements of computer programs—that treats computation as the evaluation of mathematical functions and avoids changing-state and mutable data. It is a declarative programming paradigm in that programming is done with expressions or declarations instead of statements. In functional code, the output value of a function depends only on its arguments, so calling a function with the same value for an argument always produces the same result. This is in contrast to imperative programming where, in addition to a function's arguments, global program state can affect a function's resulting value. One of the key motivations for the development of functional programming is making a program easier to understand by eliminating changes in state that do not depend on function inputs which are called side effects.

ML is a general-purpose functional programming language. It has roots in Lisp, and has been characterized as "Lisp with types". ML is a statically-scoped functional programming language like Scheme. It is known for its use of the polymorphic Hindley–Milner type system, which automatically assigns the types of most expressions without requiring explicit type annotations, and ensures type safety – there is a formal proof that a well-typed ML program does not cause runtime type errors. ML provides pattern matching for function arguments, garbage collection, imperative programming, call-by-value and currying. It is used heavily in programming language research and is one of the few languages to be completely specified and verified using formal semantics. Its types and pattern matching make it well-suited and commonly used to operate on other formal languages, such as in compiler writing, automated theorem proving, and formal verification.

In computer programming, the scope of a name binding—an association of a name to an entity, such as a variable—is the region of a computer program where the binding is valid: where the name can be used to refer to the entity. Such a region is referred to as a scope block. In other parts of the program the name may refer to a different entity, or to nothing at all.

Standard ML is a general-purpose, modular, functional programming language with compile-time type checking and type inference. It is popular among compiler writers and programming language researchers, as well as in the development of theorem provers.

In programming languages, a type system is a logical system comprising a set of rules that assigns a property called a type to the various constructs of a computer program, such as variables, expressions, functions or modules. These types formalize and enforce the otherwise implicit categories the programmer uses for algebraic data types, data structures, or other components. The main purpose of a type system is to reduce possibilities for bugs in computer programs by defining interfaces between different parts of a computer program, and then checking that the parts have been connected in a consistent way. This checking can happen statically, dynamically, or as a combination of both. Type systems have other purposes as well, such as expressing business rules, enabling certain compiler optimizations, allowing for multiple dispatch, providing a form of documentation, etc.

F Sharp (programming language) Microsoft programming language

F# is a general purpose, strongly typed, multi-paradigm programming language that encompasses functional, imperative, and object-oriented programming methods. F# is most often used as a cross-platform Common Language Infrastructure (CLI) language, but it can also generate JavaScript and graphics processing unit (GPU) code.

D (programming language) Multi-paradigm system programming language

D, also known as Dlang, is a multi-paradigm system programming language created by Walter Bright at Digital Mars and released in 2001. Andrei Alexandrescu joined the design and development effort in 2007. Though it originated as a re-engineering of C++, D is a distinct language. It has redesigned some core C++ features, while also sharing characteristics of other languages, notably Java, Python, Ruby, C#, and Eiffel.

In programming languages and type theory, polymorphism is the provision of a single interface to entities of different types or the use of a single symbol to represent multiple different types.

Type inference refers to the automatic detection of the data type of an expression in a programming language.

In computer programming, especially functional programming and type theory, an algebraic data type is a kind of composite type, i.e., a type formed by combining other types.

In computer science, a programming language is said to have first-class functions if it treats functions as first-class citizens. This means the language supports passing functions as arguments to other functions, returning them as the values from other functions, and assigning them to variables or storing them in data structures. Some programming language theorists require support for anonymous functions as well. In languages with first-class functions, the names of functions do not have any special status; they are treated like ordinary variables with a function type. The term was coined by Christopher Strachey in the context of "functions as first-class citizens" in the mid-1960s.

Caml dialect of the ML programming language family

Caml is a multi-paradigm, general-purpose programming language which is a dialect of the ML programming language family. Caml was developed in France at INRIA and ENS.

JoCaml is an experimental functional programming language derived from OCaml. It integrates the primitives of the join-calculus to enable flexible, type-checked concurrent and distributed programming. The current version of JoCaml is a re-implementation of the now unmaintained JoCaml made by Fabrice Le Fessant, featuring a modified syntax and improved OCaml compatibility compared to the original.

Haxe cross-platform open-source programming language

Haxe is a high-level cross-platform multi-paradigm programming language and compiler that can produce applications and source code, for many different computing platforms, from one code-base. It is free and open-source software, distributed under the GNU General Public License (GPL) version 2, and the standard library under the MIT License.

In computer programming, an anonymous function is a function definition that is not bound to an identifier. Anonymous functions are often arguments being passed to higher-order functions, or used for constructing the result of a higher-order function that needs to return a function. If the function is only used once, or a limited number of times, an anonymous function may be syntactically lighter than using a named function. Anonymous functions are ubiquitous in functional programming languages and other languages with first-class functions, where they fulfill the same role for the function type as literals do for other data types.

Generics are a facility of generic programming that were added to the Java programming language in 2004 within version J2SE 5.0. They were designed to extend Java's type system to allow "a type or method to operate on objects of various types while providing compile-time type safety". The aspect compile-time type safety was not fully achieved, since it was shown in 2016 that it is not guaranteed in all cases.

Nemerle is a general-purpose high-level statically typed programming language designed for platforms using the Common Language Infrastructure (.NET/Mono). It offers functional, object-oriented (OO) and imperative features. It has a simple C#-like syntax and a powerful metaprogramming system. In June 2012, the core developers of Nemerle were hired by the Czech software development company JetBrains. The team is focusing on developing Nitra, a framework to implement extant and new programming languages. This framework will likely be used to create future versions of Nemerle.

In computer programming, programming languages are often colloquially classified as to whether the language's type system makes it strongly typed or weakly typed.

Idris is a purely functional programming language with dependent types, optional lazy evaluation, and features such as a totality checker. Idris may be used as a proof assistant, but it is designed to be a general-purpose programming language similar to Haskell.

References

  1. "Releases – OCaml". ocaml.org.
  2. "A History of OCaml" . Retrieved 24 December 2016.
  3. Linux Weekly News.
  4. "ocaml/asmcomp at trunk · ocaml/ocaml · GitHub". GitHub. Retrieved 2 May 2015.
  5. "Archives of the Caml mailing list > Message from Xavier Leroy" . Retrieved 2 May 2015.
  6. oleg-at-okmij.org. "BER MetaOCaml". okmij.org.
  7. "Messenger.com Now 50% Converted to Reason · Reason". reasonml.github.io. Retrieved 2018-02-27.
  8. "Flow: A Static Type Checker for JavaScript". Flow.
  9. "Infer static analyzer". Infer.
  10. "GitHub - facebook/pyre-check: Performant type-checking for python". 9 February 2019 via GitHub.
  11. "WebAssembly specification, reference interpreter, and test suite.: WebAssembly/spec". 10 February 2019 via GitHub.
  12. "Companies using OCaml". OCaml.org. Retrieved 17 August 2014.
  13. "BuckleScript: The 1.0 release has arrived! | Tech at Bloomberg". Tech at Bloomberg. 8 September 2016. Retrieved 21 May 2017.
  14. Yaron Minsky (1 November 2011). "OCaml for the Masses" . Retrieved 2 May 2015.