This article has multiple issues. Please help improve it or discuss these issues on the talk page . (Learn how and when to remove these messages)
|
Paradigms | multi-paradigm: functional, imperative, object-oriented, concurrent, modular |
---|---|
Family | ML: Caml: OCaml: Dependent ML |
Designed by | Hongwei Xi |
Developer | Boston University |
First appeared | 2006 |
Stable release | ATS2-0.4.2 [1] / November 14, 2020 |
Typing discipline | static, dependent |
License | GPLv3 |
Filename extensions | .sats, .dats, .hats |
Website | www |
Influenced by | |
Dependent ML, ML, OCaml, C++ |
In computing, ATS (Applied Type System) is a multi-paradigm, general-purpose, high-level, functional programming language. It is a dialect of the programming language ML, designed by Hongwei Xi to unify computer programming with formal specification. ATS has support for combining theorem proving with practical programming through the use of advanced type systems. [2] A past version of The Computer Language Benchmarks Game has demonstrated that the performance of ATS is comparable to that of the languages C and C++. [3] By using theorem proving and strict type checking, the compiler can detect and prove that its implemented functions are not susceptible to bugs such as division by zero, memory leaks, buffer overflow, and other forms of memory corruption by verifying pointer arithmetic and reference counting before the program compiles. Also, by using the integrated theorem-proving system of ATS (ATS/LF), the programmer may make use of static constructs that are intertwined with the operative code to prove that a function conforms to its specification.
ATS consists of a static component and a dynamic component. The static component is used for handling types, whereas the dynamic component is used for programs. While ATS primarily relies on a call-by-value functional language at its core, it possesses the ability to accommodate diverse programming paradigms, such as functional, imperative, object-oriented, concurrent, and modular.
According to the author, ATS was inspired by Martin-Löf's constructive type theory, which was originally developed for the purpose of establishing a foundation for mathematics. Xi designed ATS “in an attempt to combine specification and implementation into a single programming language.” [4]
ATS is derived mostly from the languages ML and OCaml. An earlier language, Dependent ML, by the same author has been incorporated into ATS.
The first implementation, ATS/Proto (ATS0), was written in OCaml and was released in 2006. This was the pre-first edition of ATS and is no longer maintained. A year later, ATS/Geizella, the first implementation of ATS1, was released. This version was also written in OCaml and is no longer used actively. [5]
The second version of ATS1, ATS/Anairiats, released in 2008, was a major milestone in the development of the language, as the language was able to bootstrap itself. This version was written almost completely in ATS1. The current version, ATS/Postiats (ATS2) was released in 2013. Like its predecessor, this version is also almost entirely written in ATS1. The most recently released version is ATS2-0.4.2. [5]
As of 2024 [update] , ATS is used mostly for research; less than 200 GitHub repositories contain code written in ATS. This is far less than other functional languages, such as OCaml and Standard ML, which have over 16,000 and 3,000 repositories, respectively. This is likely due to the steep learning associated with ATS, which is present because of the language's use of dependent type-checking and template instance resolution. These features usually require the use of explicit quantifiers, which demand further learning. [6]
As of 2024 [update] , ATS/Xanadu (ATS3) is being developed actively in ATS2, with the hope of reducing the learning needed by two main improvements:
With these improvements, Xi hopes for ATS to become much more accessible and easier to learn. The main goal of ATS3 is to transform ATS from a language mainly used for research, into one strong enough for large-scale industrial software development. [5]
The main focus of ATS is to support formal verification via automated theorem proving, combined with practical programming. [2] Theorem proving can prove, for example, that an implemented function produces no memory leaks. It can also prevent other bugs that might otherwise be found only during testing. It incorporates a system similar to those of proof assistants which usually only aim to verify mathematical proofs—except ATS uses this ability to prove that the implementations of its functions operate correctly, and produce the expected output.
As a simple example, in a function using division, the programmer may prove that the divisor will never equal zero, preventing a division by zero error. Let's say, the divisor 'X' was computed as 5 times the length of list 'A'. One can prove, that in the case of a non-empty list, 'X' is non-zero, since 'X' is the product of two non-zero numbers (5 and the length of 'A'). A more practical example would be proving through reference counting that the retain count on an allocated block of memory is being counted correctly for each pointer. Then one can know, and quite literally prove, that the object will not be deallocated prematurely, and that memory leaks will not occur.
The benefit of the ATS system is that since all theorem proving occurs strictly within the compiler, it has no effect on the speed of the executable program. ATS code is often harder to compile than standard C code, but once it compiles, it is certain that it is running correctly to the degree specified by the proofs (assuming the compiler and runtime system are correct).
In ATS proofs are separate from implementation, so it is possible to implement a function without proving it, if desired.
According to the author, ATS's efficiency [7] is largely due to the way that data is represented in the language and tail-call optimizations (which are generally important for the efficiency of functional languages). Data can be stored in a flat or unboxed representation rather than a boxed representation.
dataprop
expresses predicates as algebraic types.
Predicates in pseudo‑code somewhat similar to ATS source (see below for valid ATS source):
FACT(n, r) iff fact(n) = r MUL(n, m, prod) iff n * m = prod FACT(n, r) = FACT(0, 1) | FACT(n, r) iff FACT(n-1, r1) and MUL(n, r1, r) // for n > 0 // expresses fact(n) = r iff r = n * r1 and r1 = fact(n-1)
In ATS code:
datapropFACT(int,int)=|FACTbas(0,1)// basic case: FACT(0, 1)|{n:int|n>0}{r,r1:int}// inductive caseFACTind(n,r)of(FACT(n-1,r1),MUL(n,r1,r))
where FACT (int, int) is a proof type
Non tail-recursive factorial with proposition or "Theorem" proving through the construction dataprop.
The evaluation of fact1(n-1)
returns a pair (proof_n_minus_1 | result_of_n_minus_1)
which is used in the calculation of fact1(n)
. The proofs express the predicates of the proposition.
[FACT(n,r)]implies[fact(n)=r][MUL(n,m,prod)]implies[n*m=prod]FACT(0,1)FACT(n,r)iffFACT(n-1,r1)andMUL(n,r1,r)foralln>0
To remember:
{...} universal quantification [...] existential quantification (... | ...) (proof | value) @(...) flat tuple or variadic function parameters tuple .<...>. termination metric [8]
#include"share/atspre_staload.hats"datapropFACT(int,int)=|FACTbas(0,1)of()//basiccase|{n:nat}{r:int}//inductivecaseFACTind(n+1,(n+1)*r)of(FACT(n,r))(* note that int(x) , also int x, is the monovalued type of the int x value. The function signature below says: forall n:nat, exists r:int where fact( num: int(n)) returns (FACT (n, r) | int(r)) *)funfact{n:nat}.<n>.(n:int(n)):[r:int](FACT(n,r)|int(r))=(ifcase|n>0=>((FACTind(pf1)|n*r1))where{val(pf1|r1)=fact(n-1)}|_(*else*)=>(FACTbas()|1))
implementmain0(argc,argv)={val()=if(argc!=2)thenprerrln!("Usage: ",argv[0]," <integer>")val()=assert(argc>=2)valn0=g0string2int(argv[1])valn0=g1ofg0(n0)val()=assert(n0>=0)val(_(*pf*)|res)=fact(n0)val((*void*))=println!("fact(",n0,") = ",res)}
This can all be added to a single file and compiled as follows. Compiling should work with various back end C compilers, e.g., GNU Compiler Collection (gcc). Garbage collection is not used unless explicitly stated with -D_ATS_GCATS
) [9]
$ patsccfact1.dats-ofact1 $ ./fact14
compiles and gives the expected result
valx:@(int,char)=@(15,'c')// x.0 = 15 ; x.1 = 'c'val@(a,b)=x// pattern matching binding, a= 15, b='c'valx=@{first=15,second='c'}// x.first = 15val@{first=a,second=b}=x// a= 15, b='c'val@{second=b,...}=x// with omission, b='c'
valx:'(int,char)='(15,'c')// x.0 = 15 ; x.1 = 'c'val'(a,b)=x// a= 15, b='c'valx='{first=15,second='c'}// x.first = 15val'{first=a,second=b}=x// a= 15, b='c'val'{second=b,...}=x// b='c'
|
as separator, some functions return wrapped the result value with an evaluation of predicatesval ( predicate_proofs | values) = myfunct params
{...} universal quantification [...] existential quantification (...) parenthetical expression or tuple (... | ...) (proofs | values)
.<...>. termination metric @(...) flat tuple or variadic function parameters tuple (see example's printf) @[byte][BUFLEN] type of an array of BUFLEN values of type byte [10] @[byte][BUFLEN]() array instance @[byte][BUFLEN](0) array initialized to 0
sortdefnat={a:int|a>=0}// from prelude: ∀ ''a'' ∈ int ...typedefString=[a:nat]string(a)// [..]: ∃ ''a'' ∈ nat ...
// {..}: ∀ a,b ∈ type ...fun{a,b:type}swap_type_type(xy:@(a,b)):@(b,a)=(xy.1,xy.0)
T @ L
asserts that there is a view of type T at location Lfun{a:t@ype}ptr_get0{l:addr}(pf:a@l|p:ptrl):@(a@l|a)fun{a:t@ype}ptr_set0{l:addr}(pf:a?@l|p:ptrl,x:a):@(a@l|void)
ptr_get0 (T)
is ∀ l : addr . ( T @ l | ptr( l ) ) -> ( T @ l | T) // see manual, section 7.1. Safe Memory Access through Pointers
[12] viewdefarray_v(a:viewt@ype,n:int,l:addr)=@[a][n]@l
as in case+, val+, type+, viewtype+, ...
staload "foo.sats" // foo.sats is loaded and then opened into the current namespace staload F = "foo.sats" // to use identifiers qualified as $F.bar dynload "foo.dats" // loaded dynamically at run-time
Dataviews are often declared to encode recursively defined relations on linear resources. [13]
dataviewarray_v(a:viewt@ype+,int,addr)=|{l:addr}array_v_none(a,0,l)|{n:nat}{l:addr}array_v_some(a,n+1,l)of(a@l,array_v(a,n,l+sizeofa))
Datatypes [14]
datatype workday = Mon | Tue | Wed | Thu | Fri
lists
datatype list0 (a:t@ype) = list0_cons (a) of (a, list0 a) | list0_nil (a)
A dataviewtype is similar to a datatype, but it is linear. With a dataviewtype, the programmer is allowed to explicitly free (or deallocate) in a safe manner the memory used for storing constructors associated with the dataviewtype. [15]
local variables
var res: int with pf_res = 1 // introduces pf_res as an alias of view @ (res)
on stack array allocation:
#define BUFLEN 10var!p_bufwithpf_buf=@[byte][BUFLEN](0)// pf_buf = @[byte][BUFLEN](0) @ p_buf
See val and var declarations [17]
In algorithmic information theory, the Kolmogorov complexity of an object, such as a piece of text, is the length of a shortest computer program that produces the object as output. It is a measure of the computational resources needed to specify the object, and is also known as algorithmic complexity, Solomonoff–Kolmogorov–Chaitin complexity, program-size complexity, descriptive complexity, or algorithmic entropy. It is named after Andrey Kolmogorov, who first published on the subject in 1963 and is a generalization of classical information theory.
ML is a general-purpose, high-level, functional programming language. It is known for its use of the polymorphic Hindley–Milner type system, which automatically assigns the data types of most expressions without requiring explicit type annotations, and ensures type safety; there is a formal proof that a well-typed ML program does not cause runtime type errors. ML provides pattern matching for function arguments, garbage collection, imperative programming, call-by-value and currying. While a general-purpose programming language, ML is used heavily in programming language research and is one of the few languages to be completely specified and verified using formal semantics. Its types and pattern matching make it well-suited and commonly used to operate on other formal languages, such as in compiler writing, automated theorem proving, and formal verification.
OCaml is a general-purpose, high-level, multi-paradigm programming language which extends the Caml dialect of ML with object-oriented features. OCaml was created in 1996 by Xavier Leroy, Jérôme Vouillon, Damien Doligez, Didier Rémy, Ascánder Suárez, and others.
Standard ML (SML) is a general-purpose, high-level, modular, functional programming language with compile-time type checking and type inference. It is popular for writing compilers, for programming language research, and for developing theorem provers.
Generic programming is a style of computer programming in which algorithms are written in terms of data types to-be-specified-later that are then instantiated when needed for specific types provided as parameters. This approach, pioneered by the ML programming language in 1973, permits writing common functions or types that differ only in the set of types on which they operate when used, thus reducing duplicate code.
In computer programming, especially functional programming and type theory, an algebraic data type (ADT) is a kind of composite type, i.e., a type formed by combining other types.
The syntax of the C programming language is the set of rules governing writing of software in C. It is designed to allow for programs that are extremely terse, have a close relationship with the resulting object code, and yet provide relatively high-level data abstraction. C was the first widely successful high-level language for portable operating-system development.
In computer science, a pointer is an object in many programming languages that stores a memory address. This can be that of another value located in computer memory, or in some cases, that of memory-mapped computer hardware. A pointer references a location in memory, and obtaining the value stored at that location is known as dereferencing the pointer. As an analogy, a page number in a book's index could be considered a pointer to the corresponding page; dereferencing such a pointer would be done by flipping to the page with the given page number and reading the text found on that page. The actual format and content of a pointer variable is dependent on the underlying computer architecture.
In the C++ programming language, a reference is a simple reference datatype that is less powerful but safer than the pointer type inherited from C. The name C++ reference may cause confusion, as in computer science a reference is a general concept datatype, with pointers and C++ references being specific reference datatype implementations. The definition of a reference in C++ is such that it does not need to exist. It can be implemented as a new name for an existing object.
In some programming languages, const is a type qualifier that indicates that the data is read-only. While this can be used to declare constants, const in the C family of languages differs from similar constructs in other languages in that it is part of the type, and thus has complicated behavior when combined with pointers, references, composite data types, and type-checking. In other languages, the data is not in a single memory location, but copied at compile time for each use. Languages which use it include C, C++, D, JavaScript, Julia, and Rust.
In computer science, object composition and object aggregation are closely related ways to combine objects or data types into more complex ones. In conversation, the distinction between composition and aggregation is often ignored. Common kinds of compositions are objects used in object-oriented programming, tagged unions, sets, sequences, and various graph structures. Object compositions relate to, but are not the same as, data structures.
In computer science and logic, a dependent type is a type whose definition depends on a value. It is an overlapping feature of type theory and type systems. In intuitionistic type theory, dependent types are used to encode logic's quantifiers like "for all" and "there exists". In functional programming languages like Agda, ATS, Coq, F*, Epigram, Idris, and Lean, dependent types help reduce bugs by enabling the programmer to assign types that further restrain the set of possible implementations.
In computer programming languages, a recursive data type is a data type for values that may contain other values of the same type. Data of recursive types are usually viewed as directed graphs.
sizeof is a unary operator in the programming languages C and C++. It generates the storage size of an expression or a data type, measured in the number of char-sized units. Consequently, the construct sizeof (char) is guaranteed to be 1. The actual number of bits of type char is specified by the preprocessor macro CHAR_BIT, defined in the standard include file limits.h. On most modern computing platforms this is eight bits. The result of sizeof has an unsigned integer type that is usually denoted by size_t.
The C and C++ programming languages are closely related but have many significant differences. C++ began as a fork of an early, pre-standardized C, and was designed to be mostly source-and-link compatible with C compilers of the time. Due to this, development tools for the two languages are often integrated into a single product, with the programmer able to specify C or C++ as their source language.
In computer programming, an enumerated type is a data type consisting of a set of named values called elements, members, enumeral, or enumerators of the type. The enumerator names are usually identifiers that behave as constants in the language. An enumerated type can be seen as a degenerate tagged union of unit type. A variable that has been declared as having an enumerated type can be assigned any of the enumerators as a value. In other words, an enumerated type has values that are different from each other, and that can be compared and assigned, but are not specified by the programmer as having any particular concrete representation in the computer's memory; compilers and interpreters can represent them arbitrarily.
In programming language semantics, normalisation by evaluation (NBE) is a method of obtaining the normal form of terms in the λ-calculus by appealing to their denotational semantics. A term is first interpreted into a denotational model of the λ-term structure, and then a canonical (β-normal and η-long) representative is extracted by reifying the denotation. Such an essentially semantic, reduction-free, approach differs from the more traditional syntactic, reduction-based, description of normalisation as reductions in a term rewrite system where β-reductions are allowed deep inside λ-terms.
In computer programming, a variable-length array (VLA), also called variable-sized or runtime-sized, is an array data structure whose length is determined at runtime, instead of at compile time. In the language C, the VLA is said to have a variably modified data type that depends on a value.
In programming languages and type theory, an option type or maybe type is a polymorphic type that represents encapsulation of an optional value; e.g., it is used as the return type of functions which may or may not return a meaningful value when they are applied. It consists of a constructor which either is empty, or which encapsulates the original data type A
.
Dafny is an imperative and functional compiled language that compiles to other programming languages, such as C#, Java, JavaScript, Go and Python. It supports formal specification through preconditions, postconditions, loop invariants, loop variants, termination specifications and read/write framing specifications. The language combines ideas from the functional and imperative paradigms; it includes support for object-oriented programming. Features include generic classes, dynamic allocation, inductive datatypes and a variation of separation logic known as implicit dynamic frames for reasoning about side effects. Dafny was created by Rustan Leino at Microsoft Research after his previous work on developing ESC/Modula-3, ESC/Java, and Spec#.