ATS (programming language)

Last updated
ATS
The ATS Logo.svg
Paradigms multi-paradigm: functional, imperative, object-oriented, concurrent, modular
Family ML: Caml: OCaml: Dependent ML
Designed by Hongwei Xi
Developer Boston University
First appeared2006;18 years ago (2006)
Stable release
ATS2-0.4.2 [1] / November 14, 2020;3 years ago (2020-11-14)
Typing discipline static, dependent
License GPLv3
Filename extensions .sats, .dats, .hats
Website www.ats-lang.org
Influenced by
Dependent ML, ML, OCaml, C++

In computing, ATS (Applied Type System) is a programming language designed by Hongwei Xi to unify computer programming with formal specification. ATS has support for combining theorem proving with practical programming through the use of advanced type systems. [2] A past version of The Computer Language Benchmarks Game has demonstrated that the performance of ATS is comparable to that of the languages C and C++. [3] By using theorem proving and strict type checking, the compiler can detect and prove that its implemented functions are not susceptible to bugs such as division by zero, memory leaks, buffer overflow, and other forms of memory corruption by verifying pointer arithmetic and reference counting before the program compiles. Also, by using the integrated theorem-proving system of ATS (ATS/LF), the programmer may make use of static constructs that are intertwined with the operative code to prove that a function conforms to its specification.

Contents

ATS consists of a static component and a dynamic component. The static component is used for handling types, whereas the dynamic component is used for programs. While ATS primarily relies on a call-by-value functional language at its core, it possesses the ability to accommodate diverse programming paradigms, such as functional, imperative, object-oriented, concurrent, and modular.

History

According to the author, ATS was inspired by Martin-Löf's constructive type theory, which was originally developed for the purpose of establishing a foundation for mathematics. Xi designed ATS “in an attempt to combine specification and implementation into a single programming language.” [4]

ATS is derived mostly from the languages ML and OCaml. An earlier language, Dependent ML, by the same author has been incorporated into ATS.

The first implementation, ATS/Proto (ATS0), was written in OCaml and was released in 2006. This was the pre-first edition of ATS and is no longer maintained. A year later, ATS/Geizella, the first implementation of ATS1, was released. This version was also written in OCaml and is no longer used actively. [5]

The second version of ATS1, ATS/Anairiats, released in 2008, was a major milestone in the development of the language, as the language was able to bootstrap itself. This version was written almost completely in ATS1. The current version, ATS/Postiats (ATS2) was released in 2013. Like its predecessor, this version is also almost entirely written in ATS1. The most recently released version is ATS2-0.4.2. [5]

Future

As of 2024, ATS is used mostly for research; less than 200 GitHub repositories contain code written in ATS. This is far less than other functional languages, such as OCaml and Standard ML, which have over 16,000 and 3,000 repositories, respectively. This is likely due to the steep learning associated with ATS, which is present because of the language's use of dependent type-checking and template instance resolution. These features usually require the use of explicit quantifiers, which demand further learning. [6]

As of 2024, ATS/Xanadu (ATS3) is being developed actively in ATS2, with the hope of reducing the learning needed by two main improvements:

With these improvements, Xi hopes for ATS to become much more accessible and easier to learn. The main goal of ATS3 is to transform ATS from a language mainly used for research, into one strong enough for large-scale industrial software development. [5]

Theorem proving

The main focus of ATS is to support formal verification via automated theorem proving, combined with practical programming. [2] Theorem proving can prove, for example, that an implemented function produces no memory leaks. It can also prevent other bugs that might otherwise be found only during testing. It incorporates a system similar to those of proof assistants which usually only aim to verify mathematical proofs—except ATS uses this ability to prove that the implementations of its functions operate correctly, and produce the expected output.

As a simple example, in a function using division, the programmer may prove that the divisor will never equal zero, preventing a division by zero error. Let's say, the divisor 'X' was computed as 5 times the length of list 'A'. One can prove, that in the case of a non-empty list, 'X' is non-zero, since 'X' is the product of two non-zero numbers (5 and the length of 'A'). A more practical example would be proving through reference counting that the retain count on an allocated block of memory is being counted correctly for each pointer. Then one can know, and quite literally prove, that the object will not be deallocated prematurely, and that memory leaks will not occur.

The benefit of the ATS system is that since all theorem proving occurs strictly within the compiler, it has no effect on the speed of the executable program. ATS code is often harder to compile than standard C code, but once it compiles, it is certain that it is running correctly to the degree specified by the proofs (assuming the compiler and runtime system are correct).

In ATS proofs are separate from implementation, so it is possible to implement a function without proving it, if desired.

Data representation

According to the author, ATS's efficiency [7] is largely due to the way that data is represented in the language and tail-call optimizations (which are generally important for the efficiency of functional languages). Data can be stored in a flat or unboxed representation rather than a boxed representation.

Theorem proving: An introductory case

Propositions

dataprop expresses predicates as algebraic types.

Predicates in pseudo‑code somewhat similar to ATS source (see below for valid ATS source):

 FACT(n, r)         iff    fact(n) = r  MUL(n, m, prod)    iff    n * m = prod     FACT(n, r) =         FACT(0, 1)       | FACT(n, r) iff FACT(n-1, r1) and MUL(n, r1, r)   // for n > 0    // expresses fact(n) = r  iff  r = n * r1  and  r1 = fact(n-1)

In ATS code:

datapropFACT(int,int)=|FACTbas(0,1)//basiccase:FACT(0,1)|{n:int|n>0}{r,r1:int}//inductivecaseFACTind(n,r)of(FACT(n-1,r1),MUL(n,r1,r))

where FACT (int, int) is a proof type

Example

Non tail-recursive factorial with proposition or "Theorem" proving through the construction dataprop.

The evaluation of fact1(n-1) returns a pair (proof_n_minus_1 | result_of_n_minus_1) which is used in the calculation of fact1(n). The proofs express the predicates of the proposition.

Part 1 (algorithm and propositions)

 [FACT (n, r)] implies [fact (n) = r]  [MUL (n, m, prod)] implies [n * m = prod]
 FACT (0, 1)  FACT (n, r) iff FACT (n-1, r1) and MUL (n, r1, r) forall n > 0

To remember:

{...} universal quantification [...] existential quantification (... | ...)   (proof | value) @(...) flat tuple or variadic function parameters tuple .<...>. termination metric [8] 
#include"share/atspre_staload.hats"datapropFACT(int,int)=|FACTbas(0,1)of()//basiccase|{n:nat}{r:int}//inductivecaseFACTind(n+1,(n+1)*r)of(FACT(n,r))(* note that int(x) , also int x, is the monovalued type of the int x value. The function signature below says: forall n:nat, exists r:int where fact( num: int(n)) returns (FACT (n, r) | int(r)) *)funfact{n:nat}.<n>.(n:int(n)):[r:int](FACT(n,r)|int(r))=(ifcase|n>0=>((FACTind(pf1)|n*r1))where{val(pf1|r1)=fact(n-1)}|_(*else*)=>(FACTbas()|1))

Part 2 (routines and test)

implementmain0(argc,argv)={val()=if(argc!=2)thenprerrln!("Usage: ",argv[0]," <integer>")val()=assert(argc>=2)valn0=g0string2int(argv[1])valn0=g1ofg0(n0)val()=assert(n0>=0)val(_(*pf*)|res)=fact(n0)val((*void*))=println!("fact(",n0,") = ",res)}

This can all be added to a single file and compiled as follows. Compiling should work with various back end C compilers, e.g., GNU Compiler Collection (gcc). Garbage collection is not used unless explicitly stated with -D_ATS_GCATS ) [9]

$ patsccfact1.dats-ofact1 $ ./fact14

compiles and gives the expected result

Features

Basic types

Tuples and records

prefix @ or none means direct, flat or unboxed allocation
valx:@(int,char)=@(15,'c')//x.0=15;x.1='c'val@(a,b)=x//patternmatchingbinding,a=15,b='c'valx=@{first=15,second='c'}//x.first=15val@{first=a,second=b}=x//a=15,b='c'val@{second=b,...}=x//withomission,b='c'
prefix ' means indirect or boxed allocation
valx:'(int,char)='(15,'c')//x.0=15;x.1='c'val'(a,b)=x//a=15,b='c'valx='{first=15,second='c'}//x.first=15val'{first=a,second=b}=x//a=15,b='c'val'{second=b,...}=x//b='c'
special

With '|' as separator, some functions return wrapped the result value with an evaluation of predicates

val ( predicate_proofs | values) = myfunct params

Common

{...} universal quantification [...] existential quantification (...) parenthetical expression or tuple  (... | ...)     (proofs | values)
.<...>. termination metric  @(...)          flat tuple or variadic function parameters tuple (see example's printf)  @[byte][BUFLEN]     type of an array of BUFLEN values of type byte [10]  @[byte][BUFLEN]()   array instance @[byte][BUFLEN](0)  array initialized to 0

Dictionary

sort:domain
sortdef nat = {a: int | a >= 0 }     // from prelude: ∀ a ∈ int ...
typedef String = [a:nat] string(a)   // [..]: ∃ a ∈ nat ...
type (as sort)
generic sort for elements with the length of a pointer word, to be used in type parameterized polymorphic functions. Also "boxed types" [11]
// {..}: ∀ a,b ∈ type ... fun {a,b:type} swap_type_type (xy: @(a, b)): @(b, a) = (xy.1, xy.0)
t@ype
linear version of previous type with abstracted length. Also unboxed types. [11]
viewtype
a domain class like type with a view (memory association)
viewt@ype
linear version of viewtype with abstracted length. It supersets viewtype
view
relation of a Type and a memory location. The infix @ is its most common constructor
T @ L asserts that there is a view of type T at location L
fun{a:t@ype}ptr_get0{l:addr}(pf:a@l|p:ptrl):@(a@l|a)fun{a:t@ype}ptr_set0{l:addr}(pf:a?@l|p:ptrl,x:a):@(a@l|void)
the type of ptr_get0 (T) is ∀ l : addr . ( T @ l | ptr( l ) ) -> ( T @ l | T) // see manual, section 7.1. Safe Memory Access through Pointers [12]
viewdef array_v (a:viewt@ype, n:int, l: addr) = @[a][n] @ l
T?
possibly uninitialized type

pattern matching exhaustivity

as in case+, val+, type+, viewtype+, ...

modules

 staload "foo.sats" // foo.sats is loaded and then opened into the current namespace   staload F = "foo.sats" // to use identifiers qualified as $F.bar   dynload "foo.dats" // loaded dynamically at run-time 

dataview

Dataviews are often declared to encode recursively defined relations on linear resources. [13]

dataview array_v (a: viewt@ype+, int, addr) =   | {l: addr} array_v_none (a, 0, l)   | {n: nat} {l: addr}     array_v_some (a, n+1, l)     of (a @ l, array_v (a, n, l+sizeof a))

datatype / dataviewtype

Datatypes [14]

datatype workday = Mon | Tue | Wed | Thu | Fri

lists

datatype list0 (a:t@ype) = list0_cons (a) of (a, list0 a) | list0_nil (a)

dataviewtype

A dataviewtype is similar to a datatype, but it is linear. With a dataviewtype, the programmer is allowed to explicitly free (or deallocate) in a safe manner the memory used for storing constructors associated with the dataviewtype. [15]

variables

local variables

var res: int with pf_res = 1   // introduces pf_res as an alias of view @ (res)

on stack array allocation:

#define BUFLEN 10 var !p_buf with pf_buf = @[byte][BUFLEN](0)    // pf_buf = @[byte][BUFLEN](0) @ p_buf [16] 

See val and var declarations [17]

Related Research Articles

<span class="mw-page-title-main">Kolmogorov complexity</span> Measure of algorithmic complexity

In algorithmic information theory, the Kolmogorov complexity of an object, such as a piece of text, is the length of a shortest computer program that produces the object as output. It is a measure of the computational resources needed to specify the object, and is also known as algorithmic complexity, Solomonoff–Kolmogorov–Chaitin complexity, program-size complexity, descriptive complexity, or algorithmic entropy. It is named after Andrey Kolmogorov, who first published on the subject in 1963 and is a generalization of classical information theory.

ML is a functional programming language. It is known for its use of the polymorphic Hindley–Milner type system, which automatically assigns the data types of most expressions without requiring explicit type annotations, and ensures type safety; there is a formal proof that a well-typed ML program does not cause runtime type errors. ML provides pattern matching for function arguments, garbage collection, imperative programming, call-by-value and currying. While a general-purpose programming language, ML is used heavily in programming language research and is one of the few languages to be completely specified and verified using formal semantics. Its types and pattern matching make it well-suited and commonly used to operate on other formal languages, such as in compiler writing, automated theorem proving, and formal verification.

OCaml is a general-purpose, high-level, multi-paradigm programming language which extends the Caml dialect of ML with object-oriented features. OCaml was created in 1996 by Xavier Leroy, Jérôme Vouillon, Damien Doligez, Didier Rémy, Ascánder Suárez, and others.

Standard ML (SML) is a general-purpose, modular, functional programming language with compile-time type checking and type inference. It is popular for writing compilers, for programming language research, and for developing theorem provers.

Generic programming is a style of computer programming in which algorithms are written in terms of data types to-be-specified-later that are then instantiated when needed for specific types provided as parameters. This approach, pioneered by the ML programming language in 1973, permits writing common functions or types that differ only in the set of types on which they operate when used, thus reducing duplicate code.

In vector calculus, Green's theorem relates a line integral around a simple closed curve C to a double integral over the plane region D bounded by C. It is the two-dimensional special case of Stokes' theorem.

<span class="mw-page-title-main">C syntax</span> Set of rules defining correctly structured programs

The syntax of the C programming language is the set of rules governing writing of software in C. It is designed to allow for programs that are extremely terse, have a close relationship with the resulting object code, and yet provide relatively high-level data abstraction. C was the first widely successful high-level language for portable operating-system development.

<span class="mw-page-title-main">Pointer (computer programming)</span> Object which stores memory addresses in a computer program

In computer science, a pointer is an object in many programming languages that stores a memory address. This can be that of another value located in computer memory, or in some cases, that of memory-mapped computer hardware. A pointer references a location in memory, and obtaining the value stored at that location is known as dereferencing the pointer. As an analogy, a page number in a book's index could be considered a pointer to the corresponding page; dereferencing such a pointer would be done by flipping to the page with the given page number and reading the text found on that page. The actual format and content of a pointer variable is dependent on the underlying computer architecture.

In computer science, a union is a value that may have any of several representations or formats within the same position in memory; that consists of a variable that may hold such a data structure. Some programming languages support special data types, called union types, to describe such values and variables. In other words, a union type definition will specify which of a number of permitted primitive types may be stored in its instances, e.g., "float or long integer". In contrast with a record, which could be defined to contain both a float and an integer; in a union, there is only one value at any given time.

In the C++ programming language, a reference is a simple reference datatype that is less powerful but safer than the pointer type inherited from C. The name C++ reference may cause confusion, as in computer science a reference is a general concept datatype, with pointers and C++ references being specific reference datatype implementations. The definition of a reference in C++ is such that it does not need to exist. It can be implemented as a new name for an existing object.

In some programming languages, const is a type qualifier, which indicates that the data is read-only. While this can be used to declare constants, const in the C family of languages differs from similar constructs in other languages in that it is part of the type, and thus has complicated behavior when combined with pointers, references, composite data types, and type-checking. In other languages, the data is not in a single memory location, but copied at compile time for each use. Languages which use it include C, C++, D, JavaScript, Julia, and Rust.

In computer science, object composition and object aggregation are closely related ways to combine objects or data types into more complex ones. In conversation the distinction between composition and aggregation is often ignored. Common kinds of compositions are objects used in object-oriented programming, tagged unions, sets, sequences, and various graph structures. Object compositions relate to, but are not the same as, data structures.

In computer programming languages, a recursive data type is a data type for values that may contain other values of the same type. Data of recursive types are usually viewed as directed graphs.

sizeof is a unary operator in the programming languages C and C++. It generates the storage size of an expression or a data type, measured in the number of char-sized units. Consequently, the construct sizeof (char) is guaranteed to be 1. The actual number of bits of type char is specified by the preprocessor macro CHAR_BIT, defined in the standard include file limits.h. On most modern computing platforms this is eight bits. The result of sizeof has an unsigned integer type that is usually denoted by size_t.

In computer programming, an enumerated type is a data type consisting of a set of named values called elements, members, enumeral, or enumerators of the type. The enumerator names are usually identifiers that behave as constants in the language. An enumerated type can be seen as a degenerate tagged union of unit type. A variable that has been declared as having an enumerated type can be assigned any of the enumerators as a value. In other words, an enumerated type has values that are different from each other, and that can be compared and assigned, but are not specified by the programmer as having any particular concrete representation in the computer's memory; compilers and interpreters can represent them arbitrarily.

In mathematics, Fejér's theorem, named after Hungarian mathematician Lipót Fejér, states the following:

In computer science, a type punning is any programming technique that subverts or circumvents the type system of a programming language in order to achieve an effect that would be difficult or impossible to achieve within the bounds of the formal language.

In computer programming, a variable-length array (VLA), also called variable-sized or runtime-sized, is an array data structure which length is determined at runtime, instead of at compile time. In the language C, the VLA is said to have a variably modified data type that depends on a value.

In number theory, the parity problem refers to a limitation in sieve theory that prevents sieves from giving good estimates in many kinds of prime-counting problems. The problem was identified and named by Atle Selberg in 1949. Beginning around 1996, John Friedlander and Henryk Iwaniec developed some parity-sensitive sieves that make the parity problem less of an obstacle.

<span class="mw-page-title-main">Dafny</span> Programming language

Dafny is an imperative and functional compiled language that compiles to other programming languages, such as C#, Java, JavaScript, Go and Python. It supports formal specification through preconditions, postconditions, loop invariants, loop variants, termination specifications and read/write framing specifications. The language combines ideas from the functional and imperative paradigms; it includes support for object-oriented programming. Features include generic classes, dynamic allocation, inductive datatypes and a variation of separation logic known as implicit dynamic frames for reasoning about side effects. Dafny was created by Rustan Leino at Microsoft Research after his previous work on developing ESC/Modula-3, ESC/Java, and Spec#.

References

  1. Xi, Hongwei (14 November 2020). "[ats-lang-users] ATS2-0.4.2 released". ats-lang-users. Retrieved 17 November 2020.
  2. 1 2 "Combining Programming with Theorem Proving" (PDF). Archived from the original (PDF) on 2014-11-29. Retrieved 2014-11-18.
  3. ATS benchmarks | Computer Language Benchmarks Game (web archive)
  4. "Introduction to Programming in ATS". ats-lang.github.io. Retrieved 2024-02-23.
  5. 1 2 3 "ATS-PL-SYS". www.cs.bu.edu. Retrieved 2024-02-23.
  6. 1 2 Xi, Hongwei (2024-02-17). "githwxi/ATS-Xanadu". GitHub. Retrieved 2024-02-23.
  7. Discussion about the language's efficiency (Language Shootout: ATS is the new top gunslinger. Beats C++.)
  8. "Termination metrics". Archived from the original on 2016-10-18. Retrieved 2017-05-20.
  9. Compilation - Garbage collection Archived August 4, 2009, at the Wayback Machine
  10. type of an array Archived September 4, 2011, at the Wayback Machine types like @[T][I]
  11. 1 2 "Introduction to Dependent types". Archived from the original on 2016-03-12. Retrieved 2016-02-13.
  12. Manual, section 7.1. Safe Memory Access through Pointers [ permanent dead link ] (outdated)
  13. Dataview construct Archived April 13, 2010, at the Wayback Machine
  14. Datatype construct Archived April 14, 2010, at the Wayback Machine
  15. Dataviewtype construct
  16. Manual - 7.3 Memory allocation on stack Archived August 9, 2014, at the Wayback Machine (outdated)
  17. Val and Var declarations Archived August 9, 2014, at the Wayback Machine (outdated)