Futhark (programming language)

Futhark
Paradigm	array, functional
Family	ML
Designed by	Troels Henriksen, Cosmin Oancea, Martin Elsman
Developer	University of Copenhagen
First appeared	2014;10 years ago
Typing discipline	inferred, static, strong, Hindley–Milner, uniqueness, dependent
OS	cross-platform
License	ISC
Website	futhark-lang.org
Influenced by
	APL, Haskell, NESL, Standard ML

Last updated October 31, 2024

Futhark is a multi-paradigm, high-level, functional, data parallel, array programming language. It is a dialect of the language ML, originally developed at UCPH Department of Computer Science (DIKU) as part of the HIPERFIT project.^[2] It focuses on enabling data parallel programs written in a functional style to be executed with high performance on massively parallel hardware, especially graphics processing units (GPUs). Futhark is strongly inspired by NESL, and its implementation uses a variant of the flattening transformation, but imposes constraints on how parallelism can be expressed in order to enable more aggressive compiler optimisations. In particular, irregular nested data parallelism is not supported.^[3] It is free and open-source software released under an ISC license.

Overview

Futhark is a language in the ML family, with an indentation-insensitive syntax derived from OCaml, Standard ML, and Haskell. The type system is based on a Hindley–Milner type system with a variety of extensions, such as uniqueness types and size-dependent types. Futhark is not intended as a general-purpose programming language for writing full applications, but is instead focused on writing compute kernels (not always the same as a GPU kernel) which are then invoked from applications written in conventional languages.^[4]

Futhark is named after the first six letters of the Runic alphabet.^[5]^: 2

Examples

Dot product

The following program computes the dot product of two vectors containing double-precision numbers.

defdotprodxsys=f64.sum(map2(*)xsys))

It can also be equivalently written with explicit type annotations as follows.

defdotprod[n](xs:[n]f64)(ys:[n]f64):f64=f64.sum(map2(*)xsys))

This makes the size-dependent types explicit: this function can only be invoked with two arrays of the same size, and the type checker will reject any program where this cannot be statically determined.

Matrix multiplication

The following program performs matrix multiplication, using the definition of dot product above.

defmatmul[n][m][p](A:[n][m]f64)(B:[m][p]f64):[n][p]f64=map(\A_row->map(\B_col->dotprodA_rowB_col)(transposeB))A

This shows how the types enforce that the function is only invoked with matrices of compatible size. Also, it is an example of nested data parallelism.

Related Research Articles

<span class="mw-page-title-main">Merge sort</span> Divide and conquer sorting algorithm

In computer science, merge sort is an efficient, general-purpose, and comparison-based sorting algorithm. Most implementations produce a stable sort, which means that the relative order of equal elements is the same in the input and output. Merge sort is a divide-and-conquer algorithm that was invented by John von Neumann in 1945. A detailed description and analysis of bottom-up merge sort appeared in a report by Goldstine and von Neumann as early as 1948.

ML is a general-purpose, high-level, functional programming language. It is known for its use of the polymorphic Hindley–Milner type system, which automatically assigns the data types of most expressions without requiring explicit type annotations, and ensures type safety; there is a formal proof that a well-typed ML program does not cause runtime type errors. ML provides pattern matching for function arguments, garbage collection, imperative programming, call-by-value and currying. While a general-purpose programming language, ML is used heavily in programming language research and is one of the few languages to be completely specified and verified using formal semantics. Its types and pattern matching make it well-suited and commonly used to operate on other formal languages, such as in compiler writing, automated theorem proving, and formal verification.

In programming languages, a closure, also lexical closure or function closure, is a technique for implementing lexically scoped name binding in a language with first-class functions. Operationally, a closure is a record storing a function together with an environment. The environment is a mapping associating each free variable of the function with the value or reference to which the name was bound when the closure was created. Unlike a plain function, a closure allows the function to access those captured variables through the closure's copies of their values or references, even when the function is invoked outside their scope.

Standard ML (SML) is a general-purpose, high-level, modular, functional programming language with compile-time type checking and type inference. It is popular for writing compilers, for programming language research, and for developing theorem provers.

F# is a general-purpose, high-level, strongly typed, multi-paradigm programming language that encompasses functional, imperative, and object-oriented programming methods. It is most often used as a cross-platform Common Language Infrastructure (CLI) language on .NET, but can also generate JavaScript and graphics processing unit (GPU) code.

D, also known as dlang, is a multi-paradigm system programming language created by Walter Bright at Digital Mars and released in 2001. Andrei Alexandrescu joined the design and development effort in 2007. Though it originated as a re-engineering of C++, D is now a very different language. As it has developed, it has drawn inspiration from other high-level programming languages. Notably, it has been influenced by Java, Python, Ruby, C#, and Eiffel.

Curry is a declarative programming language, an implementation of the functional logic programming paradigm, and based on the Haskell language. It merges elements of functional and logic programming, including constraint programming integration.

In computing, a persistent data structure or not ephemeral data structure is a data structure that always preserves the previous version of itself when it is modified. Such data structures are effectively immutable, as their operations do not (visibly) update the structure in-place, but instead always yield a new updated structure. The term was introduced in Driscoll, Sarnak, Sleator, and Tarjan's 1986 article.

General-purpose computing on graphics processing units is the use of a graphics processing unit (GPU), which typically handles computation only for computer graphics, to perform computation in applications traditionally handled by the central processing unit (CPU). The use of multiple video cards in one computer, or large numbers of graphics chips, further parallelizes the already parallel nature of graphics processing.

In computing, a parallel programming model is an abstraction of parallel computer architecture, with which it is convenient to express algorithms and their composition in programs. The value of a programming model can be judged on its generality: how well a range of different problems can be expressed for a variety of different architectures, and its performance: how efficiently the compiled programs can execute. The implementation of a parallel programming model can take the form of a library invoked from a programming language, as an extension to an existing languages.

In computer science, stream processing is a programming paradigm which views streams, or sequences of events in time, as the central input and output objects of computation. Stream processing encompasses dataflow programming, reactive programming, and distributed data processing. Stream processing systems aim to expose parallel processing for data streams and rely on streaming algorithms for efficient implementation. The software stack for these systems includes components such as programming models and query languages, for expressing computation; stream management systems, for distribution and scheduling; and hardware components for acceleration including floating-point units, graphics processing units, and field-programmable gate arrays.

In computing, CUDA is a proprietary parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for accelerated general-purpose processing, an approach called general-purpose computing on GPUs (GPGPU). CUDA API and its runtime: The CUDA API is an extension of the C programming language that adds the ability to specify thread-level parallelism in C and also to specify GPU device specific operations. CUDA is a software layer that gives direct access to the GPU's virtual instruction set and parallel computational elements for the execution of compute kernels. In addition to drivers and runtime kernels, the CUDA platform includes compilers, libraries and developer tools to help programmers accelerate their applications.

Data parallelism is parallelization across multiple processors in parallel computing environments. It focuses on distributing the data across different nodes, which operate on the data in parallel. It can be applied on regular data structures like arrays and matrices by working on each element in parallel. It contrasts to task parallelism as another form of parallelism.

OpenCL is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-programmable gate arrays (FPGAs) and other processors or hardware accelerators. OpenCL specifies a programming language for programming these devices and application programming interfaces (APIs) to control the platform and execute programs on the compute devices. OpenCL provides a standard interface for parallel computing using task- and data-based parallelism.

In computing, algorithmic skeletons, or parallelism patterns, are a high-level parallel programming model for parallel and distributed computing.

This article describes the features in the programming language Haskell.

Haskell is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research, and industrial applications, Haskell has pioneered several programming language features such as type classes, which enable type-safe operator overloading, and monadic input/output (IO). It is named after logician Haskell Curry. Haskell's main implementation is the Glasgow Haskell Compiler (GHC).

C++ Accelerated Massive Parallelism is a native programming model that contains elements that span the C++ programming language and its runtime library. It provides an easy way to write programs that compile and execute on data-parallel hardware, such as graphics cards (GPUs).

The Standard Libraries are a set of libraries included in the Common Language Infrastructure (CLI) in order to encapsulate many common functions, such as file reading and writing, XML document manipulation, exception handling, application globalization, network communication, threading, and reflection, which makes the programmer's job easier. It is much larger in scope than standard libraries for most other languages, including C++, and is comparable in scope and coverage to the standard libraries of Java.

A conc-tree is a data structure that stores element sequences, and provides amortized O(1) time append and prepend operations, O(log n) time insert and remove operations and O(log n) time concatenation. This data structure is particularly viable for functional task-parallel and data-parallel programming, and is relatively simple to implement compared to other data-structures with similar asymptotic complexity. Conc-trees were designed to improve efficiency of data-parallel operations that do not require sequential left-to-right iteration order, and improve constant factors in these operations by avoiding unnecessary copies of the data. Orthogonally, they are used to efficiently aggregate data in functional-style task-parallel algorithms, as an implementation of the conc-list data abstraction. Conc-list is a parallel programming counterpart to functional cons-lists, and was originally introduced by the Fortress language.

References

↑ "License". futhark-lang.org. Retrieved 2023-03-26. Developed at DIKU
↑ "Home". hiperfit.dk.
↑ Henriksen, Troels; Serup, Niels G. W.; Elsman, Martin; Henglein, Fritz; Oancea, Cosmin (2017). "Futhark: Purely Functional GPU-Programming with Nested Parallelism and In-Place Array Updates" (PDF). Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation. PLDI 2017. ACM.
↑ "Futhark User's Guide". futhark.readthedocs.io.
↑ Troels, Henriksen (November 2017). Design and Implementation of the Futhark Programming Language (PDF) (PhD thesis). University of Copenhagen. Retrieved 2024-05-25.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "License". futhark-lang.org. Retrieved 2023-03-26. Developed at DIKU

[2] "Home". hiperfit.dk.

[3] Henriksen, Troels; Serup, Niels G. W.; Elsman, Martin; Henglein, Fritz; Oancea, Cosmin (2017). "Futhark: Purely Functional GPU-Programming with Nested Parallelism and In-Place Array Updates" (PDF). Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation. PLDI 2017. ACM.

[4] "Futhark User's Guide". futhark.readthedocs.io.

[5] Troels, Henriksen (November 2017). Design and Implementation of the Futhark Programming Language (PDF) (PhD thesis). University of Copenhagen. Retrieved 2024-05-25.

[1]

[2]

[3]

[4]

[5]

v t e Parallel computing
General	Distributed computing Parallel computing Massively parallel Cloud computing High-performance computing Multiprocessing Manycore processor GPGPU Computer network Systolic array
Levels	Bit Instruction Thread Task Data Memory Loop Pipeline
Multithreading	Temporal Simultaneous (SMT) Simultaneous and heterogenous Speculative (SpMT) Preemptive Cooperative Clustered multi-thread (CMT) Hardware scout
Theory	PRAM model PEM model Analysis of parallel algorithms Amdahl's law Gustafson's law Cost efficiency Karp–Flatt metric Slowdown Speedup
Elements	Process Thread Fiber Instruction window Array
Coordination	Multiprocessing Memory coherence Cache coherence Cache invalidation Barrier Synchronization Application checkpointing
Programming	Stream processing Dataflow programming Models Implicit parallelism Explicit parallelism Concurrency Non-blocking algorithm
Hardware	Flynn's taxonomy SISD SIMD Array processing (SIMT) Pipelined processing Associative processing MISD MIMD Dataflow architecture Pipelined processor Superscalar processor Vector processor Multiprocessor symmetric asymmetric Memory shared distributed distributed shared UMA NUMA COMA Massively parallel computer Computer cluster Beowulf cluster Grid computer Hardware acceleration
APIs	Ateji PX Boost Chapel HPX Charm++ Cilk Coarray Fortran CUDA Dryad C++ AMP Global Arrays GPUOpen MPI OpenMP OpenCL OpenHMPP OpenACC Parallel Extensions PVM pthreads RaftLib ROCm UPC TBB ZPL
Problems	Automatic parallelization Deadlock Deterministic algorithm Embarrassingly parallel Parallel slowdown Race condition Software lockout Scalability Starvation
Category: Parallel computing