Swift (parallel scripting language)

Last updated
Swift
Swift (programing language) logo.png
Paradigms Dataflow, distributed, grid, concurrent, scientific workflow, scripting
Developers University of Chicago,
Argonne National Laboratory
First appeared2007;15 years ago (2007)
Stable release
0.96.2 / August 5, 2015;6 years ago (2015-08-05)
Typing discipline Strong
Platform Cross-platform: Java
OS Cross-platform: Java
License Apache 2.0
Website swift-lang.org
Influenced by
C syntax, functional programming
Influenced
Cuneiform

Swift [1] is an implicitly parallel programming language that allows writing scripts that distribute program execution across distributed computing resources, [2] including clusters, clouds, grids, and supercomputers. Swift implementations are open-source software under the Apache License, version 2.0.

Contents

Language features

A Swift script [3] describes strongly typed data, application components, invocations of applications components, and the inter-relations in the dataflow between those invocations. The program statements will automatically run in parallel unless there is a data dependency between them, given sufficient computing resources. The design of the language guarantees that results of a computation are deterministic, even though the order in which statements executes may vary. A special file data type is built into Swift. It allows command-line programs to be integrated into a program as typed functions. This allows programmers to write programs that treat command-line programs and files in the same way as regular functions and variables. A concept of mapping [4] is used to store and exchange complex data structures using a file system structure with files and directories.

Rapid dispatch of parallel tasks to a wide range of resources is implemented through a mechanism called Coasters task dispatch. [5] A Message Passing Interface based implementation of the language [6] supports very high task execution rates (e.g., 3000 tasks per second) [7] on large clusters and supercomputers.

Area of applications

Application examples: [7] [8]

See also

Related Research Articles

Grid computing is the use of widely distributed computer resources to reach a common goal. A computing grid can be thought of as a distributed system with non-interactive workloads that involve many files. Grid computing is distinguished from conventional high-performance computing systems such as cluster computing in that grid computers have each node set to perform a different task/application. Grid computers also tend to be more heterogeneous and geographically dispersed than cluster computers. Although a single grid can be dedicated to a particular application, commonly a grid is used for a variety of purposes. Grids are often constructed with general-purpose grid middleware software libraries. Grid sizes can be quite large.

Parallel computing Programming paradigm in which many processes are executed simultaneously

Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different forms of parallel computing: bit-level, instruction-level, data, and task parallelism. Parallelism has long been employed in high-performance computing, but has gained broader interest due to the physical constraints preventing frequency scaling. As power consumption by computers has become a concern in recent years, parallel computing has become the dominant paradigm in computer architecture, mainly in the form of multi-core processors.

High-performance computing Computing with supercomputers and clusters

High-performance computing (HPC) uses supercomputers and computer clusters to solve advanced computation problems.

In computer programming, dataflow programming is a programming paradigm that models a program as a directed graph of the data flowing between operations, thus implementing dataflow principles and architecture. Dataflow programming languages share some features of functional languages, and were generally developed in order to bring some functional concepts to a language more suitable for numeric processing. Some authors use the term datastream instead of dataflow to avoid confusion with dataflow computing or dataflow architecture, based on an indeterministic machine paradigm. Dataflow programming was pioneered by Jack Dennis and his graduate students at MIT in the 1960s.

Utility computing or The Computer Utility is a service provisioning model in which a service provider makes computing resources and infrastructure management available to the customer as needed, and charges them for specific usage rather than a flat rate. Like other types of on-demand computing, the utility model seeks to maximize the efficient use of resources and/or minimize associated costs. Utility is the packaging of system resources, such as computation, storage and services, as a metered service. This model has the advantage of a low or no initial cost to acquire computer resources; instead, resources are essentially rented.

ProActive Parallel Suite is an open-source software for enterprise workload orchestration, part of the OW2 community. A workflow model allows a set of executables or scripts, written in any language, to be defined along with their dependencies, so ProActive Parallel Suite can schedule and orchestrate executions while optimising the use of computational resources.

The Base One Foundation Component Library (BFC) is a rapid application development toolkit for building secure, fault-tolerant, database applications on Windows and ASP.NET. In conjunction with Microsoft's Visual Studio integrated development environment, BFC provides a general-purpose web application framework for working with databases from Microsoft, Oracle, IBM, Sybase, and MySQL, running under Windows, Linux/Unix, or IBM iSeries or z/OS. BFC includes facilities for distributed computing, batch processing, queuing, and database command scripting, and these run under Windows or Linux with Wine.

Computer cluster Set of computers configured in a distributed computing system

A computer cluster is a set of computers that work together so that they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software.

Many-task computing (MTC) in computational science is an approach to parallel computing that aims to bridge the gap between two computing paradigms: high-throughput computing (HTC) and high-performance computing (HPC).

In computing, algorithmic skeletons, or parallelism patterns, are a high-level parallel programming model for parallel and distributed computing.

Data-intensive computing is a class of parallel computing applications which use a data parallel approach to process large volumes of data typically terabytes or petabytes in size and typically referred to as big data. Computing applications which devote most of their execution time to computational requirements are deemed compute-intensive, whereas computing applications which require large volumes of data and devote most of their processing time to I/O and manipulation of data are deemed data-intensive.

HPCC High-performance computer cluster

HPCC, also known as DAS, is an open source, data-intensive computing system platform developed by LexisNexis Risk Solutions. The HPCC platform incorporates a software architecture implemented on commodity computing clusters to provide high-performance, data-parallel processing for applications utilizing big data. The HPCC platform includes system configurations to support both parallel batch data processing (Thor) and high-performance online query applications using indexed data files (Roxie). The HPCC platform also includes a data-centric declarative programming language for parallel data processing called ECL.

Data-centric programming language defines a category of programming languages where the primary function is the management and manipulation of data. A data-centric programming language includes built-in processing primitives for accessing data stored in sets, tables, lists, and other data structures and databases, and for specific manipulation and transformation of data required by a programming application. Data-centric programming languages are typically declarative and often dataflow-oriented, and define the processing result desired; the specific processing steps required to perform the processing are left to the language compiler. The SQL relational database language is an example of a declarative, data-centric language. Declarative, data-centric programming languages are ideal for data-intensive computing applications.

Quasi-opportunistic supercomputing Computational paradigm for supercomputing

Quasi-opportunistic supercomputing is a computational paradigm for supercomputing on a large number of geographically disperse computers. Quasi-opportunistic supercomputing aims to provide a higher quality of service than opportunistic resource sharing.

In computer science, a pilot job is a type of multilevel scheduling, in which a resource is acquired by an application so that the application can schedule work into that resource directly, rather than going through a local job scheduler, which might lead to queue waits for each work unit. This term comes from the Condor High-Throughput Computing System, in which Condor GlideIns provides this functionality. Other examples of pilot jobs are: the BigJob implemented in SAGA, Swift Coasters as part of the Swift parallel scripting system, the Falkon lightweight task execution framework, and HTCaaS.

Massively parallel is the term for using a large number of computer processors to simultaneously perform a set of coordinated computations in parallel. GPUs are massively parallel architecture with tens of thousands of threads.

Message passing in computer clusters Aspect of computer clusters

Message passing is an inherent element of all computer clusters. All computer clusters, ranging from homemade Beowulfs to some of the fastest supercomputers in the world, rely on message passing to coordinate the activities of the many nodes they encompass. Message passing in computer clusters built with commodity servers and switches is used by virtually every internet service.

Apache Spark Open-source data analytics cluster computing framework

Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Originally developed at the University of California, Berkeley's AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since.

Computation offloading is the transfer of resource intensive computational tasks to a separate processor, such as a hardware accelerator, or an external platform, such as a cluster, grid, or a cloud. Offloading to a coprocessor can be used to accelerate applications including: image rendering and mathematical calculations. Offloading computing to an external platform over a network can provide computing power and overcome hardware limitations of a device, such as limited computational power, storage, and energy.

In the high-performance computing environment, burst buffer is a fast intermediate storage layer positioned between the front-end computing processes and the back-end storage systems. It bridges the performance gap between the processing speed of the compute nodes and the Input/output (I/O) bandwidth of the storage systems. Burst buffers are often built from arrays of high-performance storage devices, such as NVRAM and SSD. It typically offers from one to two orders of magnitude higher I/O bandwidth than the back-end storage systems.

References

  1. "Swift Home Page". swift-lang.org. Retrieved 2014-06-02.
  2. Wilde, Michael; Hategan, Mihael; Wozniak, Justin M.; Clifford, Ben; Katz, Daniel S.; Foster, Ian (2011). "Swift: A language for distributed parallel scripting" (PDF). Parallel Computing. 37 (9): 633–652. CiteSeerX   10.1.1.658.8990 . doi:10.1016/j.parco.2011.05.005. Archived from the original (PDF) on 2014-06-06.
  3. Reference manual, chapter 2
  4. Reference manual, chapter 3
  5. Hategan, Mihael; Wozniak, Justin; Maheshwari, Ketan (2011). "Coasters: uniform resource provisioning and access for scientific computing on clouds and grids" (PDF). Proceedings Utility and Cloud Computing.
  6. Wozniak, Justin M., Timothy G. Armstrong, Michael Wilde, Daniel S. Katz, Ewing Lusk, and Ian T. Foster. "Swift/T: Large-scale Application Composition via Distributed-memory Dataflow Processing." In Cluster, Cloud and Grid Computing (CCGrid), 2013 13th IEEE/ACM International Symposium on, pp. 95-102. IEEE, 2013
  7. 1 2 Wilde, Michael; Foster, Ian; Iskra, Kamil; Beckman, Pete; Zhang, Zhao; Espinosa, Allan; Hategan, Mihael; Clifford, Ben; Raicu, Ioan (2009). "Parallel Scripting for Applications at the Petascale and Beyond" (PDF). Computer. 42 (11): 50–60. doi:10.1109/mc.2009.365. S2CID   5271623. Archived from the original (PDF) on 2014-07-12.
  8. Case studies on the official site