This article has multiple issues. Please help improve it or discuss these issues on the talk page . (Learn how and when to remove these messages)
|
The Message Passing Interface (MPI) is a standardized and portable message-passing standard designed to function on parallel computing architectures. [1] The MPI standard defines the syntax and semantics of library routines that are useful to a wide range of users writing portable message-passing programs in C, C++, and Fortran. There are several open-source MPI implementations, which fostered the development of a parallel software industry, and encouraged development of portable and scalable large-scale parallel applications.
The message passing interface effort began in the summer of 1991 when a small group of researchers started discussions at a mountain retreat in Austria. Out of that discussion came a Workshop on Standards for Message Passing in a Distributed Memory Environment, held on April 29–30, 1992 in Williamsburg, Virginia. [2] Attendees at Williamsburg discussed the basic features essential to a standard message-passing interface and established a working group to continue the standardization process. Jack Dongarra, Tony Hey, and David W. Walker put forward a preliminary draft proposal, "MPI1", in November 1992. In November 1992 a meeting of the MPI working group took place in Minneapolis and decided to place the standardization process on a more formal footing. The MPI working group met every 6 weeks throughout the first 9 months of 1993. The draft MPI standard was presented at the Supercomputing '93 conference in November 1993. [3] After a period of public comments, which resulted in some changes in MPI, version 1.0 of MPI was released in June 1994. These meetings and the email discussion together constituted the MPI Forum, membership of which has been open to all members of the high-performance-computing community.
The MPI effort involved about 80 people from 40 organizations, mainly in the United States and Europe. Most of the major vendors of concurrent computers were involved in the MPI effort, collaborating with researchers from universities, government laboratories, and industry.
MPI provides parallel hardware vendors with a clearly defined base set of routines that can be efficiently implemented. As a result, hardware vendors can build upon this collection of standard low-level routines to create higher-level routines for the distributed-memory communication environment supplied with their parallel machines. MPI provides a simple-to-use portable interface for the basic user, yet one powerful enough to allow programmers to use the high-performance message passing operations available on advanced machines.
In an effort to create a universal standard for message passing, researchers did not base it off of a single system but it incorporated the most useful features of several systems, including those designed by IBM, Intel, nCUBE, PVM, Express, P4 and PARMACS. The message-passing paradigm is attractive because of wide portability and can be used in communication for distributed-memory and shared-memory multiprocessors, networks of workstations, and a combination of these elements. The paradigm can apply in multiple settings, independent of network speed or memory architecture.
Support for MPI meetings came in part from DARPA and from the U.S. National Science Foundation (NSF) under grant ASC-9310330, NSF Science and Technology Center Cooperative agreement number CCR-8809615, and from the European Commission through Esprit Project P6643. The University of Tennessee also made financial contributions to the MPI Forum.
This section needs to be updated. The reason given is: The new features of the MPI-3 are not well described. According to the specification: "MPI-3 standard contains significant extensions to MPI functionality, including nonblocking collectives, new one-sided communication operations, and Fortran 2008 bindings.".(August 2022) |
MPI is a communication protocol for programming [4] parallel computers. Both point-to-point and collective communication are supported. MPI "is a message-passing application programmer interface, together with protocol and semantic specifications for how its features must behave in any implementation." [5] MPI's goals are high performance, scalability, and portability. MPI remains the dominant model used in high-performance computing today. [6]
MPI is not sanctioned by any major standards body; nevertheless, it has become a de facto standard for communication among processes that model a parallel program running on a distributed memory system. Actual distributed memory supercomputers such as computer clusters often run such programs.
The principal MPI-1 model has no shared memory concept, and MPI-2 has only a limited distributed shared memory concept. Nonetheless, MPI programs are regularly run on shared memory computers, and both MPICH and Open MPI can use shared memory for message transfer if it is available. [7] [8] Designing programs around the MPI model (contrary to explicit shared memory models) has advantages when running on NUMA architectures since MPI encourages memory locality. Explicit shared memory programming was introduced in MPI-3. [9] [10] [11]
Although MPI belongs in layers 5 and higher of the OSI Reference Model, implementations may cover most layers, with sockets and Transmission Control Protocol (TCP) used in the transport layer.
Most MPI implementations consist of a specific set of routines directly callable from C, C++, Fortran (i.e., an API) and any language able to interface with such libraries, including C#, Java or Python. The advantages of MPI over older message passing libraries are portability (because MPI has been implemented for almost every distributed memory architecture) and speed (because each implementation is in principle optimized for the hardware on which it runs).
MPI uses Language Independent Specifications (LIS) for calls and language bindings. The first MPI standard specified ANSI C and Fortran-77 bindings together with the LIS. The draft was presented at Supercomputing 1994 (November 1994) [12] and finalized soon thereafter. About 128 functions constitute the MPI-1.3 standard which was released as the final end of the MPI-1 series in 2008. [13]
At present, the standard has several versions: version 1.3 (commonly abbreviated MPI-1), which emphasizes message passing and has a static runtime environment, MPI-2.2 (MPI-2), which includes new features such as parallel I/O, dynamic process management and remote memory operations, [14] and MPI-3.1 (MPI-3), which includes extensions to the collective operations with non-blocking versions and extensions to the one-sided operations. [15] MPI-2's LIS specifies over 500 functions and provides language bindings for ISO C, ISO C++, and Fortran 90. Object interoperability was also added to allow easier mixed-language message passing programming. A side-effect of standardizing MPI-2, completed in 1996, was clarifying the MPI-1 standard, creating the MPI-1.2.
MPI-2 is mostly a superset of MPI-1, although some functions have been deprecated. MPI-1.3 programs still work under MPI implementations compliant with the MPI-2 standard.
MPI-3 includes new Fortran 2008 bindings, while it removes deprecated C++ bindings as well as many deprecated routines and MPI objects.
MPI is often compared with Parallel Virtual Machine (PVM), which is a popular distributed environment and message passing system developed in 1989, and which was one of the systems that motivated the need for standard parallel message passing. Threaded shared memory programming models (such as Pthreads and OpenMP) and message passing programming (MPI/PVM) can be considered complementary and have been used together on occasion in, for example, servers with multiple large shared-memory nodes.
The MPI interface is meant to provide essential virtual topology, synchronization, and communication functionality between a set of processes (that have been mapped to nodes/servers/computer instances) in a language-independent way, with language-specific syntax (bindings), plus a few language-specific features. MPI programs always work with processes, but programmers commonly refer to the processes as processors. Typically, for maximum performance, each CPU (or core in a multi-core machine) will be assigned just a single process. This assignment happens at runtime through the agent that starts the MPI program, normally called mpirun or mpiexec.
MPI library functions include, but are not limited to, point-to-point rendezvous-type send/receive operations, choosing between a Cartesian or graph-like logical process topology, exchanging data between process pairs (send/receive operations), combining partial results of computations (gather and reduce operations), synchronizing nodes (barrier operation) as well as obtaining network-related information such as the number of processes in the computing session, current processor identity that a process is mapped to, neighboring processes accessible in a logical topology, and so on. Point-to-point operations come in synchronous, asynchronous, buffered, and ready forms, to allow both relatively stronger and weaker semantics for the synchronization aspects of a rendezvous-send. Many outstanding[ clarification needed ] operations are possible in asynchronous mode, in most implementations.
MPI-1 and MPI-2 both enable implementations that overlap communication and computation, but practice and theory differ. MPI also specifies thread safe interfaces, which have cohesion and coupling strategies that help avoid hidden state within the interface. It is relatively easy to write multithreaded point-to-point MPI code, and some implementations support such code. Multithreaded collective communication is best accomplished with multiple copies of Communicators, as described below.
MPI provides several features. The following concepts provide context for all of those abilities and help the programmer to decide what functionality to use in their application programs. Four of MPI's eight basic concepts are unique to MPI-2.
Communicator objects connect groups of processes in the MPI session. Each communicator gives each contained process an independent identifier and arranges its contained processes in an ordered topology. MPI also has explicit groups, but these are mainly good for organizing and reorganizing groups of processes before another communicator is made. MPI understands single group intracommunicator operations, and bilateral intercommunicator communication. In MPI-1, single group operations are most prevalent. Bilateral operations mostly appear in MPI-2 where they include collective communication and dynamic in-process management.
Communicators can be partitioned using several MPI commands. These commands include MPI_COMM_SPLIT
, where each process joins one of several colored sub-communicators by declaring itself to have that color.
A number of important MPI functions involve communication between two specific processes. A popular example is MPI_Send
, which allows one specified process to send a message to a second specified process. Point-to-point operations, as these are called, are particularly useful in patterned or irregular communication, for example, a data-parallel architecture in which each processor routinely swaps regions of data with specific other processors between calculation steps, or a master–slave architecture in which the master sends new task data to a slave whenever the prior task is completed.
MPI-1 specifies mechanisms for both blocking and non-blocking point-to-point communication mechanisms, as well as the so-called 'ready-send' mechanism whereby a send request can be made only when the matching receive request has already been made.
Collective functions involve communication among all processes in a process group (which can mean the entire process pool or a program-defined subset). A typical function is the MPI_Bcast
call (short for "broadcast"). This function takes data from one node and sends it to all processes in the process group. A reverse operation is the MPI_Reduce
call, which takes data from all processes in a group, performs an operation (such as summing), and stores the results on one node. MPI_Reduce
is often useful at the start or end of a large distributed calculation, where each processor operates on a part of the data and then combines it into a result.
Other operations perform more sophisticated tasks, such as MPI_Alltoall
which rearranges n items of data such that the nth node gets the nth item of data from each.
Many MPI functions require that you specify the type of data which is sent between processes. This is because MPI aims to support heterogeneous environments where types might be represented differently on the different nodes [16] (for example they might be running different CPU architectures that have different endianness), in which case MPI implementations can perform data conversion. [16] Since the C language does not allow a type itself to be passed as a parameter, MPI predefines the constants MPI_INT
, MPI_CHAR
, MPI_DOUBLE
to correspond with int
, char
, double
, etc.
Here is an example in C that passes arrays of int
s from all processes to one. The one receiving process is called the "root" process, and it can be any designated process but normally it will be process 0. All the processes ask to send their arrays to the root with MPI_Gather
, which is equivalent to having each process (including the root itself) call MPI_Send
and the root make the corresponding number of ordered MPI_Recv
calls to assemble all of these arrays into a larger one: [17]
intsend_array[100];introot=0;/* or whatever */intnum_procs,*recv_array;MPI_Comm_size(comm,&num_procs);recv_array=malloc(num_procs*sizeof(send_array));MPI_Gather(send_array,sizeof(send_array)/sizeof(*send_array),MPI_INT,recv_array,sizeof(send_array)/sizeof(*send_array),MPI_INT,root,comm);
However, you may instead wish to send data as one block as opposed to 100 int
s. To do this define a "contiguous block" derived data type:
MPI_Datatypenewtype;MPI_Type_contiguous(100,MPI_INT,&newtype);MPI_Type_commit(&newtype);MPI_Gather(array,1,newtype,receive_array,1,newtype,root,comm);
For passing a class or a data structure, MPI_Type_create_struct
creates an MPI derived data type from MPI_predefined
data types, as follows:
intMPI_Type_create_struct(intcount,int*blocklen,MPI_Aint*disp,MPI_Datatype*type,MPI_Datatype*newtype)
where:
count
is a number of blocks, and specifies the length (in elements) of the arrays blocklen
, disp
, and type
.blocklen
contains numbers of elements in each block,disp
contains byte displacements of each block,type
contains types of element in each block.newtype
(an output) contains the new derived type created by this functionThe disp
(displacements) array is needed for data structure alignment, since the compiler may pad the variables in a class or data structure. The safest way to find the distance between different fields is by obtaining their addresses in memory. This is done with MPI_Get_address
, which is normally the same as C's &
operator but that might not be true when dealing with memory segmentation. [18]
Passing a data structure as one block is significantly faster than passing one item at a time, especially if the operation is to be repeated. This is because fixed-size blocks do not require serialization during transfer. [19]
Given the following data structures:
structA{intf;shortp;};structB{structAa;intpp,vp;};
Here's the C code for building an MPI-derived data type:
staticconstintblocklen[]={1,1,1,1};staticconstMPI_Aintdisp[]={offsetof(structB,a)+offsetof(structA,f),offsetof(structB,a)+offsetof(structA,p),offsetof(structB,pp),offsetof(structB,vp)};staticMPI_Datatypetype[]={MPI_INT,MPI_SHORT,MPI_INT,MPI_INT};MPI_Datatypenewtype;MPI_Type_create_struct(sizeof(type)/sizeof(*type),blocklen,disp,type,&newtype);MPI_Type_commit(&newtype);
MPI-2 defines three one-sided communications operations, MPI_Put
, MPI_Get
, and MPI_Accumulate
, being a write to remote memory, a read from remote memory, and a reduction operation on the same memory across a number of tasks, respectively. Also defined are three different methods to synchronize this communication (global, pairwise, and remote locks) as the specification does not guarantee that these operations have taken place until a synchronization point.
These types of call can often be useful for algorithms in which synchronization would be inconvenient (e.g. distributed matrix multiplication), or where it is desirable for tasks to be able to balance their load while other processors are operating on data.
This section needs expansion. You can help by adding to it. (June 2008) |
The key aspect is "the ability of an MPI process to participate in the creation of new MPI processes or to establish communication with MPI processes that have been started separately." The MPI-2 specification describes three main interfaces by which MPI processes can dynamically establish communications, MPI_Comm_spawn
, MPI_Comm_accept
/MPI_Comm_connect
and MPI_Comm_join
. The MPI_Comm_spawn
interface allows an MPI process to spawn a number of instances of the named MPI process. The newly spawned set of MPI processes form a new MPI_COMM_WORLD
intracommunicator but can communicate with the parent and the intercommunicator the function returns. MPI_Comm_spawn_multiple
is an alternate interface that allows the different instances spawned to be different binaries with different arguments. [20]
This section needs expansion. You can help by adding to it. (June 2008) |
The parallel I/O feature is sometimes called MPI-IO, [21] and refers to a set of functions designed to abstract I/O management on distributed systems to MPI, and allow files to be easily accessed in a patterned way using the existing derived datatype functionality.
The little research that has been done on this feature indicates that it may not be trivial to get high performance gains by using MPI-IO. For example, an implementation of sparse matrix-vector multiplications using the MPI I/O library shows a general behavior of minor performance gain, but these results are inconclusive. [22] It was not until the idea of collective I/O [23] implemented into MPI-IO that MPI-IO started to reach widespread adoption. Collective I/O substantially boosts applications' I/O bandwidth by having processes collectively transform the small and noncontiguous I/O operations into large and contiguous ones, thereby reducing the locking and disk seek overhead. Due to its vast performance benefits, MPI-IO also became the underlying I/O layer for many state-of-the-art I/O libraries, such as HDF5 and Parallel NetCDF. Its popularity also triggered research on collective I/O optimizations, such as layout-aware I/O [24] and cross-file aggregation. [25] [26]
Many other efforts are derivatives of MPICH, LAM, and other works, including, but not limited to, commercial implementations from HPE, Intel, Microsoft, and NEC.
While the specifications mandate a C and Fortran interface, the language used to implement MPI is not constrained to match the language or languages it seeks to support at runtime. Most implementations combine C, C++ and assembly language, and target C, C++, and Fortran programmers. Bindings are available for many other languages, including Perl, Python, R, Ruby, Java, and CL (see #Language bindings).
The ABI of MPI implementations are roughly split between MPICH and Open MPI derivatives, so that a library from one family works as a drop-in replacement of one from the same family, but direct replacement across families is impossible. The French CEA maintains a wrapper interface to facilitate such switches. [27]
MPI hardware research focuses on implementing MPI directly in hardware, for example via processor-in-memory, building MPI operations into the microcircuitry of the RAM chips in each node. By implication, this approach is independent of language, operating system, and CPU, but cannot be readily updated or removed.
Another approach has been to add hardware acceleration to one or more parts of the operation, including hardware processing of MPI queues and using RDMA to directly transfer data between memory and the network interface controller without CPU or OS kernel intervention.
mpicc (and similarly mpic++, mpif90, etc.) is a program that wraps over an existing compiler to set the necessary command-line flags when compiling code that uses MPI. Typically, it adds a few flags that enable the code to be the compiled and linked against the MPI library. [28]
Bindings are libraries that extend MPI support to other languages by wrapping an existing MPI implementation such as MPICH or Open MPI.
The two managed Common Language Infrastructure .NET implementations are Pure Mpi.NET [29] and MPI.NET, [30] a research effort at Indiana University licensed under a BSD-style license. It is compatible with Mono, and can make full use of underlying low-latency MPI network fabrics.
Although Java does not have an official MPI binding, several groups attempt to bridge the two, with different degrees of success and compatibility. One of the first attempts was Bryan Carpenter's mpiJava, [31] essentially a set of Java Native Interface (JNI) wrappers to a local C MPI library, resulting in a hybrid implementation with limited portability, which also has to be compiled against the specific MPI library being used.
However, this original project also defined the mpiJava API [32] (a de facto MPI API for Java that closely followed the equivalent C++ bindings) which other subsequent Java MPI projects adopted. One less-used API is MPJ API, which was designed to be more object-oriented and closer to Sun Microsystems' coding conventions. [33] Beyond the API, Java MPI libraries can be either dependent on a local MPI library, or implement the message passing functions in Java, while some like P2P-MPI also provide peer-to-peer functionality and allow mixed-platform operation.
Some of the most challenging parts of Java/MPI arise from Java characteristics such as the lack of explicit pointers and the linear memory address space for its objects, which make transferring multidimensional arrays and complex objects inefficient. Workarounds usually involve transferring one line at a time and/or performing explicit de-serialization and casting at both the sending and receiving ends, simulating C or Fortran-like arrays by the use of a one-dimensional array, and pointers to primitive types by the use of single-element arrays, thus resulting in programming styles quite far from Java conventions.
Another Java message passing system is MPJ Express. [34] Recent versions can be executed in cluster and multicore configurations. In the cluster configuration, it can execute parallel Java applications on clusters and clouds. Here Java sockets or specialized I/O interconnects like Myrinet can support messaging between MPJ Express processes. It can also utilize native C implementation of MPI using its native device. In the multicore configuration, a parallel Java application is executed on multicore processors. In this mode, MPJ Express processes are represented by Java threads.
There are a few academic implementations of MPI using MATLAB. MATLAB has its own parallel extension library implemented using MPI and PVM.
The OCamlMPI Module [36] implements a large subset of MPI functions and is in active use in scientific computing. An 11,000-line OCaml program was "MPI-ified" using the module, with an additional 500 lines of code and slight restructuring and ran with excellent results on up to 170 nodes in a supercomputer. [37]
PARI/GP can be built [38] to use MPI as its multi-thread engine, allowing to run parallel PARI and GP programs on MPI clusters unmodified.
Actively maintained MPI wrappers for Python include: mpi4py, [39] numba-mpi [40] and numba-jax. [41]
Discontinued developments include: pyMPI, pypar, [42] MYMPI [43] and the MPI submodule in ScientificPython.
R bindings of MPI include Rmpi [44] and pbdMPI, [45] where Rmpi focuses on manager-workers parallelism while pbdMPI focuses on SPMD parallelism. Both implementations fully support Open MPI or MPICH2.
Here is a "Hello, World!" program in MPI written in C. In this example, we send a "hello" message to each processor, manipulate it trivially, return the results to the main process, and print the messages.
/* "Hello World" MPI Test Program*/#include<assert.h>#include<stdio.h>#include<string.h>#include<mpi.h>intmain(intargc,char**argv){charbuf[256];intmy_rank,num_procs;/* Initialize the infrastructure necessary for communication */MPI_Init(&argc,&argv);/* Identify this process */MPI_Comm_rank(MPI_COMM_WORLD,&my_rank);/* Find out how many total processes are active */MPI_Comm_size(MPI_COMM_WORLD,&num_procs);/* Until this point, all programs have been doing exactly the same. Here, we check the rank to distinguish the roles of the programs */if(my_rank==0){intother_rank;printf("We have %i processes.\n",num_procs);/* Send messages to all other processes */for(other_rank=1;other_rank<num_procs;other_rank++){sprintf(buf,"Hello %i!",other_rank);MPI_Send(buf,256,MPI_CHAR,other_rank,0,MPI_COMM_WORLD);}/* Receive messages from all other processes */for(other_rank=1;other_rank<num_procs;other_rank++){MPI_Recv(buf,256,MPI_CHAR,other_rank,0,MPI_COMM_WORLD,MPI_STATUS_IGNORE);printf("%s\n",buf);}}else{/* Receive message from process #0 */MPI_Recv(buf,256,MPI_CHAR,0,0,MPI_COMM_WORLD,MPI_STATUS_IGNORE);assert(memcmp(buf,"Hello ",6)==0);/* Send message to process #0 */sprintf(buf,"Process %i reporting for duty.",my_rank);MPI_Send(buf,256,MPI_CHAR,0,0,MPI_COMM_WORLD);}/* Tear down the communication infrastructure */MPI_Finalize();return0;}
When run with 4 processes, it should produce the following output: [46]
$ mpicc example.c && mpiexec -n 4 ./a.out We have 4 processes. Process 1 reporting for duty. Process 2 reporting for duty. Process 3 reporting for duty.
Here, mpiexec
is a command used to execute the example program with 4 processes, each of which is an independent instance of the program at run time and assigned ranks (i.e. numeric IDs) 0, 1, 2, and 3. The name mpiexec
is recommended by the MPI standard, although some implementations provide a similar command under the name mpirun
. The MPI_COMM_WORLD
is the communicator that consists of all the processes.
A single program, multiple data (SPMD) programming model is thereby facilitated, but not required; many MPI implementations allow multiple, different, executables to be started in the same MPI job. Each process has its own rank, the total number of processes in the world, and the ability to communicate between them either with point-to-point (send/receive) communication, or by collective communication among the group. It is enough for MPI to provide an SPMD-style program with MPI_COMM_WORLD
, its own rank, and the size of the world to allow algorithms to decide what to do. In more realistic situations, I/O is more carefully managed than in this example. MPI does not stipulate how standard I/O (stdin, stdout, stderr) should work on a given system. It generally works as expected on the rank-0 process, and some implementations also capture and funnel the output from other processes.
MPI uses the notion of process rather than processor. Program copies are mapped to processors by the MPI runtime. In that sense, the parallel machine can map to one physical processor, or to N processors, where N is the number of available processors, or even something in between. For maximum parallel speedup, more physical processors are used. This example adjusts its behavior to the size of the world N, so it also seeks to scale to the runtime configuration without compilation for each size variation, although runtime decisions might vary depending on that absolute amount of concurrency available.
Adoption of MPI-1.2 has been universal, particularly in cluster computing, but acceptance of MPI-2.1 has been more limited. Issues include:
Some aspects of the MPI's future appear solid; others less so. The MPI Forum reconvened in 2007 to clarify some MPI-2 issues and explore developments for a possible MPI-3, which resulted in versions MPI-3.0 (September 2012) and MPI-3.1 (June 2015).
Architectures are changing, with greater internal concurrency (multi-core), better fine-grained concurrency control (threading, affinity), and more levels of memory hierarchy. Multithreaded programs can take advantage of these developments more easily than single-threaded applications. This has already yielded separate, complementary standards for symmetric multiprocessing, namely OpenMP. MPI-2 defines how standard-conforming implementations should deal with multithreaded issues, but does not require that implementations be multithreaded, or even thread-safe. MPI-3 adds the ability to use shared-memory parallelism within a node. Implementations of MPI such as Adaptive MPI, Hybrid MPI, Fine-Grained MPI, MPC and others offer extensions to the MPI standard that address different challenges in MPI.
Astrophysicist Jonathan Dursi wrote an opinion piece calling MPI obsolescent, pointing to newer technologies like the Chapel language, Unified Parallel C, Hadoop, Spark and Flink. [47] At the same time, nearly all of the projects in the Exascale Computing Project build explicitly on MPI; MPI has been shown to scale to the largest machines as of the early 2020s and is widely considered to stay relevant for a long time to come.
C is a general-purpose programming language. It was created in the 1970s by Dennis Ritchie and remains very widely used and influential. By design, C's features cleanly reflect the capabilities of the targeted CPUs. It has found lasting use in operating systems code, device drivers, and protocol stacks, but its use in application software has been decreasing. C is commonly used on computer architectures that range from the largest supercomputers to the smallest microcontrollers and embedded systems.
In distributed computing, a remote procedure call (RPC) is when a computer program causes a procedure (subroutine) to execute in a different address space, which is written as if it were a normal (local) procedure call, without the programmer explicitly writing the details for the remote interaction. That is, the programmer writes essentially the same code whether the subroutine is local to the executing program, or remote. This is a form of client–server interaction, typically implemented via a request–response message passing system. In the object-oriented programming paradigm, RPCs are represented by remote method invocation (RMI). The RPC model implies a level of location transparency, namely that calling procedures are largely the same whether they are local or remote, but usually, they are not identical, so local calls can be distinguished from remote calls. Remote calls are usually orders of magnitude slower and less reliable than local calls, so distinguishing them is important.
D, also known as dlang, is a multi-paradigm system programming language created by Walter Bright at Digital Mars and released in 2001. Andrei Alexandrescu joined the design and development effort in 2007. Though it originated as a re-engineering of C++, D is now a very different language. As it has developed, it has drawn inspiration from other high-level programming languages. Notably, it has been influenced by Java, Python, Ruby, C#, and Eiffel.
In computer science, message queues and mailboxes are software-engineering components typically used for inter-process communication (IPC), or for inter-thread communication within the same process. They use a queue for messaging – the passing of control or of content. Group communication systems provide similar kinds of functionality.
In computer science, a pointer is an object in many programming languages that stores a memory address. This can be that of another value located in computer memory, or in some cases, that of memory-mapped computer hardware. A pointer references a location in memory, and obtaining the value stored at that location is known as dereferencing the pointer. As an analogy, a page number in a book's index could be considered a pointer to the corresponding page; dereferencing such a pointer would be done by flipping to the page with the given page number and reading the text found on that page. The actual format and content of a pointer variable is dependent on the underlying computer architecture.
In computer science, Linda is a coordination model that aids communication in parallel computing environments. Developed by David Gelernter, it is meant to be used alongside a full-fledged computation language like Fortran or C where Linda's role is to "create computational activities and to support communication among them".
MPICH, formerly known as MPICH2, is a freely available, portable implementation of MPI, a standard for message-passing for distributed-memory applications used in parallel computing. MPICH is Free and open source software with some public domain components that were developed by a US governmental organisation, and is available for most flavours of Unix-like OS.
In computing, single program, multiple data (SPMD) is a term that has been used to refer to computational models for exploiting parallelism where-by multiple processors cooperate in the execution of a program in order to obtain results faster.
Concurrent computing is a form of computing in which several computations are executed concurrently—during overlapping time periods—instead of sequentially—with one completing before the next starts.
sizeof is a unary operator in the programming languages C and C++. It generates the storage size of an expression or a data type, measured in the number of char-sized units. Consequently, the construct sizeof (char) is guaranteed to be 1. The actual number of bits of type char is specified by the preprocessor macro CHAR_BIT, defined in the standard include file limits.h. On most modern computing platforms this is eight bits. The result of sizeof has an unsigned integer type that is usually denoted by size_t.
In client-server computing, a Unix domain socket is a Berkeley socket that allows data to be exchanged between two processes executing on the same Unix or Unix-like host computer. This is similar to an Internet domain socket that allows data to be exchanged between two processes executing on different host computers.
The Parallel Virtual File System (PVFS) is an open-source parallel file system. A parallel file system is a type of distributed file system that distributes file data across multiple servers and provides for concurrent access by multiple tasks of a parallel application. PVFS was designed for use in large scale cluster computing. PVFS focuses on high performance access to large data sets. It consists of a server process and a client library, both of which are written entirely of user-level code. A Linux kernel module and pvfs-client process allow the file system to be mounted and used with standard utilities. The client library provides for high performance access via the message passing interface (MPI). PVFS is being jointly developed between The Parallel Architecture Research Laboratory at Clemson University and the Mathematics and Computer Science Division at Argonne National Laboratory, and the Ohio Supercomputer Center. PVFS development has been funded by NASA Goddard Space Flight Center, The DOE Office of Science Advanced Scientific Computing Research program, NSF PACI and HECURA programs, and other government and private agencies. PVFS is now known as OrangeFS in its newest development branch.
Charm++ is a parallel object-oriented programming paradigm based on C++ and developed in the Parallel Programming Laboratory at the University of Illinois at Urbana–Champaign. Charm++ is designed with the goal of enhancing programmer productivity by providing a high-level abstraction of a parallel program while at the same time delivering good performance on a wide variety of underlying hardware platforms. Programs written in Charm++ are decomposed into a number of cooperating message-driven objects called chares. When a programmer invokes a method on an object, the Charm++ runtime system sends a message to the invoked object, which may reside on the local processor or on a remote processor in a parallel computation. This message triggers the execution of code within the chare to handle the message asynchronously.
An Active message is a messaging object capable of performing processing on its own. It is a lightweight messaging protocol used to optimize network communications with an emphasis on reducing latency by removing software overheads associated with buffering and providing applications with direct user-level access to the network hardware. This contrasts with traditional computer-based messaging systems in which messages are passive entities with no processing power.
A computer cluster is a set of computers that work together so that they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software. The newest manifestation of cluster computing is cloud computing.
Go is a statically typed, compiled high-level programming language designed at Google by Robert Griesemer, Rob Pike, and Ken Thompson. It is syntactically similar to C, but also has memory safety, garbage collection, structural typing, and CSP-style concurrency. It is often referred to as Golang because of its former domain name, golang.org
, but its proper name is Go.
In computing, algorithmic skeletons, or parallelism patterns, are a high-level parallel programming model for parallel and distributed computing.
Ateji PX is an object-oriented programming language extension for Java. It is intended to facilliate parallel computing on multi-core processors, GPU, Grid and Cloud.
Message passing is an inherent element of all computer clusters. All computer clusters, ranging from homemade Beowulfs to some of the fastest supercomputers in the world, rely on message passing to coordinate the activities of the many nodes they encompass. Message passing in computer clusters built with commodity servers and switches is used by virtually every internet service.
SHMEM is a family of parallel programming libraries, providing one-sided, RDMA, parallel-processing interfaces for low-latency distributed-memory supercomputers. The SHMEM acronym was subsequently reverse engineered to mean "Symmetric Hierarchical MEMory”. Later it was expanded to distributed memory parallel computer clusters, and is used as parallel programming interface or as low-level interface to build partitioned global address space (PGAS) systems and languages. “Libsma”, the first SHMEM library, was created by Richard Smith at Cray Research in 1993 as a set of thin interfaces to access the CRAY T3D's inter-processor-communication hardware. SHMEM has been implemented by Cray Research, SGI, Cray Inc., Quadrics, HP, GSHMEM, IBM, QLogic, Mellanox, Universities of Houston and Florida; there is also open-source OpenSHMEM.