Partitioned global address space

Last updated November 01, 2024

In computer science, partitioned global address space (PGAS) is a parallel programming model paradigm. PGAS is typified by communication operations involving a global memory address space abstraction that is logically partitioned, where a portion is local to each process, thread, or processing element.^[1]^[2] The novelty of PGAS is that the portions of the shared memory space may have an affinity for a particular process, thereby exploiting locality of reference in order to improve performance. A PGAS memory model is featured in various parallel programming languages and libraries, including: Coarray Fortran, Unified Parallel C, Split-C, Fortress, Chapel, X10, UPC++, Coarray C++, Global Arrays, DASH and SHMEM. The PGAS paradigm is now an integrated part of the Fortran language, as of Fortran 2008 which standardized coarrays.

The various languages and libraries offering a PGAS memory model differ widely in other details, such as the base programming language and the mechanisms used to express parallelism. Many PGAS systems combine the advantages of a SPMD programming style for distributed memory systems (as employed by MPI) with the data referencing semantics of shared memory systems. In contrast to message passing, PGAS programming models frequently offer one-sided communication operations such as Remote Memory Access (RMA), whereby one processing element may directly access memory with affinity to a different (potentially remote) process, without explicit semantic involvement by the passive target process. PGAS offers more efficiency and scalability than traditional shared-memory approaches with a flat address space, because hardware-specific data locality can be explicitly exposed in the semantic partitioning of the address space.

A variant of the PGAS paradigm, asynchronous partitioned global address space (APGAS) augments the programming model with facilities for both local and remote asynchronous task creation.^[3] Two programming languages that use this model are Chapel and X10.

Examples

Coarray Fortran ^[4]^[5] now an integrated part of the language as of Fortran 2008 ^[6]
Unified Parallel C ^[7]^[8]^[9] an explicitly parallel SPMD dialect of the ISO C programming language
Chapel ^[10] a parallel language originally developed by Cray under the DARPA HPCS project
UPC++,^[11] A C++ template library that provides PGAS communication operations designed to support high-performance computing on exascale supercomputers, including Remote Memory Access (RMA) and Remote Procedure Call (RPC)
Coarray C++ ^[12] a C++ library developed by Cray, providing a close analog to Fortran coarray functionality
Global Arrays ^[13] a library supporting parallel scientific computing on distributed arrays
DASH ^[14] a C++ template library for distributed data structures with support for hierarchical locality
SHMEM a family of libraries supporting parallel scientific computing on distributed arrays
X10 ^[15] a parallel language developed by IBM under the DARPA HPCS project
Fortress a parallel language developed by Sun Microsystems under the DARPA HPCS project
Titanium ^[16]^[17] an explicitly parallel dialect of Java developed at UC Berkeley to support scientific high-performance computing on large-scale multiprocessors
Split-C ^[18] a parallel extension of the C programming language that supports efficient access to a global address space
The Adapteva Epiphany architecture is a manycore network on a chip processor with scratchpad memory addressable between cores.

External links

An Introduction to the Partitioned Global Address Space Model
Programming in the Partitioned Global Address Space Model Archived 2010-06-12 at the Wayback Machine (2003)
GASNet Communication System - provides a software infrastructure for PGAS languages over high-performance networks^[19]

Related Research Articles

Fortran is a third generation, compiled, imperative programming language that is especially suited to numeric computation and scientific computing.

The Message Passing Interface (MPI) is a standardized and portable message-passing standard designed to function on parallel computing architectures. The MPI standard defines the syntax and semantics of library routines that are useful to a wide range of users writing portable message-passing programs in C, C++, and Fortran. There are several open-source MPI implementations, which fostered the development of a parallel software industry, and encouraged development of portable and scalable large-scale parallel applications.

Coarray Fortran (CAF), formerly known as F--, started as an extension of Fortran 95/2003 for parallel processing created by Robert Numrich and John Reid in the 1990s. The Fortran 2008 standard now includes coarrays, as decided at the May 2005 meeting of the ISO Fortran Committee; the syntax in the Fortran 2008 standard is slightly different from the original CAF proposal.

Unified Parallel C (UPC) is an extension of the C programming language designed for high-performance computing on large-scale parallel machines, including those with a common global address space and those with distributed memory. The programmer is presented with a single partitioned global address space; where shared variables may be directly read and written by any processor, but each variable is physically associated with a single processor. UPC uses a single program, multiple data (SPMD) model of computation in which the amount of parallelism is fixed at program startup time, typically with a single thread of execution per processor.

In computing, single program, multiple data (SPMD) is a term that has been used to refer to computational models for exploiting parallelism where-by multiple processors cooperate in the execution of a program in order to obtain results faster.

X10 is a programming language being developed by IBM at the Thomas J. Watson Research Center as part of the Productive, Easy-to-use, Reliable Computing System (PERCS) project funded by DARPA's High Productivity Computing Systems (HPCS) program.

In computing, a parallel programming model is an abstraction of parallel computer architecture, with which it is convenient to express algorithms and their composition in programs. The value of a programming model can be judged on its generality: how well a range of different problems can be expressed for a variety of different architectures, and its performance: how efficiently the compiled programs can execute. The implementation of a parallel programming model can take the form of a library invoked from a programming language, as an extension to an existing languages.

Concurrent computing is a form of computing in which several computations are executed concurrently—during overlapping time periods—instead of sequentially—with one completing before the next starts.

In computer science, stream processing is a programming paradigm which views streams, or sequences of events in time, as the central input and output objects of computation. Stream processing encompasses dataflow programming, reactive programming, and distributed data processing. Stream processing systems aim to expose parallel processing for data streams and rely on streaming algorithms for efficient implementation. The software stack for these systems includes components such as programming models and query languages, for expressing computation; stream management systems, for distribution and scheduling; and hardware components for acceleration including floating-point units, graphics processing units, and field-programmable gate arrays.

Split-C is a parallel extension of the C programming language. The Split-C project website describes Split-C as:

a parallel extension of the C programming language that supports efficient access to a global address space on current distributed memory multiprocessors. It retains the "small language" character of C and supports careful engineering and optimization of programs by providing a simple, predictable cost model.

Chapel, the Cascade High Productivity Language, is a parallel programming language that was developed by Cray, and later by Hewlett Packard Enterprise which acquired Cray. It was being developed as part of the Cray Cascade project, a participant in DARPA's High Productivity Computing Systems (HPCS) program, which had the goal of increasing supercomputer productivity by 2010. It is being developed as an open source project, under version 2 of the Apache license.

In computing, CUDA is a proprietary parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for accelerated general-purpose processing, an approach called general-purpose computing on GPUs (GPGPU). CUDA API and its runtime: The CUDA API is an extension of the C programming language that adds the ability to specify thread-level parallelism in C and also to specify GPU device specific operations. CUDA is a software layer that gives direct access to the GPU's virtual instruction set and parallel computational elements for the execution of compute kernels. In addition to drivers and runtime kernels, the CUDA platform includes compilers, libraries and developer tools to help programmers accelerate their applications.

Global Arrays, or GA, is the library developed by scientists at Pacific Northwest National Laboratory for parallel computing. GA provides a friendly API for shared-memory programming on distributed-memory computers for multidimensional arrays. The GA library is a predecessor to the GAS languages currently being developed for high-performance computing.

In computing, algorithmic skeletons, or parallelism patterns, are a high-level parallel programming model for parallel and distributed computing.

<span class="mw-page-title-main">Katherine Yelick</span> American computer scientist and academic

Katherine "Kathy" Anne Yelick, an American computer scientist, is the vice chancellor for research and the Robert S. Pepper Professor of Electrical Engineering and Computer Sciences at the University of California, Berkeley. She is also a faculty scientist at Lawrence Berkeley National Laboratory, where she was Associate Laboratory Director for Computing Sciences from 2010–2019.

SHMEM is a family of parallel programming libraries, providing one-sided, RDMA, parallel-processing interfaces for low-latency distributed-memory supercomputers. The SHMEM acronym was subsequently reverse engineered to mean "Symmetric Hierarchical MEMory”. Later it was expanded to distributed memory parallel computer clusters, and is used as parallel programming interface or as low-level interface to build partitioned global address space (PGAS) systems and languages. “Libsma”, the first SHMEM library, was created by Richard Smith at Cray Research in 1993 as a set of thin interfaces to access the CRAY T3D's inter-processor-communication hardware. SHMEM has been implemented by Cray Research, SGI, Cray Inc., Quadrics, HP, GSHMEM, IBM, QLogic, Mellanox, Universities of Houston and Florida; there is also open-source OpenSHMEM.

In computing, a memory access pattern or IO access pattern is the pattern with which a system or program reads and writes memory on secondary storage. These patterns differ in the level of locality of reference and drastically affect cache performance, and also have implications for the approach to parallelism and distribution of workload in shared memory systems. Further, cache coherency issues can affect multiprocessor performance, which means that certain memory access patterns place a ceiling on parallelism.

Richard Vuduc is a tenured professor of computer science at the Georgia Institute of Technology. His research lab, The HPC Garage, studies high-performance computing, scientific computing, parallel algorithms, modeling, and engineering. He is a member of the Association for Computing Machinery (ACM). As of 2022, Vuduc serves as Vice President of the SIAM Activity Group on Supercomputing. He has co-authored over 200 articles in peer-reviewed journals and conferences.

The Center for Supercomputing Research and Development (CSRD) at the University of Illinois (UIUC) was a research center funded from 1984 to 1993. It built the shared memory Cedar computer system, which included four hardware multiprocessor clusters, as well as parallel system and applications software. It was distinguished from the four earlier UIUC Illiac systems by starting with commercial shared memory subsystems that were based on an earlier paper published by the CSRD founders. Thus CSRD was able to avoid many of the hardware design issues that slowed the Illiac series work. Over its 9 years of major funding, plus follow-on work by many of its participants, CSRD pioneered many of the shared memory architectural and software technologies upon which all 21st century computation is based.

References

↑ Almasi, George. "PGAS (Partitioned Global Address Space) Languages.", Encyclopedia of Parallel Computing, Springer, (2011): 1539-1545. https://doi.org/10.1007/978-0-387-09766-4_210
↑ Cristian Coarfă; Yuri Dotsenko; John Mellor-Crummey, "An Evaluation of Global Address Space Languages: Co-Array Fortran and Unified Parallel C"
↑ Tim Stitt, "An Introduction to the Partitioned Global Address Space (PGAS) Programming Model"
↑ Numrich, R.W., Reid, J., Co-array Fortran for parallel programming. ACM SIGPLAN Fortran Forum 17(2), 1–31 (1998).
↑ J. Reid: Coarrays in the Next Fortran Standard. SIGPLAN Fortran Forum 29(2), 10–27 (July 2010)
↑ GCC wiki, Coarray support in gfortran as specified in the Fortran 2008 standard
↑ W. Chen, D. Bonachea, J. Duell, P. Husbands, C. Iancu, K. Yelick. A Performance Analysis of the Berkeley UPC Compiler 17th Annual International Conference on Supercomputing (ICS), 2003. https://doi.org/10.1145/782814.782825
↑ Tarek El-Ghazawi, William Carlson, Thomas Sterling, and Katherine Yelick. UPC: distributed shared memory programming. John Wiley & Sons, 2005.
↑ UPC Consortium, UPC Language and Library Specifications, v1.3, Lawrence Berkeley National Lab Tech Report LBNL-6623E, Nov 2013. https://doi.org/10.2172/1134233
↑ Bradford L. Chamberlain, Chapel, Programming Models for Parallel Computing, edited by Pavan Balaji, MIT Press, November 2015.
↑ John Bachan, Scott B. Baden, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Dan Bonachea, Paul H. Hargrove, Hadia Ahmed. "UPC++: A High-Performance Communication Framework for Asynchronous Computation", In 33rd IEEE International Parallel & Distributed Processing Symposium (IPDPS'19), May 20–24, 2019. https://doi.org/10.25344/S4V88H
↑ T. A. Johnson: Coarray C++. Proceedings of the 7th International Conference on PGAS Programming Models. pp. 54–66. PGAS’13 (2013),
↑ Nieplocha, Jaroslaw; Harrison, Robert J.; Littlefield, Richard J. (1996). Global arrays: A nonuniform memory access programming model for high-performance computers. The Journal of Supercomputing. 10 (2): 169–189.
↑ K. Furlinger, C. Glass, A. Knupfer, J. Tao, D. Hunich, et al. DASH: Data Structures and Algorithms with Support for Hierarchical Locality. Euro-Par Parallel Processing Workshops (2014).
↑ P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, et al. X10: an object-oriented approach to nonuniform cluster computing. Proceedings of the 20th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’05) (2005).
↑ Katherine Yelick, Paul Hilfinger, Susan Graham, Dan Bonachea, Jimmy Su, Amir Kamil, Kaushik Datta, Phillip Colella, and Tong Wen, "Parallel Languages and Compilers: Perspective from the Titanium Experience", The International Journal Of High Performance Computing Applications, August 1, 2007, 21(3):266-290
↑ Katherine Yelick, Susan Graham, Paul Hilfinger, Dan Bonachea, Jimmy Su, Amir Kamil, Kaushik Datta, Phillip Colella, Tong Wen, "Titanium", Encyclopedia of Parallel Computing, edited by David Padua, (Springer: 2011) Pages: 2049-2055
↑ Culler, D. E., Dusseau, A., Goldstein, S. C., Krishnamurthy, A., Lumetta, S., Von Eicken, T., & Yelick, K. Parallel programming in Split-C. In Supercomputing'93: Proceedings of the 1993 ACM/IEEE conference on Supercomputing (pp. 262-273). IEEE.
↑ Bonachea D, Hargrove P.GASNet-EX: A High-Performance, Portable Communication Library for Exascale Proceedings of Languages and Compilers for Parallel Computing (LCPC'18). Oct 2018. https://doi.org/10.25344/S4QP4W

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Almasi, George. "PGAS (Partitioned Global Address Space) Languages.", Encyclopedia of Parallel Computing, Springer, (2011): 1539-1545. https://doi.org/10.1007/978-0-387-09766-4_210

[2] Cristian Coarfă; Yuri Dotsenko; John Mellor-Crummey, "An Evaluation of Global Address Space Languages: Co-Array Fortran and Unified Parallel C"

[3] Tim Stitt, "An Introduction to the Partitioned Global Address Space (PGAS) Programming Model"

[4] Numrich, R.W., Reid, J., Co-array Fortran for parallel programming. ACM SIGPLAN Fortran Forum 17(2), 1–31 (1998).

[5] J. Reid: Coarrays in the Next Fortran Standard. SIGPLAN Fortran Forum 29(2), 10–27 (July 2010)

[6] GCC wiki, Coarray support in gfortran as specified in the Fortran 2008 standard

[7] W. Chen, D. Bonachea, J. Duell, P. Husbands, C. Iancu, K. Yelick. A Performance Analysis of the Berkeley UPC Compiler 17th Annual International Conference on Supercomputing (ICS), 2003. https://doi.org/10.1145/782814.782825

[8] Tarek El-Ghazawi, William Carlson, Thomas Sterling, and Katherine Yelick. UPC: distributed shared memory programming. John Wiley & Sons, 2005.

[9] UPC Consortium, UPC Language and Library Specifications, v1.3, Lawrence Berkeley National Lab Tech Report LBNL-6623E, Nov 2013. https://doi.org/10.2172/1134233

[10] Bradford L. Chamberlain, Chapel, Programming Models for Parallel Computing, edited by Pavan Balaji, MIT Press, November 2015.

[11] John Bachan, Scott B. Baden, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Dan Bonachea, Paul H. Hargrove, Hadia Ahmed. "UPC++: A High-Performance Communication Framework for Asynchronous Computation", In 33rd IEEE International Parallel & Distributed Processing Symposium (IPDPS'19), May 20–24, 2019. https://doi.org/10.25344/S4V88H

[12] T. A. Johnson: Coarray C++. Proceedings of the 7th International Conference on PGAS Programming Models. pp. 54–66. PGAS’13 (2013),

[13] Nieplocha, Jaroslaw; Harrison, Robert J.; Littlefield, Richard J. (1996). Global arrays: A nonuniform memory access programming model for high-performance computers. The Journal of Supercomputing. 10 (2): 169–189.

[14] K. Furlinger, C. Glass, A. Knupfer, J. Tao, D. Hunich, et al. DASH: Data Structures and Algorithms with Support for Hierarchical Locality. Euro-Par Parallel Processing Workshops (2014).

[15] P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, et al. X10: an object-oriented approach to nonuniform cluster computing. Proceedings of the 20th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’05) (2005).

[16] Katherine Yelick, Paul Hilfinger, Susan Graham, Dan Bonachea, Jimmy Su, Amir Kamil, Kaushik Datta, Phillip Colella, and Tong Wen, "Parallel Languages and Compilers: Perspective from the Titanium Experience", The International Journal Of High Performance Computing Applications, August 1, 2007, 21(3):266-290

[17] Katherine Yelick, Susan Graham, Paul Hilfinger, Dan Bonachea, Jimmy Su, Amir Kamil, Kaushik Datta, Phillip Colella, Tong Wen, "Titanium", Encyclopedia of Parallel Computing, edited by David Padua, (Springer: 2011) Pages: 2049-2055

[18] Culler, D. E., Dusseau, A., Goldstein, S. C., Krishnamurthy, A., Lumetta, S., Von Eicken, T., & Yelick, K. Parallel programming in Split-C. In Supercomputing'93: Proceedings of the 1993 ACM/IEEE conference on Supercomputing (pp. 262-273). IEEE.

[19] Bonachea D, Hargrove P.GASNet-EX: A High-Performance, Portable Communication Library for Exascale Proceedings of Languages and Compilers for Parallel Computing (LCPC'18). Oct 2018. https://doi.org/10.25344/S4QP4W

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

v t e Parallel computing
General	Distributed computing Parallel computing Massively parallel Cloud computing High-performance computing Multiprocessing Manycore processor GPGPU Computer network Systolic array
Levels	Bit Instruction Thread Task Data Memory Loop Pipeline
Multithreading	Temporal Simultaneous (SMT) Simultaneous and heterogenous Speculative (SpMT) Preemptive Cooperative Clustered multi-thread (CMT) Hardware scout
Theory	PRAM model PEM model Analysis of parallel algorithms Amdahl's law Gustafson's law Cost efficiency Karp–Flatt metric Slowdown Speedup
Elements	Process Thread Fiber Instruction window Array
Coordination	Multiprocessing Memory coherence Cache coherence Cache invalidation Barrier Synchronization Application checkpointing
Programming	Stream processing Dataflow programming Models Implicit parallelism Explicit parallelism Concurrency Non-blocking algorithm
Hardware	Flynn's taxonomy SISD SIMD Array processing (SIMT) Pipelined processing Associative processing MISD MIMD Dataflow architecture Pipelined processor Superscalar processor Vector processor Multiprocessor symmetric asymmetric Memory shared distributed distributed shared UMA NUMA COMA Massively parallel computer Computer cluster Beowulf cluster Grid computer Hardware acceleration
APIs	Ateji PX Boost Chapel HPX Charm++ Cilk Coarray Fortran CUDA Dryad C++ AMP Global Arrays GPUOpen MPI OpenMP OpenCL OpenHMPP OpenACC Parallel Extensions PVM pthreads RaftLib ROCm UPC TBB ZPL
Problems	Automatic parallelization Deadlock Deterministic algorithm Embarrassingly parallel Parallel slowdown Race condition Software lockout Scalability Starvation
Category: Parallel computing

Partitioned global address space

Contents

Examples

See also

External links

Related Research Articles

References