MVAPICH

MVAPICH
Stable release	3.0 / February 16, 2024;3 months ago
Operating system	Unix, Linux
Type	Library
License	New BSD License (free software)
Website	mvapich.cse.ohio-state.edu , mug.mvapich.cse.ohio-state.edu

Last updated May 18, 2024

MVAPICH, also known as MVAPICH2, is a BSD-licensed implementation of the MPI standard developed by Ohio State University.^[1]^[2] MVAPICH comes in a number of flavors^[3]:

MVAPICH2 ("renamed" as "MVAPICH" with the 3.0 releases^[4]), with support for InfiniBand, iWARP, RoCE, and Intel Omni-Path
MVAPICH2-X, with support for PGAS and OpenSHMEM
MVAPICH2-GDR, with support for InfiniBand and NVIDIA and AMD GPUs ^[5]
MVAPICH-PLUS, an enhanced fusing of MVAPICH2-X and MVAPICH2-GDR beginning its releases in 2023/24 with the 3.0 series.
MVAPICH2-MIC, with support for InfiniBand and Intel MIC
MVAPICH2-Virt, with support for InfiniBand and SR-IOV
MVAPICH2-EA, which is energy-aware and supports InfiniBand, iWARP, and RoCE

Related Research Articles

InfiniBand (IB) is a computer networking communications standard used in high-performance computing that features very high throughput and very low latency. It is used for data interconnect both among and within computers. InfiniBand is also used as either a direct or switched interconnect between servers and storage systems, as well as an interconnect between storage systems. It is designed to be scalable and uses a switched fabric network topology. Between 2014 and June 2016, it was the most commonly used interconnect in the TOP500 list of supercomputers.

MPICH, formerly known as MPICH2, is a freely available, portable implementation of MPI, a standard for message-passing for distributed-memory applications used in parallel computing. MPICH is Free and open source software with some public domain components that were developed by a US governmental organisation, and is available for most flavours of Unix-like OS.

In computing, remote direct memory access (RDMA) is a direct memory access from the memory of one computer into that of another without involving either one's operating system. This permits high-throughput, low-latency networking, which is especially useful in massively parallel computer clusters.

Lustre is a type of parallel distributed file system, generally used for large-scale cluster computing. The name Lustre is a portmanteau word derived from Linux and cluster. Lustre file system software is available under the GNU General Public License and provides high performance file systems for computer clusters ranging in size from small workgroup clusters to large-scale, multi-site systems. Since June 2005, Lustre has consistently been used by at least half of the top ten, and more than 60 of the top 100 fastest supercomputers in the world, including the world's No. 1 ranked TOP500 supercomputer in November 2022, Frontier, as well as previous top supercomputers such as Fugaku, Titan and Sequoia.

Compute Unified Device Architecture (CUDA) is a proprietary parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for accelerated general-purpose processing, an approach called general-purpose computing on GPUs (GPGPU). CUDA API and its runtime: The CUDA API is an extension of the C programming language that adds the ability to specify thread-level parallelism in C and also to specify GPU device specific operations (like moving data between the CPU and the GPU). CUDA is a software layer that gives direct access to the GPU's virtual instruction set and parallel computational elements for the execution of compute kernels. In addition to drivers and runtime kernels, the CUDA platform includes compilers, libraries and developer tools to help programmers accelerate their applications.

The Virtual Interface Architecture (VIA) is an abstract model of a user-level zero-copy network, and is the basis for InfiniBand, iWARP and RoCE. Created by Microsoft, Intel, and Compaq, the original VIA sought to standardize the interface for high-performance network technologies known as System Area Networks.

<span class="mw-page-title-main">OpenFabrics Alliance</span> Organization

The OpenFabrics Alliance is a non-profit organization that promotes remote direct memory access (RDMA) switched fabric technologies for server and storage connectivity. These high-speed data-transport technologies are used in high-performance computing facilities, in research and various industries.

OpenCL is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-programmable gate arrays (FPGAs) and other processors or hardware accelerators. OpenCL specifies programming languages for programming these devices and application programming interfaces (APIs) to control the platform and execute programs on the compute devices. OpenCL provides a standard interface for parallel computing using task- and data-based parallelism.

Video Decode and Presentation API for Unix (VDPAU) is a royalty-free application programming interface (API) as well as its implementation as free and open-source library distributed under the MIT License. VDPAU is also supported by Nvidia.

Brutus is the central high-performance cluster of ETH Zurich. It was introduced to the public in May 2008. A new computing cluster called EULER has been announced and opened to the public in May 2014.

NVM Express (NVMe) or Non-Volatile Memory Host Controller Interface Specification (NVMHCIS) is an open, logical-device interface specification for accessing a computer's non-volatile storage media usually attached via the PCI Express bus. The initial NVM stands for non-volatile memory, which is often NAND flash memory that comes in several physical form factors, including solid-state drives (SSDs), PCIe add-in cards, and M.2 cards, the successor to mSATA cards. NVM Express, as a logical-device interface, has been designed to capitalize on the low latency and internal parallelism of solid-state storage devices.

In computing, Linux-IO (LIO) Target is an open-source implementation of the SCSI target that has become the standard one included in the Linux kernel. Internally, LIO does not initiate sessions, but instead provides one or more Logical Unit Numbers (LUNs), waits for SCSI commands from a SCSI initiator, and performs required input/output data transfers. LIO supports common storage fabrics, including FCoE, Fibre Channel, IEEE 1394, iSCSI, iSCSI Extensions for RDMA (iSER), SCSI RDMA Protocol (SRP) and USB. It is included in most Linux distributions; native support for LIO in QEMU/KVM, libvirt, and OpenStack makes LIO also a storage option for cloud deployments.

RDMA over Converged Ethernet (RoCE) or InfiniBand over Ethernet (IBoE) is a network protocol which allows remote direct memory access (RDMA) over an Ethernet network. It does this by encapsulating an InfiniBand (IB) transport packet over Ethernet. There are multiple RoCE versions. RoCE v1 is an Ethernet link layer protocol and hence allows communication between any two hosts in the same Ethernet broadcast domain. RoCE v2 is an internet layer protocol which means that RoCE v2 packets can be routed. Although the RoCE protocol benefits from the characteristics of a converged Ethernet network, the protocol can also be used on a traditional or non-converged Ethernet network.

Mellanox Technologies Ltd. was an Israeli-American multinational supplier of computer networking products based on InfiniBand and Ethernet technology. Mellanox offered adapters, switches, software, cables and silicon for markets including high-performance computing, data centers, cloud computing, computer data storage and financial services.

High Efficiency Video Coding implementations and products covers the implementations and products of High Efficiency Video Coding (HEVC).

Omni-Path Architecture (OPA) is a high-performance communication architecture developed by Intel. It aims for low communication latency, low power consumption and a high throughput. It directly competes with InfiniBand. Intel planned to develop technology based on this architecture for exascale computing. The current owner of Omni-Path is Cornelis Networks.

The Nvidia DGX represents a series of servers and workstations designed by Nvidia, primarily geared towards enhancing deep learning applications through the use of general-purpose computing on graphics processing units (GPGPU). These systems typically come in a rackmount format featuring high-performance x86 server CPUs on the motherboard.

Singularity is a free and open-source computer program that performs operating-system-level virtualization also known as containerization.

Compute Express Link (CXL) is an open standard for high-speed, high capacity central processing unit (CPU)-to-device and CPU-to-memory connections, designed for high performance data center computers. CXL is built on the serial PCI Express (PCIe) physical and electrical interface and includes PCIe-based block input/output protocol (CXL.io) and new cache-coherent protocols for accessing system memory (CXL.cache) and device memory (CXL.mem). The serial communication and pooling capabilities allows CXL memory to overcome performance and socket packaging limitations of common DIMM memory when implementing high storage capacities.

Taiwania 3 is one of the supercomputers made by Taiwan, and also the newest one. It is placed in the National Center for High-performance Computing of NARLabs. There are 50,400 cores in total with 900 nodes, using Intel Xeon Platinum 8280 2.4 GHz CPU and using CentOS as Operating System. It is an open access for public supercomputer. It is currently open access to scientists and more to do specific research after getting permission from Taiwan's National Center for High-performance Computing. This is the third supercomputer of the Taiwania series. It uses CentOS x86_64 7.8 as its system operator and Slurm Workload Manager as workflow manager to ensure better performance. Taiwania 3 uses InfiniBand HDR100 100 Gbit/s high speed Internet connection to ensure better performance of the supercomputer. The main memory capability is 192 GB. There's currently two Intel Xeon Platinum 8280 2.4 GHz CPU inside each node. The full calculation capability is 2.7PFLOPS. It is launched into operation in November 2020 before schedule due to the needed for COVID-19. It is currently ranked number 227 on Top 500 list of June, 2021 and number 80 on Green 500 list. It is manufactured by Quanta Computer, Taiwan Fixed Network, and ASUS Cloud.

References

↑ "MVAPICH Home" . Retrieved 2021-04-14.
↑ "MVAPICH at NVIDIA Developer". May 2012. Retrieved 28 March 2016.
↑ "MVAPICH :: Overview" . Retrieved 2024-05-17.
↑ "MVAPICH 3.0 release changelog" . Retrieved 2024-05-17.
↑ "MVAPICH2-GDR 2.3.7 Changelog" . Retrieved 2024-05-17.

External links

This free and open-source software article is a stub. You can help Wikipedia by expanding it.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "MVAPICH Home" . Retrieved 2021-04-14.

[2] "MVAPICH at NVIDIA Developer". May 2012. Retrieved 28 March 2016.

[3] "MVAPICH :: Overview" . Retrieved 2024-05-17.

[4] "MVAPICH 3.0 release changelog" . Retrieved 2024-05-17.

[5] "MVAPICH2-GDR 2.3.7 Changelog" . Retrieved 2024-05-17.

[1]

[2]

[3]

[4]

[5]

v t e Parallel computing
General	Distributed computing Parallel computing Massively parallel Cloud computing High-performance computing Multiprocessing Manycore processor GPGPU Computer network Systolic array
Levels	Bit Instruction Thread Task Data Memory Loop Pipeline
Multithreading	Temporal Simultaneous (SMT) Simultaneous and heterogenous Speculative (SpMT) Preemptive Cooperative Clustered multi-thread (CMT) Hardware scout
Theory	PRAM model PEM model Analysis of parallel algorithms Amdahl's law Gustafson's law Cost efficiency Karp–Flatt metric Slowdown Speedup
Elements	Process Thread Fiber Instruction window Array
Coordination	Multiprocessing Memory coherence Cache coherence Cache invalidation Barrier Synchronization Application checkpointing
Programming	Stream processing Dataflow programming Models Implicit parallelism Explicit parallelism Concurrency Non-blocking algorithm
Hardware	Flynn's taxonomy SISD SIMD Array processing (SIMT) Pipelined processing Associative processing MISD MIMD Dataflow architecture Pipelined processor Superscalar processor Vector processor Multiprocessor symmetric asymmetric Memory shared distributed distributed shared UMA NUMA COMA Massively parallel computer Computer cluster Beowulf cluster Grid computer Hardware acceleration
APIs	Ateji PX Boost Chapel HPX Charm++ Cilk Coarray Fortran CUDA Dryad C++ AMP Global Arrays GPUOpen MPI OpenMP OpenCL OpenHMPP OpenACC Parallel Extensions PVM pthreads RaftLib ROCm UPC TBB ZPL
Problems	Automatic parallelization Deadlock Deterministic algorithm Embarrassingly parallel Parallel slowdown Race condition Software lockout Scalability Starvation
Category: Parallel computing

MVAPICH

Contents

See also

Related Research Articles

References

External links


Stable release	3.0 / February 16, 2024;3 months ago (2024-02-16)

Operating system	Unix, Linux
Type	Library
License	New BSD License (free software)
Website	mvapich.cse.ohio-state.edu , mug.mvapich.cse.ohio-state.edu