Math Kernel Library

Last updated
Intel oneAPI Math Kernel Library
Developer(s) Intel
Initial releaseNovember 1994;29 years ago (1994-11)
Stable release
2024.2 / June 14, 2024;2 months ago (2024-06-14) [1]
Written in C/C++, DPC++, Fortran
Operating system Microsoft Windows, Linux
Platform CPU [2]

GPU

Type Library and framework
License freeware under ISSL [3] [4]
Website www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl.html   OOjs UI icon edit-ltr-progressive.svg

Intel oneAPI Math Kernel Library (Intel oneMKL) , formerly known as Intel Math Kernel Library, is a library of optimized math routines for science, engineering, and financial applications. Core math functions include BLAS, LAPACK, ScaLAPACK, sparse solvers, fast Fourier transforms, and vector math. [5] [6]

Contents

The library supports x86 CPUs and Intel GPUs [2] and is available for Windows and Linux operating systems. [5] [6] [7]

Intel oneAPI Math Kernel Library is not to be confused with oneMKL Interfaces, an open-source wrapper library that allows DPC++ applications to call oneMKL routines that can be offloaded to multiple hardware architectures and vendors defined during runtime. [8]

History and licensing

Intel launched the oneAPI Math Kernel Library on November, 1994, and called it Intel BLAS Library. [9] In 1996, the library was renamed to Intel Math Kernel Library until April 2020, when intel oneMKL has become part of oneAPI initiative to support multiple hardware architectures, holding the current name Intel oneAPI Math Kernel Library.

The library is available as part of oneAPI Toolkits and in a standalone form, free of charge under the terms of Intel Simplified Software License [3] which allow redistribution. [10] Commercial support for Intel oneMKL is available when purchased as part of oneAPI Base Toolkit.

Following Apple’s transition away from x86 CPUs, Intel oneMKL last release available for macOS is the version 2023.2.2 and it is scheduled for removal by the end of 2024.

Performance and vendor lock-in

MKL and other programs generated by the Intel C++ Compiler and the Intel DPC++ Compiler improve performance with a technique called function multi-versioning: a function is compiled or written for many of the x86 instruction set extensions, and at run-time a "master function" uses the CPUID instruction to select a version most appropriate for the current CPU. However, as long as the master function detects a non-Intel CPU, it almost always chooses the most basic (and slowest) function to use, regardless of what instruction sets the CPU claims to support. This has netted the system a nickname of "cripple AMD" routine since 2009. [11] As of 2020, Intel's MKL remains the numeric library installed by default along with many pre-compiled mathematical applications on Windows (such as NumPy, SymPy). [12] [13] Although relying on the MKL, MATLAB implemented a workaround starting with Release 2020a which ensures full support for AVX2 by the MKL also for non Intel (AMD) CPUs. [14]

Details

Functional categories

Intel oneMKL has the following functional categories: [15]

Once, oneMKL included Deep Neural Network functions, but they were removed in version 2020 as a spin-off that originated the open-source Intel oneAPI Deep Neural Network Library. [16]

See also

Related Research Articles

<span class="mw-page-title-main">Single instruction, multiple data</span> Type of parallel processing

Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal and it can be directly accessible through an instruction set architecture (ISA), but it should not be confused with an ISA. SIMD describes computers with multiple processing elements that perform the same operation on multiple data points simultaneously.

SSE2 is one of the Intel SIMD processor supplementary instruction sets introduced by Intel with the initial version of the Pentium 4 in 2000. SSE2 instructions allow the use of XMM (SIMD) registers on x86 instruction set architecture processors. These registers can load up to 128 bits of data and perform instructions, such as vector addition and multiplication, simultaneously.

<span class="mw-page-title-main">LAPACK</span> Software library for numerical linear algebra

LAPACK is a standard software library for numerical linear algebra. It provides routines for solving systems of linear equations and linear least squares, eigenvalue problems, and singular value decomposition. It also includes routines to implement the associated matrix factorizations such as LU, QR, Cholesky and Schur decomposition. LAPACK was originally written in FORTRAN 77, but moved to Fortran 90 in version 3.2 (2008). The routines handle both real and complex matrices in both single and double precision. LAPACK relies on an underlying BLAS implementation to provide efficient and portable computational building blocks for its routines.

Basic Linear Algebra Subprograms (BLAS) is a specification that prescribes a set of low-level routines for performing common linear algebra operations such as vector addition, scalar multiplication, dot products, linear combinations, and matrix multiplication. They are the de facto standard low-level routines for linear algebra libraries; the routines have bindings for both C and Fortran. Although the BLAS specification is general, BLAS implementations are often optimized for speed on a particular machine, so using them can bring substantial performance benefits. BLAS implementations will take advantage of special floating point hardware such as vector registers or SIMD instructions.

AMD Core Math Library (ACML) is an end-of-life software development library released by AMD, replaced by many open source libraries, including AMD libm 4.0. This library provides mathematical routines optimized for AMD processors.

VTune Profiler is a performance analysis tool for x86-based machines running Linux or Microsoft Windows operating systems. Many features work on both Intel and AMD hardware, but the advanced hardware-based sampling features require an Intel-manufactured CPU.

Automatically Tuned Linear Algebra Software (ATLAS) is a software library for linear algebra. It provides a mature open source implementation of BLAS APIs for C and FORTRAN 77.

Intel oneAPI DPC++/C++ Compiler and Intel C++ Compiler Classic are Intel’s C, C++, SYCL, and Data Parallel C++ (DPC++) compilers for Intel processor-based systems, available for Windows, Linux, and macOS operating systems.

Intel Fortran Compiler, as part of Intel OneAPI HPC toolkit, is a group of Fortran compilers from Intel for Windows, macOS, and Linux.

Advanced Vector Extensions are SIMD extensions to the x86 instruction set architecture for microprocessors from Intel and Advanced Micro Devices (AMD). They were proposed by Intel in March 2008 and first supported by Intel with the Sandy Bridge microarchitecture shipping in Q1 2011 and later by AMD with the Bulldozer microarchitecture shipping in Q4 2011. AVX provides new features, new instructions, and a new coding scheme.

Intel Parallel Studio XE was a software development product developed by Intel that facilitated native code development on Windows, macOS and Linux in C++ and Fortran for parallel computing. Parallel programming enables software programs to take advantage of multi-core processors from Intel and other processor vendors.

IT++ is a C++ library of classes and functions for linear algebra, numerical optimization, signal processing, communications, and statistics. It is being developed by researchers in these areas and is widely used by researchers, both in the communications industry and universities. The IT++ library originates from the former Department of Information Theory at the Chalmers University of Technology, Gothenburg, Sweden.

In scientific computing, GotoBLAS and GotoBLAS2 are open source implementations of the BLAS API with many hand-crafted optimizations for specific processor types. GotoBLAS was developed by Kazushige Goto at the Texas Advanced Computing Center. As of 2003, it was used in seven of the world's ten fastest supercomputers.

OpenBLAS is an open-source implementation of the BLAS and LAPACK APIs with many hand-crafted optimizations for specific processor types. It is developed at the Lab of Parallel Software and Computational Science, ISCAS.

<span class="mw-page-title-main">ROCm</span> Parallel computing platform: GPGPU libraries and application programming interface

ROCm is an Advanced Micro Devices (AMD) software stack for graphics processing unit (GPU) programming. ROCm spans several domains: general-purpose computing on graphics processing units (GPGPU), high performance computing (HPC), heterogeneous computing. It offers several programming models: HIP, OpenMP, and OpenCL.

In scientific computing, BLIS is an open-source framework for implementing a superset of BLAS functionality for specific processor types that was awarded the J. H. Wilkinson Prize for Numerical Software in 2023. It exposes that functionality through two traditional Application Programming Interfaces (APIs): the BLAS interface and the CBLAS interface. BLIS also includes two APIs native to the framework: a typed (BLAS-like) API and an object API. These native interfaces provide access to BLAS-like functionality that is not supported by, but closely related to, operations found in the BLAS . The framework is developed and supported by the Science of High-Performance Computing (SHPC) group of the Oden Institute for Computational Engineering and Sciences at The University of Texas at Austin and the Matthews Research Group at Southern Methodist University.

oneAPI (compute acceleration) Open standard for parallel computing

oneAPI is an open standard, adopted by Intel, for a unified application programming interface (API) intended to be used across different computing accelerator (coprocessor) architectures, including GPUs, AI accelerators and field-programmable gate arrays. It is intended to eliminate the need for developers to maintain separate code bases, multiple programming languages, tools, and workflows for each architecture.

Agner Fog is a Danish evolutionary anthropologist and computer scientist. He is currently an associate professor of computer science at the Technical University of Denmark (DTU), and has been present at DTU since 1995. He is best known for coining the term "Regality Theory" and for writing extensive optimization manuals for machines running the x86 architecture.

References

  1. "Intel® Math Kernel Library Release Notes and New Features". software.intel.com.
  2. 1 2 Intel® oneAPI Math Kernel Library (oneMKL) | Intel® Software
  3. 1 2 "Intel Simplified Software License".
  4. "OneMKL — oneAPI Specification 1.1-rev-1 documentation".
  5. 1 2 "Intel Math Kernel Library".
  6. 1 2 "Intel Math Kernel Library (MKL)".
  7. "MKL - Intel Math Kernel Library". 23 April 2012.
  8. "oneapi-src/oneMKL". oneAPI-SRC. 19 March 2021. oneMKL interfaces are an open-source implementation of the oneMKL Data Parallel C++ (DPC++) interface according to the oneMKL specification. It works with multiple devices (backends) using device-specific libraries underneath.
  9. "Intel Math Kernel Library, Reference Manual, Version Information" (PDF). c. 2004. p. ii. Retrieved July 25, 2024.
  10. "Intel Math Kernel Library Licensing FAQ".
  11. Agner Fog. "Agner's CPU blog - Intel's "cripple AMD" function".
  12. "Comment chain in: r/matlab - How-To force Matlab to use a fast codepath on AMD Ryzen/TR CPUs - up to 250% performance gains". reddit. Retrieved 2020-06-06.
  13. "High-Performance Computing Center Stuttgart - Knowledge Base - Libraries(Hawk)" . Retrieved 2020-06-06.
  14. "Crippled No Longer: Matlab Now Runs on AMD CPUs at Full Speed - ExtremeTech". www.extremetech.com. Retrieved 2020-10-29.
  15. admin (2019-11-14). "Developer Reference for Intel® Math Kernel Library - C". software.intel.com. Retrieved 2019-11-27.
  16. "Transitioning from Intel MKL-DNN to oneDNN". Intel. Retrieved 25 July 2024.