MLIR (software)

Last updated
MLIR
Developer(s) LLVM Developer Group
Written in C++
Operating system Cross-platform
Type Compiler
Website mlir.llvm.org

MLIR is a unifying software framework for compiler development. [1] MLIR can make optimal use of a variety of computing platforms such as GPUs, DPUs, TPUs, FPGAs, AI ASICS, and quantum computing systems (QPUs). [2]

Contents

MLIR is a sub-project of the LLVM Compiler Infrastructure project and aims to build a "reusable and extensible compiler infrastructure (..) and aid in connecting existing compilers together." [3] [4] [5]

Name

The name of the project stands for Multi-Level Intermediate Representation, in which the multi-level keyword refers to the possibility of defining multiple dialects and progressive conversions towards machine code. This capability enables MLIR to retain information at a higher level of abstraction and perform more accurate analyses and transformations, which otherwise would have to deal with lower level representations. [6]

Dialects

Operations represent the core element around which dialects are built. They are identified by a name – that must be unique within the dialect they belong to – and have optional operands, results, attributes and regions. Operands and results adhere to the Static Single Assignment form. Each result also has an associated type. Attributes represent compile-time knowledge (e.g., constant values). Regions consist in a list of blocks, each of which may have input arguments and contain a list of operations. [7] Despite the dialects being designed around the SSA form, PHI nodes are not part of this design and are instead replaced by the input arguments of blocks, in combination with operands of control-flow operations. [8]

The general syntax for an operation is the following:

%res:2="mydialect.morph"(%input#3)({^bb0(%arg0:!mydialect<"custom_type">loc("mysource.cc":10:8)):// nested operations}){some.attribute=true,other_attribute=1.5}:(!mydialect<"custom_type">)->(!mydialect<"other_type">,!mydialect<"other_type">)loc(callsite("foo"at"mysource.cc":10:8))

The example shows an operation that is named morph, belongs to the mydialect dialect, takes one input operand and produces two results. The input argument has an associated type named custom_type and the results both have type other_type, with both the types belonging again to the mydialect dialect. The operation also has two associated attributes – named some.attribute and other_attribute – and a region containing one block. Finally, with keyword loc a locations are attached for debugging purposes. [9]

The syntax of operations, types and attributes can also be customized according to the user preferences by implementing proper parsing and printing functions within the operation definition. [10]

Core dialects

The MLIR dialects ecosystem is open and extensible, meaning that end-users are free to create new dialects capturing the semantics they need. Still, the codebase of MLIR already makes various kinds of dialects available to end-users. Each of them aims to address a specific aspect that often manifests within intermediate representations, but does it in a self-contained way. For example, the arith dialect holds simple mathematical operations on integer and floating point values, while the memref dialect holds the operations for memory management. [11]

The following code defines a function that takes two floating point matrices and performs the sum between the values at the same positions:

func.func@matrix_add(%arg0:memref<10x20xf32>, %arg1:memref<10x20xf32>) -> memref<10x20xf32> {     %result = memref.alloc() : memref<10x20xf32>affine.for%i = 0to10 {   affine.for%j = 0to20 {    %lhs = memref.load%arg0[%i, %j] : memref<10x20xf32>%rhs = memref.load%arg1[%i, %j] : memref<10x20xf32>%sum = arith.addf%lhs, %rhs : f32memref.store%sum, %result[%i, %j] : memref<10x20xf32>   }  }          func.return%result : memref<10x20xf32> } 

Different dialects may be used to achieve the same result, and each of them may imply different levels of abstraction. In this example, the affine dialect has been chosen to reuse existing analyses and optimizations for polyhedral compilation. [11]

One relevant core dialect is LLVM. Its purpose is to provide a one-to-one map of LLVM-IR – the intermediate representation used by LLVM – in order to enable the reuse all of its middle-end and backend transformations, including machine code generation. [12]

Operation definition specification

The operations of a dialect can be defined using the C++ language, but also in a more convenient and robust way by using the Operation definition specification (ODS). [13] By using TableGen, the C++ code for declarations and definitions can be then automatically generated. [14]

The autogenerated code can include parsing and printing methods – which are based on a simple string mapping the structure of desired textual representation – together with all the boilerplate code for accessing fields and perform common actions such verification of the semantics of each operation, canonicalization or folding. [15]

The same declaration mechanism can be used also for types and attributes, which are the other two categories of elements constituting a dialect. [15]

The following example illustrates how to specify the assembly format of an operation expecting a variadic number of operands and producing zero results. The textual representation consists in the optional list of attributes, followed by the optional list of operands, a colon, and types of the operands. [13]

letassemblyFormat="attr-dict ($operands^ `:` type($operands))?";

Transformations

Transformations can always be performed directly on the IR, without having to rely on built-in coordination mechanisms. However, in order to ease both implementation and maintenance, MLIR provides an infrastructure for IR rewriting that is composed by different rewrite drivers. Each driver receives a set of objects named patterns, each of which has its own internal logic to match operations with certain properties. When an operation is matched, the rewrite process is performed and the IR is modified according to the logic within the pattern. [16]

Dialect Conversion Driver

This driver operates according to the legality of existing operations, meaning that the driver receives a set of rules determining which operations have to be considered illegal and expects the patterns to match and convert them into legal ones. The logic behind those rules can be arbitrarily complex: it may be based just on the dialect to which the operations belong, but can also inspect more specific properties such as attributes or nested operations. [17]

As the names suggests, this driver is typically used for converting the operations of a dialect into operations belonging to a different one. In this scenario, the whole source dialect would be marked as illegal, the destination one as legal, and patterns for the source dialect operations would be provided. The dialect conversion framework also provides support for type conversion, which has to be performed on operands and results to convert them to the type system of the destination dialect. [17]

MLIR allows for multiple conversion paths to be taken. Considering the example about the sum of matrices, a possible lowering strategy may be to generate for-loops belonging to the scf dialect, obtaining code to be executed on CPUs:

#map = affine_map<(d0, d1) -> (d0, d1)>module {     func.func@avg(%arg0:memref<10x20xf32>, %arg1:memref<10x20xf32>) -> memref<10x20xf32> {         %alloc = memref.alloc() : memref<10x20xf32>%c0 = arith.constant0 : index%c10 = arith.constant10 : index%c1 = arith.constant1 : indexscf.for%arg2 = %c0to%c10step%c1 {             %c0_0 = arith.constant0 : index%c20 = arith.constant20 : index%c1_1 = arith.constant1 : indexscf.for%arg3 = %c0_0to%c20step%c1_1 {                 %0 = memref.load%arg0[%arg2, %arg3] : memref<10x20xf32>%1 = memref.load%arg1[%arg2, %arg3] : memref<10x20xf32>%2 = arith.addf%0, %1 : f32memref.store%2, %alloc[%arg2, %arg3] : memref<10x20xf32>             }         }                  return%alloc : memref<10x20xf32>     } } 

Another possible strategy, however, could have been to use the gpu dialect to generate code for GPUs:

#map = affine_map<(d0, d1) -> (d0, d1)>module {     func.func@avg(%arg0:memref<10x20xf32>, %arg1:memref<10x20xf32>) -> memref<10x20xf32> {         %alloc = memref.alloc() : memref<10x20xf32>%c0 = arith.constant0 : index%c10 = arith.constant10 : index%0 = arith.subi%c10, %c0 : index%c1 = arith.constant1 : index%c0_0 = arith.constant0 : index%c20 = arith.constant20 : index%1 = arith.subi%c20, %c0_0 : index%c1_1 = arith.constant1 : index%c1_2 = arith.constant1 : indexgpu.launchblocks(%arg2, %arg3, %arg4) in (%arg8 = %0, %arg9 = %c1_2, %arg10 = %c1_2) threads(%arg5, %arg6, %arg7) in (%arg11 = %1, %arg12 = %c1_2, %arg13 = %c1_2) {             %2 = arith.addi%c0, %arg2 : index%3 = arith.addi%c0_0, %arg5 : index%4 = memref.load%arg0[%2, %3] : memref<10x20xf32>%5 = memref.load%arg1[%2, %3] : memref<10x20xf32>%6 = arith.addf%4, %5 : f32memref.store%4, %alloc[%2, %3] : memref<10x20xf32>gpu.terminator         }                  return%alloc : memref<10x20xf32>     } } 

Greedy Pattern Rewrite Driver

The driver greedily applies the provided patterns according to their benefit, until a fixed point is reached or the maximum number of iterations is reached. The benefit of a pattern is self-attributed. In case of equalities, the relative order within the patterns list is used. [16]

Traits and Interfaces

MLIR allows to apply existing optimizations (e.g., common subexpression elimination, loop-invariant code motion) on custom dialects by means of traits and interfaces. These two mechanisms enable transformation passes to operate on operations without knowing their actual implementation, relying only on some properties that traits or interfaces provide. [18] [19]

Traits are meant to be attached to operations without requiring any additional implementation. Their purpose is to indicate that the operation satisfies certain properties (e.g. having exactly two operands). [18] Interfaces, instead, represent a more powerful tool through which the operation can be queried about some specific aspect, whose value may change between instances of the same kind of operation. An example of interface is the representation of memory effects: each operation that operates on memory may have such interface attached, but the actual effects may depend on the actual operands (e.g., a function call with arguments possibly being constants or references to memory). [19]

Applications

The freedom in modeling intermediate representations enables MLIR to be used in a wide range of scenarios. This includes traditional programming languages, [20] but also high-level synthesis, [21] [22] quantum computing [23] and homomorphic encryption. [24] [25] [26] Machine learning applications also take advantage of built-in polyhedral compilation techniques, together with dialects targeting accelerators and other heterogeneous systems. [27] [28] [29] [30] [31]

See also

Related Research Articles

<span class="mw-page-title-main">Machine code</span> Set of instructions executed by a computer

In computer programming, machine code is computer code consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). Although decimal computers were once common, the contemporary marketplace is dominated by binary computers; for those computers, machine code is "the binary representation of a computer program which is actually read and interpreted by the computer. A program in machine code consists of a sequence of machine instructions ."

In compiler design, static single assignment form is a property of an intermediate representation (IR) that requires each variable to be assigned exactly once and defined before it is used. Existing variables in the original IR are split into versions, new variables typically indicated by the original name with a subscript in textbooks, so that every definition gets its own version. In SSA form, use-def chains are explicit and each contains a single element.

In computer science, computer engineering and programming language implementations, a stack machine is a computer processor or a virtual machine in which the primary interaction is moving short-lived temporary values to and from a push down stack. In the case of a hardware processor, a hardware stack is used. The use of a stack significantly reduces the required number of processor registers. Stack machines extend push-down automata with additional load/store operations or multiple stacks and hence are Turing-complete.

<span class="mw-page-title-main">LLVM</span> Compiler backend for multiple programming languages

LLVM is a set of compiler and toolchain technologies that can be used to develop a frontend for any programming language and a backend for any instruction set architecture. LLVM is designed around a language-independent intermediate representation (IR) that serves as a portable, high-level assembly language that can be optimized with a variety of transformations over multiple passes. The name LLVM originally stood for Low Level Virtual Machine, though the project has expanded and the name is no longer officially an initialism.

Treelang is a "toy" programming language distributed with the GNU Compiler Collection (GCC) to demonstrate the features of its code-generation backend. It was developed by Tim Josling, based on a language called Toy created by Richard Kenner. During the GCC 4.3 release cycle, a patch was committed to remove the language, because of high maintenance costs outweighing its benefits and also because it was no longer considered a good front-end example by GCC developers.

In computer programming, an inline assembler is a feature of some compilers that allows low-level code written in assembly language to be embedded within a program, among code that otherwise has been compiled from a higher-level language such as C or Ada.

An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good" IR must be accurate – capable of representing the source code without loss of information – and independent of any particular source or target language. An IR may take one of several forms: an in-memory data structure, or a special tuple- or stack-based code readable by the program. In the latter case it is also called an intermediate language.

The computer programming languages C and Pascal have similar times of origin, influences, and purposes. Both were used to design their own compilers early in their lifetimes. The original Pascal definition appeared in 1969 and a first compiler in 1970. The first version of C appeared in 1972.

typedef is a reserved keyword in the programming languages C, C++, and Objective-C. It is used to create an additional name (alias) for another data type, but does not create a new type, except in the obscure case of a qualified typedef of an array type where the typedef qualifiers are transferred to the array element type. As such, it is often used to simplify the syntax of declaring complex data structures consisting of struct and union types, although it is also commonly used to provide specific descriptive type names for integer data types of varying sizes.

C++11 is a version of the ISO/IEC 14882 standard for the C++ programming language. C++11 replaced the prior version of the C++ standard, called C++03, and was later replaced by C++14. The name follows the tradition of naming language versions by the publication year of the specification, though it was formerly named C++0x because it was expected to be published before 2010.

stdarg.h is a header in the C standard library of the C programming language that allows functions to accept an indefinite number of arguments. It provides facilities for stepping through a list of function arguments of unknown number and type. C++ provides this functionality in the header cstdarg.

<span class="mw-page-title-main">CUDA</span> Parallel computing platform and programming model

Compute Unified Device Architecture (CUDA) is a parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for accelerated general-purpose processing, an approach called general-purpose computing on GPUs (GPGPU). CUDA API and its runtime: The CUDA API is an extension of the C programming language that adds the ability to specify thread-level parallelism in C and also to specify GPU device specific operations (like moving data between the CPU and the GPU). CUDA is a software layer that gives direct access to the GPU's virtual instruction set and parallel computational elements for the execution of compute kernels. In addition to drivers and runtime kernels, the CUDA platform includes compilers, libraries and developer tools to help programmers accelerate their applications.

Christopher Arthur Lattner is an American computer scientist, former Apple, Google, and Tesla employee and co-founder of LLVM, Clang compiler, MLIR compiler infrastructure and the Swift programming language. He worked as the President of Platform Engineering, SiFive after two years at Google Brain. Prior to that, he briefly served as Vice President of Autopilot Software at Tesla, Inc. and worked at Apple Inc. as Senior Director of the Developer Tools department, leading the Xcode, Instruments, and compiler teams.

In computer programming, variadic templates are templates that take a variable number of arguments.

The FMA instruction set is an extension to the 128 and 256-bit Streaming SIMD Extensions instructions in the x86 microprocessor instruction set to perform fused multiply–add (FMA) operations. There are two variants:

<span class="mw-page-title-main">Rust (programming language)</span> General-purpose programming language

Rust is a multi-paradigm, general-purpose programming language that emphasizes performance, type safety, and concurrency. It enforces memory safety—meaning that all references point to valid memory—without a garbage collector. To simultaneously enforce memory safety and prevent data races, its "borrow checker" tracks the object lifetime of all references in a program during compilation. Rust was influenced by ideas from functional programming, including immutability, higher-order functions, and algebraic data types. It is popular for systems programming.

C++ Accelerated Massive Parallelism is a native programming model that contains elements that span the C++ programming language and its runtime library. It provides an easy way to write programs that compile and execute on data-parallel hardware, such as graphics cards (GPUs).

Objective-C is a high-level general-purpose, object-oriented programming language that adds Smalltalk-style messaging to the C programming language. Originally developed by Brad Cox and Tom Love in the early 1980s, it was selected by NeXT for its NeXTSTEP operating system. Due to Apple macOS’s direct lineage from NeXTSTEP, Objective-C was the standard programming language used, supported, and promoted by Apple for developing macOS and iOS applications until the introduction of the Swift programming language in 2014.

Swift is a high-level general-purpose, multi-paradigm, compiled programming language developed by Apple Inc. and the open-source community. Swift compiles to machine code, as it is an LLVM-based compiler. Swift was first released in June 2014, and the Swift toolchain has shipped in Xcode since version 6, released in 2014.

<span class="mw-page-title-main">ROCm</span> Parallel computing platform: GPGPU libraries and application programming interface

ROCm is an Advanced Micro Devices (AMD) software stack for graphics processing unit (GPU) programming. ROCm spans several domains: general-purpose computing on graphics processing units (GPGPU), high performance computing (HPC), heterogeneous computing. It offers several programming models: HIP, OpenMP/Message Passing Interface (MPI), and OpenCL.

References

  1. Aho, Alfred V.; Sethi, Ravi; Ullman, Jeffrey D. (2002). Compilers: principles, techniques, and tools. Addison-Wesley series in computer science (Reprinted, with corr., [36. Druck] ed.). Reading, Mass.: Addison-Wesley. ISBN   978-0-201-10088-4.
  2. "Why Mojo". docs.modular.com. Modular Inc. 2023. Retrieved 2023-08-28. MLIR's strength is its ability to build domain specific compilers, particularly for weird domains that aren't traditional CPUs and GPUs, such as AI ASICS, quantum computing systems, FPGAs, and custom silicon.
  3. "The LLVM Compiler Infrastructure". LLVM. Retrieved 2023-10-01. The MLIR subproject is a novel approach to building reusable and extensible compiler infrastructure. MLIR aims to address software fragmentation, improve compilation for heterogeneous hardware, significantly reduce the cost of building domain specific compilers, and aid in connecting existing compilers together.
  4. Lattner, Chris; Amini, Mehdi; Bondhugula, Uday; Cohen, Albert; Davis, Andy; Pienaar, Jacques; Riddle, River; Shpeisman, Tatiana; Vasilache, Nicolas; Zinenko, Oleksandr (2021). "MLIR: Scaling Compiler Infrastructure for Domain Specific Computation". 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). pp. 2–14. doi:10.1109/CGO51591.2021.9370308. ISBN   978-1-7281-8613-9.
  5. Mernik, Marjan; Heering, Jan; Sloane, Anthony M. (December 2005). "When and how to develop domain-specific languages". ACM Computing Surveys. 37 (4): 316–344. doi:10.1145/1118890.1118892. ISSN   0360-0300. S2CID   207158373.
  6. Seidl, Helmut; Wilhelm, Reinhard; Hack, Sebastian (2012). Compiler design: analysis and transformation. Berlin New York: Springer. ISBN   978-3-642-17548-0.
  7. "MLIR Language Reference - MLIR". mlir.llvm.org. Retrieved 2023-07-05.
  8. "MLIR Rationale - MLIR". mlir.llvm.org. Retrieved 2023-07-05.
  9. Mehdi, Amini; River, Riddle. "MLIR Tutorial" (PDF).
  10. Stroustrup, Bjarne (2015). The C++ programming language: C++ 11 (4. ed., 4. print ed.). Upper Saddle River, NJ: Addison-Wesley. ISBN   978-0-321-56384-2.
  11. 1 2 "Dialects - MLIR". mlir.llvm.org. Retrieved 2023-07-07.
  12. "LLVM Language Reference Manual — LLVM 17.0.0git documentation". llvm.org. Retrieved 2023-07-05.
  13. 1 2 "Operation Definition Specification (ODS) - MLIR". mlir.llvm.org. Retrieved 2023-07-05.
  14. "TableGen Overview — LLVM 17.0.0git documentation". llvm.org. Retrieved 2023-07-05.
  15. 1 2 "Defining Dialects - MLIR". mlir.llvm.org. Retrieved 2023-07-07.
  16. 1 2 "Pattern Rewriting : Generic DAG-to-DAG Rewriting - MLIR". mlir.llvm.org. Retrieved 2023-07-06.
  17. 1 2 "Dialect Conversion - MLIR". mlir.llvm.org. Retrieved 2023-07-06.
  18. 1 2 "Traits - MLIR". mlir.llvm.org. Retrieved 2023-07-05.
  19. 1 2 "Interfaces - MLIR". mlir.llvm.org. Retrieved 2023-07-05.
  20. Moses, William S.; Chelini, Lorenzo; Zhao, Ruizhe; Zinenko, Oleksandr (2021). Polygeist: Raising C to Polyhedral MLIR. 30th International Conference on Parallel Architectures and Compilation Techniques (PACT). pp. 45–59. doi:10.1109/PACT52795.2021.00011. ISBN   978-1-6654-4278-7.
  21. Agostini, Nicolas Bohm; Curzel, Serena; Amatya, Vinay; Tan, Cheng; Minutoli, Marco; Castellana, Vito Giovanni; Manzano, Joseph; Kaeli, David; Tumeo, Antonino (2022-10-30). "An MLIR-based Compiler Flow for System-Level Design and Hardware Acceleration". Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design. Association for Computing Machinery. pp. 1–9. doi:10.1145/3508352.3549424. ISBN   978-1-4503-9217-4.
  22. Ruizhe, Zhao; Jianyi, Cheng (2021). "Phism: Polyhedral High-Level Synthesis in MLIR". arXiv: 2103.15103 [cs.PL].
  23. McCaskey, Alexander; Nguyen, Thien (October 2021). "A MLIR Dialect for Quantum Assembly Languages". 2021 IEEE International Conference on Quantum Computing and Engineering (QCE). IEEE. pp. 255–264. arXiv: 2101.11365 . doi:10.1109/QCE52317.2021.00043. ISBN   978-1-6654-1691-7. S2CID   231718965.
  24. Park, Sunjae; Song, Woosung; Nam, Seunghyeon; Kim, Hyeongyu; Shin, Junbum; Lee, Juneyoung (2023-06-06). "HEaaN.MLIR: An Optimizing Compiler for Fast Ring-Based Homomorphic Encryption". Proceedings of the ACM on Programming Languages. 7 (PLDI): 196–220. doi: 10.1145/3591228 . ISSN   2475-1421.
  25. Govindarajan, Sanath; Moses, William S. "SyFER-MLIR: Integrating Fully Homomorphic Encryption Into the MLIR Compiler Framework" (PDF).
  26. "HEIR: Homomorphic Encryption Intermediate Representation". GitHub . Retrieved 2023-09-05.
  27. Jin, Tian; Bercea, Gheorghe-Teodor; Le, Tung D.; Chen, Tong; Su, Gong; Imai, Haruki; Negishi, Yasushi; Leu, Anh; O'Brien, Kevin; Kawachiya, Kiyokuni; Eichenberger, Alexandre E. (2020). "Compiling ONNX Neural Network Models Using MLIR". arXiv: 2008.08272 [cs.PL].
  28. Pienaar, Jacques (2020), MLIR in TensorFlow Ecosystem , retrieved 2023-07-06
  29. Hu, Pengchao; Lu, Man; Wang, Lei; Jiang, Guoyue (2022). "TPU-MLIR: A Compiler For TPU Using MLIR". arXiv: 2210.15016 [cs.PL].
  30. Katel, Navdeep; Khandelwal, Vivek; Bondhugula, Uday (2022-03-19). "MLIR-based code generation for GPU tensor cores". Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction. ACM. pp. 117–128. doi:10.1145/3497776.3517770. ISBN   978-1-4503-9183-2. S2CID   247522110.
  31. Bik, Aart; Koanantakool, Penporn; Shpeisman, Tatiana; Vasilache, Nicolas; Zheng, Bixia; Kjolstad, Fredrik (2022-12-31). "Compiler Support for Sparse Tensor Computations in MLIR". ACM Transactions on Architecture and Code Optimization. 19 (4): 1–25. arXiv: 2202.04305 . doi:10.1145/3544559. ISSN   1544-3566. S2CID   246680261.