Collective Tuning Initiative

Last updated February 11, 2024

The Collective Tuning Initiative is a community-driven initiative started by Grigori Fursin to develop free and open-source research tools with a unified API for collaborative characterization, optimization and co-design of computer systems. They enable sharing of benchmarks, data sets and optimization cases from the community in the Collective Optimization Database through unified web services to predict better optimizations or architecture designs (provided there is enough information collected in the repository from multiple users).^[1]^[2] Using common research-and-development tools should help to improve the quality and reproducibility of computer systems' research and development and accelerate innovation in this area. This approach helped establish Reproducibility Initiatives and Artifact Evaluation at several ACM-sponsored conferences to encourage sharing of artifacts and validation of experimental results from accepted papers.^[3]

Collective Optimization Database: an open repository to share optimization cases from the community, provide web services and plugins to analyze collected performance numbers and predict compiler optimizations to speed up applications based on statistical and machine-learning techniques
Machine learning-based program optimization predictor: web-service that suggests optimization-improving factors such as execution time, code size and compilation time, based on similarities between programs (program features)
Continuous Collective Compilation Framework: automates and distributes exploration of large optimization spaces and compiler auto-tuning across multiple users
Interactive Compilation Interface : converts production compilers into interactive research toolsets using an event-driven plugin system to avoid the development of new research compilers from scratch
Collective benchmark with multiple data sets: enables realistic benchmarking and research on iterative compilation and run-time adaptation.
Universal Adaptation Framework: Enables run-time adaptation and optimization of statically-compiled programs for heterogeneous, multi-core computer architectures.

All above tools became a part of the (Collective Knowledge framework) released in 2015.

Collective Optimization Database

The Collective Optimization Database is an open repository to enable sharing of benchmarks, data sets and optimization cases from the community, provide web services and plugins to analyze optimization data and predict program transformations or better hardware designs for multi-objective optimizations based on statistical and machine learning techniques provided there is enough information collected in the repository from multiple users.^[4]

Functionality

The Collective Optimization Database is also intended to improve the quality and reproducibility of the research on code and architecture design, characterization and optimization. It includes an online machine learning-based program optimization predictor ^[5] that can suggest profitable optimizations to improve program execution time, code size, or compilation time, based on similarities between programs. The Collective Optimization Database is an important part of the Collective Tuning Initiative^[1]^[2] which is developing open-source R&D tools for collaborative and reproducible computing systems research.

Related Research Articles

In computing, a compiler is a computer program that translates computer code written in one programming language into another language. The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a low-level programming language to create an executable program.

The GNU Compiler Collection (GCC) is an optimizing compiler produced by the GNU Project supporting various programming languages, hardware architectures and operating systems. The Free Software Foundation (FSF) distributes GCC as free software under the GNU General Public License. GCC is a key component of the GNU toolchain and the standard compiler for most projects related to GNU and the Linux kernel. With roughly 15 million lines of code in 2019, GCC is one of the biggest free programs in existence. It has played an important role in the growth of free software, as both a tool and an example.

An integrated development environment (IDE) is a software application that provides comprehensive facilities for software development. An IDE normally consists of at least a source-code editor, build automation tools, and a debugger. Some IDEs, such as IntelliJ IDEA, Eclipse and Lazarus contain the necessary compiler, interpreter or both; others, such as SharpDevelop and NetBeans, do not.

Wolfram Mathematica is a software system with built-in libraries for several areas of technical computing that allow machine learning, statistics, symbolic computation, data manipulation, network analysis, time series analysis, NLP, optimization, plotting functions and various types of data, implementation of algorithms, creation of user interfaces, and interfacing with programs written in other programming languages. It was conceived by Stephen Wolfram, and is developed by Wolfram Research of Champaign, Illinois. The Wolfram Language is the programming language used in Mathematica. Mathematica 1.0 was released on June 23, 1988 in Champaign, Illinois and Santa Clara, California.

In computing, just-in-time (JIT) compilation is compilation during execution of a program rather than before execution. This may consist of source code translation but is more commonly bytecode translation to machine code, which is then executed directly. A system implementing a JIT compiler typically continuously analyses the code being executed and identifies parts of the code where the speedup gained from compilation or recompilation would outweigh the overhead of compiling that code.

In computer science, program optimization, code optimization, or software optimization is the process of modifying a software system to make some aspect of it work more efficiently or use fewer resources. In general, a computer program may be optimized so that it executes more rapidly, or to make it capable of operating with less memory storage or other resources, or draw less power.

Maven is a build automation tool used primarily for Java projects. Maven can also be used to build and manage projects written in C#, Ruby, Scala, and other languages. The Maven project is hosted by The Apache Software Foundation, where it was formerly part of the Jakarta Project.

In computing, a benchmark is the act of running a computer program, a set of programs, or other operations, in order to assess the relative performance of an object, normally by running a number of standard tests and trials against it.

The Interactive Compilation Interface (ICI) is a plugin system with a high-level compiler-independent and low-level compiler-dependent API to transform production compilers into interactive research toolsets. It was developed by Grigori Fursin during the MILEPOST project. The ICI framework acts as a "middleware" interface between the compiler and the user-definable plugins. It opens up and reuses the production-quality compiler infrastructure to enable program analysis and instrumentation, fine-grain program optimizations, simple prototyping of new development and research ideas while avoiding building new compilation tools from scratch. For example, it is used in MILEPOST GCC to automate compiler and architecture design and program optimizations based on statistical analysis and machine learning, and predict profitable optimization to improve program execution time, code size and compilation time.

MILEPOST GCC is a free, community-driven, open-source, adaptive, self-tuning compiler that combines stable production-quality GCC, Interactive Compilation Interface and machine learning plugins to adapt to any given architecture and program automatically and predict profitable optimizations to improve program execution time, code size and compilation time. It is currently used and supported by academia and industry and is intended to open up research opportunities to automate compiler and architecture design and optimization.

sbt is an open-source build tool created explicitly for Scala and Java projects. It aims to streamline the procedure of constructing, compiling, testing, and packaging applications, libraries, and frameworks. sbt is highly adaptable, permitting developers to customize the build process according to their project's specific needs.

Dart is a programming language designed by Lars Bak and Kasper Lund and developed by Google. It can be used to develop web and mobile apps as well as server and desktop applications.

The following is provided as an overview of and topical guide to databases:

GraalVM is a Java Development Kit (JDK) written in Java. The open-source distribution of GraalVM is based on OpenJDK, and the enterprise distribution is based on Oracle JDK. As well as just-in-time (JIT) compilation, GraalVM can compile a Java application ahead of time. This allows for faster initialization, greater runtime performance, and decreased resource consumption, but the resulting executable can only run on the platform it was compiled for. It provides additional programming languages and execution modes. The first production-ready release, GraalVM 19.0, was distributed in May 2019. The most recent release is GraalVM for JDK 21, made available in September 2023.

Bazel is a free and open-source software tool used for the automation of building and testing software. Google uses the build tool Blaze internally and released an open-sourced port of the Blaze tool as Bazel, named as an anagram of Blaze. Bazel was first released in March 2015 and was in beta status by September 2015. Version 1.0 was released in October 2019.

The Collective Knowledge (CK) project is an open-source framework and repository to enable collaborative, reproducible and sustainable research and development of complex computational systems. CK is a small, portable, customizable and decentralized infrastructure helping researchers and practitioners:

The cTuning Foundation is a global non-profit organization developing a common methodology and open-source tools to support sustainable, collaborative and reproducible research in Computer science and organize and automate artifact evaluation and reproducibility inititiaves at machine learning and systems conferences and journals.

Grigori Fursin is a British computer scientist, president of the non-profit CTuning foundation, founding member of MLCommons, co-chair of the MLCommons Task Force on Automation and Reproducibility and founder of cKnowledge. His research group created open-source machine learning based self-optimizing compiler, MILEPOST GCC, considered to be the first in the world. At the end of the MILEPOST project he established cTuning foundation to crowdsource program optimisation and machine learning across diverse devices provided by volunteers. His foundation also developed Collective Knowledge Framework to support open research. Since 2015 Fursin leads Artifact Evaluation at several ACM and IEEE computer systems conferences. He is also a founding member of the ACM taskforce on Data, Software, and Reproducibility in Publication.

Quarkus is a Java framework tailored for deployment on Kubernetes. Key technology components surrounding it are OpenJDK HotSpot and GraalVM. The goal of Quarkus is to make Java a leading platform in Kubernetes and serverless environments while offering developers a unified reactive and imperative programming model to optimally address a wider range of distributed application architectures.

Collective Mind (CM) is collection of portable, extensible and ready-to-use automation recipes with a human-friendly interface to make it easier to compose, benchmark and optimize complex AI, ML and other applications and systems across diverse and continuously changing models, data sets, software and hardware.

References

1 2 Grigori Fursin. Collective Tuning Initiative: automating and accelerating development and optimization of computing systems. Proceedings of the GCC Summit'09, Montreal, Canada, June 2009 (link)
1 2 Rethinking code optimization for mobile and multicore, InfoWorld, July 2009 (link)
↑ Artifact Evaluation for computer systems' conferences
↑ Grigori Fursin and Olivier Temam. Collective optimization. Proceedings of the International Conference on High Performance Embedded Architectures & Compilers (HiPEAC 2009), Paphos, Cyprus, January 2009 (link)
↑ Collective Knowledge based portal for collaborative benchmarking and optimization of emerging workloads at cknowledge.io

External links

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[cti-1] 1 2 Grigori Fursin. Collective Tuning Initiative: automating and accelerating development and optimization of computing systems. Proceedings of the GCC Summit'09, Montreal, Canada, June 2009 (link)

[infoworld-2] 1 2 Rethinking code optimization for mobile and multicore, InfoWorld, July 2009 (link)

[3] Artifact Evaluation for computer systems' conferences

[4] Grigori Fursin and Olivier Temam. Collective optimization. Proceedings of the International Conference on High Performance Embedded Architectures & Compilers (HiPEAC 2009), Paphos, Cyprus, January 2009 (link)

[5] Collective Knowledge based portal for collaborative benchmarking and optimization of emerging workloads at cknowledge.io

[1]

[2]

[3]

[4]

[5]