CTuning foundation

The cTuning Foundation
Founded	2014;10 years ago
Founder	Grigori Fursin
Type	Non-profit research and development organization, Engineering organization
Registration no.	W943003814
Focus	Collaborative software, Open Science, Open Source Software, Reproducibility, Computer Science, Machine learning, Artifact Evaluation, Performance tuning, Knowledge management
Location	Cachan ;
Origins	Collective Tuning Initiative & Milepost GCC
Area served	Worldwide
Method	Develop open-source tools, a public repository of knowledge, and a common methodology for collaborative and reproducible experimentation
Website	ctuning.org

Last updated December 02, 2024

The cTuning Foundation is a global non-profit organization developing a common methodology and open-source tools to support sustainable, collaborative and reproducible research in Computer science and organize and automate artifact evaluation and reproducibility inititiaves at machine learning and systems conferences and journals.^[1]

Notable projects

Collective Mind - a Python package with a collection of portable, extensible and ready-to-use automation recipes with a human-friendly interface to help the community compose, benchmark and optimize complex AI, ML and other applications and systems across diverse and continuously changing models, data sets, software and hardware.^[2]^[3]^[4]
Collective Knowledge - an open-source framework to organize software projects as a database of reusable components with common automation actions and extensible meta descriptions based on FAIR principles, implement portable research workflows, and crowdsource experiments across diverse platforms provided by volunteers.^[5]
ACM ReQuEST - Reproducible Quality-Efficient Systems Tournaments to co-design efficient software/hardware stacks for deep learning algorithms in terms of speed, accuracy and costs across diverse platforms, environments, libraries, models and data sets^[6]
MILEPOST GCC - open-source technology to build machine learning based self-optimizing compilers.
Artifact Evaluation - validation of experimental results from published papers at the computer systems and machine learning conferences.^[7]^[8]^[9]
Reproducible Papers - a public index of reproducible papers with portable workflows and reusable research components.

History

Grigori Fursin developed cTuning.org at the end of the Milepost project in 2009 to continue his research on machine learning based program and architecture optimization as a community effort.^[10]^[11]

In 2014, cTuning Foundation was registered in France as a non-profit research and development organization. It received funding from the EU TETRACOM project and ARM to develop the Collective Knowledge Framework and prepare reproducible research methodology for ACM and IEEE conferences.^[12]

In 2020, cTuning Foundation joined MLCommons as a founding member to accelerate innovation in ML.^[13]

In 2023, cTuning Foundation joined the new initiative by the Autonomous Vehicle Computing Consortium and MLCommons to develop an automotive industry standard machine learning benchmark suite.^[14]

Since 2024, cTuning Foundation supports the MLCommons Croissant Metadata Format to help standardize ML Datasets.^[15]

Funding

Current funding comes from the European Union research and development funding programme, Microsoft, and other organizations.^[16]

Related Research Articles

In software engineering, profiling is a form of dynamic program analysis that measures, for example, the space (memory) or time complexity of a program, the usage of particular instructions, or the frequency and duration of function calls. Most commonly, profiling information serves to aid program optimization, and more specifically, performance engineering.

Process mining is a family of techniques used to analyze event data in order to understand and improve operational processes. Part of the fields of data science and process management, process mining is generally built on logs that contain case id, a unique identifier for a particular process instance; an activity, a description of the event that is occurring; a timestamp; and sometimes other information such as resources, costs, and so on.

In computing, compiler correctness is the branch of computer science that deals with trying to show that a compiler behaves according to its language specification. Techniques include developing the compiler using formal methods and using rigorous testing on an existing compiler.

The Interactive Compilation Interface (ICI) is a plugin system with a high-level compiler-independent and low-level compiler-dependent API to transform production compilers into interactive research toolsets. It was developed by Grigori Fursin during the MILEPOST project. The ICI framework acts as a "middleware" interface between the compiler and the user-definable plugins. It opens up and reuses the production-quality compiler infrastructure to enable program analysis and instrumentation, fine-grain program optimizations, simple prototyping of new development and research ideas while avoiding building new compilation tools from scratch. For example, it is used in MILEPOST GCC to automate compiler and architecture design and program optimizations based on statistical analysis and machine learning, and predict profitable optimization to improve program execution time, code size and compilation time.

Business process management (BPM) is the discipline in which people use various methods to discover, model, analyze, measure, improve, optimize, and automate business processes. Any combination of methods used to manage a company's business processes is BPM. Processes can be structured and repeatable or unstructured and variable. Though not required, enabling technologies are often used with BPM.

MILEPOST GCC is a free, community-driven, open-source, adaptive, self-tuning compiler that combines stable production-quality GCC, Interactive Compilation Interface and machine learning plugins to adapt to any given architecture and program automatically and predict profitable optimizations to improve program execution time, code size and compilation time. It is currently used and supported by academia and industry and is intended to open up research opportunities to automate compiler and architecture design and optimization.

The Collective Tuning Initiative is a community-driven initiative started by Grigori Fursin to develop free and open-source research tools with a unified API for collaborative characterization, optimization and co-design of computer systems. They enable sharing of benchmarks, data sets and optimization cases from the community in the Collective Optimization Database through unified web services to predict better optimizations or architecture designs. Using common research-and-development tools should help to improve the quality and reproducibility of computer systems' research and development and accelerate innovation in this area. This approach helped establish Reproducibility Initiatives and Artifact Evaluation at several ACM-sponsored conferences to encourage sharing of artifacts and validation of experimental results from accepted papers.

KNIME, the Konstanz Information Miner, is a free and open-source data analytics, reporting and integration platform. KNIME integrates various components for machine learning and data mining through its modular data pipelining "Building Blocks of Analytics" concept. A graphical user interface and use of JDBC allows assembly of nodes blending different data sources, including preprocessing, for modeling, data analysis and visualization without, or with minimal, programming.

Automated synthesis or automatic synthesis is a set of techniques that use robotic equipment to perform chemical synthesis in an automated way. Automating processes allows for higher efficiency and product quality although automation technology can be cost-prohibitive and there are concerns regarding overdependence and job displacement. Chemical processes were automated throughout the 19th and 20th centuries, with major developments happening in the previous thirty years, as technology advanced. Tasks that are performed may include: synthesis in variety of different conditions, sample preparation, purification, and extractions. Applications of automated synthesis are found on research and industrial scales in a wide variety of fields including polymers, personal care, and radiosynthesis.

Robotic process automation (RPA) is a form of business process automation that is based on software robots (bots) or artificial intelligence (AI) agents. RPA should not be confused with artificial intelligence as it is based on automotive technology following a predefined workflow. It is sometimes referred to as software robotics.

The Collective Knowledge (CK) project is an open-source framework and repository to enable collaborative, reproducible and sustainable research and development of complex computational systems. CK is a small, portable, customizable and decentralized infrastructure helping researchers and practitioners:

Grigori Fursin is a British computer scientist, president of the non-profit CTuning foundation, founding member of MLCommons, and co-chair of the MLCommons Task Force on Automation and Reproducibility. His research group created open-source machine learning based self-optimizing compiler, MILEPOST GCC, considered to be the first in the world. At the end of the MILEPOST project he established cTuning foundation to crowdsource program optimisation and machine learning across diverse devices provided by volunteers. His foundation also developed Collective Knowledge Framework and Collective Mind to support open research. Since 2015 Fursin leads Artifact Evaluation at several ACM and IEEE computer systems conferences. He is also a founding member of the ACM taskforce on Data, Software, and Reproducibility in Publication.

Automated machine learning (AutoML) is the process of automating the tasks of applying machine learning to real-world problems. It is the combination of automation and ML.

MLOps or ML Ops is a paradigm that aims to deploy and maintain machine learning models in production reliably and efficiently. The word is a compound of "machine learning" and the continuous delivery practice (CI/CD) of DevOps in the software field. Machine learning models are tested and developed in isolated experimental systems. When an algorithm is ready to be launched, MLOps is practiced between Data Scientists, DevOps, and Machine Learning engineers to transition the algorithm to production systems. Similar to DevOps or DataOps approaches, MLOps seeks to increase automation and improve the quality of production models, while also focusing on business and regulatory requirements. While MLOps started as a set of best practices, it is slowly evolving into an independent approach to ML lifecycle management. MLOps applies to the entire lifecycle - from integrating with model generation, orchestration, and deployment, to health, diagnostics, governance, and business metrics.

<span class="mw-page-title-main">ModelOps</span>

ModelOps, as defined by Gartner, "is focused primarily on the governance and lifecycle management of a wide range of operationalized artificial intelligence (AI) and decision models, including machine learning, knowledge graphs, rules, optimization, linguistic and agent-based models" in Multi-Agent Systems. "ModelOps lies at the heart of any enterprise AI strategy". It orchestrates the model lifecycles of all models in production across the entire enterprise, from putting a model into production, then evaluating and updating the resulting application according to a set of governance rules, including both technical and business key performance indicators (KPI's). It grants business domain experts the capability to evaluate AI models in production, independent of data scientists.

Data version control is a method of working with data sets. It is similar to the version control systems used in traditional software development, but is optimized to allow better processing of data and collaboration in the context of data analytics, research, and any other form of data analysis. Data version control may also include specific features and configurations designed to facilitate work with large data sets and data lakes.

Auto-WEKA is an automated machine learning system based on Weka by Chris Thornton, Frank Hutter, Holger H. Hoos and Kevin Leyton-Brown. An extended version was published as Auto-WEKA 2.0. Auto-WEKA was named the first prominent AutoML system in a neutral comparison study.

References

↑ "ACM TechTalk "Reproducing 150 Research Papers and Testing Them in the Real World: Challenges and Solutions with Grigori Fursin"" . Retrieved 11 February 2021.
↑ Fursin, Grigori (June 2023). Toward a common language to facilitate reproducible research and technology transfer: challenges and solutions. keynote at the 1st ACM Conference on Reproducibility and Replicability. doi:10.5281/zenodo.8105339.
↑ Online catalog of automation recipes developed by MLCommons
↑ HPCWire: MLPerf Releases Latest Inference Results and New Storage Benchmark, September 2023
↑ Fursin, Grigori (October 2020). Collective Knowledge: organizing research projects as a database of reusable components and portable workflows with common interfaces. Philosophical Transactions of the Royal_Society. arXiv: 2011.01149 . doi:10.1098/rsta.2020.0211 . Retrieved 22 October 2020.
↑ Ceze, Luis (20 June 2018), ACM ReQuEST'18 front matters and report (PDF), ISBN 9781450359238
↑ Fursin, Grigori; Bruce Childers; Alex K. Jones; Daniel Mosse (June 2014). TRUST'14. Proceedings of the 1st ACM SIGPLAN Workshop on Reproducible Research Methodologies and New Publication Models in Computer Engineering at PLDI'14. doi:10.1145/2618137.
↑ Fursin, Grigori; Christophe Dubach (June 2014). Community-driven reviewing and validation of publications. Proceedings of TRUST'14 at PLDI'14. arXiv: 1406.4020 . doi:10.1145/2618137.2618142.
↑ Childers, Bruce R; Grigori Fursin; Shriram Krishnamurthi; Andreas Zeller (March 2016). Artifact evaluation for publications. Dagstuhl Perspectives Workshop 15452. doi: 10.4230/DagRep.5.11.29 .
↑ World's First Intelligent, Open Source Compiler Provides Automated Advice on Software Code Optimization, IBM press-release, June 2009 (link)
↑ Grigori Fursin. Collective Tuning Initiative: automating and accelerating development and optimization of computing systems. Proceedings of the GCC Summit'09, Montreal, Canada, June 2009 (link)
↑ Article on TTP project "COLLECTIVE KNOWLEDGE: A FRAMEWORK FOR SYSTEMATIC PERFORMANCE ANALYSIS AND OPTIMIZATION", HiPEACinfo, July 2015 (link)
↑ MLCommons press-release: "MLCommons Launches and Unites 50+ Global Technology and Academic Leaders in AI and Machine Learning to Accelerate Innovation in ML" (link)
↑ AVCC press-release: "AVCC and MLCommons Join Forces to Develop an Automotive Industry Standard Machine Learning Benchmark Suite" (link)
↑ MLCommons press-release: "New Croissant Metadata Format helps Standardize ML Datasets. Support from Hugging Face, Google Dataset Search, Kaggle, and Open ML, makes datasets easily discoverable and usable." (link)
↑ cTuning foundation partners

This organization-related article is a stub. You can help Wikipedia by expanding it.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "ACM TechTalk "Reproducing 150 Research Papers and Testing Them in the Real World: Challenges and Solutions with Grigori Fursin"" . Retrieved 11 February 2021.

[acm-rep2023-keynote-2] Fursin, Grigori (June 2023). Toward a common language to facilitate reproducible research and technology transfer: challenges and solutions. keynote at the 1st ACM Conference on Reproducibility and Replicability. doi:10.5281/zenodo.8105339.

[3] Online catalog of automation recipes developed by MLCommons

[hpcwire-4] HPCWire: MLPerf Releases Latest Inference Results and New Storage Benchmark, September 2023

[pt2020-5] Fursin, Grigori (October 2020). Collective Knowledge: organizing research projects as a database of reusable components and portable workflows with common interfaces. Philosophical Transactions of the Royal_Society. arXiv: 2011.01149 . doi:10.1098/rsta.2020.0211 . Retrieved 22 October 2020.

[6] Ceze, Luis (20 June 2018), ACM ReQuEST'18 front matters and report (PDF), ISBN 9781450359238

[7] Fursin, Grigori; Bruce Childers; Alex K. Jones; Daniel Mosse (June 2014). TRUST'14. Proceedings of the 1st ACM SIGPLAN Workshop on Reproducible Research Methodologies and New Publication Models in Computer Engineering at PLDI'14. doi:10.1145/2618137.

[8] Fursin, Grigori; Christophe Dubach (June 2014). Community-driven reviewing and validation of publications. Proceedings of TRUST'14 at PLDI'14. arXiv: 1406.4020 . doi:10.1145/2618137.2618142.

[9] Childers, Bruce R; Grigori Fursin; Shriram Krishnamurthi; Andreas Zeller (March 2016). Artifact evaluation for publications. Dagstuhl Perspectives Workshop 15452. doi: 10.4230/DagRep.5.11.29 .

[10] World's First Intelligent, Open Source Compiler Provides Automated Advice on Software Code Optimization, IBM press-release, June 2009 (link)

[11] Grigori Fursin. Collective Tuning Initiative: automating and accelerating development and optimization of computing systems. Proceedings of the GCC Summit'09, Montreal, Canada, June 2009 (link)

[12] Article on TTP project "COLLECTIVE KNOWLEDGE: A FRAMEWORK FOR SYSTEMATIC PERFORMANCE ANALYSIS AND OPTIMIZATION", HiPEACinfo, July 2015 (link)

[13] MLCommons press-release: "MLCommons Launches and Unites 50+ Global Technology and Academic Leaders in AI and Machine Learning to Accelerate Innovation in ML" (link)

[14] AVCC press-release: "AVCC and MLCommons Join Forces to Develop an Automotive Industry Standard Machine Learning Benchmark Suite" (link)

[15] MLCommons press-release: "New Croissant Metadata Format helps Standardize ML Datasets. Support from Hugging Face, Google Dataset Search, Kaggle, and Open ML, makes datasets easily discoverable and usable." (link)

[16] cTuning foundation partners

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]


Founded	2014;10 years ago (2014)
Founder	Grigori Fursin
Type	Non-profit research and development organization, Engineering organization
Registration no.	W943003814
Focus	Collaborative software, Open Science, Open Source Software, Reproducibility, Computer Science, Machine learning, Artifact Evaluation, Performance tuning, Knowledge management
Location	Cachan
Origins	Collective Tuning Initiative & Milepost GCC
Area served	Worldwide
Method	Develop open-source tools, a public repository of knowledge, and a common methodology for collaborative and reproducible experimentation
Website	ctuning.org