CTuning foundation

Last updated
The cTuning Foundation
Founded2014;10 years ago (2014)
Founder Grigori Fursin
Type Non-profit research and development organization, Engineering organization
Registration no. W943003814
Focus Collaborative software, Open Science, Open Source Software, Reproducibility, Computer Science, Machine learning, Artifact Evaluation, Performance tuning, Knowledge management
Location
Origins Collective Tuning Initiative & Milepost GCC
Area served
Worldwide
MethodDevelop open-source tools, a public repository of knowledge, and a common methodology for collaborative and reproducible experimentation
Website ctuning.org

The cTuning Foundation is a global non-profit organization developing a common methodology and open-source tools to support sustainable, collaborative and reproducible research in Computer science and organize and automate artifact evaluation and reproducibility inititiaves at machine learning and systems conferences and journals. [1]

Contents

Notable projects

History

Grigori Fursin developed cTuning.org at the end of the Milepost project in 2009 to continue his research on machine learning based program and architecture optimization as a community effort. [10] [11]

In 2014, cTuning Foundation was registered in France as a non-profit research and development organization. It received funding from the EU TETRACOM project and ARM to develop the Collective Knowledge Framework and prepare reproducible research methodology for ACM and IEEE conferences. [12]

In 2020, cTuning Foundation joined MLCommons as a founding member to accelerate innovation in ML. [13]

In 2023, cTuning Foundation joined the new initiative by the Autonomous Vehicle Computing Consortium and MLCommons to develop an automotive industry standard machine learning benchmark suite. [14]

Since 2024, cTuning Foundation supports the MLCommons Croissant Metadata Format to help standardize ML Datasets. [15]

Funding

Current funding comes from the European Union research and development funding programme, Microsoft, and other organizations. [16]

Related Research Articles

<span class="mw-page-title-main">Workflow</span> Pattern of activity often with a result

Workflow is a generic term for orchestrated and repeatable patterns of activity, enabled by the systematic organization of resources into processes that transform materials, provide services, or process information. It can be depicted as a sequence of operations, the work of a person or group, the work of an organization of staff, or one or more simple or complex mechanisms.

In software engineering, profiling is a form of dynamic program analysis that measures, for example, the space (memory) or time complexity of a program, the usage of particular instructions, or the frequency and duration of function calls. Most commonly, profiling information serves to aid program optimization, and more specifically, performance engineering.

Process mining is a family of techniques used to analyze event data in order to understand and improve operational processes. Part of the fields of data science and process management, process mining is generally built on logs that contain case id, a unique identifier for a particular process instance; an activity, a description of the event that is occurring; a timestamp; and sometimes other information such as resources, costs, and so on.

In computing, compiler correctness is the branch of computer science that deals with trying to show that a compiler behaves according to its language specification. Techniques include developing the compiler using formal methods and using rigorous testing on an existing compiler.

The Interactive Compilation Interface (ICI) is a plugin system with a high-level compiler-independent and low-level compiler-dependent API to transform production compilers into interactive research toolsets. It was developed by Grigori Fursin during the MILEPOST project. The ICI framework acts as a "middleware" interface between the compiler and the user-definable plugins. It opens up and reuses the production-quality compiler infrastructure to enable program analysis and instrumentation, fine-grain program optimizations, simple prototyping of new development and research ideas while avoiding building new compilation tools from scratch. For example, it is used in MILEPOST GCC to automate compiler and architecture design and program optimizations based on statistical analysis and machine learning, and predict profitable optimization to improve program execution time, code size and compilation time.

MILEPOST GCC is a free, community-driven, open-source, adaptive, self-tuning compiler that combines stable production-quality GCC, Interactive Compilation Interface and machine learning plugins to adapt to any given architecture and program automatically and predict profitable optimizations to improve program execution time, code size and compilation time. It is currently used and supported by academia and industry and is intended to open up research opportunities to automate compiler and architecture design and optimization.

The Collective Tuning Initiative is a community-driven initiative started by Grigori Fursin to develop free and open-source research tools with a unified API for collaborative characterization, optimization and co-design of computer systems. They enable sharing of benchmarks, data sets and optimization cases from the community in the Collective Optimization Database through unified web services to predict better optimizations or architecture designs. Using common research-and-development tools should help to improve the quality and reproducibility of computer systems' research and development and accelerate innovation in this area. This approach helped establish Reproducibility Initiatives and Artifact Evaluation at several ACM-sponsored conferences to encourage sharing of artifacts and validation of experimental results from accepted papers.

KNIME, the Konstanz Information Miner, is a free and open-source data analytics, reporting and integration platform. KNIME integrates various components for machine learning and data mining through its modular data pipelining "Building Blocks of Analytics" concept. A graphical user interface and use of JDBC allows assembly of nodes blending different data sources, including preprocessing, for modeling, data analysis and visualization without, or with minimal, programming.

Feature engineering is a preprocessing step in supervised machine learning and statistical modeling which transforms raw data into a more effective set of inputs. Each input comprises several attributes, known as features. By providing models with relevant information, feature engineering significantly enhances their predictive accuracy and decision-making capability.

The Collective Knowledge (CK) project is an open-source framework and repository to enable collaborative, reproducible and sustainable research and development of complex computational systems. CK is a small, portable, customizable and decentralized infrastructure helping researchers and practitioners:

<span class="mw-page-title-main">Grigori Fursin</span> British computer scientist

Grigori Fursin is a British computer scientist, president of the non-profit CTuning foundation, founding member of MLCommons, co-chair of the MLCommons Task Force on Automation and Reproducibility and founder of cKnowledge. His research group created open-source machine learning based self-optimizing compiler, MILEPOST GCC, considered to be the first in the world. At the end of the MILEPOST project he established cTuning foundation to crowdsource program optimisation and machine learning across diverse devices provided by volunteers. His foundation also developed Collective Knowledge Framework to support open research. Since 2015 Fursin leads Artifact Evaluation at several ACM and IEEE computer systems conferences. He is also a founding member of the ACM taskforce on Data, Software, and Reproducibility in Publication.

Automated machine learning (AutoML) is the process of automating the tasks of applying machine learning to real-world problems. It is the combination of automation and ML.

Neural architecture search (NAS) is a technique for automating the design of artificial neural networks (ANN), a widely used model in the field of machine learning. NAS has been used to design networks that are on par or outperform hand-designed architectures. Methods for NAS can be categorized according to the search space, search strategy and performance estimation strategy used:

<span class="mw-page-title-main">MLOps</span> Approach to machine learning lifecycle management

MLOps or ML Ops is a paradigm that aims to deploy and maintain machine learning models in production reliably and efficiently. The word is a compound of "machine learning" and the continuous delivery practice (CI/CD) of DevOps in the software field. Machine learning models are tested and developed in isolated experimental systems. When an algorithm is ready to be launched, MLOps is practiced between Data Scientists, DevOps, and Machine Learning engineers to transition the algorithm to production systems. Similar to DevOps or DataOps approaches, MLOps seeks to increase automation and improve the quality of production models, while also focusing on business and regulatory requirements. While MLOps started as a set of best practices, it is slowly evolving into an independent approach to ML lifecycle management. MLOps applies to the entire lifecycle - from integrating with model generation, orchestration, and deployment, to health, diagnostics, governance, and business metrics.

<span class="mw-page-title-main">ModelOps</span>

ModelOps, as defined by Gartner, "is focused primarily on the governance and lifecycle management of a wide range of operationalized artificial intelligence (AI) and decision models, including machine learning, knowledge graphs, rules, optimization, linguistic and agent-based models". "ModelOps lies at the heart of any enterprise AI strategy". It orchestrates the model lifecycles of all models in production across the entire enterprise, from putting a model into production, then evaluating and updating the resulting application according to a set of governance rules, including both technical and business key performance indicators (KPI's). It grants business domain experts the capability to evaluate AI models in production, independent of data scientists.

Data version control is a method of working with data sets. It is similar to the version control systems used in traditional software development, but is optimized to allow better processing of data and collaboration in the context of data analytics, research, and any other form of data analysis. Data version control may also include specific features and configurations designed to facilitate work with large data sets and data lakes.

Liang Zhao is a computer scientist and academic. He is an associate professor in the Department of Compute Science at Emory University.

References

  1. "ACM TechTalk "Reproducing 150 Research Papers and Testing Them in the Real World: Challenges and Solutions with Grigori Fursin"" . Retrieved 11 February 2021.
  2. Fursin, Grigori (June 2023). Toward a common language to facilitate reproducible research and technology transfer: challenges and solutions. keynote at the 1st ACM Conference on Reproducibility and Replicability. doi:10.5281/zenodo.8105339.
  3. Online catalog of automation recipes developed by MLCommons
  4. HPCWire: MLPerf Releases Latest Inference Results and New Storage Benchmark, September 2023
  5. Fursin, Grigori (October 2020). Collective Knowledge: organizing research projects as a database of reusable components and portable workflows with common interfaces. Philosophical Transactions of the Royal_Society. arXiv: 2011.01149 . doi:10.1098/rsta.2020.0211 . Retrieved 22 October 2020.
  6. Ceze, Luis (20 June 2018), ACM ReQuEST'18 front matters and report (PDF), ISBN   9781450359238
  7. Fursin, Grigori; Bruce Childers; Alex K. Jones; Daniel Mosse (June 2014). TRUST'14. Proceedings of the 1st ACM SIGPLAN Workshop on Reproducible Research Methodologies and New Publication Models in Computer Engineering at PLDI'14. doi:10.1145/2618137.
  8. Fursin, Grigori; Christophe Dubach (June 2014). Community-driven reviewing and validation of publications. Proceedings of TRUST'14 at PLDI'14. arXiv: 1406.4020 . doi:10.1145/2618137.2618142.
  9. Childers, Bruce R; Grigori Fursin; Shriram Krishnamurthi; Andreas Zeller (March 2016). Artifact evaluation for publications. Dagstuhl Perspectives Workshop 15452. doi: 10.4230/DagRep.5.11.29 .
  10. World's First Intelligent, Open Source Compiler Provides Automated Advice on Software Code Optimization, IBM press-release, June 2009 (link)
  11. Grigori Fursin. Collective Tuning Initiative: automating and accelerating development and optimization of computing systems. Proceedings of the GCC Summit'09, Montreal, Canada, June 2009 (link)
  12. Article on TTP project "COLLECTIVE KNOWLEDGE: A FRAMEWORK FOR SYSTEMATIC PERFORMANCE ANALYSIS AND OPTIMIZATION", HiPEACinfo, July 2015 (link)
  13. MLCommons press-release: "MLCommons Launches and Unites 50+ Global Technology and Academic Leaders in AI and Machine Learning to Accelerate Innovation in ML" (link)
  14. AVCC press-release: "AVCC and MLCommons Join Forces to Develop an Automotive Industry Standard Machine Learning Benchmark Suite" (link)
  15. MLCommons press-release: "New Croissant Metadata Format helps Standardize ML Datasets. Support from Hugging Face, Google Dataset Search, Kaggle, and Open ML, makes datasets easily discoverable and usable." (link)
  16. cTuning foundation partners