Founded | 2014 |
---|---|
Founder | Grigori Fursin |
Type | Non-profit research and development organization, Engineering organization |
Registration no. | W943003814 |
Focus | Collaborative software, Open Science, Open Source Software, Reproducibility, Computer Science, Machine learning, Artifact Evaluation, Performance tuning, Knowledge management |
Location | |
Origins | Collective Tuning Initiative & Milepost GCC |
Area served | Worldwide |
Method | Develop open-source tools, a public repository of knowledge, and a common methodology for collaborative and reproducible experimentation |
Website | ctuning |
The cTuning Foundation is a global non-profit organization developing a common methodology and open-source tools to support sustainable, collaborative and reproducible research in Computer science and organize and automate artifact evaluation and reproducibility inititiaves at machine learning and systems conferences and journals. [1]
Grigori Fursin developed cTuning.org at the end of the Milepost project in 2009 to continue his research on machine learning based program and architecture optimization as a community effort. [10] [11]
In 2014, cTuning Foundation was registered in France as a non-profit research and development organization. It received funding from the EU TETRACOM project and ARM to develop the Collective Knowledge Framework and prepare reproducible research methodology for ACM and IEEE conferences. [12]
In 2020, cTuning Foundation joined MLCommons as a founding member to accelerate innovation in ML. [13]
In 2023, cTuning Foundation joined the new initiative by the Autonomous Vehicle Computing Consortium and MLCommons to develop an automotive industry standard machine learning benchmark suite. [14]
Since 2024, cTuning Foundation supports the MLCommons Croissant Metadata Format to help standardize ML Datasets. [15]
Current funding comes from the European Union research and development funding programme, Microsoft, and other organizations. [16]
In software engineering, profiling is a form of dynamic program analysis that measures, for example, the space (memory) or time complexity of a program, the usage of particular instructions, or the frequency and duration of function calls. Most commonly, profiling information serves to aid program optimization, and more specifically, performance engineering.
Process mining is a family of techniques used to analyze event data in order to understand and improve operational processes. Part of the fields of data science and process management, process mining is generally built on logs that contain case id, a unique identifier for a particular process instance; an activity, a description of the event that is occurring; a timestamp; and sometimes other information such as resources, costs, and so on.
In computing, compiler correctness is the branch of computer science that deals with trying to show that a compiler behaves according to its language specification. Techniques include developing the compiler using formal methods and using rigorous testing on an existing compiler.
The Interactive Compilation Interface (ICI) is a plugin system with a high-level compiler-independent and low-level compiler-dependent API to transform production compilers into interactive research toolsets. It was developed by Grigori Fursin during the MILEPOST project. The ICI framework acts as a "middleware" interface between the compiler and the user-definable plugins. It opens up and reuses the production-quality compiler infrastructure to enable program analysis and instrumentation, fine-grain program optimizations, simple prototyping of new development and research ideas while avoiding building new compilation tools from scratch. For example, it is used in MILEPOST GCC to automate compiler and architecture design and program optimizations based on statistical analysis and machine learning, and predict profitable optimization to improve program execution time, code size and compilation time.
Business process management (BPM) is the discipline in which people use various methods to discover, model, analyze, measure, improve, optimize, and automate business processes. Any combination of methods used to manage a company's business processes is BPM. Processes can be structured and repeatable or unstructured and variable. Though not required, enabling technologies are often used with BPM.
MILEPOST GCC is a free, community-driven, open-source, adaptive, self-tuning compiler that combines stable production-quality GCC, Interactive Compilation Interface and machine learning plugins to adapt to any given architecture and program automatically and predict profitable optimizations to improve program execution time, code size and compilation time. It is currently used and supported by academia and industry and is intended to open up research opportunities to automate compiler and architecture design and optimization.
The Collective Tuning Initiative is a community-driven initiative started by Grigori Fursin to develop free and open-source research tools with a unified API for collaborative characterization, optimization and co-design of computer systems. They enable sharing of benchmarks, data sets and optimization cases from the community in the Collective Optimization Database through unified web services to predict better optimizations or architecture designs. Using common research-and-development tools should help to improve the quality and reproducibility of computer systems' research and development and accelerate innovation in this area. This approach helped establish Reproducibility Initiatives and Artifact Evaluation at several ACM-sponsored conferences to encourage sharing of artifacts and validation of experimental results from accepted papers.
KNIME, the Konstanz Information Miner, is a free and open-source data analytics, reporting and integration platform. KNIME integrates various components for machine learning and data mining through its modular data pipelining "Building Blocks of Analytics" concept. A graphical user interface and use of JDBC allows assembly of nodes blending different data sources, including preprocessing, for modeling, data analysis and visualization without, or with minimal, programming.
Automated synthesis or automatic synthesis is a set of techniques that use robotic equipment to perform chemical synthesis in an automated way. Automating processes allows for higher efficiency and product quality although automation technology can be cost-prohibitive and there are concerns regarding overdependence and job displacement. Chemical processes were automated throughout the 19th and 20th centuries, with major developments happening in the previous thirty years, as technology advanced. Tasks that are performed may include: synthesis in variety of different conditions, sample preparation, purification, and extractions. Applications of automated synthesis are found on research and industrial scales in a wide variety of fields including polymers, personal care, and radiosynthesis.
Robotic process automation (RPA) is a form of business process automation that is based on software robots (bots) or artificial intelligence (AI) agents. RPA should not be confused with artificial intelligence as it is based on automotive technology following a predefined workflow. It is sometimes referred to as software robotics.
The Collective Knowledge (CK) project is an open-source framework and repository to enable collaborative, reproducible and sustainable research and development of complex computational systems. CK is a small, portable, customizable and decentralized infrastructure helping researchers and practitioners:
Grigori Fursin is a British computer scientist, president of the non-profit CTuning foundation, founding member of MLCommons, and co-chair of the MLCommons Task Force on Automation and Reproducibility. His research group created open-source machine learning based self-optimizing compiler, MILEPOST GCC, considered to be the first in the world. At the end of the MILEPOST project he established cTuning foundation to crowdsource program optimisation and machine learning across diverse devices provided by volunteers. His foundation also developed Collective Knowledge Framework and Collective Mind to support open research. Since 2015 Fursin leads Artifact Evaluation at several ACM and IEEE computer systems conferences. He is also a founding member of the ACM taskforce on Data, Software, and Reproducibility in Publication.
Automated machine learning (AutoML) is the process of automating the tasks of applying machine learning to real-world problems. It is the combination of automation and ML.
MLOps or ML Ops is a paradigm that aims to deploy and maintain machine learning models in production reliably and efficiently. The word is a compound of "machine learning" and the continuous delivery practice (CI/CD) of DevOps in the software field. Machine learning models are tested and developed in isolated experimental systems. When an algorithm is ready to be launched, MLOps is practiced between Data Scientists, DevOps, and Machine Learning engineers to transition the algorithm to production systems. Similar to DevOps or DataOps approaches, MLOps seeks to increase automation and improve the quality of production models, while also focusing on business and regulatory requirements. While MLOps started as a set of best practices, it is slowly evolving into an independent approach to ML lifecycle management. MLOps applies to the entire lifecycle - from integrating with model generation, orchestration, and deployment, to health, diagnostics, governance, and business metrics.
ModelOps, as defined by Gartner, "is focused primarily on the governance and lifecycle management of a wide range of operationalized artificial intelligence (AI) and decision models, including machine learning, knowledge graphs, rules, optimization, linguistic and agent-based models" in Multi-Agent Systems. "ModelOps lies at the heart of any enterprise AI strategy". It orchestrates the model lifecycles of all models in production across the entire enterprise, from putting a model into production, then evaluating and updating the resulting application according to a set of governance rules, including both technical and business key performance indicators (KPI's). It grants business domain experts the capability to evaluate AI models in production, independent of data scientists.
Data version control is a method of working with data sets. It is similar to the version control systems used in traditional software development, but is optimized to allow better processing of data and collaboration in the context of data analytics, research, and any other form of data analysis. Data version control may also include specific features and configurations designed to facilitate work with large data sets and data lakes.
Auto-WEKA is an automated machine learning system based on Weka by Chris Thornton, Frank Hutter, Holger H. Hoos and Kevin Leyton-Brown. An extended version was published as Auto-WEKA 2.0. Auto-WEKA was named the first prominent AutoML system in a neutral comparison study.