RapidMiner

Last updated
RapidMiner
Developer(s) RapidMiner
Initial release2006;18 years ago (2006)
Stable release
10.1 / 31 January 2023;14 months ago (2023-01-31)
Operating system Cross-platform
Type Data science, machine learning, predictive analytics
License Professional and Enterprise Editions are Proprietary; Free Edition (10,000 rows and 1 logical processor limit) is available as AGPL
Website rapidminer.com

RapidMiner is a data science platform that analyses the collective impact of an organization's data. It was acquired by Altair Engineering in September 2022. [1]

Contents

History

RapidMiner, formerly known as YALE (Yet Another Learning Environment), was developed starting in 2001 by Ralf Klinkenberg, Ingo Mierswa, and Simon Fischer at the Artificial Intelligence Unit of the Technical University of Dortmund. [2] Starting in 2006, its development was driven by Rapid-I, a company founded by Ingo Mierswa and Ralf Klinkenberg in the same year. [3] In 2013, the company rebranded from Rapid-I to RapidMiner. [4]

Description

RapidMiner uses a client/server model with the server offered either on-premises or in public or private cloud infrastructures.

RapidMiner provides data mining and machine learning procedures including: data loading and transformation (ETL), data preprocessing and visualization, predictive analytics and statistical modeling, evaluation, and deployment. RapidMiner is written in the Java programming language. RapidMiner provides a GUI to design and execute analytical workflows. Those workflows are called “Processes” in RapidMiner and they consist of multiple “Operators”. Each operator performs a single task within the process, and the output of each operator forms the input of the next one. Alternatively, the engine can be called from other programs or used as an API. Individual functions can be called from the command line. RapidMiner provides a variety of learning schemes, models, and algorithms that can be extended using R and Python scripts. [5]

RapidMiner can also use plugins available through the RapidMiner Marketplace. The RapidMiner Marketplace is a platform for developers to create data analysis algorithms and publish them to the community. [6]

The RapidMiner Studio Free Edition, which is limited to one logical processor and 10,000 data rows, is available under the AGPL license. [7]

Adoption

In 2019, Gartner placed RapidMiner in the leader quadrant of its Magic Quadrant for Data Science & Machine Learning Platforms for the sixth year in a row. [8] The report noted that RapidMiner provides deep and broad modeling capabilities for automated end-to-end model development. In the 2018 annual software poll, KD-nuggets readers voted RapidMiner as one of the most popular data analytics software with the poll’s respondents citing the software package as the tool they use. [9] RapidMiner has received millions of total downloads and has over 400,000 users including BMW, Intel, Cisco, GE, and Samsung as paying customers. RapidMiner claims to be the market leader in the software for data science platforms against competitors such as SAS and IBM. [10]

Development

About 50 developers worldwide participated in the development of the open-source RapidMiner with the majority of the contributors being employees of RapidMiner. [11] The company that develops RapidMiner received a $16 million Series C funding with participation from venture capital firms Nokia Growth Partners, Ascent Venture Partners, Longworth Venture Partners, Earlybird Venture Capital and Open-Ocean. Open-Ocean partner Michael "Monty" Widenius is one of the founders of MySQL. [12]

Related Research Articles

<span class="mw-page-title-main">SAS (software)</span> Statistical software

SAS is a statistical software suite developed by SAS Institute for data management, advanced analytics, multivariate analysis, business intelligence, criminal investigation, and predictive analytics. SAS' analytical software is built upon artificial intelligence and utilizes machine learning, deep learning and generative AI to manage and model data. The software is widely used in industries such as finance, insurance, health care and education.

MarkLogic is an American software business that develops and provides an enterprise NoSQL database, which is also named MarkLogic. They have offices in the United States, Europe, Asia, and Australia.

<span class="mw-page-title-main">Weka (software)</span> Suite of machine learning software written in Java

Waikato Environment for Knowledge Analysis (Weka) is a collection of machine learning and data analysis free software licensed under the GNU General Public License. It was developed at the University of Waikato, New Zealand and is the companion software to the book "Data Mining: Practical Machine Learning Tools and Techniques".

<span class="mw-page-title-main">World Programming System</span> Data analysis software

The World Programming System, also known as WPS Analytics or WPS, is a software product developed by a company called World Programming.

Nuxeo is a software company making an open source content management system.

<span class="mw-page-title-main">Netezza</span> Provider of Integrated Data Warehouse Hardware and Software

IBM Netezza is a subsidiary of American technology company IBM that designs and markets high-performance data warehouse appliances and advanced analytics applications for uses including enterprise data warehousing, business intelligence, predictive analytics and business continuity planning.

<span class="mw-page-title-main">Exasol</span> Database management software company

Exasol is an analytics database management software company. Its product is called Exasol, an in-memory, column-oriented, relational database management system

KNIME, the Konstanz Information Miner, is a free and open-source data analytics, reporting and integration platform. KNIME integrates various components for machine learning and data mining through its modular data pipelining "Building Blocks of Analytics" concept. A graphical user interface and use of JDBC allows assembly of nodes blending different data sources, including preprocessing, for modeling, data analysis and visualization without, or with only minimal, programming.

BigQuery is a managed, serverless data warehouse product by Google, offering scalable analysis over large quantities of data. It is a Platform as a Service (PaaS) that supports querying using a dialect of SQL. It also has built-in machine learning capabilities. BigQuery was announced in May 2010 and made generally available in November 2011.

Rexer Analytics’s Annual Data Miner Survey is the largest survey of data mining, data science, and analytics professionals in the industry. It consists of approximately 50 multiple choice and open-ended questions that cover seven general areas of data mining science and practice: (1) Field and goals, (2) Algorithms, (3) Models, (4) Tools, (5) Technology, (6) Challenges, and (7) Future. It is conducted as a service to the data mining community, and the results are usually announced at the PAW conferences and shared via freely available summary reports. In the 2013 survey, 1259 data miners from 75 countries participated. After 2011, Rexer Analytics moved to a biannual schedule.

<span class="mw-page-title-main">Alpine Data Labs</span> Environment for analytics

Alpine Data Labs is an advanced analytics interface working with Apache Hadoop and big data. It provides a collaborative, visual environment to create and deploy analytics workflow and predictive models. This aims to make analytics more suitable for business analyst level staff, like sales and other departments using the data, rather than requiring a "data engineer" or "data scientist" who understands languages like MapReduce or Pig.

<span class="mw-page-title-main">GoodData</span> US-based BI & analytics company

GoodData is a software company headquartered in San Francisco, California, in the U.S., with additional offices in Europe and Asia. GoodData is the leading cloud-based data and analytics platform, bringing AI-fueled data-driven decision-making to organizations across the globe. With a platform that leverages the potential of automation and AI, GoodData empowers its customers to make data analytics available to every single end user via real-time, self-service data insights right at the point of work. Over 140,000 of the world’s top businesses and 3.2 million users rely on GoodData to drive meaningful change and achieve more through data.

Google Cloud Platform (GCP), offered by Google, is a suite of cloud computing services that provides a series of modular cloud services including computing, data storage, data analytics, and machine learning, alongside a set of management tools. It runs on the same infrastructure that Google uses internally for its end-user products, such as Google Search, Gmail, and Google Docs, according to Verma, et.al. Registration requires a credit card or bank account details.

Sisense is an American business intelligence software company headquartered in New York City. It also has offices in San Francisco and Scottsdale.

<span class="mw-page-title-main">Paxata</span> American private software company

Paxata is a privately owned software company headquartered in Redwood City, California. It develops self-service data preparation software that gets data ready for data analytics software. Paxata's software is intended for business analysts, as opposed to technical staff. It is used to combine data from different sources, then check it for data quality issues, such as duplicates and outliers. Algorithms and machine learning automate certain aspects of data preparation and users work with the software through a user-interface similar to Excel spreadsheets.

Alteryx, Inc. is an American computer software company based in Irvine, California, with a development center in Broomfield, Colorado, and offices worldwide. The company's products are used for data science and analytics. The software is designed to make advanced analytics automation accessible to any data worker.

<span class="mw-page-title-main">Dynatrace</span> American technology company

Dynatrace, Inc. is a global technology company that provides a software observability platform based on artificial intelligence (AI) and automation. Dynatrace technologies are used to monitor, analyze, and optimize application performance, software development and security practices, IT infrastructure, and user experience for businesses and government agencies throughout the world.

Enterprise legal management (ELM) is a practice management strategy of corporate legal departments, insurance claims departments, and government legal and contract management departments.

Clarifai is an independent artificial intelligence company that specializes in computer vision, natural language processing, and audio recognition. One of the first deep learning platforms having been founded in 2013, Clarifai provides an AI platform for unstructured image, video, text, and audio data. Its platform supports the full AI lifecycle for data exploration, data labeling, model training, evaluation and inference around images, video, text, and audio data. Headquartered in Washington DC and with employees in the US, Canada, Argentina, Estonia and India Clarifai uses machine learning and deep neural networks to identify and analyze images, videos, text and audio automatically. Clarifai enables users to implement AI technology into their products.

<span class="mw-page-title-main">MLOps</span> Approach to machine learning lifecycle management

MLOps or ML Ops is a paradigm that aims to deploy and maintain machine learning models in production reliably and efficiently. The word is a compound of "machine learning" and the continuous development practice of DevOps in the software field. Machine learning models are tested and developed in isolated experimental systems. When an algorithm is ready to be launched, MLOps is practiced between Data Scientists, DevOps, and Machine Learning engineers to transition the algorithm to production systems. Similar to DevOps or DataOps approaches, MLOps seeks to increase automation and improve the quality of production models, while also focusing on business and regulatory requirements. While MLOps started as a set of best practices, it is slowly evolving into an independent approach to ML lifecycle management. MLOps applies to the entire lifecycle - from integrating with model generation, orchestration, and deployment, to health, diagnostics, governance, and business metrics. According to Gartner, MLOps is a subset of ModelOps. MLOps is focused on the operationalization of ML models, while ModelOps covers the operationalization of all types of AI models.

References

  1. Altair. "Altair Announces Completion of Acquisition of RapidMiner". www.prnewswire.com. Retrieved 2022-10-01.
  2. Guido Deutsch, “RapidMiner from Rapid-I at CeBIT 2010 Archived 2020-01-24 at the Wayback Machine ,” Data Mining Blog, March 18, 2010.
  3. Interview with RapidMiner's Ingo Mierswa, Ralf Klinkenberg”, KDnuggets, February, 2010.
  4. German Predictive Analytics Startup Rapid-I Rebrands As RapidMiner”, TechCrunch, November 4, 2013.
  5. David Norris, “RapidMiner - a potential game changer,” Bloor Research, November 13, 2013.
  6. Ajay Ohri, “Interview with Rapid-I Ingo Mierswa and Simon Fischer,” KDnuggets, August 2011.
  7. RapidMiner Embraces its Community and Open Source Culture Delivering Get-More-Open-Core Predictive Analytics, September 1, 2015.
  8. "Gartner Magic Quadrant for Data Science and Machine Learning Platforms". Gartner. Retrieved 25 October 2020.
  9. "Python eats away at R: Top Software for Analytics, Data Science, Machine Learning in 2018: Trends and Analysis". www.kdnuggets.com. Retrieved 2018-10-05.
  10. Ingrid Lunden, “German Predictive Analytics Startup Rapid-I Rebrands As RapidMiner, Takes $5M From Open Ocean, Earlybird To Tackle The U.S. Market,” TechCrunch, November 4, 2013.
  11. Evan Quinn, “Is Rapid-I the Hidden Giant of Analytics?,” QuinnSight Research, June 17, 2013.
  12. "Five Questions With Michael Widenius - Founder And Original Developer Of MySQL : OpenSource Release Feed". 2009-03-13. Archived from the original on 2009-03-13. Retrieved 2023-10-22.