RapidMiner

Last updated
RapidMiner
Developer(s) RapidMiner
Initial release2006;18 years ago (2006)
Stable release
10.1 / 31 January 2023;17 months ago (2023-01-31)
Operating system Cross-platform
Type Data science, machine learning, predictive analytics
License Professional and Enterprise Editions are Proprietary; Free Edition (10,000 rows and 1 logical processor limit) is available as AGPL
Website rapidminer.com

RapidMiner is a data science platform that analyses the collective impact of an organization's data. It was acquired by Altair Engineering in September 2022. [1]

Contents

History

RapidMiner, formerly known as YALE (Yet Another Learning Environment), was developed by Ralf Klinkenberg, Ingo Mierswa, and Simon Fischer in 2001 at the Artificial Intelligence Unit of the Technical University of Dortmund. [2] Starting in 2006, its development was driven by Rapid-I, a company founded by Ingo Mierswa and Ralf Klinkenberg in the same year. [3] In 2013, the company rebranded from Rapid-I to RapidMiner. [4]

Description

RapidMiner uses a client/server model with the server offered either on-premises or in public or private cloud infrastructures.

RapidMiner provides data mining and machine learning procedures including: data loading and transformation (ETL), data preprocessing and visualization, predictive analytics and statistical modeling, evaluation, and deployment. RapidMiner is written in the Java programming language. RapidMiner provides a GUI to design and execute analytical workflows. Those workflows are called “Processes” in RapidMiner and they consist of multiple “Operators”. Each operator performs a single task within the process, and the output of each operator forms the input of the next one. Alternatively, the engine can be called from other programs or used as an API. Individual functions can be called from the command line. RapidMiner provides a variety of learning schemes, models, and algorithms that can be extended using R and Python scripts. [5]

RapidMiner can also use plugins available through the RapidMiner Marketplace. The RapidMiner Marketplace is a platform for developers to create data analysis algorithms and publish them to the community. [6]

The RapidMiner Studio Free Edition, which is limited to one logical processor and 10,000 data rows, is available under the AGPL license. [7]

Adoption

In 2019, Gartner placed RapidMiner in the leader quadrant of its Magic Quadrant for Data Science & Machine Learning Platforms for the sixth year in a row. [8] The report noted that RapidMiner provides deep and broad modeling capabilities for automated end-to-end model development. In the 2018 annual software poll, KD-nuggets readers voted RapidMiner as one of the most popular data analytics software with the poll’s respondents citing the software package as the tool they use. [9] RapidMiner has received millions of total downloads and has over 400,000 users including BMW, Intel, Cisco, GE, and Samsung as paying customers. RapidMiner claims to be the market leader in the software for data science platforms against competitors such as SAS and IBM. [10]

Development

About 50 developers worldwide participated in the development of the open-source RapidMiner with the majority of the contributors being employees of RapidMiner. [11] The company that develops RapidMiner received a $16 million Series C funding with participation from venture capital firms Nokia Growth Partners, Ascent Venture Partners, Longworth Venture Partners, Earlybird Venture Capital and Open-Ocean. Open-Ocean partner Michael "Monty" Widenius is one of the founders of MySQL. [12]

Related Research Articles

Software AG is a German multinational software corporation that develops enterprise software for business process management, integration, and big data analytics. Founded in 1969, the company is headquartered in Darmstadt, Germany, and has offices worldwide.

<span class="mw-page-title-main">SAS (software)</span> Statistical software

SAS is a statistical software suite developed by SAS Institute for data management, advanced analytics, multivariate analysis, business intelligence, criminal investigation, and predictive analytics. SAS' analytical software is built upon artificial intelligence and utilizes machine learning, deep learning and generative AI to manage and model data. The software is widely used in industries such as finance, insurance, health care and education.

MarkLogic is an American software business that develops and provides an enterprise NoSQL database, which is also named MarkLogic. They have offices in the United States, Europe, Asia, and Australia.

<span class="mw-page-title-main">Weka (software)</span> Suite of machine learning software written in Java

Waikato Environment for Knowledge Analysis (Weka) is a collection of machine learning and data analysis free software licensed under the GNU General Public License. It was developed at the University of Waikato, New Zealand and is the companion software to the book "Data Mining: Practical Machine Learning Tools and Techniques".

Process mining is a family of techniques used to analyze event data in order to understand and improve operational processes. Part of the fields of data science and process management, process mining is generally built on logs that contain case id, a unique identifier for a particular process instance; an activity, a description of the event that is occurring; a timestamp; and sometimes other information such as resources, costs, and so on.

<span class="mw-page-title-main">World Programming System</span> Data analysis software

The World Programming System, also known as WPS Analytics or WPS, is a software product developed by a company called World Programming.

Nuxeo is a software company making an open source content management system.

<span class="mw-page-title-main">Exasol</span> Database management software company

Exasol is an analytics database management software company. Its product is called Exasol, an in-memory, column-oriented, relational database management system

KNIME, the Konstanz Information Miner, is a free and open-source data analytics, reporting and integration platform. KNIME integrates various components for machine learning and data mining through its modular data pipelining "Building Blocks of Analytics" concept. A graphical user interface and use of JDBC allows assembly of nodes blending different data sources, including preprocessing, for modeling, data analysis and visualization without, or with minimal, programming.

BigQuery is a managed, serverless data warehouse product by Google, offering scalable analysis over large quantities of data. It is a Platform as a Service (PaaS) that supports querying using a dialect of SQL. It also has built-in machine learning capabilities. BigQuery was announced in May 2010 and made generally available in November 2011.

Rexer Analytics’s Annual Data Miner Survey is the largest survey of data mining, data science, and analytics professionals in the industry. It consists of approximately 50 multiple choice and open-ended questions that cover seven general areas of data mining science and practice: (1) Field and goals, (2) Algorithms, (3) Models, (4) Tools, (5) Technology, (6) Challenges, and (7) Future. It is conducted as a service to the data mining community, and the results are usually announced at the PAW conferences and shared via freely available summary reports. In the 2013 survey, 1259 data miners from 75 countries participated. After 2011, Rexer Analytics moved to a biannual schedule.

<span class="mw-page-title-main">Alpine Data Labs</span> Environment for analytics

Alpine Data Labs is an advanced analytics interface working with Apache Hadoop and big data. It provides a collaborative, visual environment to create and deploy analytics workflow and predictive models. This aims to make analytics more suitable for business analyst level staff, like sales and other departments using the data, rather than requiring a "data engineer" or "data scientist" who understands languages like MapReduce or Pig.

<span class="mw-page-title-main">GoodData</span> US-based BI & analytics company

GoodData is a software company headquartered in San Francisco, California, in the U.S., with additional offices in Europe and Asia.

Google Cloud Platform (GCP), offered by Google, is a suite of cloud computing services that provides a series of modular cloud services including computing, data storage, data analytics, and machine learning, alongside a set of management tools. It runs on the same infrastructure that Google uses internally for its end-user products, such as Google Search, Gmail, and Google Docs, according to Verma, et.al. Registration requires a credit card or bank account details.

Sisense is an American business intelligence software company headquartered in New York City, United States. It also has offices in San Francisco and Scottsdale.

Alteryx, Inc. is an American computer software company based in Irvine, California, with a development center in Broomfield, Colorado, and offices worldwide. The company's products are used for data science and analytics. The software is designed to make advanced analytics automation accessible to any data worker.

<span class="mw-page-title-main">Dynatrace</span> American technology company

Dynatrace, Inc. is a global technology company that provides a software observability platform based on artificial intelligence (AI) and automation. Dynatrace technologies are used to monitor, analyze, and optimize application performance, software development and security practices, IT infrastructure, and user experience for businesses and government agencies throughout the world.

Enterprise legal management (ELM) is a practice management strategy of corporate legal departments, insurance claims departments, and government legal and contract management departments.

Microsoft Power BI is an interactive data visualization software product developed by Microsoft with a primary focus on business intelligence. It is part of the Microsoft Power Platform. Power BI is a collection of software services, apps, and connectors that work together to turn various sources of data into static and interactive data visualizations. Data may be input by reading directly from a database, webpage, PDF, or structured files such as spreadsheets, CSV, XML, JSON, XLSX, and SharePoint.

ThoughtSpot, Inc. is a technology company that produces business intelligence analytics search software. The company is based in Mountain View, California, and was founded in 2012.

References

  1. Altair. "Altair Announces Completion of Acquisition of RapidMiner". www.prnewswire.com. Retrieved 2022-10-01.
  2. Guido Deutsch, “RapidMiner from Rapid-I at CeBIT 2010 Archived 2020-01-24 at the Wayback Machine ,” Data Mining Blog, March 18, 2010.
  3. Interview with RapidMiner's Ingo Mierswa, Ralf Klinkenberg”, KDnuggets, February, 2010.
  4. German Predictive Analytics Startup Rapid-I Rebrands As RapidMiner”, TechCrunch, November 4, 2013.
  5. David Norris, “RapidMiner - a potential game changer,” Bloor Research, November 13, 2013.
  6. Ajay Ohri, “Interview with Rapid-I Ingo Mierswa and Simon Fischer,” KDnuggets, August 2011.
  7. RapidMiner Embraces its Community and Open Source Culture Delivering Get-More-Open-Core Predictive Analytics, September 1, 2015.
  8. "Gartner Magic Quadrant for Data Science and Machine Learning Platforms". Gartner. Retrieved 25 October 2020.
  9. "Python eats away at R: Top Software for Analytics, Data Science, Machine Learning in 2018: Trends and Analysis". www.kdnuggets.com. Retrieved 2018-10-05.
  10. Ingrid Lunden, “German Predictive Analytics Startup Rapid-I Rebrands As RapidMiner, Takes $5M From Open Ocean, Earlybird To Tackle The U.S. Market,” TechCrunch, November 4, 2013.
  11. Evan Quinn, “Is Rapid-I the Hidden Giant of Analytics?,” QuinnSight Research, June 17, 2013.
  12. "Five Questions With Michael Widenius - Founder And Original Developer Of MySQL : OpenSource Release Feed". 2009-03-13. Archived from the original on 2009-03-13. Retrieved 2023-10-22.