RapidMiner

Last updated
RapidMiner
Developer(s) RapidMiner
Initial release2006;18 years ago (2006)
Stable release
10.1 / 31 January 2023;20 months ago (2023-01-31)
Operating system Cross-platform
Type Data science, machine learning, predictive analytics
License Professional and Enterprise Editions are Proprietary; Free Edition (10,000 rows and 1 logical processor limit) is available as AGPL
Website rapidminer.com

RapidMiner is a data science platform that analyses the collective impact of an organization's data. It was acquired by Altair Engineering in September 2022. [1]

Contents

History

RapidMiner, formerly known as YALE (Yet Another Learning Environment), was developed by Ralf Klinkenberg, Ingo Mierswa, and Simon Fischer in 2001 at the Artificial Intelligence Unit of the Technical University of Dortmund. [2] Starting in 2006, its development was driven by Rapid-I, a company founded by Ingo Mierswa and Ralf Klinkenberg in the same year. [3] In 2013, the company rebranded from Rapid-I to RapidMiner. [4]

Description

RapidMiner uses a client/server model with the server offered either on-premises or in public or private cloud infrastructures.

RapidMiner provides data mining and machine learning procedures including: data loading and transformation (ETL), data preprocessing and visualization, predictive analytics and statistical modeling, evaluation, and deployment. RapidMiner is written in the Java programming language. RapidMiner provides a GUI to design and execute analytical workflows. Those workflows are called “Processes” in RapidMiner and they consist of multiple “Operators”. Each operator performs a single task within the process, and the output of each operator forms the input of the next one. Alternatively, the engine can be called from other programs or used as an API. Individual functions can be called from the command line. RapidMiner provides a variety of learning schemes, models, and algorithms that can be extended using R and Python scripts. [5]

RapidMiner can also use plugins available through the RapidMiner Marketplace. The RapidMiner Marketplace is a platform for developers to create data analysis algorithms and publish them to the community. [6]

The RapidMiner Studio Free Edition, which is limited to one logical processor and 10,000 data rows, is available under the AGPL license. [7]

Adoption

The report noted that RapidMiner provides deep and broad modeling capabilities for automated end-to-end model development. In the 2018 annual software poll, KD-nuggets readers voted RapidMiner as one of the most popular data analytics software with the poll’s respondents citing the software package as the tool they use. [8] RapidMiner has received millions of total downloads and has over 400,000 users including BMW, Intel, Cisco, GE, and Samsung as paying customers. RapidMiner claims to be the market leader in the software for data science platforms against competitors such as SAS and IBM. [9]

Development

About 50 developers worldwide participated in the development of the open-source RapidMiner with the majority of the contributors being employees of RapidMiner. [10] The company that develops RapidMiner received a $16 million Series C funding with participation from venture capital firms Nokia Growth Partners, Ascent Venture Partners, Longworth Venture Partners, Earlybird Venture Capital and Open-Ocean. Open-Ocean partner Michael "Monty" Widenius is one of the founders of MySQL. [11]

Components

The RapidMiner data science platform consists of the following main components: [12] RapidMiner Studio, RapidMiner AI Hub and RapidMiner Go which can be deployed as a part of the AI Hub. This video explains the links between the main elements and advises on the suitability of each component for different user groups and use cases.

Related Research Articles

<span class="mw-page-title-main">IBM Db2</span> Relational model database server

Db2 is a family of data management products, including database servers, developed by IBM. It initially supported the relational model, but was extended to support object–relational features and non-relational structures like JSON and XML. The brand name was originally styled as DB2 until 2017, when it changed to its present form.

<span class="mw-page-title-main">SAS (software)</span> Statistical software

SAS is a statistical software suite developed by SAS Institute for data management, advanced analytics, multivariate analysis, business intelligence, criminal investigation, and predictive analytics. SAS' analytical software is built upon artificial intelligence and utilizes machine learning, deep learning and generative AI to manage and model data. The software is widely used in industries such as finance, insurance, health care and education.

<span class="mw-page-title-main">Michael Widenius</span> Finnish software programmer

Ulf Michael Widenius, also known as Monty, is a Finnish software programmer. He is the main author of the original version of the open source MySQL database, a founding member of the MySQL AB company and CTO of the MariaDB Corporation AB. Additionally, he is a founder and general partner at venture capital firm OpenOcean.

<span class="mw-page-title-main">Weka (software)</span> Suite of machine learning software written in Java

Waikato Environment for Knowledge Analysis (Weka) is a collection of machine learning and data analysis free software licensed under the GNU General Public License. It was developed at the University of Waikato, New Zealand and is the companion software to the book "Data Mining: Practical Machine Learning Tools and Techniques".

<span class="mw-page-title-main">Altair Engineering</span> American multinational company

Altair Engineering Inc. is an American multinational information technology company headquartered in Troy, Michigan. It provides software and cloud solutions for simulation, IoT, high performance computing (HPC), data analytics, and artificial intelligence (AI). Altair Engineering is the creator of the HyperWorks CAE software product, among numerous other software packages and suites. The company was founded in 1985 and went public in 2017. It is traded on the Nasdaq stock exchange under the stock ticker symbol ALTR. Altair develops and provides software and cloud services for product development, high-performance computing (HPC), simulation, artificial intelligence, and data intelligence.

Oracle Data Mining (ODM) is an option of Oracle Database Enterprise Edition. It contains several data mining and data analysis algorithms for classification, prediction, regression, associations, feature selection, anomaly detection, feature extraction, and specialized analytics. It provides means for the creation, management and operational deployment of data mining models inside the database environment.

<span class="mw-page-title-main">World Programming System</span> Data analysis software

The World Programming System, also known as WPS Analytics or WPS, is a software product developed by a company called World Programming.

KNIME, the Konstanz Information Miner, is a free and open-source data analytics, reporting and integration platform. KNIME integrates various components for machine learning and data mining through its modular data pipelining "Building Blocks of Analytics" concept. A graphical user interface and use of JDBC allows assembly of nodes blending different data sources, including preprocessing, for modeling, data analysis and visualization without, or with minimal, programming.

<span class="mw-page-title-main">MariaDB</span> Database management system

MariaDB is a community-developed, commercially supported fork of the MySQL relational database management system (RDBMS), intended to remain free and open-source software under the GNU General Public License. Development is led by some of the original developers of MySQL, who forked it due to concerns over its acquisition by Oracle Corporation in 2009, but in 2024 MariaDB was itself bought by the K1 private equity group, which appointed a new CEO.

Rexer Analytics’s Annual Data Miner Survey is the largest survey of data mining, data science, and analytics professionals in the industry. It consists of approximately 50 multiple choice and open-ended questions that cover seven general areas of data mining science and practice: (1) Field and goals, (2) Algorithms, (3) Models, (4) Tools, (5) Technology, (6) Challenges, and (7) Future. It is conducted as a service to the data mining community, and the results are usually announced at the PAW conferences and shared via freely available summary reports. In the 2013 survey, 1259 data miners from 75 countries participated. After 2011, Rexer Analytics moved to a biannual schedule.

Revolution Analytics is a statistical software company focused on developing open source and "open-core" versions of the free and open source software R for enterprise, academic and analytics customers. Revolution Analytics was founded in 2007 as REvolution Computing providing support and services for R in a model similar to Red Hat's approach with Linux in the 1990s as well as bolt-on additions for parallel processing. In 2009 the company received nine million in venture capital from Intel along with a private equity firm and named Norman H. Nie as their new CEO. In 2010 the company announced the name change as well as a change in focus. Their core product, Revolution R, would be offered free to academic users and their commercial software would focus on big data, large scale multiprocessor computing, and multi-core functionality.

<span class="mw-page-title-main">Datadog</span> American technology company

Datadog, Inc. is an American company that provides an observability service for cloud-scale applications, providing monitoring of servers, databases, tools, and services, through a SaaS-based data analytics platform. Founded and headquartered in New York City, the company is a publicly traded entity on the Nasdaq stock exchange. The mascot is a dog named Bits.

<span class="mw-page-title-main">Carto (company)</span> Cloud computing platform

CARTO is a software as a service (SaaS) spatial analysis platform that provides GIS, web mapping, data visualization, spatial analytics, and spatial data science features. The company is positioned as a Location Intelligence platform due to its tools for geospatial data analysis and visualization that do not require advanced GIS or development experience. As a cloud-native platform, CARTO runs natively on cloud data warehouse platforms overcoming any previous limits on data scale for spatial workloads.

<span class="mw-page-title-main">Alpine Data Labs</span> Environment for analytics

Alpine Data Labs is an advanced analytics interface working with Apache Hadoop and big data. It provides a collaborative, visual environment to create and deploy analytics workflow and predictive models. This aims to make analytics more suitable for business analyst level staff, like sales and other departments using the data, rather than requiring a "data engineer" or "data scientist" who understands languages like MapReduce or Pig.

Google Cloud Platform (GCP) is a suite of cloud computing services offered by Google that provides a series of modular cloud services including computing, data storage, data analytics, and machine learning, alongside a set of management tools. It runs on the same infrastructure that Google uses internally for its end-user products, such as Google Search, Gmail, and Google Docs, according to Verma et al. Registration requires a credit card or bank account details.

<span class="mw-page-title-main">Oracle Cloud</span> Cloud computing service

Oracle Cloud is a cloud computing service offered by Oracle Corporation providing servers, storage, network, applications and services through a global network of Oracle Corporation managed data centers. The company allows these services to be provisioned on demand over the Internet.

Deep Learning Studio is a software tool that aims to simplify the creation of deep learning models used in artificial intelligence. It is compatible with a number of open-source programming frameworks popularly used in artificial neural networks, including MXNet and Google's TensorFlow.

References

  1. Altair. "Altair Announces Completion of Acquisition of RapidMiner". www.prnewswire.com. Retrieved 2022-10-01.
  2. Guido Deutsch, “RapidMiner from Rapid-I at CeBIT 2010 Archived 2020-01-24 at the Wayback Machine ,” Data Mining Blog, March 18, 2010.
  3. Interview with RapidMiner's Ingo Mierswa, Ralf Klinkenberg”, KDnuggets, February, 2010.
  4. German Predictive Analytics Startup Rapid-I Rebrands As RapidMiner”, TechCrunch, November 4, 2013.
  5. David Norris, “RapidMiner - a potential game changer,” Bloor Research, November 13, 2013.
  6. Ajay Ohri, “Interview with Rapid-I Ingo Mierswa and Simon Fischer,” KDnuggets, August 2011.
  7. RapidMiner Embraces its Community and Open Source Culture Delivering Get-More-Open-Core Predictive Analytics, September 1, 2015.
  8. "Python eats away at R: Top Software for Analytics, Data Science, Machine Learning in 2018: Trends and Analysis". www.kdnuggets.com. Retrieved 2018-10-05.
  9. Ingrid Lunden, “German Predictive Analytics Startup Rapid-I Rebrands As RapidMiner, Takes $5M From Open Ocean, Earlybird To Tackle The U.S. Market,” TechCrunch, November 4, 2013.
  10. Evan Quinn, “Is Rapid-I the Hidden Giant of Analytics?,” QuinnSight Research, June 17, 2013.
  11. "Five Questions With Michael Widenius - Founder And Original Developer Of MySQL : OpenSource Release Feed". 2009-03-13. Archived from the original on 2009-03-13. Retrieved 2023-10-22.
  12. "Machine Learning and RapidMiner Tutorials | Altair Engineering Inc. Academy". academy.rapidminer.com. Retrieved 2024-08-12.