Original author(s) | Microsoft |
---|---|
Initial release | 2017 |
Written in | Python |
Platform | Windows, Linux |
Available in | Python |
Website | docs |
revoscalepy is a machine learning package in Python created by Microsoft. It is available as part of Machine Learning Services in Microsoft SQL Server 2017 and Machine Learning Server 9.2.0 and later. [1]
The package contains functions for creating linear model, logistic regression, random forest, decision tree and boosted decision tree, in addition to some summary functions for inspecting data. [2] Other machine learning algorithms such as neural network are provided in microsoftml, a separate package that is the Python version of MicrosoftML. [3]
revoscalepy also contains functions designed to run machine learning algorithms in different compute contexts, including SQL Server, Apache Spark, and Hadoop. [2]
In June 2021, Microsoft announced to open source the revoscalepy and RevoScaleR packages, making them freely available under the MIT License. [4]
MySQL is an open-source relational database management system (RDBMS). Its name is a combination of "My", the name of co-founder Michael Widenius's daughter, and "SQL", the abbreviation for Structured Query Language. A relational database organizes data into one or more data tables in which data types may be related to each other; these relations help structure the data. SQL is a language programmers use to create, modify and extract data from the relational database, as well as control user access to the database. In addition to relational databases and SQL, an RDBMS like MySQL works with an operating system to implement a relational database in a computer's storage system, manages users, allows for network access and facilitates testing database integrity and creation of backups.
Db2 is a family of data management products, including database servers, developed by IBM. They initially supported the relational model, but were extended to support object–relational features and non-relational structures like JSON and XML. The brand name was originally styled as DB/2, then DB2 until 2017 and finally changed to its present form.
R is a programming language and free software environment for statistical computing and graphics supported by the R Core Team and the R Foundation for Statistical Computing. The R language is widely used among statisticians and data miners for developing statistical software and data analysis. Polls, data mining surveys, and studies of scholarly literature databases show substantial increases in R's popularity; as of July 2021, R ranks 12th in the TIOBE index, a measure of popularity of programming languages.
The following tables compare general and technical information for a number of relational database management systems. Please see the individual products' articles for further information. Unless otherwise specified in footnotes, comparisons are based on the stable versions without any add-ons, extensions or external programs.
A pivot table is a table of grouped values that aggregates the individual items of a more extensive table within one or more discrete categories. This summary might include sums, averages, or other statistics, which the pivot table groups together using a chosen aggregation function applied to the grouped values.
SQL Server Integration Services (SSIS) is a component of the Microsoft SQL Server database software that can be used to perform a broad range of data migration tasks.
Waikato Environment for Knowledge Analysis (Weka), developed at the University of Waikato, New Zealand, is free software licensed under the GNU General Public License, and the companion software to the book "Data Mining: Practical Machine Learning Tools and Techniques".
RapidMiner is a data science software platform developed by the company of the same name that provides an integrated environment for data preparation, machine learning, deep learning, text mining, and predictive analytics. It is used for business and commercial applications as well as for research, education, training, rapid prototyping, and application development and supports all steps of the machine learning process including data preparation, results visualization, model validation and optimization. RapidMiner is developed on an open core model.
Oracle Data Mining (ODM) is an option of Oracle Database Enterprise Edition. It contains several data mining and data analysis algorithms for classification, prediction, regression, associations, feature selection, anomaly detection, feature extraction, and specialized analytics. It provides means for the creation, management and operational deployment of data mining models inside the database environment.
Microsoft SQL Server is a relational database management system developed by Microsoft. As a database server, it is a software product with the primary function of storing and retrieving data as requested by other software applications—which may run either on the same computer or on another computer across a network. Microsoft markets at least a dozen different editions of Microsoft SQL Server, aimed at different audiences and for workloads ranging from small single-machine applications to large Internet-facing applications with many concurrent users.
An embedded database system is a database management system (DBMS) which is tightly integrated with an application software; it is "embedded in the application". It is actually a broad technology category that includes
VHD and its successor VHDx are file formats representing a virtual hard disk drive (HDD). They may contain what is found on a physical HDD, such as disk partitions and a file system, which in turn can contain files and folders. They are typically used as the hard disk of a virtual machine, are built into modern versions of Windows, and are the native file format for Microsoft's hypervisor, Hyper-V.
Microsoft Azure, commonly referred to as Azure, is a cloud computing service created by Microsoft for building, testing, deploying, and managing applications and services through Microsoft-managed data centers. It provides software as a service (SaaS), platform as a service (PaaS) and infrastructure as a service (IaaS) and supports many different programming languages, tools, and frameworks, including both Microsoft-specific and third-party software and systems.
SOFA Statistics is an open-source statistical package. The name stands for Statistics Open For All. It has a graphical user interface and can connect directly to MySQL, PostgreSQL, SQLite, MS Access (mdb), and Microsoft SQL Server. Data can also be imported from CSV and Tab-Separated files or spreadsheets. The main statistical tests available are Independent and Paired t-tests, Wilcoxon signed ranks, Mann–Whitney U, Pearson's chi squared, Kruskal Wallis H, one-way ANOVA, Spearman's R, and Pearson's R. Nested tables can be produced with row and column percentages, totals, sd, mean, median, lower and upper quartiles, and sum.
RStudio is an Integrated Development Environment (IDE) for R, a programming language for statistical computing and graphics. It is available in two formats: RStudio Desktop is a regular desktop application while RStudio Server runs on a remote server and allows accessing RStudio using a web browser.
Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming entire clusters with implicit data parallelism and fault tolerance. Originally developed at the University of California, Berkeley's AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since.
Google Cloud Platform (GCP), offered by Google, is a suite of cloud computing services that runs on the same infrastructure that Google uses internally for its end-user products, such as Google Search, Gmail, file storage, and YouTube. Alongside a set of management tools, it provides a series of modular cloud services including computing, data storage, data analytics and machine learning. Registration requires a credit card or bank account details.
The history of Microsoft SQL Server begins with the first Microsoft SQL Server database product - SQL Server v1.0, a 16-bit Relational Database for the OS/2 operating system, released in 1989.
RevoScaleR is a machine learning package in R created by Microsoft. It is available as part of Machine Learning Server, Microsoft R Client, and Machine Learning Services in Microsoft SQL Server 2016.
LightGBM, short for Light Gradient Boosting Machine, is a free and open source distributed gradient boosting framework for machine learning originally developed by Microsoft. It is based on decision tree algorithms and used for ranking, classification and other machine learning tasks. The development focus is on performance and scalability.