Scikit-learn

scikit-learn
Original author(s)	David Cournapeau
Initial release	June 2007;17 years ago
Stable release	1.6.1 / 10 January 2025;2 months ago
Repository	github.com/scikit-learn/scikit-learn ;
Written in	Python, Cython, C and C++
Operating system	Linux, macOS, Windows
Type	Library for machine learning
License	New BSD License
Website	scikit-learn.org

Last updated April 04, 2025

scikit-learn (formerly scikits.learn and also known as sklearn) is a free and open-source machine learning library for the Python programming language.^[3] It features various classification, regression and clustering algorithms including support-vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy. Scikit-learn is a NumFOCUS fiscally sponsored project.^[4]

Overview

The scikit-learn project started as scikits.learn, a Google Summer of Code project by French data scientist David Cournapeau. The name of the project stems from the notion that it is a "SciKit" (SciPy Toolkit), a separately developed and distributed third-party extension to SciPy.^[5] The original codebase was later rewritten by other developers.^{[ who? ]} In 2010, contributors Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort and Vincent Michel, from the French Institute for Research in Computer Science and Automation in Saclay, France, took leadership of the project and released the first public version of the library on February 1, 2010.^[6] In November 2012, scikit-learn as well as scikit-image were described as two of the "well-maintained and popular" scikits libraries^[update].^[7] In 2019, it was noted that scikit-learn is one of the most popular machine learning libraries on GitHub.^[8]

Features

Large catalogue of well-established machine learning algorithms and data pre-processing methods (i.e. feature engineering)
Utility methods for common data-science tasks, such as splitting data into train and test sets, cross-validation and grid search
Consistent way of running machine learning models (estimator.fit() and estimator.predict()), which libraries can implement
Declarative way of structuring a data science process (the Pipeline), including data pre-processing and model fitting

Examples

Fitting a random forest classifier:

>>>fromsklearn.ensembleimportRandomForestClassifier>>>classifier=RandomForestClassifier(random_state=0)>>>X=[[1,2,3],# 2 samples, 3 features...[11,12,13]]>>>y=[0,1]# classes of each sample>>>classifier.fit(X,y)RandomForestClassifier(random_state=0)

Implementation

scikit-learn is largely written in Python, and uses NumPy extensively for high-performance linear algebra and array operations. Furthermore, some core algorithms are written in Cython to improve performance. Support vector machines are implemented by a Cython wrapper around LIBSVM; logistic regression and linear support vector machines by a similar wrapper around LIBLINEAR. In such cases, extending these methods with Python may not be possible.

scikit-learn integrates well with many other Python libraries, such as Matplotlib and plotly for plotting, NumPy for array vectorization, Pandas dataframes, SciPy, and many more.

Version history

scikit-learn was initially developed by David Cournapeau as a Google Summer of Code project in 2007. Later that year, Matthieu Brucher joined the project and started to use it as a part of his thesis work. In 2010, INRIA, the French Institute for Research in Computer Science and Automation, got involved and the first public release (v0.1 beta) was published in late January 2010.

August 2013. scikit-learn 0.14^[9]
July 2014. scikit-learn 0.15.0^[9]
March 2015. scikit-learn 0.16.0^[9]
November 2015. scikit-learn 0.17.0^[9]
September 2016. scikit-learn 0.18.0
July 2017. scikit-learn 0.19.0
September 2018. scikit-learn 0.20.0^[10]
May 2019. scikit-learn 0.21.0^[11]
December 2019. scikit-learn 0.22^[12]
May 2020. scikit-learn 0.23.0^[13]
Jan 2021. scikit-learn 0.24^[14]
September 2021. scikit-learn 1.0.0^[15]
October 2021. scikit-learn 1.0.1^[16]
December 2021. scikit-learn 1.0.2^[17]
May 2022. scikit-learn 1.1.0^[18]
May 2022. scikit-learn 1.1.1^[19]
August 2022. scikit-learn 1.1.2^[20]
October 2022. scikit-learn 1.1.3^[21]
December 2022. scikit-learn 1.2.0^[22]
January 2023. scikit-learn 1.2.1^[23]
March 2023. scikit-learn 1.2.2^[24]

Awards

2019 Inria-French Academy of Sciences-Dassault Systèmes Innovation Prize^[25]
2022 Open Science Award for Open Source Research Software^[26]

scikit-learn alternatives

References

↑ "Release 1.6.1". 10 January 2025. Retrieved 29 January 2025.
↑ "The scikit-learn Open Source Project on Open Hub: Languages Page". Open Hub. Retrieved 14 July 2018.
↑ Fabian Pedregosa; Gaël Varoquaux; Alexandre Gramfort; Vincent Michel; Bertrand Thirion; Olivier Grisel; Mathieu Blondel; Peter Prettenhofer; Ron Weiss; Vincent Dubourg; Jake Vanderplas; Alexandre Passos; David Cournapeau; Matthieu Perrot; Édouard Duchesnay (2011). "scikit-learn: Machine Learning in Python". Journal of Machine Learning Research. 12: 2825–2830.
↑ "NumFOCUS Sponsored Projects". NumFOCUS. Retrieved 2021-10-25.
↑ Dreijer, Janto. "scikit-learn".
↑ "About us — scikit-learn 0.20.1 documentation". scikit-learn.org.
↑ Eli Bressert (2012). SciPy and NumPy: an overview for developers. O'Reilly. p. 43. ISBN 978-1-4493-6162-4.
↑ "The State of the Octoverse: machine learning". The GitHub Blog. GitHub. 2019-01-24. Retrieved 2019-10-17.
1 2 3 4 "Release history — scikit-learn 0.19.dev0 documentation". scikit-learn.org. Retrieved 2017-02-27.
↑ "Release History - 0.20.0 documentation". scikit-learn. Retrieved 6 November 2018.
↑ "Release History - 0.21.0 documentation". scikit-learn. Retrieved 5 May 2019.
↑ "Release History - 0.22 documentation". scikit-learn. Retrieved 7 June 2020.
↑ "Release History - 0.23.0 documentation". scikit-learn. Retrieved 7 June 2020.
↑ "Release History - 0.24 documentation", scikit-learn, retrieved 2021-02-08
↑ "Release History - 1.0.0 documentation". scikit-learn.
↑ "Release History - 1.0.1 documentation". scikit-learn.
↑ "Release History - 1.0.2 documentation". scikit-learn.
↑ "Release History - 1.1.0 documentation". scikit-learn.
↑ "Release History - 1.1.1 documentation". scikit-learn.
↑ "Release History - 1.1.2 documentation". scikit-learn.
↑ "Release History - 1.1.3 documentation". scikit-learn.
↑ "Release History - 1.2.0 documentation". scikit-learn.
↑ "Release History - 1.2.1 documentation". scikit-learn.
↑ "Release History - 1.2.2 documentation". scikit-learn.
↑ "The 2019 Inria-French Academy of Sciences-Dassault Systèmes Innovation Prize : scikit-learn , a success story for machine learning free software | Inria". www.inria.fr. Retrieved 2025-03-19.
↑ Badolato, Anne-Marie (2022-02-07). "Open Science Awards for Open Source Research Software". Ouvrir la Science. Retrieved 2025-03-19.

External links

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[wikidata-052ecc87d99e0902705a2000918444846295acb8-v18-1] "Release 1.6.1". 10 January 2025. Retrieved 29 January 2025.

[wikidata-8841d81671926d02174fc8206c98988633ebc55e-v18-2] "The scikit-learn Open Source Project on Open Hub: Languages Page". Open Hub. Retrieved 14 July 2018.

[jmlr-3] Fabian Pedregosa; Gaël Varoquaux; Alexandre Gramfort; Vincent Michel; Bertrand Thirion; Olivier Grisel; Mathieu Blondel; Peter Prettenhofer; Ron Weiss; Vincent Dubourg; Jake Vanderplas; Alexandre Passos; David Cournapeau; Matthieu Perrot; Édouard Duchesnay (2011). "scikit-learn: Machine Learning in Python". Journal of Machine Learning Research. 12: 2825–2830.

[4] "NumFOCUS Sponsored Projects". NumFOCUS. Retrieved 2021-10-25.

[5] Dreijer, Janto. "scikit-learn".

[6] "About us — scikit-learn 0.20.1 documentation". scikit-learn.org.

[7] Eli Bressert (2012). SciPy and NumPy: an overview for developers. O'Reilly. p. 43. ISBN 978-1-4493-6162-4.

[8] "The State of the Octoverse: machine learning". The GitHub Blog. GitHub. 2019-01-24. Retrieved 2019-10-17.

[:0-9] 1 2 3 4 "Release history — scikit-learn 0.19.dev0 documentation". scikit-learn.org. Retrieved 2017-02-27.

[10] "Release History - 0.20.0 documentation". scikit-learn. Retrieved 6 November 2018.

[11] "Release History - 0.21.0 documentation". scikit-learn. Retrieved 5 May 2019.

[12] "Release History - 0.22 documentation". scikit-learn. Retrieved 7 June 2020.

[13] "Release History - 0.23.0 documentation". scikit-learn. Retrieved 7 June 2020.

[14] "Release History - 0.24 documentation", scikit-learn, retrieved 2021-02-08

[15] "Release History - 1.0.0 documentation". scikit-learn.

[16] "Release History - 1.0.1 documentation". scikit-learn.

[17] "Release History - 1.0.2 documentation". scikit-learn.

[18] "Release History - 1.1.0 documentation". scikit-learn.

[19] "Release History - 1.1.1 documentation". scikit-learn.

[20] "Release History - 1.1.2 documentation". scikit-learn.

[21] "Release History - 1.1.3 documentation". scikit-learn.

[22] "Release History - 1.2.0 documentation". scikit-learn.

[23] "Release History - 1.2.1 documentation". scikit-learn.

[24] "Release History - 1.2.2 documentation". scikit-learn.

[25] "The 2019 Inria-French Academy of Sciences-Dassault Systèmes Innovation Prize : scikit-learn , a success story for machine learning free software | Inria". www.inria.fr. Retrieved 2025-03-19.

[26] Badolato, Anne-Marie (2022-02-07). "Open Science Awards for Open Source Research Software". Ouvrir la Science. Retrieved 2025-03-19.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

v t e Differentiable computing
General	Differentiable programming Information geometry Statistical manifold Automatic differentiation Neuromorphic computing Pattern recognition Ricci calculus Computational learning theory Inductive bias
Hardware	IPU TPU VPU Memristor SpiNNaker
Software libraries	TensorFlow PyTorch Keras scikit-learn Theano JAX Flux.jl MindSpore
Portals Computer programming Technology