Scikit-learn

Last updated
scikit-learn
Original author David Cournapeau
Developer Google Summer of Code project
Initial releaseJune 2007;18 years ago (2007-06)
Stable release
1.8.0 [1] / 10 December 2025;32 days ago (10 December 2025)
Repository
Written in Python, Cython, C and C++ [2]
Operating system Linux, macOS, Windows
Type Library for machine learning
License New BSD License
Website scikit-learn.org

scikit-learn (formerly scikits.learn and also known as sklearn) is a free and open-source machine learning library for the Python programming language. [3] It features various classification, regression and clustering algorithms including support-vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy. Scikit-learn is a NumFOCUS fiscally sponsored project. [4]

Contents

Overview

The scikit-learn project started as scikits.learn, a Google Summer of Code project by French data scientist David Cournapeau. The name of the project derives from its role as a "scientific toolkit for machine learning", originally developed and distributed as a third-party extension to SciPy. [5] The original codebase was later rewritten by other developers.[ who? ] In 2010, contributors Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort and Vincent Michel, from the French Institute for Research in Computer Science and Automation in Saclay, France, took leadership of the project and released the first public version of the library on February 1, 2010. [6] In November 2012, scikit-learn as well as scikit-image were described as two of the "well-maintained and popular" scikits libraries. [7] In 2019, it was noted that scikit-learn is one of the most popular machine learning libraries on GitHub. [8] At that time, the project had over 1,400 contributors and the documentation received 42 million visits in 2018. [9] According to a 2022 Kaggle survey of nearly 24,000 respondents from 173 countries, scikit-learn was identified as the most widely used machine learning framework. [10]

Features

Examples

Fitting a random forest classifier:

>>>fromsklearn.ensembleimportRandomForestClassifier>>>classifier=RandomForestClassifier(random_state=0)>>>X=[[1,2,3],# 2 samples, 3 features...[11,12,13]]>>>y=[0,1]# classes of each sample>>>classifier.fit(X,y)RandomForestClassifier(random_state=0)

Implementation

scikit-learn is largely written in Python, and uses NumPy extensively for high-performance linear algebra and array operations. Furthermore, some core algorithms are written in Cython to improve performance. Support vector machines are implemented by a Cython wrapper around LIBSVM; logistic regression and linear support vector machines by a similar wrapper around LIBLINEAR. In such cases, extending these methods with Python may not be possible.

scikit-learn integrates well with many other Python libraries, such as Matplotlib and plotly for plotting, NumPy for array vectorization, Pandas dataframes, SciPy, and many more.

History

scikit-learn was initially developed by David Cournapeau as a Google Summer of Code project in 2007. Later that year, Matthieu Brucher joined the project and started to use it as a part of his thesis work. In 2010, INRIA, the French Institute for Research in Computer Science and Automation, got involved and the first public release (v0.1 beta) was published in late January 2010.

The project released its first stable version, 1.0.0, on September 24, 2021. [11] The release was the result of over 2,100 merged pull requests, approximately 800 of which were dedicated to improving documentation. [12] Development continues to focus on bug fixes, efficiency and feature expansion.

The latest version, 1.8, was released on December 10, 2025. [11] This update introduced native Array API support, enabling the library to perform GPU computations by directly using PyTorch and CuPy arrays. This version also included bug fixes, improvements and new features, such as efficiency improvements to the fit time of linear models. [13]

Applications

Scikit-learn is widely used across industries for a variety of machine learning tasks such as classification, regression, clustering, and model selection. The following are real-world applications of the library:

Finance and Insurance

Retail and E-Commerce

Media, Marketing, and Social Platforms

Technology

Academia

Awards

References

  1. "Release 1.8.0". 10 December 2025. Retrieved 11 December 2025.
  2. "The scikit-learn Open Source Project on Open Hub: Languages Page". Open Hub. Retrieved 14 July 2018.
  3. Fabian Pedregosa; Gaël Varoquaux; Alexandre Gramfort; Vincent Michel; Bertrand Thirion; Olivier Grisel; Mathieu Blondel; Peter Prettenhofer; Ron Weiss; Vincent Dubourg; Jake Vanderplas; Alexandre Passos; David Cournapeau; Matthieu Perrot; Édouard Duchesnay (2011). "scikit-learn: Machine Learning in Python". Journal of Machine Learning Research. 12: 2825–2830. arXiv: 1201.0490 . Bibcode:2011JMLR...12.2825P.
  4. "NumFOCUS Sponsored Projects". NumFOCUS. Retrieved 2021-10-25.
  5. Dreijer, Janto. "scikit-learn". Archived from the original on 2020-11-07. Retrieved 2015-03-04.
  6. "About us — scikit-learn 0.20.1 documentation". scikit-learn.org.
  7. Eli Bressert (2012). SciPy and NumPy: an overview for developers. O'Reilly. p. 43. ISBN   978-1-4493-6162-4.
  8. "The State of the Octoverse: machine learning". The GitHub Blog. GitHub. 2019-01-24. Retrieved 2019-10-17.
  9. "The 2019 Inria-French Academy of Sciences-Dassault Systèmes Innovation Prize : scikit-learn , a success story for machine learning free software". www.inria.fr. Retrieved 2026-01-10.
  10. "Kaggle Machine Learning & Data Science Survey 2022". Kaggle. Retrieved 2026-01-10.
  11. 1 2 "Release History for scikit-learn". pypi.org/. Retrieved 2026-01-10.
  12. "Release Highlights for scikit-learn 1.0". scikit-learn.org. Retrieved 2026-01-10.
  13. "Release Highlights for scikit-learn 1.8". scikit-learn.org. Retrieved 2026-01-10.
  14. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 "Testimonials". scikit-learn.org. Retrieved 2025-08-06.
  15. "The 2019 Inria-French Academy of Sciences-Dassault Systèmes Innovation Prize : scikit-learn , a success story for machine learning free software | Inria". www.inria.fr. Retrieved 2025-03-19.
  16. Badolato, Anne-Marie (2022-02-07). "Open Science Awards for Open Source Research Software". Ouvrir la Science. Retrieved 2025-03-19.