David Cournapeau

Last updated

David Cournapeau is a data scientist. He is the original author of the scikit-learn package, an open source machine learning library in the Python programming language. [1] [2]

Contents

Early life and education

Cournapeau graduated with a MSc in Electrical Engineering from Telecom Paristech, Paris in 2004, and obtained his PhD in Computer Science at Kyoto University, Japan, in the domain of speech recognition. [3]

Career

The scikit-learn project started as scikits.learn, a Google Summer of Code project by David Cournapeau. After having worked for Silveregg, a SaaS Japanese company delivering recommendation systems for Japanese online retailers, [3] he worked for 6 years at Enthought, a scientific consulting company. He joined Cogent Labs, a Japanese Deep Learning/AI company, in 2017. [4] He is a Machine Learning Engineering Manager at Mercari, Inc. [5]

Cournapeau has also been involved in the development of other central numerical Python libraries: NumPy and SciPy. [6] [7]

Related Research Articles

<span class="mw-page-title-main">SciPy</span> Open-source Python library for scientific computing

SciPy is a free and open-source Python library used for scientific computing and technical computing.

<span class="mw-page-title-main">NumPy</span> Python library for numerical programming

NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. The predecessor of NumPy, Numeric, was originally created by Jim Hugunin with contributions from several other developers. In 2005, Travis Oliphant created NumPy by incorporating features of the competing Numarray into Numeric, with extensive modifications. NumPy is open-source software and has many contributors. NumPy is a NumFOCUS fiscally sponsored project.

In computer software, a general-purpose programming language (GPL) is a programming language for building software in a wide variety of application domains. Conversely, a domain-specific programming language (DSL) is used within a specific area. For example, Python is a GPL, while SQL is a DSL for querying relational databases.

<span class="mw-page-title-main">Cython</span> Programming language

Cython is a superset of the programming language Python, which allows developers to write Python code that yields performance comparable to that of C.

Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) is a matrix-free method for finding the largest eigenvalues and the corresponding eigenvectors of a symmetric generalized eigenvalue problem

scikit-learn Python library for machine learning

scikit-learn is a free software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support-vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy. Scikit-learn is a NumFOCUS fiscally sponsored project.

<span class="mw-page-title-main">Astropy</span> Python language software

Astropy is a collection of software packages written in the Python programming language and designed for use in astronomy. The software is a single, free, core package for astronomical utilities due to the increasingly widespread usage of Python by astronomers, and to foster interoperability between various extant Python astronomy packages. Astropy is included in several large Python distributions; it is part of package managers for Linux and macOS, the Anaconda Python Distribution, Enthought Canopy and Ureka.

scikit-image is an open-source image processing library for the Python programming language. It includes algorithms for segmentation, geometric transformations, color space manipulation, analysis, filtering, morphology, feature detection, and more. It is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.

John D. Hunter was an American neurobiologist and the original author of Matplotlib.

In statistics, the graphical lasso is a sparsepenalized maximum likelihood estimator for the concentration or precision matrix of a multivariate elliptical distribution. The original variant was formulated to solve Dempster's covariance selection problem for the multivariate Gaussian distribution when observations were limited. Subsequently, the optimization algorithms to solve this problem were improved and extended to other types of estimators and distributions.

<span class="mw-page-title-main">Berkeley Institute for Data Science</span>

The Berkeley Institute for Data Science (BIDS) is a central hub of research and education within University of California, Berkeley designed to facilitate data-intensive science and earn grants to be disseminated within the sciences. BIDS was initially funded by grants from the Gordon and Betty Moore Foundation and the Sloan Foundation as part of a three-year grant with data science institutes at New York University and the University of Washington. The objective of the three-university initiative is to bring together domain experts from the natural and social sciences, along with methodological experts from computer science, statistics, and applied mathematics. The organization has an executive director and a faculty director, Saul Perlmutter, who won the 2011 Nobel Prize in Physics. The initiative was announced at a White House Office of Science and Technology Policy event to highlight and promote advances in data-driven scientific discovery, and is a core component of the National Science Foundation's strategic plan for building national capacity in data science.

Travis Oliphant is an American data scientist and businessman. He is a co-founder of NumFOCUS, 501(c)(3) nonprofit charity in the United States, and sits on its advisory board. He is also a founder of technology startup Anaconda. In addition, Travis is the primary creator of NumPy and founding contributor to the SciPy packages in the Python programming language.

PyMC is a probabilistic programming language written in Python. It can be used for Bayesian statistical modeling and probabilistic machine learning.

<span class="mw-page-title-main">Dask (software)</span> Python library for parallel computing

Dask is an open-source Python library for parallel computing. Dask scales Python code from multi-core local machines to large distributed clusters in the cloud. Dask provides a familiar user interface by mirroring the APIs of other libraries in the PyData ecosystem including: Pandas, scikit-learn and NumPy. It also exposes low-level APIs that help programmers run custom algorithms in parallel.

scikit-multiflow Machine learning library for data streams in Python

scikit-mutliflow is a free and open source software machine learning library for multi-output/multi-label and stream data written in Python.

CuPy is an open source library for GPU-accelerated computing with Python programming language, providing support for multi-dimensional arrays, sparse matrices, and a variety of numerical algorithms implemented on top of them. CuPy shares the same API set as NumPy and SciPy, allowing it to be a drop-in replacement to run NumPy/SciPy code on GPU. CuPy supports NVIDIA CUDA GPU platform, and AMD ROCm GPU platform starting in v9.0.

References

  1. "David Cournapeau - Google Scholar". scholar.google.co.in. Retrieved 2016-01-13.
  2. Fabian Pedregosa; Gaël Varoquaux; Alexandre Gramfort; Vincent Michel; Bertrand Thirion; Olivier Grisel; Mathieu Blondel; Peter Prettenhofer; Ron Weiss; Vincent Dubourg; Jake Vanderplas; Alexandre Passos; David Cournapeau (2011). "Scikit-learn: Machine Learning in Python". Journal of Machine Learning Research. 12: 2825–2830. arXiv: 1201.0490 . Bibcode:2011JMLR...12.2825P.
  3. 1 2 "Developers team: Enthought Scientific Computing Solutions". www.enthought.com. Archived from the original on 2016-03-11. Retrieved 2016-01-13.
  4. "Research team: Cogent Labs". www.cogent.co.jp. 9 August 2018. Retrieved 2019-01-18.
  5. https://www.linkedin.com/in/cournapeau-david-76782410/.{{cite web}}: Missing or empty |title= (help)
  6. Charles R Harris; K. Jarrod Millman; Stéfan J. van der Walt; et al. (16 September 2020). "Array programming with NumPy" (PDF). Nature . 585 (7825): 357–362. arXiv: 2006.10256 . doi:10.1038/S41586-020-2649-2. ISSN   1476-4687. PMC   7759461 . PMID   32939066. Wikidata   Q99413970.
  7. Pauli Virtanen; Ralf Gommers; Travis E. Oliphant; et al. (3 February 2020). "SciPy 1.0: fundamental algorithms for scientific computing in Python" (PDF). Nature Methods . 17 (3): 261–272. doi:10.1038/S41592-019-0686-2. ISSN   1548-7091. PMC   7056644 . PMID   32015543. Wikidata   Q84573952.{{cite journal}}: |author35= has generic name (help)CS1 maint: numeric names: authors list (link) (erratum)