Wes McKinney

Last updated

Wes McKinney is an American software developer and businessman. He is the creator and "Benevolent Dictator for Life" (BDFL) of the open-source pandas package for data analysis in the Python programming language, and has also authored three versions of the reference book Python for Data Analysis. [1] [2] He was the CEO and founder of technology startup Datapad. He was a software engineer at Two Sigma Investments. He founded Ursa Labs, [3] which, in 2021, became part of Voltron Data. [4] In 2022, it was announced that Voltron Data had raised $110 million. [5]

Contents

Early life and education

McKinney graduated from MIT with a B.S. in Mathematics in 2007. [1] In 2010, he began a Ph.D program in Statistics at Duke University, but went on leave in 2011. [6]

Career

From 2007 to 2010, McKinney researched global macro and credit trading strategies at AQR Capital Management. During his time at AQR Capital, he learned Python and started building what would become pandas. [1] McKinney made the pandas project public in 2009. [6]

McKinney left AQR in 2010 to start a PhD in Statistics at Duke University. He went on leave from Duke in the summer of 2011 to devote more time to developing Pandas, [6] culminating in the writing of Python for Data Analysis in 2012.

In 2012, he co-founded Lambda Foundry Inc. [7]

McKinney co-founded Datapad with Chang She in January 2013, with McKinney as CEO. Datapad developed a data visualization product also on the Python stack targeting enterprise customers. Datapad was acquired by Cloudera in September 2014. [8] [9] McKinney joined the engineering team at Cloudera following the acquisition. He worked on an open-source project called Ibis, incubated within Cloudera Labs, aiming at using Python for big data problems. [10] In 2016, McKinney joined the investment fund Two Sigma Investments to work on Apache Arrow. In 2018, he launched Ursa Labs. [3] In 2023, he joined Posit (formerly RStudio) as a Principal Architect. [11]

Media coverage

McKinney has been interviewed by VentureBeat and others. [12] [13] [14] He frequently gives talks to the Python community. [15] [16]

Related Research Articles

Yacc is a computer program for the Unix operating system developed by Stephen C. Johnson. It is a lookahead left-to-right rightmost derivation (LALR) parser generator, generating a LALR parser based on a formal grammar, written in a notation similar to Backus–Naur form (BNF). Yacc is supplied as a standard utility on BSD and AT&T Unix. GNU-based Linux distributions include Bison, a forward-compatible Yacc replacement.

<span class="mw-page-title-main">SciPy</span> Open-source Python library for scientific computing

SciPy is a free and open-source Python library used for scientific computing and technical computing.

<span class="mw-page-title-main">NumPy</span> Python library for numerical programming

NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. The predecessor of NumPy, Numeric, was originally created by Jim Hugunin with contributions from several other developers. In 2005, Travis Oliphant created NumPy by incorporating features of the competing Numarray into Numeric, with extensive modifications. NumPy is open-source software and has many contributors. NumPy is a NumFOCUS fiscally sponsored project.

Cohort analysis is a kind of behavioral analytics that breaks the data in a data set into related groups before analysis. These groups, or cohorts, usually share common characteristics or experiences within a defined time-span. Cohort analysis allows a company to "see patterns clearly across the life-cycle of a customer, rather than slicing across all customers blindly without accounting for the natural cycle that a customer undergoes." By seeing these patterns of time, a company can adapt and tailor its service to those specific cohorts. While cohort analysis is sometimes associated with a cohort study, they are different and should not be viewed as one and the same. Cohort analysis is specifically the analysis of cohorts in regards to big data and business analytics, while in cohort study, data is broken down into similar groups.

<span class="mw-page-title-main">Matplotlib</span> Library for creating static, animated, and interactive visualizations in Python.

Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy. It provides an object-oriented API for embedding plots into applications using general-purpose GUI toolkits like Tkinter, wxPython, Qt, or GTK. There is also a procedural "pylab" interface based on a state machine, designed to closely resemble that of MATLAB, though its use is discouraged. SciPy makes use of Matplotlib.

<span class="mw-page-title-main">Greg Kroah-Hartman</span> American Linux kernel developer

Greg Kroah-Hartman is a major Linux kernel developer. As of April 2013, he is the Linux kernel maintainer for the -stable branch, the staging subsystem, USB, driver core, debugfs, kref, kobject, and the sysfs kernel subsystems, Userspace I/O, and TTY layer. He also created linux-hotplug, the udev project, and the Linux Driver Project. He worked for Novell in the SUSE Labs division and, as of 1 February 2012, works at the Linux Foundation.

<span class="mw-page-title-main">Doug Cutting</span> American information theorist

Douglass Read Cutting is a software designer, advocate, and creator of open-source search technology. He founded two technology projects, Lucene, and Nutch, with Mike Cafarella. Both projects are now managed through the Apache Software Foundation. Cutting and Cafarella are also the co-founders of Apache Hadoop.

<span class="mw-page-title-main">IPython</span> Advanced interactive shell for Python

IPython is a command shell for interactive computing in multiple programming languages, originally developed for the Python programming language, that offers introspection, rich media, shell syntax, tab completion, and history. IPython provides the following features:

Benevolent dictator for life (BDFL) is a title given to a small number of open-source software development leaders, typically project founders who retain the final say in disputes or arguments within the community. The phrase originated in 1995 with reference to Guido van Rossum, creator of the Python programming language.

Heroku is a cloud platform as a service (PaaS) supporting several programming languages. As one of the first cloud platforms, Heroku has been in development since June 2007, when it supported only the Ruby programming language, but now also supports Java, Node.js, Scala, Clojure, Python, PHP, and Go. For this reason, Heroku is said to be a polyglot platform as it has features for a developer to build, run and scale applications in a similar manner across most of these languages. Heroku was acquired by Salesforce in 2010 for $212 million.

pandas (software) Python library for data analysis

Pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. It is free software released under the three-clause BSD license. The name is derived from the term "panel data", an econometrics term for data sets that include observations over multiple time periods for the same individuals, as well as a play on the phrase "Python data analysis". Wes McKinney started building what would become Pandas at AQR Capital while he was a researcher there from 2007 to 2010.

<span class="mw-page-title-main">Hadley Wickham</span> New Zealand statistician

Hadley Alexander Wickham is a New Zealand statistician known for his work on open-source software for the R statistical programming environment. He is the chief scientist at Posit, PBC and an adjunct professor of statistics at the University of Auckland, Stanford University, and Rice University. His work includes the data visualisation system ggplot2 and the tidyverse, a collection of R packages for data science based on the concept of tidy data.

Apache Phoenix is an open source, massively parallel, relational database engine supporting OLTP for Hadoop using Apache HBase as its backing store. Phoenix provides a JDBC driver that hides the intricacies of the NoSQL store enabling users to create, delete, and alter SQL tables, views, indexes, and sequences; insert and delete rows singly and in bulk; and query data through SQL. Phoenix compiles queries and other statements into native NoSQL store APIs rather than using MapReduce enabling the building of low latency applications on top of NoSQL stores.

<span class="mw-page-title-main">Hilary Mason (entrepreneur)</span> American entrepreneur

Hilary Mason is an American entrepreneur and data scientist. She is the co-founder of the startup Fast Forward Labs.

Jeff Hammerbacher is a data scientist. He was chief scientist and cofounder at Cloudera and later served on the faculty of the Icahn School of Medicine at Mount Sinai.

David Cournapeau is a data scientist. He is the original author of the scikit-learn package, an open source machine learning library in the Python programming language.

Apache Parquet is a free and open-source column-oriented data storage format in the Apache Hadoop ecosystem. It is similar to RCFile and ORC, the other columnar-storage file formats in Hadoop, and is compatible with most of the data processing frameworks around Hadoop. It provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk.

Apache Arrow is a language-agnostic software framework for developing data analytics applications that process columnar data. It contains a standardized column-oriented memory format that is able to represent flat and hierarchical data for efficient analytic operations on modern CPU and GPU hardware. This reduces or eliminates factors that limit the feasibility of working with large sets of data, such as the cost, volatility, or physical constraints of dynamic random-access memory.

The tidyverse is a collection of open source packages for the R programming language introduced by Hadley Wickham and his team that "share an underlying design philosophy, grammar, and data structures" of tidy data. Characteristic features of tidyverse packages include extensive use of non-standard evaluation and encouraging piping.

References

  1. 1 2 3 McKinney, Wes (2013). Python for Data Analysis (1st ed.). Sebastopol, Calif.: O'Reilly. ISBN   978-1449319793.
  2. McKinney, Wes (2017). Python for Data Analysis (2nd ed.). Sebastopol, Calif.: O'Reilly. ISBN   978-1491957660.
  3. 1 2 "Announcing Ursa Labs: An innovation lab for open source data science". 19 April 2018.
  4. McKinney, Wes (2021-08-05). "Wes McKinney - Joining Forces for an Arrow-Native Future". wesmckinney.com. Retrieved 2024-02-28.
  5. Miller, Ron (2022-02-17). "Voltron Data grabs $110M to build startup based on Apache Arrow project". TechCrunch. Retrieved 2024-02-28.
  6. 1 2 3 Kopf, Dan. "Meet the man behind the most important tool in data science", Quartz , 8 December 2017. Retrieved on 24 October 2019.
  7. "wesmckinney.com" . Retrieved 26 July 2023.
  8. "Data startup DataPad gets acquired, says it will shut down on Friday". VentureBeat. 29 September 2014. Retrieved 2016-01-10.
  9. "Cloudera Bought Datapad". GigaOm. 30 September 2014. Retrieved 10 January 2016.
  10. "Ibis on Impala: Python at Scale for Data Science - Cloudera Engineering Blog". Cloudera Engineering Blog. Retrieved 2016-01-10. [W]e are excited to announce a new open source project, called Ibis, that will deliver the great Python experience and ecosystem, only at any data and node scale.
  11. "Welcome, Wes!". Posit. 2023-11-06. Retrieved 2024-01-26.
  12. "DataPad emerges to let everyone at your company create and play with charts". VentureBeat. 20 May 2014. Retrieved 2016-01-10.
  13. "Meet Quantopian's Newest Advisor: Wes McKinney". Quantopian Blog. Retrieved 2016-01-10.
  14. "Big data's 4 big Vs: It's our Data Summit highlights - Web Summit Blog". Web Summit Blog. Retrieved 2016-01-10.
  15. "LFPUG: Python in the enterprise + Pandas | Enthought Blog". blog.enthought.com. Retrieved 2016-01-10.
  16. "Big Data Conference - Wes McKinney". O'Reilly Media. Retrieved 10 January 2016.