Michael Stonebraker

Last updated
Michael Stonebraker
Michael Stonebraker P1120062.jpg
Michael Stonebraker giving the 2015 Turing lecture
Born (1943-10-11) October 11, 1943 (age 80)
Alma mater Princeton University,
University of Michigan
Known for Ingres, Postgres, Vertica, Streambase, Illustra, VoltDB, SciDB
SpouseBeth
Awards IEEE John von Neumann Medal (2005)
ACM Turing Award (2014)
Scientific career
FieldsComputer science
Institutions University of California, Berkeley,
University of Michigan,
Massachusetts Institute of Technology
Thesis The Reduction of Large Scale Markov Models for Random Chains
Doctoral advisor Arch Waugh Naylor
Notable students Joseph M. Hellerstein
Clifford A. Lynch [2]
Margo Seltzer [2]
Dale Skeen [3]
Marti Hearst [4]
Leilani Battle [5]
Website csail.mit.edu/user/1547

Michael Ralph Stonebraker (born October 11, 1943 [6] ) is a computer scientist specializing in database systems. Through a series of academic prototypes and commercial startups, Stonebraker's research and products are central to many relational databases. He is also the founder of many database companies, including Ingres Corporation, Illustra, Paradigm4, StreamBase Systems, Tamr, Vertica and VoltDB, and served as chief technical officer of Informix. For his contributions to database research, Stonebraker received the 2014 Turing Award, often described as "the Nobel Prize for computing." [7]

Contents

Stonebraker's career can be broadly divided into two phases: his time at University of California, Berkeley when he focused on relational database management systems such as Ingres and Postgres, and, starting in 2001, at Massachusetts Institute of Technology (MIT) where he developed more novel data management techniques such as C-Store, H-Store, SciDB and DBOS. [8] Stonebraker is currently a professor emeritus at UC Berkeley and an adjunct professor at MIT's Computer Science and Artificial Intelligence Laboratory. [9] [10] He is also known as an editor for the book Readings in Database Systems.

Life

Stonebraker grew up in Milton, New Hampshire. [11] He earned his B.S.E. in electrical engineering from Princeton University in 1965, and his M.S. and Ph.D. from the University of Michigan in 1967 and 1971 [12] respectively. His awards include the IEEE John von Neumann Medal and the first SIGMOD Edgar F. Codd Innovations Award. In 1994 he was inducted as a Fellow of the Association for Computing Machinery. [13] In 1997, he was elected a member of the National Academy of Engineering for the development and commercialization of relational and object-relational database systems. In March 2015 it was announced he won the 2014 ACM Turing Award. [7] In September 2015, he won the 2015 Commonwealth Award, chosen by council members of MassTLC. [14]

The Berkeley years (19712000)

Stonebraker joined University of California, Berkeley as an assistant professor in 1971, and taught in the computer science department for twenty-nine years. It was there that he did his early pioneering work on relational databases.

Ingres

In 1973, Stonebraker and his colleague Eugene Wong started researching relational database systems after reading a series of seminal papers published by Edgar F. Codd on the relational data model. [15]

Their project, known as Ingres (Interactive Graphics and Retrieval System), [16] was one of the first systems (along with System R from IBM) to demonstrate that it was possible to build a practical and efficient implementation of the relational model. A number of key ideas from INGRES are still widely used in relational systems, including the use of B-trees, primary-copy replication, the query rewrite approach to views and integrity constraints, and the idea of rules/triggers for integrity checking in an RDBMS. Additionally, much experimental work was done that provided insights into how to build a locking system that could provide satisfactory transaction performance. [17]

By the mid-1970s, Stonebraker's team produced, using a rotating team of student programmers, a usable relational database system. At the time Ingres was considered "low end" compared to IBM's System R, as it ran on Unix-based Digital Equipment Corporation machines as opposed to the "big iron" IBM mainframes.[ citation needed ]

By the early 1980s, however, the performance and capabilities of these low-end machines were seriously threatening IBM's mainframe market, and with the threat came the ability of Ingres to become a viable, "real" product for a large number of applications. Ingres used a variation of the BSD license for a nominal fee, and soon a number of companies took advantage of this to create commercial versions of Ingres .[ citation needed ]

These included Stonebraker, who with fellow Berkeley professors Larry Rowe and Eugene Wong helped found Relational Technology, Inc., later called Ingres Corporation. Subsequently, sold to Computer Associates, Ingres was re-established as an independent company in 2005, and later renamed Actian. Other startups based on Ingres include Sybase, founded by Robert Epstein, a student on the project, and Britton Lee, Inc. Sybase's code was later used as a basis for Microsoft SQL Server. [18]

Postgres

After founding Relational Technology, Stonebraker and Rowe began a "post-Ingres" effort, to address the limitations of the relational model. The new project was named POSTGRES (POST inGRES), [19] and was designed to add support for complex data types to database systems and improve end-to-end performance of data-intensive applications. Postgres provided an object relational programming model in which fields could be complex datatypes, and where users could register new types as well as scalar and aggregate functions over those types. Postgres was extensible in a number of other ways, making it easy for programmers to modify or add to the optimizer, query language, runtime, and indexing frameworks. These features improved both database programmability and performance, and made it possible to push large portions of a number of applications inside the database, including geographic information systems and time series processing. This had the effect of substantially broadening the commercial database market.[ citation needed ]

Postgres was also offered using a BSD-like license, and the code forms the basis of the free software, PostgreSQL. Stonebraker also led an effort to commercialize the code, creating Illustra which was purchased by Informix. PostgreSQL has been used as the basis for a number of other startup companies, including Aster Data Systems, EnterpriseDB, and Greenplum.[ citation needed ]

Informix acquired Illustra in 1996, and Stonebraker became Informix's CTO, a position he held until September 2000. Informix integrated Illustra's O–R mapping and DataBlades into the 7.x OnLine product, resulting in Informix Universal Server (IUS), or more generally, Version 9.[ citation needed ]

Mariposa and Cohera

After the Postgres project, Stonebraker initiated the Mariposa [20] project which became the basis of Cohera Corporation. Mariposa built a federated database over an economic model of resource trading, in which data distributed across multiple organizations could be integrated and queried from a single relational interface, governed by site-specific policies that would charge for data processing and storage. These economic policies allowed traditional ideas in query optimization to be carried out over competing sites, and also served as the basis for data storage, replication and movement within a federation.

Cohera's initial mission was to commercialize Mariposa, but eventually focused on a business-to-business catalog management application on the core federated data integration engine. Cohera's intellectual property was purchased by PeopleSoft in 2001, and used as the basis of PeopleSoft's Enterprise Catalog Management. PeopleSoft was in turn purchased by Oracle Corporation in 2004.[ citation needed ]

The MIT years (2001–present)

Stonebraker became an adjunct professor at MIT in 2001, where he began another series of research projects and founded a number of companies.

Aurora and StreamBase

In the Aurora Project, Stonebraker, along with colleagues from Brandeis University, Brown University, and MIT, focused on data management for streaming data, using a new data model and query language. Unlike relational systems, which "pull" data and process it a record at a time, in Aurora, data is "pushed", arriving asynchronously from external data sources (such as stock ticks, news feeds, or sensors.) The output is itself a stream of results (such as windowed averages) that are sent to users. [21]

Stonebraker co-founded StreamBase Systems in 2003 to commercialize the technology behind Aurora.

C-Store and Vertica

In the C-Store project, started in 2005, Stonebraker, along with colleagues from Brandeis, Brown, MIT, and University of Massachusetts Boston, developed a parallel, shared-nothing column-oriented DBMS for data warehousing. By dividing and storing data in columns, C-Store is able to perform less I/O and get better compression ratios than conventional database systems that store data in rows. [22]

Stonebraker explained that it's because similar data items are side-by-side: Name,Name,Name,Name vs. Name,Address,Zip,Phone#. In 2005, Stonebraker co-founded Vertica to commercialize the technology behind C-Store. [23]

Morpheus and Goby

In 2006, Stonebraker started the Morpheus project, along with researchers from the University of Florida. Morpheus is a data integration system which relies on a collection of "transforms" to mediate between data sources. Each transform provides a queryable interface to particular web site or service, and Morpheus makes it possible to search for and compose multiple transforms to provide a new service or a unified view of several services.

In 2009, Stonebraker co-founded Goby, [24] a local search company based on ideas from Morpheus, for people to explore new things to do in free time.

H-Store and VoltDB

In 2007, with researchers from Brown University, MIT, and Yale University, Stonebraker started the H-Store project. H-Store is a distributed main-memory online transaction processing (OLTP) system designed to provide very high throughput on transaction processing workloads.

In 2009, Stonebraker co-founded, and then served as an adviser to, VoltDB a commercial startup based on ideas from the H-Store project.

SciDB

In 2008, along with David DeWitt and researchers from Brown, MIT, Portland State University, SLAC, the University of Washington, and the University of Wisconsin–Madison, Stonebraker started SciDB [25] [26] an open-source DBMS specially designed for scientific research applications. [27]

He founded Paradigm4 with Marilyn Matz, who became CEO. Paradigm4 developed SciDB, used mostly by life sciences and financial markets. Novartis, Foundation Medicine, and the National Institutes of Health are some of the company's clients. [14] [28]

NoSQL

In 2010 and 2011, Stonebraker criticized the NoSQL movement. [29] [30] [31]

Notable students

Stonebraker trained more than 30 students, [3] including:

Selected works

Related Research Articles

<span class="mw-page-title-main">Object database</span> Type of database management system

An object database or object-oriented database is a database management system in which information is represented in the form of objects as used in object-oriented programming. Object databases are different from relational databases which are table-oriented. A third type, object–relational databases, is a hybrid of both approaches. Object databases have been considered since the early 1980s.

<span class="mw-page-title-main">PostgreSQL</span> Free and open-source object relational database management system

PostgreSQL, also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. PostgreSQL features transactions with atomicity, consistency, isolation, durability (ACID) properties, automatically updatable views, materialized views, triggers, foreign keys, and stored procedures. It is supported on all major operating systems, including Linux, FreeBSD, OpenBSD, macOS, and Windows, and handles a range of workloads from single machines to data warehouses or web services with many concurrent users.

Structured Query Language (SQL) is a domain-specific language used to manage data, especially in a relational database management system (RDBMS). It is particularly useful in handling structured data, i.e., data incorporating relations among entities and variables.

<span class="mw-page-title-main">Object–relational database</span> Database management system

An object–relational database (ORD), or object–relational database management system (ORDBMS), is a database management system (DBMS) similar to a relational database, but with an object-oriented database model: objects, classes and inheritance are directly supported in database schemas and in the query language. In addition, just as with pure relational systems, it supports extension of the data model with custom data types and methods.

<span class="mw-page-title-main">Ingres (database)</span> Database software

Ingres Database is a proprietary SQL relational database management system intended to support large commercial and government applications.

A database server is a server which uses a database application that provides database services to other computer programs or to computers, as defined by the client–server model. Database management systems (DBMSs) frequently provide database-server functionality, and some database management systems rely exclusively on the client–server model for database access.

The following tables compare general and technical information for a number of relational database management systems. Please see the individual products' articles for further information. Unless otherwise specified in footnotes, comparisons are based on the stable versions without any add-ons, extensions or external programs.

<span class="mw-page-title-main">MonetDB</span> Open source column-oriented relational database management system

MonetDB is an open-source column-oriented relational database management system (RDBMS) originally developed at the Centrum Wiskunde & Informatica (CWI) in the Netherlands. It is designed to provide high performance on complex queries against large databases, such as combining tables with hundreds of columns and millions of rows. MonetDB has been applied in high-performance applications for online analytical processing, data mining, geographic information system (GIS), Resource Description Framework (RDF), text retrieval and sequence alignment processing.

In computing, GiST or Generalized Search Tree, is a data structure and API that can be used to build a variety of disk-based search trees. GiST is a generalization of the B+ tree, providing a concurrent and recoverable height-balanced search tree infrastructure without making any assumptions about the type of data being stored, or the queries being serviced. GiST can be used to easily implement a range of well-known indexes, including B+ trees, R-trees, hB-trees, RD-trees, and many others; it also allows for easy development of specialized indexes for new data types. It cannot be used directly to implement non-height-balanced trees such as quad trees or prefix trees (tries), though like prefix trees it does support compression, including lossy compression. GiST can be used for any data type that can be naturally ordered into a hierarchy of supersets. Not only is it extensible in terms of data type support and tree layout, it allows the extension writer to support any query predicates that they choose.

A spatial database is a general-purpose database that has been enhanced to include spatial data that represents objects defined in a geometric space, along with tools for querying and analyzing such data.

A column-oriented DBMS or columnar DBMS is a database management system (DBMS) that stores data tables by column rather than by row. Benefits include more efficient access to data when only querying a subset of columns, and more options for data compression. However, they are typically less efficient for inserting new data.

<span class="mw-page-title-main">Illustra</span> Database software

Illustra was a commercialized version of the Postgres object-relational database management system (DBMS) sold by Illustra Information Technologies, a company founded in 1992 and formed by Michael Stonebraker, Gary Morgenthaler and several of Michael Stonebraker's current and former students including: Wei Hong, Jeff Meredith, Michael Olson, Paula Hawthorn, Jeff Anton, Cimarron Taylor and Michael Ubell.

<span class="mw-page-title-main">Joseph M. Hellerstein</span> American computer scientist

Joseph M. Hellerstein is an American professor of computer science at the University of California, Berkeley, where he works on database systems and computer networks. He co-founded Trifacta with Jeffrey Heer and Sean Kandel in 2012, which stemmed from their research project, Wrangler.

<span class="mw-page-title-main">Samuel Madden (computer scientist)</span> American computer scientist

Samuel R. Madden is an American computer scientist specializing in database management systems. He is currently a professor of computer science at the Massachusetts Institute of Technology.

QUEL is a relational database query language, based on tuple relational calculus, with some similarities to SQL. It was created as a part of the Ingres DBMS effort at University of California, Berkeley, based on Codd's earlier suggested but not implemented Data Sub-Language ALPHA. QUEL was used for a short time in most products based on the freely available Ingres source code, most notably in an implementation called POSTQUEL supported by POSTGRES. As Oracle and DB2 gained market share in the early 1980s, most companies then supporting QUEL moved to SQL instead. QUEL continues to be available as a part of the Ingres DBMS, although no QUEL-specific language enhancements have been added for many years.

Informix Corporation was a software company located in Menlo Park, California. It was a developer of relational database software for computers using the Unix, Microsoft Windows, and Apple Macintosh operating systems.

NewSQL is a class of relational database management systems that seek to provide the scalability of NoSQL systems for online transaction processing (OLTP) workloads while maintaining the ACID guarantees of a traditional database system.

<span class="mw-page-title-main">Martin L. Kersten</span> Dutch computer scientist (born 1953)

Martin L. Kersten was a computer scientist with research focus on database architectures, query optimization and their use in scientific databases. He was an architect of the MonetDB system, an open-source column store for data warehouses, online analytical processing (OLAP) and geographic information systems (GIS). He has been (co-) founder of several successful spin-offs of the Centrum Wiskunde & Informatica (CWI).

<span class="mw-page-title-main">Eugene Wong</span> Chinese-American computer scientist and mathematician

Eugene Wong is a Chinese-American computer scientist and mathematician. Wong's career has spanned academia, university administration, government and the private sector. Together with Michael Stonebraker and a group of scientists at IBM, Wong is credited with pioneering database research in the 1970s from which software developed by IBM, Microsoft, and Oracle descends. Wong retired in 1994, since then holding the title of Professor Emeritus of Electrical Engineering and Computer Sciences at University of California, Berkeley.

Daniel Abadi is the Darnell-Kanal Professor of Computer Science at University of Maryland, College Park. His primary area of research is database systems, with contributions to stream databases, distributed databases, graph databases, and column-store databases. He helped create C-Store, a column-oriented database, and HadoopDB, a hybrid of relational databases and Hadoop. Both database systems were commercialized by companies.

References

  1. "Michael Stonebraker - A.M. Turing Award Winner" . Retrieved 2018-02-06.
  2. 1 2 "Ph.D. Dissertations | EECS at UC Berkeley". www2.eecs.berkeley.edu.
  3. 1 2 Michael Stonebraker at the Mathematics Genealogy Project
  4. "Nice: or what it was like to be Mike's student" (PDF).
  5. Battle, Leilani Marie (2017). "Behavior-driven optimization techniques for scalable data exploration". Massachusetts Institute of Technology. hdl:1721.1/111853 . Retrieved 2023-12-27.
  6. "Contributors". IEEE Transactions on Systems, Man, and Cybernetics (4): 562–564. Sep 1972. doi:10.1109/TSMC.1972.4309174.
  7. 1 2 Conner-Simons, Adam (March 25, 2015). "Michael Stonebraker wins $1 million Turing Award". MIT News. Massachusetts Institute of Technology. Retrieved March 25, 2015.
  8. "Postgres pioneer Michael Stonebraker promises to upend the database once more". www.theregister.com. Retrieved 2023-12-27.
  9. "Michael Stonebraker". www2.eecs.berkeley.edu. Retrieved 2018-03-16.
  10. "Michael Stonebraker | MIT CSAIL". www.csail.mit.edu. Retrieved 2018-03-16.
  11. Oral History of Michael Stonebraker; 2012-08-23 Retrieved 2018-08-26.
  12. Stonebraker, Michael Ralph (1971). The Reduction of Large Scale Markov Models for Random Chains (PhD thesis). University of Michigan. OCLC   634008426. ProQuest   302585708.
  13. "Michael Ralph Stonebraker - ACM author profile page" . Retrieved 2011-07-27.
  14. 1 2 Geller, Jessica. "PTC Chief Heppelman named CEO of the year by Mass. tech council." betaBoston. The Boston Globe. Sept. 16, 2015 Archived 2016-01-07 at the Wayback Machine
  15. Codd, E. F. (1970). "A relational model of data for large shared data banks" (PDF). Communications of the ACM. 13 (6): 377–387. doi:10.1145/362384.362685. S2CID   207549016.
  16. Stonebraker, M.; Held, G.; Wong, E.; Kreps, P. (1976). "The design and implementation of INGRES". ACM Transactions on Database Systems. 1 (3): 189. CiteSeerX   10.1.1.109.957 . doi:10.1145/320473.320476. S2CID   1514658.
  17. "Relational Roots". Joseph Hellerstein. 1998. Retrieved 2009-11-24.
  18. "Motivation & DBMS Architecture Overview". Joseph Hellerstein. 1998. Retrieved 2009-11-24.
  19. Stonebraker, M.; Rowe, L. A. (1986). "The design of POSTGRES". ACM SIGMOD Record. 15 (2): 340. doi: 10.1145/16856.16888 .
  20. Stonebraker, M.; Aoki, P. M.; Litwin, W.; Pfeffer, A.; Sah, A.; Sidell, J.; Staelin, C.; Yu, A. (1996). "Mariposa: A wide-area distributed database system". The VLDB Journal the International Journal on Very Large Data Bases. 5: 48–63. CiteSeerX   10.1.1.68.5480 . doi:10.1007/s007780050015. S2CID   5062284.
  21. Abadi, D. J.; Carney, D.; Etintemel, U.; Cherniack, M.; Convey, C.; Lee, S.; Stonebraker, M.; Tatbul, N.; Zdonik, S. (2003). "Aurora: A new model and architecture for data stream management". The VLDB Journal the International Journal on Very Large Data Bases. 12 (2): 120. CiteSeerX   10.1.1.6.1187 . doi:10.1007/s00778-003-0095-z. S2CID   8101432.
  22. (Print edition title: Database Pioneer Rethinks How Data is Organized.Charles Babcock (February 21, 2008). "Database Pioneer Rethinks The Best Way To Organize Data". InformationWeek .
  23. "The Vertica Analytic Database: C-Store 7 Years Later" (PDF)" (PDF). VLDB.org. August 28, 2012.
  24. Goby .
  25. Brown, P. G. (2010). "Overview of sciDB". Proceedings of the 2010 international conference on Management of data - SIGMOD '10. pp. 963–968. doi:10.1145/1807167.1807271. ISBN   9781450300322. S2CID   14544985.
  26. Stonebraker, M.; Brown, P.; Poliakov, A.; Raman, S. (2011). "The Architecture of SciDB". Scientific and Statistical Database Management. Lecture Notes in Computer Science. Vol. 6809. pp. 1–16. doi:10.1007/978-3-642-22351-8_1. ISBN   978-3-642-22350-1.
  27. "SciDB: Relational daddy answers Google, Hadoop, NoSQL". The Register. 2010-09-13. Retrieved 2012-01-11.
  28. Alspach, Kyle. "New Money: MassChallenge Alum Gets Dorm Room Fund Investment; Drone Co. Raises Seed Round." BostInno. Nov. 30, 2015 Archived 2016-02-07 at the Wayback Machine
  29. Stonebraker, M. (2010). "SQL databases v. NoSQL databases". Communications of the ACM. 53 (4): 10–11. doi:10.1145/1721654.1721659. S2CID   13959501.
  30. Stonebraker, M. (2011). "Stonebraker on NoSQL and enterprises". Communications of the ACM. 54 (8): 10–11. doi:10.1145/1978542.1978546. S2CID   36572502.
  31. Stonebraker, M.; Abadi, D.; Dewitt, D. J.; Madden, S.; Paulson, E.; Pavlo, A.; Rasin, A. (2010). "MapReduce and parallel DBMSs". Communications of the ACM. 53: 64–71. doi:10.1145/1629175.1629197. S2CID   61484899.