Berkeley Institute for Data Science

Last updated
Berkeley Institute for Data Science
Berkeley Institute for Data Science - Logo.png
EstablishedNovember 2013
Faculty Director
Saul Perlmutter
Executive Director
David Mongeau
Parent organization
University of California, Berkeley
Website bids.berkeley.edu

The Berkeley Institute for Data Science (BIDS) is a central hub of research and education within University of California, Berkeley designed to facilitate data-intensive science and earn grants to be disseminated within the sciences. [1] [2] BIDS was initially funded by grants from the Gordon and Betty Moore Foundation and the Sloan Foundation as part of a three-year grant with data science institutes at New York University and the University of Washington. [3] [4] [5] The objective of the three-university initiative is to bring together domain experts from the natural and social sciences, along with methodological experts from computer science, statistics, and applied mathematics. [6] The organization has an executive director and a faculty director, Saul Perlmutter, who won the 2011 Nobel Prize in Physics. [7] The initiative was announced at a White House Office of Science and Technology Policy event to highlight and promote advances in data-driven scientific discovery, and is a core component of the National Science Foundation's strategic plan for building national capacity in data science. [8] [9] [10]

Contents

Working groups

There are six working groups that are common across the three universities included in the original Moore/Sloan grant. The working groups are intended to "address the major challenges facing advances in data-intensive research" and include Career Paths and Alternative Metrics, Reproducibility and Open Science, Education and Training, Ethnography and Evaluation, Software Tools and Environments, and Working Spaces and Culture. and while all three are separate aspects of one division, they were awarded different grant money.

Notable fellows

A primary objective of BIDS is to build a community of data science fellows and senior fellows across academic disciplines. The 23 current fellows constitute the majority of the onsite liveware at the Institute, which supports a number of notable initiatives (via Fellow support). The following list is a subset of notable fellows to date:

Related Research Articles

Python is an interpreted high-level general-purpose programming language. Python's design philosophy emphasizes code readability with its notable use of significant indentation. Its language constructs as well as its object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects.

NumPy Numerical programming library for the Python programming language

NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. The ancestor of NumPy, Numeric, was originally created by Jim Hugunin with contributions from several other developers. In 2005, Travis Oliphant created NumPy by incorporating features of the competing Numarray into Numeric, with extensive modifications. NumPy is open-source software and has many contributors.

Alfred P. Sloan Foundation American philanthropic nonprofit organization

The Alfred P. Sloan Foundation is an American philanthropic nonprofit organization. It was established in 1934 by Alfred P. Sloan Jr., then-President and Chief Executive Officer of General Motors.

Computational engineering

Computational science and engineering (CSE) is a relatively new discipline that deals with the development and application of computational models and simulations, often coupled with high-performance computing, to solve complex physical problems arising in engineering analysis and design as well as natural phenomena. CSE has been described as the "third mode of discovery".

Gordon and Betty Moore Foundation

The Gordon and Betty Moore Foundation is an American foundation established by Intel co-founder Gordon E. Moore and his wife Betty I. Moore in September 2000 to support scientific discovery, environmental conservation, patient care improvements and preservation of the character of the Bay Area.

IPython Advanced interactive shell for Python

IPython is a command shell for interactive computing in multiple programming languages, originally developed for the Python programming language, that offers introspection, rich media, shell syntax, tab completion, and history. IPython provides the following features:

Cython Programming language

Cython is a programming language that aims to be a superset of the Python programming language, designed to give C-like performance with code that is written mostly in Python with optional additional C-inspired syntax.

Jennifer Tour Chayes American computer scientist and mathematician

Jennifer Tour Chayes is the University of California, Berkeley Associate Provost for the Division of Computing, Data Science, and Society and Dean of the School of Information. She was formerly a Technical Fellow and Managing Director of Microsoft Research New England in Cambridge, Massachusetts, which she founded in 2008, and Microsoft Research New York City, which she founded in 2012.

Joshua Simon Bloom is an American astrophysicist, full professor of astronomy at the University of California, Berkeley, and was the CTO and co-founder of the machine-learning company wise.io. He received a Bachelor of Arts in astronomy and astrophysics and physics from the Harvard College in 1996, an M.Phil from Cambridge University in 1997, and a PhD in astronomy from the California Institute of Technology in 2002. He was a Junior Fellow of the Harvard Society of Fellows from 2002 to 2005. His astronomy research focuses on gamma-ray bursts and other astrophysical transients such as supernovae and tidal disruption events. He is author of the book What Are Gamma-Ray Bursts? published by Princeton University Press in 2011.

scikit-learn Machine learning library for the Python programming language

Scikit-learn is a free software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.

Taekjip Ha is a South Korean-born American biophysicist who is currently a Bloomberg Distinguished Professor of Biophysics and Biomedical Engineering at Johns Hopkins University. He was previously the Gutgsell Professor of Physics, at University of Illinois at Urbana-Champaign where he was the principal investigator of Single Molecule Nanometry group. He is also a Howard Hughes Medical Institute investigator.

pandas (software) Python library for data analysis

pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. It is free software released under the three-clause BSD license. The name is derived from the term "panel data", an econometrics term for data sets that include observations over multiple time periods for the same individuals. Its name is a play on the phrase "Python data analysis" itself. Wes McKinney started building what would become pandas at AQR Capital while he was a researcher there from 2007 to 2010.

Fernando Pérez (software developer) Colombian-American physicist and software developer

Fernando Pérez is a Colombian-American physicist, software developer, and free software advocate. He is best known as the creator of the IPython programming environment, for which he received the 2012 Free Software Award from the Free Software Foundation and for his work on Project Jupyter for which he received the 2017 ACM Software System Award. He is a fellow of the Python Software Foundation, and a founding member of the NumFOCUS organization.

Paul Philip Hood Wilson is the Grainger Professor of Nuclear Engineering in Nuclear Engineering and Engineering Physics at the University of Wisconsin–Madison. He is a prominent nuclear energy communicator, and advocate of modern computational science practices. He is well known for leading the production of the computational nuclear engineering toolkits ALARA, Cyclus, and DAGMC. He is also the founding president of the North American Young Generation in Nuclear and is the Faculty Director of the Advanced Computing Initiative (ACI) at the University of Wisconsin–Madison.

Hitoshi Murayama

Hitoshi Murayama (村山斉) is a Japanese-born physicist with notable contributions in the fields of particle physics and cosmology. He is currently a professor at the Center for Theoretical Physics at the University of California, Berkeley, and Director of the Kavli Institute for the Physics and Mathematics of the Universe at the University of Tokyo.

Project Jupyter Nonprofit organization developing open-source software

Project Jupyter is a project and community whose goal is to "develop open-source software, open-standards, and services for interactive computing across dozens of programming languages". It was spun off from IPython in 2014 by Fernando Pérez. Project Jupyter's name is a reference to the three core programming languages supported by Jupyter, which are Julia, Python and R, and also a homage to Galileo's notebooks recording the discovery of the moons of Jupiter. Project Jupyter has developed and supported the interactive computing products Jupyter Notebook, JupyterHub, and JupyterLab.

Tamara Ann Broderick is an American computer scientist at the Massachusetts Institute of Technology. She works on machine learning and Bayesian inference.

scikit-multiflow Machine learning library for data streams in Python

scikit-mutliflow is a free and open source software machine learning library for multi-output/multi-label and stream data written in Python.

Karthik Ram

Karthik Ram is a research scientist at the Berkeley Institute for Data Science and member of the Initiative for Global Change Biology at the University of California, Berkeley. He is best known for being the co-founder of rOpenSci. Ram's work focuses on global change, data science, and open research software.

References

  1. Ungerleider, Neal (13 November 2013). "White House to Universities: We Need More Data Scientists". Fast Company. Retrieved 25 October 2015.
  2. Suthaharan, Shan (2015). Machine Learning Models and Algorithms for Big Data Classification: Thinking with Examples for Effective Learning. Springer. p. 10. ISBN   9781489976413.
  3. "NYU Part of Initiative to Harness Potential of Data Scientists, Big Data with Support from Moore, Sloan Foundations". New York University. 12 November 2013. Retrieved 26 October 2015.
  4. "UW, Berkeley, NYU collaborate in $37.8M data science initiative". University of Washington eScience Institute. 7 November 2013. Archived from the original on 14 August 2020. Retrieved 26 October 2015.
  5. Baker, Monya (8 April 2015). "Data science: Industry allure". Nature. 520 (7546): 253–255. doi: 10.1038/nj7546-253a .
  6. "Examples of Big Data Initiatives and Funding Projects". Data Sharing for Demographic Research. Eunice Kennedy Shriver National Institute of Child Health and Human Development. 2015. Retrieved 26 October 2015.
  7. "Launch of the Berkeley Institute for Data Science" (YouTube). Berkeley, California: CITRIS. 12 December 2013. Retrieved 5 November 2011.
  8. Lohr, Steve (12 November 2013). "Program Seeks to Nurture 'Data Science Culture' at Universities". New York Times. Retrieved 25 October 2015.
  9. "Data to Knowledge to Action" (PDF). Office of Science and Technology Policy . 12 November 2013. Archived (PDF) from the original on 2017-01-28. Retrieved 25 October 2015 via National Archives.
  10. Johnstone, Iain; Roberts, Fred (18 July 2014). Final Report from StatSNSF subcommittee (PDF). National Science Foundation. Archived from the original (PDF) on 5 March 2016. Retrieved 5 November 2015.
  11. Allred, Cathy (17 September 2014). "Deciding Force: What we learned from Ferguson". Daily Herald. Retrieved 5 November 2015.
  12. McMillan, Cecily; Gould-Wartofsky, Michael (17 September 2015). "Decriminalize dissent". Al Jazeera America. Retrieved 6 November 2015.
  13. "$6M for UC Berkeley and Cal Poly to Expand and Enhance Open-Source Software for Scientific Computing and Data Science". Business Wire. 7 July 2015. Retrieved 5 November 2015.
  14. Krill, Paul (14 February 2014). "IPython founder details road map for interactive computing platform". InfoWorld. Retrieved 6 November 2015.
  15. Strickland, Eliza (16 April 2014). "Google Earth Engine Brings Big Data to Environmental Activism". IEEE Spectrum. Retrieved 5 November 2015.
  16. Benderly, Beryl (13 July 2015). "Putting women at the controls at NASA". Science. Retrieved 5 November 2015.
  17. Scopatz, Anthony; Kathryn, Huff (2015). Effective Computation in Physics. O'Reilly Media. ISBN   9781491901595.
  18. Lowery, Jack (14 September 2014). "Women in Data Science: Kathryn Huff". Center for Data Science. New York University. Retrieved 6 November 2015.
  19. "Berkeley Institute for Data Science". Berkeley Institute for Data Science.
  20. "Anomaly - Precision Payments brought to healthcare". Anomaly - Precision Payments brought to healthcare.
  21. "From Bioinformatics to Natural Language Processing with Leonard Apeltsin". James Le.
  22. Apeltsin, Leonard (2021). Data Science Bookcamp: Five Python Projects. Manning Publishing. ISBN   9781617296253.
  23. Bressert, Eli (2012). SciPy and NumPy: An Overview for Developers. O'Reilly Media. p. 43. ISBN   9781449361624.
  24. "scikit-image". Python Package Index. Retrieved 5 November 2015.
  25. "Laura Waller". Berkeley Institute for Data Science.