Berkeley Institute for Data Science

Last updated
Berkeley Institute for Data Science
EstablishedNovember 2013
Faculty Director
Fernando Pérez
Parent organization
University of California, Berkeley
Website bids.berkeley.edu

The Berkeley Institute for Data Science (BIDS) is a central hub of research and education within University of California, Berkeley designed to facilitate data-intensive science and earn grants to be disseminated within the sciences. [1] [2] BIDS was initially funded by grants from the Gordon and Betty Moore Foundation and the Sloan Foundation as part of a three-year grant with data science institutes at New York University and the University of Washington. [3] [4] [5] The objective of the three-university initiative is to bring together domain experts from the natural and social sciences, along with methodological experts from computer science, statistics, and applied mathematics. [6] Saul Perlmutter established BIDS in 2013 and stepped down as the Faculty Director in December 2023. [7] The initiative was announced at a White House Office of Science and Technology Policy event to highlight and promote advances in data-driven scientific discovery, and is a core component of the National Science Foundation's strategic plan for building national capacity in data science. [8] [9] [10]

Contents

Working groups

When BIDS was founded in 2013, there were six working groups across the three universities included in the original Moore/Sloan grant, referred to as the Moore-Sloan Data Science Environments (MSDSE). [11] The aim of the MSDSE was to address the major challenges facing advances in data-intensive research, including careers, education and training, tools and software, reproducibility and open science, physical and intellectual space, and data science studies. [12] The efforts from these working groups led to the founding of the Academic Data Science Alliance (ADSA) [13] in 2019. BIDS is a founding member of ADSA. [14]

Notable fellows

A primary objective of BIDS is to build a community of data science fellows and senior fellows across academic disciplines. The 23 current fellows constitute the majority of the onsite liveware at the Institute, which supports a number of notable initiatives (via Fellow support). The following list is a subset of notable fellows to date:

Related Research Articles

<span class="mw-page-title-main">Lawrence Berkeley National Laboratory</span> National laboratory located near Berkeley, California, U.S.

Lawrence Berkeley National Laboratory is a federally funded research and development center in the hills of Berkeley, California, United States. Established in 1931 by the University of California (UC), the laboratory is sponsored by the United States Department of Energy and administered by the UC system. Ernest Lawrence, who won the Nobel prize for inventing the cyclotron, founded the lab and served as its director until his death in 1958. Located in the Berkeley Hills, the lab overlooks the campus of the University of California, Berkeley.

The National Institute for Research in Digital Science and Technology (Inria) is a French national research institution focusing on computer science and applied mathematics. It was created under the name French Institute for Research in Computer Science and Automation (IRIA) in 1967 at Rocquencourt near Paris, part of Plan Calcul. Its first site was the historical premises of SHAPE, which is still used as Inria's main headquarters. In 1980, IRIA became INRIA. Since 2011, it has been styled Inria.

In computer software, a general-purpose programming language (GPL) is a programming language for building software in a wide variety of application domains. Conversely, a domain-specific programming language (DSL) is used within a specific area. For example, Python is a GPL, while SQL is a DSL for querying relational databases.

<span class="mw-page-title-main">Adam Riess</span> American astrophysicist (born 1969)

Adam Guy Riess is an American astrophysicist and Bloomberg Distinguished Professor at Johns Hopkins University and the Space Telescope Science Institute. He is known for his research in using supernovae as cosmological probes. Riess shared both the 2006 Shaw Prize in Astronomy and the 2011 Nobel Prize in Physics with Saul Perlmutter and Brian P. Schmidt for providing evidence that the expansion of the universe is accelerating.

<span class="mw-page-title-main">Chris Lintott</span> British astrophysicist, author and broadcaster (born 1980)

Christopher John Lintott is a British astrophysicist, author and broadcaster. He is a Professor of Astrophysics in the Department of Physics at the University of Oxford, and, since 2023, Gresham Professor of Astronomy at Gresham College, London. Lintott is involved in a number of popular science projects aimed at bringing astronomy to a wider audience and is also the primary presenter of the BBC television series The Sky at Night, having previously been co-presenter with Patrick Moore until Moore's death in 2012. He co-authored Bang! – The Complete History of the Universe and The Cosmic Tourist with Moore and Queen guitarist and astrophysicist Brian May.

<span class="mw-page-title-main">Gordon and Betty Moore Foundation</span> American charitable foundation

The Gordon and Betty Moore Foundation is an American foundation established by Intel co-founder Gordon E. Moore and his wife Betty I. Moore in September 2000 to support scientific discovery, environmental conservation, patient care improvements and preservation of the character of the San Francisco Bay Area.

<span class="mw-page-title-main">IPython</span> Advanced interactive shell for Python

IPython is a command shell for interactive computing in multiple programming languages, originally developed for the Python programming language, that offers introspection, rich media, shell syntax, tab completion, and history. IPython provides the following features:

Joshua Simon Bloom is an American astrophysicist and professor of astronomy at the University of California, Berkeley, and was the CTO and co-founder of the machine-learning company wise.io. He received a Bachelor of Arts in astronomy and astrophysics and physics from the Harvard College in 1996, an M.Phil from Cambridge University in 1997, and a PhD in astronomy from the California Institute of Technology in 2002. He was a Junior Fellow of the Harvard Society of Fellows from 2002 to 2005. He was the chair of the Astronomy Department at UC Berkeley from 2020 to 2023. His astronomy research focuses on gamma-ray bursts and other astrophysical transients such as supernovae and tidal disruption events. He is author of the book What Are Gamma-Ray Bursts? published by Princeton University Press in 2011.

scikit-learn Python library for machine learning

scikit-learn is a free and open-source machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support-vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy. Scikit-learn is a NumFOCUS fiscally sponsored project.

<span class="mw-page-title-main">Astropy</span> Python language software

Astropy is a collection of software packages written in the Python programming language and designed for use in astronomy. The software is a single, free, core package for astronomical utilities due to the increasingly widespread usage of Python by astronomers, and to foster interoperability between various extant Python astronomy packages. Astropy is included in several large Python distributions; it is part of package managers for Linux and macOS, the Anaconda Python Distribution, Enthought Canopy and Ureka.

<span class="mw-page-title-main">Fernando Pérez (software developer)</span> Colombian-American physicist and software developer

Fernando Pérez is a Colombian-American physicist, software developer, and free software advocate. He is best known as the creator of the IPython programming environment, for which he received the 2012 Free Software Award from the Free Software Foundation and for his work on Project Jupyter for which he received the 2017 ACM Software System Award. He is a fellow of the Python Software Foundation, and a founding member of the NumFOCUS organization.

Paul Philip Hood Wilson is the Grainger Professor of Nuclear Engineering and the Chair of the Department of Nuclear Engineering and Engineering Physics at the University of Wisconsin–Madison. He is a prominent nuclear energy communicator, and advocate of modern computational science practices. He is well known for leading the production of the computational nuclear engineering toolkits ALARA, Cyclus, and DAGMC. He is also the founding president of the North American Young Generation in Nuclear and is the Faculty Director of the Advanced Computing Initiative (ACI) at the University of Wisconsin–Madison.

<span class="mw-page-title-main">Carnegie Mellon University Computational Biology Department</span>

The Ray and Stephanie Lane Computational Biology Department (CBD) is one of the seven departments within the School of Computer Science at Carnegie Mellon University in Pittsburgh, Pennsylvania, United States. Now situated in the Gates-Hillman Center, CBD was established in 2007 as the Lane Center for Computational Biology by founding department head Robert F. Murphy. The establishment was supported by funding from Raymond J. Lane and Stephanie Lane, CBD officially became a department within the School of Computer Science in 2009. In November 2023, Carnegie Mellon named the department as the Ray and Stephanie Lane Computational Biology Department, in recognition of the Lanes' significant investment in computational biology at CMU.

The NYU Center for Data Science (CDS) is a degree-granting graduate institute and research center at New York University. It was established in 2013 by computer scientist Yann LeCun. CDS offers a M.S. in Data Science and, as of 2017, it was one of the first universities in the U.S. to offer a Ph.D. in Data Science. Its M.S. in Data Science program is one of the most highly regarded and selective in the country.

<span class="mw-page-title-main">Project Jupyter</span> Open source data science software

Project Jupyter is a project to develop open-source software, open standards, and services for interactive computing across multiple programming languages.

<span class="mw-page-title-main">Laura Waller</span> Computer scientist

Laura Ann Waller is a computer scientist and Ted Van Duzer Endowed Associate Professor at the University of California, Berkeley. She was awarded a Chan Zuckerberg Initiative Fellowship to develop microscopes to image deep structures within the brain in 2017 and won the 2018 SPIE Early Career Award.

<span class="mw-page-title-main">Dask (software)</span> Python library for parallel computing

Dask is an open-source Python library for parallel computing. Dask scales Python code from multi-core local machines to large distributed clusters in the cloud. Dask provides a familiar user interface by mirroring the APIs of other libraries in the PyData ecosystem including: Pandas, scikit-learn and NumPy. It also exposes low-level APIs that help programmers run custom algorithms in parallel.

Katherine Snowden Pollard is the Director of the Gladstone Institute of Data Science and Biotechnology and a professor at the University of California, San Francisco (UCSF). She is a Chan Zuckerberg Biohub Investigator. She was awarded Fellowship of the International Society for Computational Biology in 2020 and the American Institute for Medical and Biological Engineering in 2021 for outstanding contributions to computational biology and bioinformatics.

<span class="mw-page-title-main">Karthik Ram</span> Data scientist

Karthik Ram is a research scientist at the Berkeley Institute for Data Science and member of the Initiative for Global Change Biology at the University of California, Berkeley. He is best known for being the co-founder of rOpenSci. Ram's work focuses on global change, data science, and open research software.

<span class="mw-page-title-main">Susan K. Gregurick</span> American computational chemist

Susan Kathryn Gregurick is an American computational chemist. She is the associate director for data science at the National Institutes of Health (NIH). Gregurick is the director of the NIH Office of Data Science Strategy.

References

  1. Ungerleider, Neal (13 November 2013). "White House to Universities: We Need More Data Scientists". Fast Company. Retrieved 25 October 2015.
  2. Suthaharan, Shan (2015). Machine Learning Models and Algorithms for Big Data Classification: Thinking with Examples for Effective Learning. Springer. p. 10. ISBN   9781489976413.
  3. "NYU Part of Initiative to Harness Potential of Data Scientists, Big Data with Support from Moore, Sloan Foundations". New York University. 12 November 2013. Retrieved 26 October 2015.
  4. "UW, Berkeley, NYU collaborate in $37.8M data science initiative". University of Washington eScience Institute. 7 November 2013. Archived from the original on 14 August 2020. Retrieved 26 October 2015.
  5. Baker, Monya (8 April 2015). "Data science: Industry allure". Nature. 520 (7546): 253–255. doi: 10.1038/nj7546-253a . PMID   25859590.
  6. "Examples of Big Data Initiatives and Funding Projects". Data Sharing for Demographic Research. Eunice Kennedy Shriver National Institute of Child Health and Human Development. 2015. Retrieved 26 October 2015.
  7. "BIDS Community Celebrates Saul Perlmutter's Tenure as the Founding Faculty Director". Berkeley Institute for Data Science (BIDS) (article). Berkeley, California. 12 December 2023. Retrieved 2 October 2024.
  8. Lohr, Steve (12 November 2013). "Program Seeks to Nurture 'Data Science Culture' at Universities". New York Times. Retrieved 25 October 2015.
  9. "Data to Knowledge to Action" (PDF). Office of Science and Technology Policy . 12 November 2013. Archived (PDF) from the original on 2017-01-28. Retrieved 25 October 2015 via National Archives.
  10. Johnstone, Iain; Roberts, Fred (18 July 2014). Final Report from StatSNSF subcommittee (PDF). National Science Foundation. Archived from the original (PDF) on 5 March 2016. Retrieved 5 November 2015.
  11. "MSDSE Archive". Academic Data Science Alliance. Retrieved 2 October 2024.
  12. "MSDSE Archive - Themes". Academic Data Science Alliance. Retrieved 2 October 2024.
  13. "About ADSA". Academic Data Science Alliance. Retrieved 2 October 2022.
  14. "Founding Members". Academic Data Science Alliance. Retrieved 2 October 2022.
  15. Allred, Cathy (17 September 2014). "Deciding Force: What we learned from Ferguson". Daily Herald. Retrieved 5 November 2015.
  16. McMillan, Cecily; Gould-Wartofsky, Michael (17 September 2015). "Decriminalize dissent". Al Jazeera America. Retrieved 6 November 2015.
  17. "$6M for UC Berkeley and Cal Poly to Expand and Enhance Open-Source Software for Scientific Computing and Data Science". Business Wire. 7 July 2015. Retrieved 5 November 2015.
  18. Krill, Paul (14 February 2014). "IPython founder details road map for interactive computing platform". InfoWorld. Retrieved 6 November 2015.
  19. Strickland, Eliza (16 April 2014). "Google Earth Engine Brings Big Data to Environmental Activism". IEEE Spectrum. Retrieved 5 November 2015.
  20. Benderly, Beryl (13 July 2015). "Putting women at the controls at NASA". Science. Retrieved 5 November 2015.
  21. Scopatz, Anthony; Kathryn, Huff (2015). Effective Computation in Physics. O'Reilly Media. ISBN   9781491901595.
  22. Lowery, Jack (14 September 2014). "Women in Data Science: Kathryn Huff". Center for Data Science. New York University. Retrieved 6 November 2015.
  23. "Berkeley Institute for Data Science". Berkeley Institute for Data Science.
  24. "Anomaly - Precision Payments brought to healthcare". Anomaly - Precision Payments brought to healthcare.
  25. "From Bioinformatics to Natural Language Processing with Leonard Apeltsin". James Le.
  26. Apeltsin, Leonard (2021). Data Science Bookcamp: Five Python Projects. Manning Publishing. ISBN   9781617296253.
  27. Bressert, Eli (2012). SciPy and NumPy: An Overview for Developers. O'Reilly Media. p. 43. ISBN   9781449361624.
  28. "scikit-image". Python Package Index. Retrieved 5 November 2015.
  29. "Laura Waller". Berkeley Institute for Data Science.