Jenny Bryan

Last updated
Jennifer "Jenny" Bryan
Occupations
Known for R packages
Academic background
Alma mater Yale University (B.A.)
University of California, Berkeley (PhD)
Website https://jennybryan.org

Jennifer "Jenny" Bryan is a data scientist and an associate professor of statistics at the University of British Columbia where she developed the Master in Data Science Program. She is a statistician and software engineer at RStudio from Vancouver, Canada and is known for creating open source tools which connect R to Google Sheets and Google Drive. [1] [2] [3] [4]

Contents

Education

Bryan earned her Bachelor of Arts in Economics and German literature from Yale University in 1992 and her PhD in Biostatistics from University of California, Berkeley in 2001. [5] [6]

Career

As an associate professor of statistics at the University of British Columbia, [7] Bryan worked on biostatistics with a focus on gene expression and microarray data. Notable projects to which she has contributed include the quantification of photomotor responses in larval zebrafish, [8] the development of an assay system in the multicellular animal Caenorhabditis elegans to test genetic interactions causing synthetic lethality in somatic cells, [9] and a novel yeast-based model to search for modifier genes involved in cystic fibrosis. [10] Beyond biostatistics, Bryan has also contributed to medoids-based clustering methods. [11] Her general science contributions include a manifesto published in PLOS One on good practices for scientific computing [12] and an introduction to the Git version control system [13] for research data analysis. [14] [15] [16]

Bryan's teaching activities at UBC included development of the Master of Data Science Program [17] and new materials for the STAT 545 course. [18] Under Bryan's direction, the STAT 545 course became notable as an early example of a data science course taught in a statistics program. It is also notable for its focus on teaching using modern R packages, Git and GitHub, its extensive sharing of teaching materials openly online, and its strong emphasis on practical data cleaning, exploration, and visualization skills, rather than algorithms and theory. [15] As of late 2016 Bryan is on leave from her UBC position and is working at RStudio with a team led by Hadley Wickham. [3]

Bryan has had experience with S and R since 1996. [1] [7] She is known for her open source contributions in R. [19] Influential contributions include the use of Lego [20] and the concept of data rectangling [21] for explaining programming concepts, [22] [23] reproducible research, [24] and advice on project and workflow organisation. [25] [26] [27]

Bryan is well known for her work on efficient methods of working in spreadsheets, and the connection between R and spreadsheet software such as Excel and Google Sheets. [4] She is the primary developer of the R package googlesheets, that connects R to the Google Sheets service, [28] and googledrive, an R package for interfacing between R and Google Drive.

Bryan is known for her work in teaching, her contributions to R packages, and her involvement with the leadership committee at rOpenSci. [29] [30] She is also part of the R Foundation Forwards task force and a member of the editorial board of BMC Bioinformatics. [30] [31] Previously, she worked as an Associate at the Boston Consulting Group in Boston, MA. [6]

Personal life

Bryan lives with her husband, three children, and dog, Toby. [1] [31] [32]

Related Research Articles

<span class="mw-page-title-main">R (programming language)</span> Programming language for statistics

R is a programming language for statistical computing and graphics supported by the R Core Team and the R Foundation for Statistical Computing. Created by statisticians Ross Ihaka and Robert Gentleman, R is used among data miners, bioinformaticians and statisticians for data analysis and developing statistical software. The core R language is augmented by a large number of extension packages containing reusable code and documentation.

<span class="mw-page-title-main">Computational statistics</span> Interface between statistics and computer science

Computational statistics, or statistical computing, is the bond between statistics and computer science, and refers to the statistical methods that are enabled by using computational methods. It is the area of computational science specific to the mathematical science of statistics. This area is also developing rapidly, leading to calls that a broader concept of computing should be taught as part of general statistical education.

In statistics, a generalized estimating equation (GEE) is used to estimate the parameters of a generalized linear model with a possible unmeasured correlation between observations from different timepoints. Although some believe that Generalized estimating equations are robust in everything even with the wrong choice of working-correlation matrix, Generalized estimating equations are only robust to loss of consistency with the wrong choice.

Statistics is the theory and application of mathematics to the scientific method including hypothesis generation, experimental design, sampling, data collection, data summarization, estimation, prediction and inference from those results to the population from which the experimental sample was drawn. This article lists statisticians who have been instrumental in the development of theoretical and applied statistics.

Probabilistic programming (PP) is a programming paradigm in which probabilistic models are specified and inference for these models is performed automatically. It represents an attempt to unify probabilistic modeling and traditional general purpose programming in order to make the former easier and more widely applicable. It can be used to create systems that help make decisions in the face of uncertainty.

<span class="mw-page-title-main">RStudio</span> Integrated development environment for R

RStudio is an integrated development environment for R, a programming language for statistical computing and graphics. It is available in two formats: RStudio Desktop is a regular desktop application while RStudio Server runs on a remote server and allows accessing RStudio using a web browser. The RStudio IDE is a product of Posit PBC.

<span class="mw-page-title-main">Hadley Wickham</span> New Zealand statistician

Hadley Alexander Wickham is a New Zealand statistician known for his work on open-source software for the R statistical programming environment. He is the chief scientist at Posit, PBC and an adjunct professor of statistics at the University of Auckland, Stanford University, and Rice University. His work includes the data visualisation system ggplot2 and the tidyverse, a collection of R packages for data science based on the concept of tidy data.

Yihui Xie is a Chinese statistician, data scientist and software engineer for RStudio. He is the principal author of the open-source software package Knitr for data analysis in the R programming language, and has also written the book Dynamic Documents with R and knitr.

<span class="mw-page-title-main">Notebook interface</span> Programming tool blending code and documents

A notebook interface or computational notebook is a virtual notebook environment used for literate programming, a method of writing computer programs. Some notebooks are WYSIWYG environments including executable calculations embedded in formatted documents; others separate calculations and text into separate sections. Notebooks share some goals and features with spreadsheets and word processors but go beyond their limited data models.

<span class="mw-page-title-main">Rafael Irizarry (scientist)</span> American professor of biostatistics

Rafael Irizarry is a professor of biostatistics at the Harvard T.H. Chan School of Public Health and professor of biostatistics and computational biology at the Dana–Farber Cancer Institute. Irizarry is known as one of the founders of the Bioconductor project.

Yvonne Millicent Mahala Bishop was an English-born statistician who spent her working life in America. She wrote a "classic" book on multivariate statistics, and made important studies of the health effects of anesthetics and air pollution. Later in her career, she became the Director of the Office of Statistical Standards in the Energy Information Administration.

Benjamin Strong Baumer is a statistician and sabermetrician. He is a professor of statistical and data sciences at Smith College, and was formerly the statistical analyst for the New York Mets.

<span class="mw-page-title-main">Project Jupyter</span> Open source data science software

Project Jupyter is a project to develop open-source software, open standards, and services for interactive computing across multiple programming languages. It was spun off from IPython in 2014 by Fernando Pérez and Brian Granger. Project Jupyter's name is a reference to the three core programming languages supported by Jupyter, which are Julia, Python and R. Its name and logo are an homage to Galileo's discovery of the moons of Jupiter, as documented in notebooks attributed to Galileo. Project Jupyter has developed and supported the interactive computing products Jupyter Notebook, JupyterHub, and JupyterLab. Jupyter is financially sponsored by NumFOCUS.

rnn (software) Machine Learning framework written in the R language

rnn is an open-source machine learning framework that implements recurrent neural network architectures, such as LSTM and GRU, natively in the R programming language, that has been downloaded over 100,000 times.

Hilary S. Parker is an American biostatistician and data scientist. She was formerly a senior data analyst at the fashion merchandising company Stitch Fix. Parker co-hosts the data analytics podcast Not So Standard Deviations with Roger Peng. She received her PhD in biostatistics from the Johns Hopkins Bloomberg School of Public Health and has formerly been employed by Etsy.

<span class="mw-page-title-main">R package</span> Extensions to the R statistical programming language

R packages are extensions to the R statistical programming language. R packages contain code, data, and documentation in a standardised collection format that can be installed by users of R, typically via a centralised software repository such as CRAN. The large number of packages available for R, and the ease of installing and using them, has been cited as a major factor driving the widespread adoption of the language in data science.

Thomas Lumley is an Australian statistician who serves as the chair of biostatistics at the University of Auckland in New Zealand. Lumley is also a member of the "R Core Team."

Guosheng Yin is a statistician, data scientist, educator and researcher in Biostatistics, Statistics, machine learning, and AI. Presently, Guosheng Yin is Chair in Statistics in Department of Mathematics at Imperial College London. Previously, he served as the Head of Department and the Patrick S C Poon Endowed Chair in Statistics and Actuarial Science, at the University of Hong Kong. Before he joined the University of Hong Kong, Yin worked at the University of Texas M.D. Anderson Cancer Center till 2009 as a tenured Associate Professor of Biostatistics.

Tim Hesterberg is an American statistician. He is a Fellow of the American Statistical Association and currently works as a staff data scientist at Instacart.

References

  1. 1 2 3 O'Briant, Kelly (8 December 2017). ".rprofile: Jenny Bryan". rOpenSci. doi:10.59350/p8h48-s7k80 . Retrieved 4 February 2018.
  2. "GitHub profile of Jennifer (Jenny) Bryan". GitHub. Retrieved 4 February 2018.
  3. 1 2 Machlis, Sharon (2016-11-30). "What's up with RStudio's 2 high-profile hires?". Computer World. Retrieved 19 February 2018.
  4. 1 2 Hofmann, Heike; VanderPlas, Susan (19 December 2017). "All of This Has Happened Before. All of This Will Happen Again: Data Science". Journal of Computational and Graphical Statistics. 26 (4): 775–778. doi:10.1080/10618600.2017.1385474. S2CID   126170766.
  5. Bryan, Jenny. Happy Git and GitHub for the useR . Retrieved 4 February 2018.
  6. 1 2 "Jennifer Bryan homepage" . Retrieved 4 February 2018.
  7. 1 2 Happy Git and GitHub for the useR . Retrieved 4 February 2018.
  8. Jenkins, Jeremy L; Urban, Laszlo (2010). "Fishing for neuroactive compounds". Nature Chemical Biology. 6 (3): 172–173. doi:10.1038/nchembio.320. ISSN   1552-4469. PMID   20154663.
  9. "InCytes from MBC, December 2009". Molecular Biology of the Cell. 20 (24): 5037–5038. 2009-12-15. doi:10.1091/mbc.z09-00-0024. ISSN   1059-1524. PMC   2793281 .
  10. Blondel, Marc (2012-12-27). "Flirting with CFTR modifier genes at happy hour". Genome Medicine. 4 (12): 98. doi: 10.1186/gm399 . ISSN   1756-994X. PMC   3580438 . PMID   23270638.
  11. Van der Laan, Mark (2003). "A new partitioning around medoids algorithm". Journal of Statistical Computation and Simulation. 73 (8): 575–584. doi:10.1080/0094965031000136012. S2CID   17437463.
  12. Wilson, Greg; Bryan, Jennifer; Cranston, Karen; Kitzes, Justin; Nederbragt, Lex; Teal, Tracy K. (2017-06-22). "Good enough practices in scientific computing". PLOS Computational Biology. 13 (6): e1005510. Bibcode:2017PLSCB..13E5510W. doi: 10.1371/journal.pcbi.1005510 . ISSN   1553-7358. PMC   5480810 . PMID   28640806.
  13. Bryan, Jenny (2018). "Excuse me, do you have a moment to talk about version control?". The American Statistician. 72: 20–27. doi:10.1080/00031305.2017.1399928. S2CID   125821034.
  14. Baumer, Benjamin S. (2018). "Lessons From Between the White Lines for Isolated Data Scientists". The American Statistician. 72 (1): 66–71. doi:10.1080/00031305.2017.1375985. S2CID   126280044.
  15. 1 2 Marwick, Ben; Boettiger, Carl; Mullen, Lincoln (29 September 2017). "Packaging Data Analytical Work Reproducibly Using R (and Friends)". The American Statistician. 72 (1): 80–88. doi:10.1080/00031305.2017.1375986. S2CID   125412832.
  16. McNamara, Amelia; Horton, Nicholas J.; Baumer, Benjamin S. (19 December 2017). "Greater Data Science at Baccalaureate Institutions". Journal of Computational and Graphical Statistics. 26 (4): 781–783. arXiv: 1710.08728 . Bibcode:2017arXiv171008728M. doi:10.1080/10618600.2017.1386568. S2CID   88522819.
  17. Zhou, Helen (2016-02-29). "New Master of Data Science coming to UBC". The Ubyssey.
  18. Bryan, Jenny (2018). "Data wrangling, exploration, and analysis with R". Archived from the original on 24 February 2018. Retrieved 20 March 2018.
  19. Julia Carie Wong (2016-02-12). "Women considered better coders- but only if they hide their gender". The Guardian.
  20. Bryan, Jenny (2016). "Data Rectangling (Talk presented at PLOTCON 2016)".
  21. Boettiger., Carl (Dec 11, 2017). "Data Rectangling with jq". Boettiger Group. Retrieved 20 March 2018.
  22. Leek, Jeff (2016-12-20). "A non-comprehensive list of awesome things other people did in 2016". Simply Stats. Retrieved 20 March 2018.
  23. "EARL Boston Revisited". Mango Business Solutions. 5 Dec 2016. Retrieved 20 March 2018.
  24. Kitzes, Justin (2018). The practice of reproducible research : case studies and lessons from the data-intensive sciences. Oakland, California: University of California Press. ISBN   9780520294752.
  25. "Project-oriented workflow". Tidyverse Blog. 2017. Retrieved 20 March 2018.
  26. Smith, David (2 January 2018). "Do you have bad R habits? Here's how to identify and fix them". Revolutions: Daily news about using open source R for big data analysis, predictive modeling, data science, and visualization since 2008. Retrieved 20 March 2018.
  27. Layton, Richard (19 November 2015). "Influences of Reproducible Reporting on Work Flow". Chance. 28 (4): 60–64. doi:10.1080/09332480.2015.1120133. S2CID   61249336.
  28. de Vries, Andrie (2 September 2015). "Using the googlesheets package to work with Google Sheets". Revolutions: Daily news about using open source R for big data analysis, predictive modeling, data science, and visualization since 2008. Retrieved 20 March 2018.
  29. "rOpenSci: Meet Our Team".
  30. 1 2 "Jenny Bryan's CV" (PDF). Retrieved 4 February 2018.
  31. 1 2 Middleton, Atakohu (2017-12-15). "StatsChat Jenny Bryan: "You need a huge tolerance for ambiguity"". StatsChat. Retrieved 4 February 2018.
  32. Robinson, Emily. "Does a tweet count as a citation? His name is Toby". Twitter. Retrieved 15 October 2018.