Jennifer "Jenny" Bryan | |
---|---|
Occupations | |
Known for | R packages |
Academic background | |
Alma mater | Yale University (B.A.) University of California, Berkeley (PhD) |
Website | https://jennybryan.org |
Jennifer "Jenny" Bryan is a data scientist and an associate professor of statistics at the University of British Columbia where she developed the Master in Data Science Program. She is a statistician and software engineer at RStudio from Vancouver, Canada and is known for creating open source tools which connect R to Google Sheets and Google Drive. [1] [2] [3] [4]
Bryan earned her Bachelor of Arts in Economics and German literature from Yale University in 1992 and her PhD in Biostatistics from University of California, Berkeley in 2001. [5] [6]
As an associate professor of statistics at the University of British Columbia, [7] Bryan worked on biostatistics with a focus on gene expression and microarray data. Notable projects to which she has contributed include the quantification of photomotor responses in larval zebrafish, [8] the development of an assay system in the multicellular animal Caenorhabditis elegans to test genetic interactions causing synthetic lethality in somatic cells, [9] and a novel yeast-based model to search for modifier genes involved in cystic fibrosis. [10] Beyond biostatistics, Bryan has also contributed to medoids-based clustering methods. [11] Her general science contributions include a manifesto published in PLOS One on good practices for scientific computing [12] and an introduction to the Git version control system [13] for research data analysis. [14] [15] [16]
Bryan's teaching activities at UBC included development of the Master of Data Science Program [17] and new materials for the STAT 545 course. [18] Under Bryan's direction, the STAT 545 course became notable as an early example of a data science course taught in a statistics program. It is also notable for its focus on teaching using modern R packages, Git and GitHub, its extensive sharing of teaching materials openly online, and its strong emphasis on practical data cleaning, exploration, and visualization skills, rather than algorithms and theory. [15] As of late 2016 Bryan is on leave from her UBC position and is working at RStudio with a team led by Hadley Wickham. [3]
Bryan has had experience with S and R since 1996. [1] [7] She is known for her open source contributions in R. [19] Influential contributions include the use of Lego [20] and the concept of data rectangling [21] for explaining programming concepts, [22] [23] reproducible research, [24] and advice on project and workflow organisation. [25] [26] [27]
Bryan is well known for her work on efficient methods of working in spreadsheets, and the connection between R and spreadsheet software such as Excel and Google Sheets. [4] She is the primary developer of the R package googlesheets, that connects R to the Google Sheets service, [28] and googledrive, an R package for interfacing between R and Google Drive.
Bryan is known for her work in teaching, her contributions to R packages, and her involvement with the leadership committee at rOpenSci. [29] [30] She is also part of the R Foundation Forwards task force and a member of the editorial board of BMC Bioinformatics. [30] [31] Previously, she worked as an Associate at the Boston Consulting Group in Boston, MA. [6]
Bryan lives with her husband, three children, and dog, Toby. [1] [31] [32]
R is a programming language for statistical computing and graphics supported by the R Core Team and the R Foundation for Statistical Computing. Created by statisticians Ross Ihaka and Robert Gentleman, R is used among data miners, bioinformaticians and statisticians for data analysis and developing statistical software. The core R language is augmented by a large number of extension packages containing reusable code and documentation.
Computational statistics, or statistical computing, is the bond between statistics and computer science, and refers to the statistical methods that are enabled by using computational methods. It is the area of computational science specific to the mathematical science of statistics. This area is also developing rapidly, leading to calls that a broader concept of computing should be taught as part of general statistical education.
In statistics, a generalized estimating equation (GEE) is used to estimate the parameters of a generalized linear model with a possible unmeasured correlation between observations from different timepoints. Although some believe that Generalized estimating equations are robust in everything even with the wrong choice of working-correlation matrix, Generalized estimating equations are only robust to loss of consistency with the wrong choice.
Statistics is the theory and application of mathematics to the scientific method including hypothesis generation, experimental design, sampling, data collection, data summarization, estimation, prediction and inference from those results to the population from which the experimental sample was drawn. This article lists statisticians who have been instrumental in the development of theoretical and applied statistics.
Probabilistic programming (PP) is a programming paradigm in which probabilistic models are specified and inference for these models is performed automatically. It represents an attempt to unify probabilistic modeling and traditional general purpose programming in order to make the former easier and more widely applicable. It can be used to create systems that help make decisions in the face of uncertainty.
RStudio is an integrated development environment for R, a programming language for statistical computing and graphics. It is available in two formats: RStudio Desktop is a regular desktop application while RStudio Server runs on a remote server and allows accessing RStudio using a web browser. The RStudio IDE is a product of Posit PBC.
Hadley Alexander Wickham is a New Zealand statistician known for his work on open-source software for the R statistical programming environment. He is the chief scientist at Posit, PBC and an adjunct professor of statistics at the University of Auckland, Stanford University, and Rice University. His work includes the data visualisation system ggplot2 and the tidyverse, a collection of R packages for data science based on the concept of tidy data.
Yihui Xie is a Chinese statistician, data scientist and software engineer for RStudio. He is the principal author of the open-source software package Knitr for data analysis in the R programming language, and has also written the book Dynamic Documents with R and knitr.
A notebook interface or computational notebook is a virtual notebook environment used for literate programming, a method of writing computer programs. Some notebooks are WYSIWYG environments including executable calculations embedded in formatted documents; others separate calculations and text into separate sections. Notebooks share some goals and features with spreadsheets and word processors but go beyond their limited data models.
Rafael Irizarry is a professor of biostatistics at the Harvard T.H. Chan School of Public Health and professor of biostatistics and computational biology at the Dana–Farber Cancer Institute. Irizarry is known as one of the founders of the Bioconductor project.
Yvonne Millicent Mahala Bishop was an English-born statistician who spent her working life in America. She wrote a "classic" book on multivariate statistics, and made important studies of the health effects of anesthetics and air pollution. Later in her career, she became the Director of the Office of Statistical Standards in the Energy Information Administration.
Benjamin Strong Baumer is a statistician and sabermetrician. He is a professor of statistical and data sciences at Smith College, and was formerly the statistical analyst for the New York Mets.
Project Jupyter is a project to develop open-source software, open standards, and services for interactive computing across multiple programming languages. It was spun off from IPython in 2014 by Fernando Pérez and Brian Granger. Project Jupyter's name is a reference to the three core programming languages supported by Jupyter, which are Julia, Python and R. Its name and logo are an homage to Galileo's discovery of the moons of Jupiter, as documented in notebooks attributed to Galileo. Project Jupyter has developed and supported the interactive computing products Jupyter Notebook, JupyterHub, and JupyterLab. Jupyter is financially sponsored by NumFOCUS.
rnn is an open-source machine learning framework that implements recurrent neural network architectures, such as LSTM and GRU, natively in the R programming language, that has been downloaded over 100,000 times.
Hilary S. Parker is an American biostatistician and data scientist. She was formerly a senior data analyst at the fashion merchandising company Stitch Fix. Parker co-hosts the data analytics podcast Not So Standard Deviations with Roger Peng. She received her PhD in biostatistics from the Johns Hopkins Bloomberg School of Public Health and has formerly been employed by Etsy.
R packages are extensions to the R statistical programming language. R packages contain code, data, and documentation in a standardised collection format that can be installed by users of R, typically via a centralised software repository such as CRAN. The large number of packages available for R, and the ease of installing and using them, has been cited as a major factor driving the widespread adoption of the language in data science.
Thomas Lumley is an Australian statistician who serves as the chair of biostatistics at the University of Auckland in New Zealand. Lumley is also a member of the "R Core Team."
Guosheng Yin is a statistician, data scientist, educator and researcher in Biostatistics, Statistics, machine learning, and AI. Presently, Guosheng Yin is Chair in Statistics in Department of Mathematics at Imperial College London. Previously, he served as the Head of Department and the Patrick S C Poon Endowed Chair in Statistics and Actuarial Science, at the University of Hong Kong. Before he joined the University of Hong Kong, Yin worked at the University of Texas M.D. Anderson Cancer Center till 2009 as a tenured Associate Professor of Biostatistics.
Tim Hesterberg is an American statistician. He is a Fellow of the American Statistical Association and currently works as a staff data scientist at Instacart.