Tidyverse

Last updated
Tidyverse
Repository github.com/tidyverse/tidyverse
Written in R
Type Package collection
License MIT
Website www.tidyverse.org OOjs UI icon edit-ltr-progressive.svg

The tidyverse is a collection of open source packages for the R programming language introduced by Hadley Wickham [1] and his team that "share an underlying design philosophy, grammar, and data structures" of tidy data. [2] Characteristic features of tidyverse packages include extensive use of non-standard evaluation and encouraging piping. [3] [4] [5]

As of November 2018, the tidyverse package and some of its individual packages comprise 5 out of the top 10 most downloaded R packages. [6] The tidyverse is the subject of multiple books and papers. [7] [8] [9] [10] In 2019, the ecosystem has been published in the Journal of Open Source Software . [11]

Its syntax has been referred to as "supremely readable", [12] and some [13] have argued that tidyverse is an effective way to introduce complete beginners to programming, as pedagogically it allows students to quickly begin doing data processing tasks. [14] [13] Moreover, some practitioners have pointed out that data processing tasks are intuitively easier to chain together with tidyverse compared to Python's equivalent data processing package, pandas. [15] There is also an active R community around the tidyverse. For example, there is the TidyTuesday social data project organised by the Data Science Learning Community (DSLC), [16] where varied real-world datasets are released each week for the community to participate, share, practice, and make learning to work with data easier. [17] Critics of the tidyverse have argued it promotes tools that are harder to teach and learn than their built-in, base R equivalents and are too dissimilar to some programming languages. [18] [19]

The tidyverse principles more generally encourage and help ensure that a universe of streamlined packages, in principle, will help alleviate dependency issues and compatibility with current and future features. [20] An example of such a tidyverse principled approach is the pharmaverse, which is a collection of R packages for clinical reporting usage in pharma. [21]

Packages

The core tidyverse packages, which provide functionality to model, transform, and visualize data, include: [22]

Additional packages assist the core collection. [23] Other packages based on the tidy data principles are regularly developed, such as tidytext [24] for text analysis, tidymodels [25] for machine learning, or tidyquant [26] for financial operations.

Related Research Articles

<span class="mw-page-title-main">SciPy</span> Open-source Python library for scientific computing

SciPy is a free and open-source Python library used for scientific computing and technical computing.

<span class="mw-page-title-main">R (programming language)</span> Programming language for statistics

R is a programming language for statistical computing and data visualization. It has been adopted in the fields of data mining, bioinformatics, and data analysis.

In computing, data transformation is the process of converting data from one format or structure into another format or structure. It is a fundamental aspect of most data integration and data management tasks such as data wrangling, data warehousing, data integration and application integration.

Norman Saul Matloff is an American professor of computer science at the University of California, Davis.

A software repository, or repo for short, is a storage location for software packages. Often a table of contents is also stored, along with metadata. A software repository is typically managed by source or version control, or repository managers. Package managers allow automatically installing and updating repositories, sometimes called "packages".

Data wrangling, sometimes referred to as data munging, is the process of transforming and mapping data from one "raw" data form into another format with the intent of making it more appropriate and valuable for a variety of downstream purposes such as analytics. The goal of data wrangling is to assure quality and useful data. Data analysts typically spend the majority of their time in the process of data wrangling compared to the actual analysis of the data.

ggplot2 Data visualization package for R

ggplot2 is an open-source data visualization package for the statistical programming language R. Created by Hadley Wickham in 2005, ggplot2 is an implementation of Leland Wilkinson's Grammar of Graphics—a general scheme for data visualization which breaks up graphs into semantic components such as scales and layers. ggplot2 can serve as a replacement for the base graphics in R and contains a number of defaults for web and print display of common scales. Since 2005, ggplot2 has grown in use to become one of the most popular R packages.

<span class="mw-page-title-main">Snake case</span> Words joined with underscores

Snake case is the naming convention in which each space is replaced with an underscore (_) character, and words are written in lowercase. It is a commonly used naming convention in computing, for example for variable and subroutine names, and for filenames. One study has found that readers can recognize snake case values more quickly than camel case. However, "subjects were trained mainly in the underscore style", so the possibility of bias cannot be eliminated.

<span class="mw-page-title-main">Hadley Wickham</span> New Zealand statistician

Hadley Alexander Wickham is a New Zealand statistician known for his work on open-source software for the R statistical programming environment. He is the chief scientist at Posit PBC and an adjunct professor of statistics at the University of Auckland, Stanford University, and Rice University. His work includes the data visualisation system ggplot2 and the tidyverse, a collection of R packages for data science based on the concept of tidy data.

Heike Hofmann is a statistician and Professor in the Department of Statistics at University of Nebraska–Lincoln and was previously at Iowa State University.

Mine Çetinkaya-Rundel is a Turkish-American statistician and professor of the practice at Duke University, and a professional educator at RStudio. She is the author of several open source statistics textbooks and is an instructor for Coursera. She is the chair-elect of the Statistical Education Section of the American Statistical Association. Previously, she was a senior lecturer at University of Edinburgh.

rnn (software) Machine Learning framework written in the R language

rnn is an open-source machine learning framework that implements recurrent neural network architectures, such as LSTM and GRU, natively in the R programming language, that has been downloaded over 100,000 times.

Jennifer "Jenny" Bryan is a data scientist and an associate professor of statistics at the University of British Columbia where she developed the Master in Data Science Program. She is a statistician and software engineer at RStudio from Vancouver, Canada and is known for creating open source tools which connect R to Google Sheets and Google Drive.

dplyr R package

dplyr is an R package whose set of functions are designed to enable dataframe manipulation in an intuitive, user-friendly way. It is one of the core packages of the popular tidyverse set of packages in the R programming language. Data analysts typically use dplyr in order to transform existing datasets into a format better suited for some particular type of analysis, or data visualization.

<span class="mw-page-title-main">R package</span> Extensions to the R statistical programming language

R packages are extensions to the R statistical programming language. R packages contain code, data, and documentation in a standardised collection format that can be installed by users of R, typically via a centralised software repository such as CRAN. The large number of packages available for R, and the ease of installing and using them, has been cited as a major factor driving the widespread adoption of the language in data science.

<span class="mw-page-title-main">Jamovi</span> Graphical user interface for R programming language

jamovi is a free and open-source computer program for data analysis and performing statistical tests. The core developers of jamovi are Jonathon Love, Damian Dropmann, and Ravi Selker, who were developers for the JASP project.

<span class="mw-page-title-main">Easystats</span> Software package for the R language

The easystats collection of open source R packages was created in 2019 and primarily includes tools dedicated to the post-processing of statistical models. As of May 2022, the 10 packages composing the easystats ecosystem have been downloaded more than 8 million times, and have been used in more than 1000 scientific publications. The ecosystem is the topic of several statistical courses, video tutorials and books.

Posit PBC is an open-source data science software company. It is a public-benefit corporation founded by J. J. Allaire, creator of the programming language ColdFusion.

Chester Ismay is an American data professional and educator with a background in data science, statistical modeling, and machine learning. He served as the Senior Director of Data Science Education at Flatiron School. Ismay has co-authored several R packages, including infer, fivethirtyeight, thesisdown, and moderndive.

References

  1. "Welcome to the Tidyverse". Revolutions. Retrieved 2018-11-26.
  2. "Tidyverse". www.tidyverse.org. Retrieved 2018-11-26.
  3. Wickham, Stefan Milton Bache and Hadley (2014-11-22), magrittr: A Forward-Pipe Operator for R , retrieved 2020-04-20
  4. Wickham, Hadley. 4 Pipes | The tidyverse style guide.
  5. Wickham, Hadley (May 30, 2019). Advanced R (2nd ed.). New York: Chapman & Hall. ISBN   978-0815384571.{{cite book}}: CS1 maint: date and year (link)
  6. "RDocumentation". www.rdocumentation.org. Retrieved 2018-11-26.
  7. Duggan, Jim (2018-09-07). "Input and output data analysis for system dynamics modelling using the tidyverse libraries of R". System Dynamics Review. 34 (3): 438–461. doi:10.1002/sdr.1600. hdl: 10379/15029 . ISSN   0883-7066. S2CID   70005357.
  8. Chang, Winston (2013). R Graphics Cookbook. "O'Reilly Media, Inc.". ISBN   9781449316952.
  9. C., Boehmke, Bradley (2016-11-17). Data wrangling with R. Cham. ISBN   9783319455990. OCLC   964404346.{{cite book}}: CS1 maint: location missing publisher (link) CS1 maint: multiple names: authors list (link)
  10. Hadley, Wickham (2017). R for data science : import, tidy, transform, visualize, and model data. Grolemund, Garrett (First ed.). Sebastopol, CA. ISBN   9781491910399. OCLC   968213225.{{cite book}}: CS1 maint: location missing publisher (link)
  11. Wickham, Hadley; Averick, Mara; Bryan, Jennifer; Chang, Winston; McGowan, Lucy D'Agostino; François, Romain; Grolemund, Garrett; Hayes, Alex; Henry, Lionel; Hester, Jim; Kuhn, Max; Pedersen, Thomas Lin; Miller, Evan; Bache, Stephan Milton; Müller, Kirill; Ooms, Jeroen; Robinson, David; Seidel, Dana Paige; Spinu, Vitalie; Takahashi, Kohske; Vaughan, Davis; Wilke, Claus; Woo, Kara; Yutani, Hiroaki (21 November 2019). "Welcome to the Tidyverse". Journal of Open Source Software. 4 (43): 1686. Bibcode:2019JOSS....4.1686W. doi: 10.21105/joss.01686 . S2CID   214002773.
  12. Steinmetz, Art (2024-04-10). "Outsider Data Science - The Truth About Tidy Wrappers". outsiderdata.netlify.app. Retrieved 2024-04-11.
  13. 1 2 Heppler, Jason (2018-02-27). "Teaching the tidyverse to R novices". Medium. Retrieved 2023-08-24.
  14. on, Teach the tidyverse to beginners was published (5 July 2017). "Teach the tidyverse to beginners". Variance Explained. Retrieved 2022-07-15.
  15. "Why pandas feels clunky when coming from R". Rasmus Bååth's Blog. Retrieved 2024-03-30.
  16. "dslc.io". dslc.io. Retrieved 2024-08-11.
  17. rfordatascience/tidytuesday, Data Science Learning Community, 2024-08-11, retrieved 2024-08-11
  18. Matloff, Norm (30 September 2019). "An opinionated view of the Tidyverse "dialect" of the R language". GitHub. Retrieved 28 October 2019.
  19. Muenchen, Bob (23 March 2017). "The Tidyverse Curse". r4stats.com.
  20. "The Power of Transitioning to a '-verse' Approach in R Package Development". www.appsilon.com. Retrieved 2024-08-11.
  21. "pharmaverse". pharmaverse.org. Retrieved 2024-08-11.
  22. "Tidyverse packages - Tidyverse" . Retrieved 2018-11-26.
  23. "Tidyverse packages". www.tidyverse.org. Retrieved 2020-12-22.
  24. Silge, Julia (2023-02-01), tidytext: Text mining using tidy tools , retrieved 2023-02-03
  25. "Tidymodels". www.tidymodels.org. Retrieved 2023-02-03.
  26. "Tidy Quantitative Financial Analysis". business-science.github.io. Retrieved 2023-02-03.