R package

Last updated
R logo R logo.svg
R logo

R packages are extensions to the R statistical programming language. R packages contain code, data, and documentation in a standardised collection format that can be installed by users of R, typically via a centralised software repository such as CRAN (the Comprehensive R Archive Network). [1] [2] The large number of packages available for R, and the ease of installing and using them, has been cited as a major factor driving the widespread adoption of the language in data science. [3] [4] [5] [6]

Contents

Compared to libraries in other programming languages, R packages must conform to a relatively strict specification. [3] The Writing R Extensions manual [7] specifies a standard directory structure for R source code, data, documentation, and package metadata, which enables them to be installed and loaded using R's in-built package management tools. [3] Packages distributed on CRAN must meet additional standards. [3] [8] According to John Chambers, whilst these requirements "impose considerable demands" on package developers, they improve the usability and long-term stability of packages for end users. [3]

Repositories

Comprehensive R Archive Network (CRAN)

The Comprehensive R Archive Network (CRAN) homepage CRAN homepage.png
The Comprehensive R Archive Network (CRAN) homepage

The Comprehensive R Archive Network (CRAN) is R's central software repository, supported by the R Foundation. [9] It contains an archive of the latest and previous versions of the R distribution, documentation, and contributed R packages. [10] It includes both source packages and pre-compiled binaries for Windows and macOS. [11] As of November 2020, more than 16,000 packages are available. [12] CRAN was created by Kurt Hornik and Friedrich Leisch in 1997, [13] [14] with the name paralleling other early packing systems such as TeX's CTAN (released 1992) and Perl's CPAN (released 1995). [15] As of 2021, it is still maintained by Hornik and a team of volunteers. [9] The master site is located at the Vienna University of Economics and Business and is mirrored on servers around the world. [10]

Homepage for R CRAN Task Views R Task Views Homepage.png
Homepage for R CRAN Task Views

The "Task Views" page (subject list) on the CRAN website [16] lists a wide range of tasks (in fields such as finance, genetics, high performance computing, machine learning, medical imaging, meta-analysis, social sciences and spatial statistics) for which R packages are available. Another way to browse CRAN packages is provided by Metacran, [17] which also maintains lists of featured, most downloaded, trending or most depended upon packages.

The number of CRAN packages has grown exponentially for many years, [18] and as of 2018 an average of 21 submissions of new or updated packages were made every day. [6] Since each submission is manually reviewed by a small team of CRAN maintainers, many of whom, according to R core developer Peter Dalgaard, are "approaching pensionable age", there is a concern that this system is not sustainable in the long term. [6] The growth of CRAN has exposed limitations of its dependency management infrastructure, particularly the fact that it assumes that dependencies always refer to the latest version of a package, meaning that new releases of CRAN packages must always be backwards compatible, [19] and that CRAN packages cannot have dependencies that are not on CRAN. [20] It has also led to concerns about declining quality of packages. [21]

MRAN and Posit Package Manager

Homepage for the Microsoft R Application Network (MRAN) MRAN homepage.png
Homepage for the Microsoft R Application Network (MRAN)

The Microsoft R Application Network (MRAN) is a mirror of CRAN maintained by Microsoft which is based on the company's downstream distribution of R, Microsoft R Open (formerly Revolution R Open). [22] It also includes an archive of daily CRAN snapshots, branded as the "CRAN Time Machine", which enables users of MRAN to bypass the dependency versioning limitations of CRAN by installing a fixed set of R package versions via the package checkpoint. [23] [24] In January 2023 Microsoft announced that MRAN was being retired and the associated websites and repositories became unavailable in July 2023. [25]

Homepage for the Posit Package Manager Posit package manager homepage.png
Homepage for the Posit Package Manager

The Posit Package Manager (formerly RStudio Package Manager) is a similar tool produced by the developers of RStudio which, in addition to CRAN snapshots, includes an archive of R packages from Bioconductor and Python packages from the Python Package Index. [26] It also distributes pre-compiled binary packages for Linux (only Windows and macOS binaries are included on CRAN). [27]

Other repositories

The Bioconductor project provides R packages for the analysis of genomic data. This includes object-oriented data-handling and analysis tools for data from Affymetrix, cDNA microarray, and next-generation high-throughput sequencing methods. [28]

Homepage for R-Forge R-Forge Homepage.png
Homepage for R-Forge

R-Forge, [29] is a central platform for the collaborative development of R packages, R-related software, and projects. R-Forge also hosts many unpublished beta packages, and development versions of CRAN packages.

R is distributed with fifteen "base packages": base, compiler, datasets, grDevices, graphics, grid, methods, parallel, splines, stats, stats4, tcltk, tools, translations, and utils. [30]

In addition, there are fifteen "recommended packages" from CRAN which are included with binary distributions of R: KernSmooth, MASS, Matrix, boot, class, cluster, codetools, foreign, lattice, mgcv, nlme, nnet, rpart, spatial, and survival. [30]

Other packages

A group of packages called the tidyverse, which can be considered a "dialect of the R language", is increasingly popular in the R ecosystem. As of 2020-06-13, Metacran [17] listed 7 of the 8 core packages of the tidyverse in the list of most downloaded R packages. The group of packages strives to provide a cohesive collection of functions to deal with common data science tasks, including data import, cleaning, transformation and visualisation (notably with the ggplot2 package).

The R Infrastructure packages [31] support coding and the development of R packages and as of 2021-05-04, Metacran [17] lists 16 of these packages among the 25 most downloaded packages.

See also

Related Research Articles

<span class="mw-page-title-main">Package manager</span> Software tools for handling software packages

A package manager or package-management system is a collection of software tools that automates the process of installing, upgrading, configuring, and removing computer programs for a computer in a consistent manner.

<span class="mw-page-title-main">R (programming language)</span> Programming language for statistics

R is a programming language for statistical computing and data visualization. It has been adopted in the fields of data mining, bioinformatics, and data analysis.

Bioconductor is a free, open source and open development software project for the analysis and comprehension of genomic data generated by wet lab experiments in molecular biology.

The following tables provide a comparison of numerical analysis software.

A software repository, or repo for short, is a storage location for software packages. Often a table of contents is also stored, along with metadata. A software repository is typically managed by source or version control, or repository managers. Package managers allow automatically installing and updating repositories, sometimes called "packages".

<span class="mw-page-title-main">Python Package Index</span> Software repository

The Python Package Index, abbreviated as PyPI and also known as the Cheese Shop, is the official third-party software repository for Python. It is analogous to the CPAN repository for Perl and to the CRAN repository for R. PyPI is run by the Python Software Foundation, a charity. Some package managers, including pip, use PyPI as the default source for packages and their dependencies.

A software package development process is a system for developing software packages. Such packages are used to reuse and share code, e.g., via a software repository. A package development process includes a formal system for package checking that usually exposes bugs, thereby potentially making it easier to produce trustworthy software. It may also include a standard for documentation, thereby making it easier for new users to learn how to use it.

Revolution Analytics is a statistical software company focused on developing open source and "open-core" versions of the free and open source software R for enterprise, academic and analytics customers. Revolution Analytics was founded in 2007 as REvolution Computing providing support and services for R in a model similar to Red Hat's approach with Linux in the 1990s as well as bolt-on additions for parallel processing. In 2009 the company received nine million in venture capital from Intel along with a private equity firm and named Norman H. Nie as their new CEO. In 2010 the company announced the name change as well as a change in focus. Their core product, Revolution R, would be offered free to academic users and their commercial software would focus on big data, large scale multiprocessor computing, and multi-core functionality.

<span class="mw-page-title-main">RStudio</span> Integrated development environment for R

RStudio IDE is an integrated development environment for R, a programming language for statistical computing and graphics. It is available in two formats: RStudio Desktop is a regular desktop application while RStudio Server runs on a remote server and allows accessing RStudio using a web browser. The RStudio IDE is a product of Posit PBC.

<span class="mw-page-title-main">Knitr</span> Report generation engine with R

knitr is a software engine for dynamic report generation with R. It is a package in the programming language R that enables integration of R code into LaTeX, LyX, HTML, Markdown, AsciiDoc, and reStructuredText documents. The purpose of knitr is to allow reproducible research in R through the means of literate programming. It is licensed under the GNU General Public License.

<span class="mw-page-title-main">Hadley Wickham</span> New Zealand statistician

Hadley Alexander Wickham is a New Zealand statistician known for his work on open-source software for the R statistical programming environment. He is the chief scientist at Posit PBC and an adjunct professor of statistics at the University of Auckland, Stanford University, and Rice University. His work includes the data visualisation system ggplot2 and the tidyverse, a collection of R packages for data science based on the concept of tidy data.

<span class="mw-page-title-main">Anaconda (Python distribution)</span> Python and R distribution

Anaconda is a distribution of the Python and R programming languages for scientific computing, that aims to simplify package management and deployment. The distribution includes data-science packages suitable for Windows, Linux, and macOS. It is developed and maintained by Anaconda, Inc., which was founded by Peter Wang and Travis Oliphant in 2012. As an Anaconda, Inc. product, it is also known as Anaconda Distribution or Anaconda Individual Edition, while other products from the company are Anaconda Team Edition and Anaconda Enterprise Edition, neither of which is free.

Conda is an open-source, cross-platform, language-agnostic package manager and environment management system. It was originally developed to solve package management challenges faced by Python data scientists, and today is a popular package manager for Python and R. At first, Anaconda Python distribution was developed by Anaconda Inc.; later, it was spun out as a separate package, released under the BSD license. The Conda package and environment manager is included in all versions of Anaconda, Miniconda, and Anaconda Repository. Conda is a NumFOCUS affiliated project.

<span class="mw-page-title-main">Joseph J. Allaire</span> Co-founder of several companies including Allaire Corporation and RStudio

Joseph J. Allaire, better known professionally as J. J. Allaire, is an American-born software engineer and Internet entrepreneur. He created the ColdFusion programming language and web application server, founded Allaire Corporation, OnFolio, FitNow, and RStudio, and created LoseIt! and Windows Live Writer. Allaire is currently the founder and CEO of statistical computing company Posit.

The R Journal is a peer-reviewed open-access scientific journal published by The R Foundation since 2009. It publishes research articles in statistical computing that are of interest to users of the R programming language. The journal includes a News and Notes section that supersedes the R News newsletter, which was published from 2001 to 2008.

rnn (software) Machine Learning framework written in the R language

rnn is an open-source machine learning framework that implements recurrent neural network architectures, such as LSTM and GRU, natively in the R programming language, that has been downloaded over 100,000 times.

dplyr R package

dplyr is an R package whose set of functions are designed to enable dataframe manipulation in an intuitive, user-friendly way. It is one of the core packages of the popular tidyverse set of packages in the R programming language. Data analysts typically use dplyr in order to transform existing datasets into a format better suited for some particular type of analysis, or data visualization.

Posit PBC is an open-source data science software company. It is a public-benefit corporation founded by J. J. Allaire, creator of the programming language ColdFusion.

References

  1. Hornik, Kurt (2020-02-20). "Frequently Asked Questions on R". The Comprehensive R Archive Network. 7.29: What is the difference between package and library?. Archived from the original on 2011-07-09. Retrieved 2 November 2020.{{cite web}}: CS1 maint: location (link)
  2. Wickham, Hadley; Bryan, Jennifer. "Introduction". R Packages (2nd ed.). Archived from the original on 2022-06-29. Retrieved 2020-11-02.
  3. 1 2 3 4 5 Chambers, John M. (2020). "S, R, and Data Science". The R Journal. 12 (1): 462–476. doi: 10.32614/RJ-2020-028 . ISSN   2073-4859. Archived from the original on 2020-11-01. Retrieved 2020-11-02.
  4. Vance, Ashlee (2009-01-06). "Data Analysts Captivated by R's Power". New York Times . Archived from the original on 2021-05-02. Retrieved 2020-11-02.
  5. Tippmann, Sylvia (2014-12-29). "Programming tools: Adventures with R". Nature News. 517 (7532): 109–110. doi: 10.1038/517109a . PMID   25557714.
  6. 1 2 3 Thieme, Nick (2018). "R generation". Significance. 15 (4): 14–19. doi: 10.1111/j.1740-9713.2018.01169.x . ISSN   1740-9713.
  7. "Writing R Extensions". The Comprehensive R Archive Network. Archived from the original on 2020-11-12. Retrieved 2020-11-02.
  8. "CRAN Repository Policy". The Comprehensive R Archive Network. Archived from the original on 2020-11-05. Retrieved 2020-11-02.
  9. 1 2 CRAN Repository Maintainers. "CRAN Repository Policy". The Comprehensive R Archive Network. R Project. Archived from the original on 11 November 2020. Retrieved 20 November 2020.
  10. 1 2 Hornik, Kurt (2020-02-20). "Frequently Asked Questions on R". The Comprehensive R Archive Network. 2.1: What is CRAN?: R Project. Archived from the original on 2011-07-09. Retrieved 20 November 2020.{{cite web}}: CS1 maint: location (link)
  11. CRAN Repository Maintainers. "The Comprehensive R Archive Network". R Project. Archived from the original on 23 January 2019. Retrieved 20 November 2020.
  12. CRAN Repository Maintainers. "CRAN - Contributed Packages". The Comprehensive R Archive Network. CRAN. Archived from the original on 24 November 2020. Retrieved 20 November 2020.
  13. Hornik, Kurt (1997-04-23). "ANNOUNCE: CRAN". r-announce (Mailing list). Archived from the original on 2021-03-08. Retrieved 20 November 2020.
  14. Thieme, Nick (2018). "R generation". Significance. 15 (4): 14–19. doi: 10.1111/j.1740-9713.2018.01169.x . ISSN   1740-9713.
  15. Fitzgerald, Brian (2016-02-09). "A Survey of Programming Language Package Systems". Some Things Are Obvious. Archived from the original on 2020-11-09. Retrieved 4 May 2021.
  16. "CRAN Task Views". cran.r-project.org. Archived from the original on 2011-07-09. Retrieved 2018-09-16.
  17. 1 2 3 "Metacran". Archived from the original on 2021-04-20. Retrieved 2021-05-04.
  18. Asay, Matt (April 21, 2016). "Exponential growth of R's open source community threatens commercial competitors". TechRepublic. Archived from the original on 2020-10-26. Retrieved 2020-11-02.
  19. Ooms, Jeroen (2013). "Possible Directions for Improving Dependency Versioning in R". The R Journal. 5 (1): 197–206. arXiv: 1303.2140 . doi: 10.32614/RJ-2013-019 . ISSN   2073-4859. S2CID   6791850. Archived from the original on 2020-09-19. Retrieved 2020-11-02.
  20. Decan, A.; Mens, T.; Claes, M.; Grosjean, P. (2016). "When GitHub Meets CRAN: An Analysis of Inter-Repository Package Dependency Problems". 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER). Vol. 1. pp. 493–504. doi:10.1109/SANER.2016.12. ISBN   978-1-5090-1855-0. S2CID   16751624. Archived from the original on 2021-01-16. Retrieved 2021-05-12.
  21. Hornik, Kurt (2012). "Are There Too Many R Packages?". Austrian Journal of Statistics. 41 (1): 59–66–59–66. doi: 10.17713/ajs.v41i1.188 . ISSN   1026-597X. Archived from the original on 2020-11-26. Retrieved 2020-11-02.
  22. "Welcome to MRAN". Microsoft R Application Network. Microsoft. Archived from the original on 4 May 2021. Retrieved 4 May 2021.
  23. "Reproducibility: Using Fixed CRAN Repository Snapshots". Microsoft R Application Network. Microsoft. Archived from the original on 2 May 2021. Retrieved 4 May 2021.
  24. Smith, David (2019-05-22). "MRAN snapshots, and you". Revolutions. Revolution Analytics. Archived from the original on 2021-05-04. Retrieved 4 May 2021.
  25. "Microsoft R Application Network retirement". techcommunity.microsoft.com. Retrieved 2023-11-15.
  26. Lopp, Sean (2020-12-07). "RStudio Package Manager 1.2.0 - Bioconductor & PyPI". RStudio Blog. RStudio. Archived from the original on 2021-05-04. Retrieved 4 May 2021.
  27. Lopp, Sean (2020-07-01). "Announcing Public Package Manager and v1.1.6". RStudio Blog. RStudio. Archived from the original on 2021-05-04. Retrieved 4 May 2021.
  28. Huber, W; Carey, VJ; Gentleman, R; Anders, S; Carlson, M; Carvalho, BS; Bravo, HC; Davis, S; Gatto, L; Girke, T; Gottardo, R; Hahne, F; Hansen, KD; Irizarry, RA; Lawrence, M; Love, MI; MacDonald, J; Obenchain, V; Oleś, AK; Pagès, H; Reyes, A; Shannon, P; Smyth, GK; Tenenbaum, D; Waldron, L; Morgan, M (2015). "Orchestrating high-throughput genomic analysis with Bioconductor". Nature Methods. 12 (2). Nature Publishing Group: 115–121. doi:10.1038/nmeth.3252. PMC   4509590 . PMID   25633503.
  29. "R-Forge: Welcome". Archived from the original on 2018-09-14. Retrieved 2018-09-16.
  30. 1 2 Hornik, Kurt (2020-02-20). "Frequently Asked Questions on R". The Comprehensive R Archive Network. 5.1: Which add-on packages exist for R?. Archived from the original on 2011-07-09. Retrieved 2 November 2020.{{cite web}}: CS1 maint: location (link)
  31. "R infrastructure". GitHub . Archived from the original on 2021-05-19. Retrieved 2021-05-04.

Further reading