![]() | |
![]() R terminal | |
Paradigms | Multi-paradigm: procedural, object-oriented, functional, reflective, imperative, array [1] |
---|---|
Designed by | Ross Ihaka and Robert Gentleman |
Developer | R Core Team |
First appeared | August 1993 |
Stable release | |
Typing discipline | Dynamic |
Platform | arm64 and x86-64 |
License | GPL-2.0-or-later [3] |
Filename extensions | |
Website | r-project.org |
Influenced by | |
Influenced | |
Julia [7] pandas [8] | |
|
R is a programming language for statistical computing and data visualization. It has been adopted in the fields of data mining, bioinformatics and data analysis. [9]
The core R language is augmented by a large number of extension packages, containing reusable code, documentation, and sample data.
R software is open-source and free software. It is licensed by the GNU Project and available under the GNU General Public License. [3] It is written primarily in C, Fortran, and R itself. Precompiled executables are provided for various operating systems.
As an interpreted language, R has a native command line interface. Moreover, multiple third-party graphical user interfaces are available, such as RStudio—an integrated development environment—and Jupyter—a notebook interface.
R was started by professors Ross Ihaka and Robert Gentleman as a programming language to teach introductory statistics at the University of Auckland. [10] The language was inspired by the S programming language, with most S programs able to run unaltered in R. [6] The language was also inspired by Scheme's lexical scoping, allowing for local variables. [1]
The name of the language, R, comes from being both an S language successor as well as the shared first letter of the authors, Ross and Robert. [11] In August 1993, Ihaka and Gentleman posted a binary of R on StatLib — a data archive website. [12] At the same time, they announced the posting on the s-news mailing list. [13] On December 5, 1997, R became a GNU project when version 0.60 was released. [14] On February 29, 2000, the 1.0 version was released. [15]
R packages are collections of functions, documentation, and data that expand R. [16] For example, packages add report features such as RMarkdown, Quarto, [17] knitr and Sweave. Packages also add the capability to implement various statistical techniques such as linear, generalized linear and nonlinear modeling, classical statistical tests, spatial analysis, time-series analysis, and clustering. Easy package installation and use have contributed to the language's adoption in data science. [18]
Base packages are immediately available when starting R and provide the necessary syntax and commands for programming, computing, graphics production, basic arithmetic, and statistical functionality. [19]
The Comprehensive R Archive Network (CRAN) was founded in 1997 by Kurt Hornik and Friedrich Leisch to host R's source code, executable files, documentation, and user-created packages. [20] Its name and scope mimic the Comprehensive TeX Archive Network and the Comprehensive Perl Archive Network. [20] CRAN originally had three mirrors and 12 contributed packages. [21] As of 16 October 2024 [update] , it has 99 mirrors [22] and 21,513 contributed packages. [23] Packages are also available on repositories R-Forge, Omegahat, and GitHub. [24] [25] [26]
The Task Views on the CRAN web site list packages in fields such as causal inference, finance, genetics, high-performance computing, machine learning, medical imaging, meta-analysis, social sciences, and spatial statistics.
The Bioconductor project provides packages for genomic data analysis, complementary DNA, microarray, and high-throughput sequencing methods.
The tidyverse package bundles several subsidiary packages that provide a common interface for tasks related to accessing and processing "tidy data", [27] data contained in a two-dimensional table with a single row for each observation and a single column for each variable. [28]
Installing a package occurs only once. For example, to install the tidyverse package: [28]
> install.packages("tidyverse")
To load the functions, data, and documentation of a package, one executes the library()
function. To load tidyverse: [a]
> # Package name can be enclosed in quotes> library("tidyverse")> # But also the package name can be called without quotes> library(tidyverse)
R comes installed with a command line console. Available for installation are various integrated development environments (IDE). IDEs for R include R.app [29] (OSX/macOS only), Rattle GUI, R Commander, RKWard, RStudio, and Tinn-R. [30]
General purpose IDEs that support R include Eclipse via the StatET plugin and Visual Studio via R Tools for Visual Studio.
Editors that support R include Emacs, Vim via the Nvim-R plugin, Kate, LyX via Sweave, WinEdt (website), and Jupyter (website).
Scripting languages that support R include Python (website), Perl (website), Ruby (source code), F# (website), and Julia (source code).
General purpose programming languages that support R include Java via the Rserve socket server, and .NET C# (website).
Statistical frameworks which use R in the background include Jamovi and JASP.
The R Core Team was founded in 1997 to maintain the R source code. The R Foundation for Statistical Computing was founded in April 2003 to provide financial support. The R Consortium is a Linux Foundation project to develop R infrastructure.
The R Journal is an open access, academic journal which features short to medium-length articles on the use and development of R. It includes articles on packages, programming tips, CRAN news, and foundation news.
The R community hosts many conferences and in-person meetups - see the community maintained GitHub list. These groups include:
The main R implementation is written primarily in C, Fortran, and R itself. Other implementations include:
Microsoft R Open (MRO) was an R implementation. As of 30 June 2021, Microsoft started to phase out MRO in favor of the CRAN distribution. [33]
Although R is an open-source project, some companies provide commercial support:
> print("Hello, World!")[1] "Hello, World!"
The following examples illustrate the basic syntax of the language and use of the command-line interface. (An expanded list of standard language features can be found in the R manual, "An Introduction to R". [34] )
In R, the generally preferred assignment operator is an arrow made from two characters <-
, although =
can be used in some cases. [35]
> x<-1:6# Create a numeric vector in the current environment> y<-x^2# Create vector based on the values in x.> print(y)# Print the vector’s contents.[1] 1 4 9 16 25 36> z<-x+y# Create a new vector that is the sum of x and y> z# Return the contents of z to the current environment.[1] 2 6 12 20 30 42> z_matrix<-matrix(z,nrow=3)# Create a new matrix that turns the vector z into a 3x2 matrix object> z_matrix [,1] [,2][1,] 2 20[2,] 6 30[3,] 12 42> 2*t(z_matrix)-2# Transpose the matrix, multiply every element by 2, subtract 2 from each element in the matrix, and return the results to the terminal. [,1] [,2] [,3][1,] 2 10 22[2,] 38 58 82> new_df<-data.frame(t(z_matrix),row.names=c("A","B"))# Create a new data.frame object that contains the data from a transposed z_matrix, with row names 'A' and 'B'> names(new_df)<-c("X","Y","Z")# Set the column names of new_df as X, Y, and Z.> print(new_df)# Print the current results. X Y ZA 2 6 12B 20 30 42> new_df$Z# Output the Z column[1] 12 42> new_df$Z==new_df['Z']&&new_df[3]==new_df$Z# The data.frame column Z can be accessed using $Z, ['Z'], or [3] syntax and the values are the same. [1] TRUE> attributes(new_df)# Print attributes information about the new_df object$names[1] "X" "Y" "Z"$row.names[1] "A" "B"$class[1] "data.frame"> attributes(new_df)$row.names<-c("one","two")# Access and then change the row.names attribute; can also be done using rownames()> new_df X Y Zone 2 6 12two 20 30 42
One of R's strengths is the ease of creating new functions. [36] Objects in the function body remain local to the function, and any data type may be returned. In R, almost all functions and all user-defined functions are closures. [37]
Create a function:
# The input parameters are x and y.# The function returns a linear combination of x and y.f<-function(x,y){z<-3*x+4*y# an explicit return() statement is optional, could be replaced with simply `z`return(z)}
Usage output:
> f(1,2)[1] 11> f(c(1,2,3),c(5,3,4))[1] 23 18 25> f(1:3,4)[1] 19 22 25
It is possible to define functions to be used as infix operators with the special syntax `%name%`
where "name" is the function variable name:
> `%sumx2y2%`<-function(e1,e2){e1^2+e2^2}> 1:3%sumx2y2%-(1:3)[1] 2 8 18
Since version 4.1.0 functions can be written in a short notation, which is useful for passing anonymous functions to higher-order functions: [38]
> sapply(1:5,\(i)i^2)# here \(i) is the same as function(i) [1] 1 4 9 16 25
In R version 4.1.0, a native pipe operator, |>
, was introduced. [39] This operator allows users to chain functions together one after another, instead of a nested function call.
> nrow(subset(mtcars,cyl==4))# Nested without the pipe character[1] 11> mtcars|>subset(cyl==4)|>nrow()# Using the pipe character[1] 11
Another alternative to nested functions, in contrast to using the pipe character, is using intermediate objects:
> mtcars_subset_rows<-subset(mtcars,cyl==4)> num_mtcars_subset<-nrow(mtcars_subset_rows)> print(num_mtcars_subset)[1] 11
While the pipe operator can produce code that is easier to read, it has been advised to pipe together at most 10 to 15 lines and chunk code into sub-tasks which are saved into objects with meaningful names. [40] Here is an example with fewer than 10 lines that some readers may still struggle to grasp without intermediate named steps:
(\(x,n=42,key=c(letters,LETTERS," ",":",")"))strsplit(x,"")[[1]]|>(Vectorize(\(chr)which(chr==key)-1))()|>(`+`)(n)|>(`%%`)(length(key))|>(\(i)key[i+1])()|>paste(collapse=""))("duvFkvFksnvEyLkHAErnqnoyr")
The R language has native support for object-oriented programming. There are two native frameworks, the so-called S3 and S4 systems. The former, being more informal, supports single dispatch on the first argument and objects are assigned to a class by just setting a "class" attribute in each object. The latter is a Common Lisp Object System (CLOS)-like system of formal classes (also derived from S) and generic methods that supports multiple dispatch and multiple inheritance [41]
In the example, summary
is a generic function that dispatches to different methods depending on whether its argument is a numeric vector or a "factor":
> data<-c("a","b","c","a",NA)> summary(data) Length Class Mode 5 character character > summary(as.factor(data)) a b c NA's 2 1 1 1
The R language has built-in support for data modeling and graphics. The following example shows how R can generate and plot a linear model with residuals.
# Create x and y valuesx<-1:6y<-x^2# Linear regression model y = A + B * xmodel<-lm(y~x)# Display an in-depth summary of the modelsummary(model)# Create a 2 by 2 layout for figurespar(mfrow=c(2,2))# Output diagnostic plots of the modelplot(model)
Output:
Residuals: 1 2 3 4 5 6 7 8 9 10 3.3333 -0.6667 -2.6667 -2.6667 -0.6667 3.3333Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -9.3333 2.8441 -3.282 0.030453 * x 7.0000 0.7303 9.585 0.000662 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1Residual standard error: 3.055 on 4 degrees of freedomMultiple R-squared: 0.9583, Adjusted R-squared: 0.9478F-statistic: 91.88 on 1 and 4 DF, p-value: 0.000662
This Mandelbrot set example highlights the use of complex numbers. It models the first 20 iterations of the equation z = z2 + c
, where c
represents different complex constants.
Install the package that provides the write.gif()
function beforehand:
install.packages("caTools")
R Source code:
library(caTools)jet.colors<-colorRampPalette(c("green","pink","#007FFF","cyan","#7FFF7F","white","#FF7F00","red","#7F0000"))dx<-1500# define widthdy<-1400# define heightC<-complex(real=rep(seq(-2.2,1.0,length.out=dx),each=dy),imag=rep(seq(-1.2,1.2,length.out=dy),times=dx))# reshape as matrix of complex numbersC<-matrix(C,dy,dx)# initialize output 3D arrayX<-array(0,c(dy,dx,20))Z<-0# loop with 20 iterationsfor (kin1:20){# the central difference equationZ<-Z^2+C# capture the resultsX[,,k]<-exp(-abs(Z))}write.gif(X,"Mandelbrot.gif",col=jet.colors,delay=100)
All R version releases from 2.14.0 onward have codenames that make reference to Peanuts comics and films. [42] [43] [44]
In 2018, core R developer Peter Dalgaard presented a history of R releases since 1997. [45] Some notable early releases before the named releases include:
The idea of naming R version releases was inspired by the Debian and Ubuntu version naming system. Dalgaard also noted that another reason for the use of Peanuts references for R codenames is because, "everyone in statistics is a P-nut". [45]
Version | Release date | Name | Peanuts reference | Reference |
---|---|---|---|---|
4.4.2 | 2024-10-31 | Pile of Leaves | [46] | [47] |
4.4.1 | 2024-06-14 | Race for Your Life | [48] | [49] |
4.4.0 | 2024-04-24 | Puppy Cup | [50] | [51] |
4.3.3 | 2024-02-29 | Angel Food Cake | [52] | [53] |
4.3.2 | 2023-10-31 | Eye Holes | [54] | [55] |
4.3.1 | 2023-06-16 | Beagle Scouts | [56] | [57] |
4.3.0 | 2023-04-21 | Already Tomorrow | [58] [59] [60] | [61] |
4.2.3 | 2023-03-15 | Shortstop Beagle | [62] | [63] |
4.2.2 | 2022-10-31 | Innocent and Trusting | [64] | [65] |
4.2.1 | 2022-06-23 | Funny-Looking Kid | [66] [67] [68] [69] [70] [71] | [72] |
4.2.0 | 2022-04-22 | Vigorous Calisthenics | [73] | [74] |
4.1.3 | 2022-03-10 | One Push-Up | [73] | [75] |
4.1.2 | 2021-11-01 | Bird Hippie | [76] [77] | [75] |
4.1.1 | 2021-08-10 | Kick Things | [78] | [79] |
4.1.0 | 2021-05-18 | Camp Pontanezen | [80] | [81] |
4.0.5 | 2021-03-31 | Shake and Throw | [82] | [83] |
4.0.4 | 2021-02-15 | Lost Library Book | [84] [85] [86] | [87] |
4.0.3 | 2020-10-10 | Bunny-Wunnies Freak Out | [88] | [89] |
4.0.2 | 2020-06-22 | Taking Off Again | [90] | [91] |
4.0.1 | 2020-06-06 | See Things Now | [92] | [93] |
4.0.0 | 2020-04-24 | Arbor Day | [94] | [95] |
3.6.3 | 2020-02-29 | Holding the Windsock | [96] | [97] |
3.6.2 | 2019-12-12 | Dark and Stormy Night | See It was a dark and stormy night#Literature [98] | [99] |
3.6.1 | 2019-07-05 | Action of the Toes | [100] | [101] |
3.6.0 | 2019-04-26 | Planting of a Tree | [102] | [103] |
3.5.3 | 2019-03-11 | Great Truth | [104] | [105] |
3.5.2 | 2018-12-20 | Eggshell Igloos | [106] | [107] |
3.5.1 | 2018-07-02 | Feather Spray | [108] | [109] |
3.5.0 | 2018-04-23 | Joy in Playing | [110] | [111] |
3.4.4 | 2018-03-15 | Someone to Lean On | [112] [ better source needed ] | [113] |
3.4.3 | 2017-11-30 | Kite-Eating Tree | See Kite-Eating Tree [114] | [115] |
3.4.2 | 2017-09-28 | Short Summer | See It Was a Short Summer, Charlie Brown | [116] |
3.4.1 | 2017-06-30 | Single Candle | [117] | [118] |
3.4.0 | 2017-04-21 | You Stupid Darkness | [117] | [119] |
3.3.3 | 2017-03-06 | Another Canoe | [120] | [121] |
3.3.2 | 2016-10-31 | Sincere Pumpkin Patch | [122] | [123] |
3.3.1 | 2016-06-21 | Bug in Your Hair | [124] | [125] |
3.3.0 | 2016-05-03 | Supposedly Educational | [126] | [127] |
3.2.5 | 2016-04-11 | Very, Very Secure Dishes | [128] | [129] [130] [131] |
3.2.4 | 2016-03-11 | Very Secure Dishes | [128] | [132] |
3.2.3 | 2015-12-10 | Wooden Christmas-Tree | See A Charlie Brown Christmas [133] | [134] |
3.2.2 | 2015-08-14 | Fire Safety | [135] [136] | [137] |
3.2.1 | 2015-06-18 | World-Famous Astronaut | [138] | [139] |
3.2.0 | 2015-04-16 | Full of Ingredients | [140] | [141] |
3.1.3 | 2015-03-09 | Smooth Sidewalk | [142] [ page needed ] | [143] |
3.1.2 | 2014-10-31 | Pumpkin Helmet | See You're a Good Sport, Charlie Brown | [144] |
3.1.1 | 2014-07-10 | Sock it to Me | [145] [146] [147] [148] | [149] |
3.1.0 | 2014-04-10 | Spring Dance | [100] | [150] |
3.0.3 | 2014-03-06 | Warm Puppy | [151] | [152] |
3.0.2 | 2013-09-25 | Frisbee Sailing | [153] | [154] |
3.0.1 | 2013-05-16 | Good Sport | [155] | [156] |
3.0.0 | 2013-04-03 | Masked Marvel | [157] | [158] |
2.15.3 | 2013-03-01 | Security Blanket | [159] | [160] |
2.15.2 | 2012-10-26 | Trick or Treat | [161] | [162] |
2.15.1 | 2012-06-22 | Roasted Marshmallows | [163] | [164] |
2.15.0 | 2012-03-30 | Easter Beagle | [165] | [166] |
2.14.2 | 2012-02-29 | Gift-Getting Season | See It's the Easter Beagle, Charlie Brown [167] | [168] |
2.14.1 | 2011-12-22 | December Snowflakes | [169] | [170] |
2.14.0 | 2011-10-31 | Great Pumpkin | See It's the Great Pumpkin, Charlie Brown [171] | [172] |
r-devel | N/A | Unsuffered Consequences | [173] | [45] |
We set a goal of developing enough of a language to teach introductory statistics courses at Auckland.
The R language and related software play a major role in computing for data science. ... R packages provide tools for a wide range of purposes and users.