Dplyr

dplyr
dplyr
Original authors	Hadley Wickham, Romain François, Lionel Henry, Kirill Müller, Davis Vaughan
Initial release	January 7, 2014;11 years ago
Stable release	1.1.4 / 17 November 2023;23 months ago
Repository	github.com/tidyverse/dplyr ;
Written in	R
License	MIT License
Website	dplyr.tidyverse.org//

Last updated November 02, 2025

dplyr is an R package whose set of functions are designed to enable dataframe (a spreadsheet-like data structure) manipulation in an intuitive, user-friendly way. It is one of the core packages of the popular tidyverse set of packages in the R programming language.^[2] Data analysts typically use dplyr in order to transform existing datasets into a format better suited for some particular type of analysis, or data visualization.^[3]^[4]

For instance, someone seeking to analyze a large dataset may wish to only view a smaller subset of the data. Alternatively, a user may wish to rearrange the data in order to see the rows ranked by some numerical value, or even based on a combination of values from the original dataset. Functions within the dplyr package will allow a user to perform such tasks.

dplyr was launched in 2014.^[5] On the dplyr web page, the package is described as "a grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges."^[6]

The five core verbs

While dplyr actually includes several dozen functions that enable various forms of data manipulation, the package features five primary verbs or actions:^[7]

filter(), which is used to extract rows from a dataframe, based on conditions specified by a user;
select(), which is used to subset a dataframe by its columns;
arrange(), which is used to sort rows in a dataframe based on attributes held by particular columns;
mutate(), which is used to create new variables, by altering and/or combining values from existing columns; and
summarize(), also spelled summarise(), which is used to collapse values from a dataframe into a single summary.

Additional functions

In addition to its five main verbs, dplyr also includes several other functions that enable exploration and manipulation of dataframes. Included among these are:

count(), which is used to sum the number of unique observations that contain some particular value or categorical attribute;
rename(), which enables a user to alter the column names for variables, often to improve ease of use and intuitive understanding of a dataset;
slice_max(), which returns a data subset that contains the rows with the highest number of values for some particular variable;
slice_min(), which returns a data subset that contains the rows with the lowest number of values for some particular variable.

Built-in datasets

The dplyr package comes with five datasets. These are: band_instruments, band_instruments2, band_members, starwars, storms.

Copyright & license

The copyright to dplyr is held by Posit PBC, formerly RStudio PBC. dplyr was originally released under a GPL license^{[ citation needed ]}, but in 2022, Posit changed the license terms for the package to the "more permissive" MIT License.^[8] The main difference between the two types of license is that the MIT license allows subsequent re-use of code within proprietary software, whereas a GPL license does not.

References

↑ "Release 1.1.4". 17 November 2023. Retrieved 16 July 2025.
↑ Wickham, Hadley; Averick, Mara; Bryan, Jennifer; Chang, Winston; McGowan, Lucy D'Agostino; François, Romain; Grolemund, Garrett; Hayes, Alex; Henry, Lionel; Hester, Jim; Kuhn, Max; Pedersen, Thomas Lin; Miller, Evan; Bache, Stephan Milton; Müller, Kirill (2019-11-21). "Welcome to the Tidyverse". Journal of Open Source Software. 4 (43): 1686. doi: 10.21105/joss.01686 . ISSN 2475-9066.
↑ Yadav, Rohit (2019-10-29). "Python's Pandas vs R's Tidyverse: Who Comes Out On Top?". Analytics India Magazine. Retrieved 2021-02-06.
↑ Krill, Paul (2015-06-30). "Why R? The pros and cons of the R language". InfoWorld. Retrieved 2021-02-06.
↑ "Introducing dplyr". blog.rstudio.com. 17 January 2014. Retrieved 2020-09-02.
↑ "Function reference". dplyr.tidyverse.org. Retrieved 2021-02-06.
↑ Grolemund, Garrett; Wickham, Hadley. 5 Data transformation | R for Data Science.
↑ "A Grammar of Data Manipulation". tidyverse.org. Retrieved 2023-01-14.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[wikidata-ee39be459dc2ba4669a1c0fd1c085d53d79f6a44-v20-1] "Release 1.1.4". 17 November 2023. Retrieved 16 July 2025.

[2] Wickham, Hadley; Averick, Mara; Bryan, Jennifer; Chang, Winston; McGowan, Lucy D'Agostino; François, Romain; Grolemund, Garrett; Hayes, Alex; Henry, Lionel; Hester, Jim; Kuhn, Max; Pedersen, Thomas Lin; Miller, Evan; Bache, Stephan Milton; Müller, Kirill (2019-11-21). "Welcome to the Tidyverse". Journal of Open Source Software. 4 (43): 1686. doi: 10.21105/joss.01686 . ISSN 2475-9066.

[3] Yadav, Rohit (2019-10-29). "Python's Pandas vs R's Tidyverse: Who Comes Out On Top?". Analytics India Magazine. Retrieved 2021-02-06.

[4] Krill, Paul (2015-06-30). "Why R? The pros and cons of the R language". InfoWorld. Retrieved 2021-02-06.

[5] "Introducing dplyr". blog.rstudio.com. 17 January 2014. Retrieved 2020-09-02.

[6] "Function reference". dplyr.tidyverse.org. Retrieved 2021-02-06.

[7] Grolemund, Garrett; Wickham, Hadley. 5 Data transformation | R for Data Science.

[8] "A Grammar of Data Manipulation". tidyverse.org. Retrieved 2023-01-14.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]