Software repository

Last updated

A software repository, or repo for short, is a storage location for software packages. Often a table of contents is also stored, along with metadata. A software repository is typically managed by source or version control, or repository managers. Package managers allow automatically installing and updating repositories, sometimes called "packages".

Contents

Overview

Many software publishers and other organizations maintain servers on the Internet for this purpose, either free of charge or for a subscription fee. Repositories may be solely for particular programs, such as CPAN for the Perl programming language, or for an entire operating system. Operators of such repositories typically provide a package management system, tools intended to search for, install and otherwise manipulate software packages from the repositories. For example, many Linux distributions use Advanced Packaging Tool (APT), commonly found in Debian based distributions, or Yellowdog Updater, Modified (yum) found in Red Hat based distributions. There are also multiple independent package management systems, such as pacman, used in Arch Linux and equo, found in Sabayon Linux.

Example of a signed repository key (with ZYpp on openSUSE) Zypper new repository package signing key screenshot.png
Example of a signed repository key (with ZYpp on openSUSE)

As software repositories are designed to include useful packages, major repositories are designed to be malware free. If a computer is configured to use a digitally signed repository from a reputable vendor, and is coupled with an appropriate permissions system, this significantly reduces the threat of malware to these systems. As a side effect, many systems that have these abilities do not need anti-malware software such as antivirus software. [1]

Most major Linux distributions have many repositories around the world that mirror the main repository.

In an enterprise environment, a software repository is usually used to store artifacts, or to mirror external repositories which may be inaccessible due to security restrictions. Such repositories may provide additional functionality, like access control, versioning, security checks for uploaded software, cluster functionality etc. and typically support a variety of formats in one package, so as to cater for all the needs in an enterprise, and thus aiming to provide a single point of truth. Popular examples are JFrog Artifactory, [2] [3] Nexus repository [4] and Cloudsmith, [5] a cloud-based product.

At client side, a package manager helps installing from and updating the repositories.

At server side, a software repository is typically managed by source control or repository managers. Some of the repository managers allow to aggregate other repository location into one URL and provide a caching proxy. When doing continuous builds many artifacts are produced and often centrally stored, so automatically deleting the ones which are not released is important.

Package management system vs. package development process

A package management system is different from a package development process.

A typical use of a package management system is to facilitate the integration of code from possibly different sources into a coherent stand-alone operating unit. Thus, a package management system might be used to produce a distribution of Linux, possibly a distribution tailored to a specific restricted application.

A package development process, by contrast, is used to manage the co-development of code and documentation of a collection of functions or routines with a common theme, producing thereby a package of software functions that typically will not be complete and usable by themselves. A good package development process will help users conform to good documentation and coding practices, integrating some level of unit testing.

Selected repositories

The following table lists a few languages with repositories for contributed software. The "Autochecks" column describes the routine checks done.

Very few people have the ability to test their software under multiple operating systems with different versions of the core code and with other contributed packages they may use. For the R programming language, the Comprehensive R Archive Network (CRAN) runs tests routinely.

To understand how this is valuable, imagine a situation with two developers, Sally and John. Sally contributes a package A. Sally only runs the current version of the software under one version of Microsoft Windows, and has only tested it in that environment. At more or less regular intervals, CRAN tests Sally's contribution under a dozen combinations of operating systems and versions of the core R language software. If one of them generates an error, she gets that error message. With luck, that error message details may provide enough input to allow enable a fix for the error, even if she cannot replicate it with her current hardware and software. Next, suppose John contributes to the repository a package B that uses a package A. Package B passes all the tests and is made available to users. Later, Sally submits an improved version of A, which unfortunately, breaks B. The autochecks make it possible to provide information to John so he can fix the problem.

This example exposes both a strength and a weakness in the R contributed-package system: CRAN supports this kind of automated testing of contributed packages, but packages contributed to CRAN need not specify the versions of other contributed packages that they use. Procedures for requesting specific versions of packages exist, but contributors might not use those procedures.

Beyond this, a repository such as CRAN running regular checks of contributed packages actually provides an extensive if ad hoc test suite for development versions of the core language. If Sally (in the example above) gets an error message she does not understand or thinks is inappropriate, especially from a development version of the language, she can (and often does with R) ask the core development-team for the language for help. In this way, the repository can contribute to improving the quality of the core language software.

Language, purpose Package development process RepositoryInstall methodsCollaborative development platformAutochecks
Haskell Common Architecture for Building Applications and Libraries [6] Hackage cabal (software)
Java Maven [7]
Julia [8]
Common Lisp Quicklisp [9]
.NET NuGet NuGet [10] dotnet add package <package>
Node.js node npm, [11] yarn, bowernpm install <package>

yarn add <package>

bower install <package>

Perl CPAN PPM [12] ActiveState
PHP PEAR, Composer PECL, Packagist composer require <package>

pear install <package>

Python Setuptools PyPI pip, EasyInstall, PyPM, Anaconda
R R CMD check process [13] [14] CRAN [15] install.packages [16]
remotes [17]
GitHub [18] Often on 12 platforms or combinations of different versions of R (devel, prerel, patched, release) on different operating systems (different versions of Linux, Windows, macOS, and Solaris).
Ruby RubyGems RubyGems [19] RubyGems, [19] Bundler [20]
Rust Cargo [21] crates.io [22] Cargo [21]
Go gopkg.go.devgo get <package> GitHub [18]
Dart Flutterpub.devflutter pub get <package>
D DUBdlang.orgdub add <package>
TeX, LaTeX CTAN

(Parts of this table were copied from a "List of Top Repositories by Programming Language" on Stack Overflow [23] )

Many other programming languages, among them C, C++, and Fortran, do not possess a central software repository with universal scope. Notable repositories with limited scope include:

Package managers

Package managers help manage repositories and the distribution of them. If a repository is updated, a package manager will typically allow the user to update that repository through the package manager. They also help with managing things such as dependencies between other software repositories. Some examples of Package Managers include:

Popular Package Managers
Package ManagerDescription
npm A package manager for Node.js [24]
pip A package installer for Python [25]
apt For managing Debian Packages [26]
Homebrew A package installer for MacOS that allows one to install packages Apple didn't [27]
vcpkg A package manager for C and C++ [28] [29]
yum and dnf Package manager for Fedora and Red Hat Enterprise Linux [30]
pacman Package manager for Arch Linux [31]

Repository managers

Relationship to continuous integration

As part of the development lifecycle, source code is continuously being built into binary artifacts using continuous integration. This may interact with a binary repository manager much like a developer would by getting artifacts from the repositories and pushing builds there. Tight integration with CI servers enables the storage of important metadata such as:

Artifacts and packages

Artifacts and packages inherently mean different things. Artifacts are simply an output or collection of files (ex. JAR, WAR, DLLS, RPM etc.) and one of those files may contain metadata (e.g. POM file). Whereas packages are a single archive file in a well-defined format (ex. NuGet) that contain files appropriate for the package type (ex. DLL, PDB). [32] Many artifacts result from builds but other types are crucial as well. Packages are essentially one of two things: a library or an application. [33]

Compared to source files, binary artifacts are often larger by orders of magnitude, they are rarely deleted or overwritten (except for rare cases such as snapshots or nightly builds), and they are usually accompanied by much metadata such as id, package name, version, license and more.

Metadata

Metadata describes a binary artifact, is stored and specified separately from the artifact itself, and can have several additional uses. The following table shows some common metadata types and their uses:

Metadata typeUsed for
Versions availableUpgrading and downgrading automatically
DependenciesSpecify other artifacts that the current artifact depends on
Downstream dependenciesSpecify other artifacts that depend on the current artifact
LicenseLegal compliance
Build date and timeTraceability
DocumentationProvide offline availability for contextual documentation in IDEs
Approval informationTraceability
MetricsCode coverage, compliance to rules, test results
User-created metadataCustom reports and processes

See also

Related Research Articles

The Comprehensive Perl Archive Network (CPAN) is a repository of over 250,000 software modules and accompanying documentation for 39,000 distributions, written in the Perl programming language by over 12,000 contributors. CPAN can denote either the archive network or the Perl program that acts as an interface to the network and as an automated software installer. Most software on CPAN is free and open source software.

<span class="mw-page-title-main">Debian</span> Linux distribution based on free and open-source software

Debian, also known as Debian GNU/Linux, is a Linux distribution composed of free and open-source software, developed by the community-supported Debian Project, which was established by Ian Murdock on August 16, 1993. The first version of Debian (0.01) was released on September 15, 1993, and its first stable version (1.1) was released on June 17, 1996. The Debian Stable branch is the most popular edition for personal computers and servers. Debian is also the basis for many other distributions, like PureOS, Ubuntu, Pardus, and Linux Mint.

<span class="mw-page-title-main">Slackware</span> Linux distribution (operating system)

Slackware is a Linux distribution created by Patrick Volkerding in 1993. Originally based on Softlanding Linux System (SLS), Slackware has been the basis for many other Linux distributions, most notably the first versions of SUSE Linux distributions, and is the oldest distribution that is still maintained.

<span class="mw-page-title-main">Package manager</span> Software tools for handling software packages

A package manager or package-management system is a collection of software tools that automates the process of installing, upgrading, configuring, and removing computer programs for a computer in a consistent manner.

yum (software) Free and open-source command-line package management utility

The Yellowdog Updater Modified (YUM) is a free and open-source command-line package-management utility for computers running the Linux operating system using the RPM Package Manager. Though YUM has a command-line interface, several other tools provide graphical user interfaces to YUM functionality.

<span class="mw-page-title-main">R (programming language)</span> Programming language for statistics

R is a programming language for statistical computing and graphics supported by the R Core Team and the R Foundation for Statistical Computing. Created by statisticians Ross Ihaka and Robert Gentleman, R is used among data miners, bioinformaticians and statisticians for data analysis and developing statistical software. The core R language is augmented by a large number of extension packages containing reusable code and documentation.

<span class="mw-page-title-main">Arch Linux</span> Rolling release distribution of Linux

Arch Linux is an independently developed, x86-64 general-purpose Linux distribution that strives to provide the latest stable versions of most software by following a rolling-release model. The default installation is a minimal base system, configured by the user to only add what is purposely required.

<span class="mw-page-title-main">PCLinuxOS</span> Linux distribution

PCLinuxOS, often shortened to PCLOS, is a rolling release Linux distribution for x86-64 computers, with KDE Plasma, MATE, and XFCE as its default user interfaces. It is a primarily FOSS operating system for personal computers aimed at ease of use.

Puppy Linux is an operating system and family of light-weight Linux distributions that focus on ease of use and minimal memory footprint. The entire system can be run from random-access memory (RAM) with current versions generally taking up about 600 MB (64-bit), 300 MB (32-bit), allowing the boot medium to be removed after the operating system has started. Applications such as AbiWord, Gnumeric and MPlayer are included, along with a choice of lightweight web browsers and a utility for downloading other packages. The distribution was originally developed by Barry Kauler and other members of the community, until Kauler retired in 2013. The tool Woof can build a Puppy Linux distribution from the binary packages of other Linux distributions.

<span class="mw-page-title-main">Linux Mint</span> Ubuntu-based Linux distribution

Linux Mint is a community-driven Linux distribution based on Ubuntu, bundled with a variety of free and open-source applications. It can provide full out-of-the-box multimedia support for those who choose to include proprietary software such as multimedia codecs. Compared to Ubuntu, it uses the Cinnamon interface by default, using a different, more traditional layout that can be customized by dragging the applets and creating panels. New applets can also be downloaded.

Nix is a cross-platform package manager that uses a deployment model where software is installed into unique directories generated through cryptographic hashes. It is also the name of the tool's programming language. A package's hash takes into account the dependencies, which is claimed to eliminate dependency hell, as an alternative to the typical solution of installing multiple versions of dependencies at the same time. This package management model advertises more reliable, reproducible, and portable packages.

Readahead is a system call of the Linux kernel that loads a file's contents into the page cache. This prefetches the file so that when it is subsequently accessed, its contents are read from the main memory (RAM) rather than from a hard disk drive (HDD), resulting in much lower file access latencies.

<span class="mw-page-title-main">RPM Package Manager</span> Package management system

RPM Package Manager (RPM) is a free and open-source package management system. The name RPM refers to the .rpm file format and the package manager program itself. RPM was intended primarily for Linux distributions; the file format is the baseline package format of the Linux Standard Base.

<span class="mw-page-title-main">Python Package Index</span> Software repository

The Python Package Index, abbreviated as PyPI and also known as the Cheese Shop, is the official third-party software repository for Python. It is analogous to the CPAN repository for Perl and to the CRAN repository for R. PyPI is run by the Python Software Foundation, a charity. Some package managers, including pip, use PyPI as the default source for packages and their dependencies.

<span class="mw-page-title-main">Anaconda (Python distribution)</span> Distribution of the Python and R languages for scientific computing

Anaconda is a distribution of the Python and R programming languages for scientific computing, that aims to simplify package management and deployment. The distribution includes data-science packages suitable for Windows, Linux, and macOS. It is developed and maintained by Anaconda, Inc., which was founded by Peter Wang and Travis Oliphant in 2012. As an Anaconda, Inc. product, it is also known as Anaconda Distribution or Anaconda Individual Edition, while other products from the company are Anaconda Team Edition and Anaconda Enterprise Edition, neither of which are free.

<span class="mw-page-title-main">DNF (software)</span> RPM package manager

DNF or Dandified YUM is the next-generation version of the Yellowdog Updater, Modified (yum), a package manager for .rpm-based Linux distributions. DNF was introduced in Fedora 18 in 2013; it has been the default package manager since Fedora 22 in 2015, Red Hat Enterprise Linux 8, and OpenMandriva, and is also an alternative package manager for Mageia.

<span class="mw-page-title-main">Void Linux</span> Independent distribution developed entirely by volunteers

Void Linux is an independent Linux distribution that uses the X Binary Package System (XBPS) package manager, which was designed and implemented from scratch, and the runit init system. Excluding binary kernel blobs, a base install is composed entirely of free software.

<span class="mw-page-title-main">R package</span> Extensions to the R statistical programming language

R packages are extensions to the R statistical programming language. R packages contain code, data, and documentation in a standardised collection format that can be installed by users of R, typically via a centralised software repository such as CRAN. The large number of packages available for R, and the ease of installing and using them, has been cited as a major factor driving the widespread adoption of the language in data science.

References

  1. itmWEB: Coping with Computer Viruses Archived October 14, 2007, at the Wayback Machine
  2. "JFrog Artifactory - wikieduonline". Archived from the original on 2021-03-05. Retrieved 2021-04-25.
  3. "Artifactory - Universal Artifact Management". Archived from the original on 2021-05-01. Retrieved 2021-04-25.
  4. "Nexus Repository | Software Component Management". Archived from the original on 2021-04-25. Retrieved 2021-04-25.
  5. "Cloudsmith artifact repository". Archived from the original on 2023-07-16. Retrieved 2023-09-11.
  6. "The Haskell Cabal | Overview". www.haskell.org. Archived from the original on 2019-04-10. Retrieved 2019-03-25.
  7. "Maven – Welcome to Apache Maven". maven.apache.org. Archived from the original on 2011-07-24. Retrieved 2019-03-25.
  8. "Julia Package Listing". pkg.julialang.org. Archived from the original on 2019-01-20. Retrieved 2019-03-25.
  9. "Quicklisp beta". www.quicklisp.org. Archived from the original on 2019-03-23. Retrieved 2019-03-25.
  10. karann-msft. "NuGet Package Manager UI Reference". docs.microsoft.com. Archived from the original on 2019-03-25. Retrieved 2019-03-25.
  11. "npm". www.npmjs.com. Archived from the original on 2018-04-13. Retrieved 2019-03-25.
  12. "Installing Perl Modules - www.cpan.org". www.cpan.org. Archived from the original on 2019-03-14. Retrieved 2019-03-25.
  13. Leisch, Friedrich. "Creating R Packages: A Tutorial" (PDF). Archived (PDF) from the original on 2017-12-09. Retrieved 2016-07-19.
  14. Graves, Spencer B.; Dorai-Raj, Sundar. "Creating R Packages, Using CRAN, R-Forge, And Local R Archive Networks And Subversion (SVN) Repositories" (PDF). Archived (PDF) from the original on 2017-07-05. Retrieved 2016-07-19.
  15. "The Comprehensive R Archive Network". cran.r-project.org. Archived from the original on 2019-01-23. Retrieved 2019-03-25.
  16. "R Installation and Administration". cran.r-project.org. Archived from the original on 2015-11-23. Retrieved 2019-03-25.
  17. Wickham, Hadley; Bryan, Jenny. "Package structure and state". R Packages. O'Reilly. Archived from the original on 2020-11-09. Retrieved 2020-11-20.
  18. 1 2 Decan, Alexandre; Mens, Tom; Claes, Maelick; Grosjean, Philippe (2015). "On the Development and Distribution of R Packages: An Empirical Analysis of the R Ecosystem". Proceedings of the 2015 European Conference on Software Architecture Workshops. pp. 1–6. doi:10.1145/2797433.2797476. ISBN   9781450333931. S2CID   1680582. Archived from the original on 2023-01-18. Retrieved 2021-10-26.
  19. 1 2 "RubyGems.org your community gem host". rubygems. Archived from the original on 2019-02-13. Retrieved 2022-02-03.
  20. "Bundler: The best way to manage a Ruby application's gems". bundler.io. Archived from the original on 2022-01-29. Retrieved 2022-02-03.
  21. 1 2 "The Cargo Book". Documentation. Rust Programming Language. Archived from the original on 2019-04-28. Retrieved 2019-08-26.
  22. "Rust Package Registry". crates.io. Archived from the original on 2019-08-28. Retrieved 2019-08-26.
  23. "List of Top Repositories by Programming Language". Stack Overflow. Archived from the original on 2018-12-26. Retrieved 2010-04-14.
  24. "npm About". www.npmjs.com. Archived from the original on 2019-11-19. Retrieved 2019-11-21.
  25. developers, The pip, pip: The PyPA recommended tool for installing Python packages., archived from the original on 2020-07-14, retrieved 2019-11-21
  26. "Apt - Debian Wiki". wiki.debian.org. Archived from the original on 2019-10-19. Retrieved 2019-11-22.
  27. "Homebrew". Homebrew. Archived from the original on 2022-10-05. Retrieved 2019-11-22.
  28. "Yelp launches Yelp Fusion, Microsoft creates Vcpkg tool, and the new Touch Sense SDK for Android developers". SD Times. September 20, 2016. Archived from the original on November 27, 2020. Retrieved November 19, 2020.
  29. "Microsoft's C++ library manager now available for Linux and macOS". SD Times. April 25, 2018. Archived from the original on September 22, 2020. Retrieved November 19, 2020.
  30. Chinthaguntla, Keerthi (22 April 2020). "Linux package management with YUM and RPM". Enable Sysadmin. Archived from the original on 2021-04-11. Retrieved 2021-04-11.
  31. "pacman - ArchWiki". wiki.archlinux.org. Archived from the original on 2017-08-18. Retrieved 2021-04-11.
  32. Chris, Tucker (2007-03-15). "Optimal Package Install/Uninstall Manager" (PDF). UC San Diego: 1. Archived (PDF) from the original on 2011-06-14. Retrieved 2011-09-14.{{cite journal}}: Cite journal requires |journal= (help)
  33. "Linux repository classification schemes". braintickle.blogspot.com. 13 January 2006. Archived from the original on 2007-10-11. Retrieved 2008-03-01.