Piper (source control system)

Last updated

Piper is a centralized version control system used by Google for its internal software development. Originally designed for Linux, it supports Microsoft Windows and macOS since October 2012. [1]

Contents

Scale

Since its founding years Google used a central codebase shared by the developers. [2] For over 10 years Google relied on a single Perforce instance, using proprietary caching for scalability. [3] This mode of operation was kept as Google grew, the need for further scaling led to the development of Piper. [4] Currently, Google's version control "is an extreme case": [5] as of 2016, the repository was storing 86 terabytes of data comprising two billion lines of code in nine million files (two orders of magnitude more than in the Linux kernel repository). 25 thousand developers contributed 16 thousand changes daily, with additional 24 thousand commit operations by bots. Read requests each day are measured in billions. [6]

Architecture

Piper uses the standard Google storage infrastructure, Bigtable (now called Spanner), distributed across 10 data centers worldwide and replicated through Paxos protocol. [3]

Use

When using Piper, developers apply changes to a local copy of files, similar to a working copy of Subversion, local clone of Git, or a client of Perforce. Updates made by other developers can be pulled from the central repository and merged into the local code. The commits are only allowed after a code review. [7]

Typical use involves Clients in the Cloud (CitC). This system utilizes cloud backend and a local FUSE filesystem to create an illusion of changes overlaid on top of a full repository. This approach enables seamless browsing and use of standard Unix tools without explicit synchronization operations, thus keeping the local copy very small (average size of a local copy is less than ten files). All file writes are mapped to snapshots thus permitting restoration of the previous states of the code without explicit snapshotting. Due to the always-connected operation, CitC allows easy switching of the computers as well as sharing the modified code with other developers, the automated build system and testing tools. [7] As a result, the majority of Google developers practices trunk-based development with no personal branches; the branches are mostly used for releases. [8]

Security

Most of the codebase is visible to all developers, sensitive individual files (less than 1% as of 2016) are access-controlled. All operations with Piper are logged, accidentally committed files can be purged. [7]

Related Research Articles

XFS is a high-performance 64-bit journaling file system created by Silicon Graphics, Inc (SGI) in 1993. It was the default file system in SGI's IRIX operating system starting with its version 5.3. XFS was ported to the Linux kernel in 2001; as of June 2014, XFS is supported by most Linux distributions; Red Hat Enterprise Linux uses it as its default file system.

Version control is the software engineering practice of controlling, organizing, and tracking different versions in history of computer files; primarily source code text files, but generally any type of file.

<span class="mw-page-title-main">Apache Subversion</span> Free and open-source software versioning and revision control system

Apache Subversion is a version control system distributed as open source under the Apache License. Software developers use Subversion to maintain current and historical versions of files such as source code, web pages, and documentation. Its goal is to be a mostly compatible successor to the widely used Concurrent Versions System (CVS).

In the field of computer science, an atomic commit is an operation that applies a set of distinct changes as a single operation. If the changes are applied, then the atomic commit is said to have succeeded. If there is a failure before the atomic commit can be completed, then all of the changes completed in the atomic commit are reversed. This ensures that the system is always left in a consistent state. The other key property of isolation comes from their nature as atomic operations. Isolation ensures that only one atomic commit is processed at a time. The most common uses of atomic commits are in database systems and version control systems.

A versioning file system is any computer file system which allows a computer file to exist in several versions at the same time. Thus it is a form of revision control. Most common versioning file systems keep a number of old copies of the file. Some limit the number of changes per minute or per hour to avoid storing large numbers of trivial changes. Others instead take periodic snapshots whose contents can be accessed using methods similar as those for normal file access.

<span class="mw-page-title-main">Git</span> Distributed version control software system

Git is a distributed version control system that tracks versions of files. It is often used to control source code by programmers who are developing software collaboratively.

In software development, distributed version control is a form of version control in which the complete codebase, including its full history, is mirrored on every developer's computer. Compared to centralized version control, this enables automatic management branching and merging, speeds up most operations, improves the ability to work offline, and does not rely on a single location for backups. Git, the world's most popular version control system, is a distributed version control system.

In software development, a codebase is a collection of source code used to build a particular software system, application, or software component. Typically, a codebase includes only human-written source code system files; thus, a codebase usually does not include source code files generated by tools or binary library files, as they can be built from the human-written source code. However, it generally does include configuration and property files, as they are the data necessary for the build.

<span class="mw-page-title-main">Darcs</span> Distributed version control system

Darcs is a distributed version control system created by David Roundy. Key features include the ability to choose which changes to accept from other repositories, interaction with either other local (on-disk) repositories or remote repositories via SSH, HTTP, or email, and an unusually interactive interface. The developers also emphasize the use of advanced software tools for verifying correctness: the expressive type system of the functional programming language Haskell enforces some properties, and randomized testing via QuickCheck verifies many others. The name is a recursive acronym for Darcs Advanced Revision Control System.

<span class="mw-page-title-main">Mercurial</span> Distributed revision-control tool for software developers

Mercurial is a distributed revision control tool for software developers. It is supported on Microsoft Windows, Linux, and other Unix-like systems, such as FreeBSD and macOS.

The following tables describe attributes of notable version control and software configuration management (SCM) software systems that can be used to compare and contrast the various systems.

<span class="mw-page-title-main">Software remastering</span>

Software remastering is software development that recreates system software and applications while incorporating customizations, with the intent that it is copied and run elsewhere for "off-label" usage. The term comes from remastering in media production, where it is similarly distinguished from mere copying.

Unity Version Control is a cross-platform commercial distributed version control tool developed by Códice Software for Microsoft Windows, Mac OS X, Linux, and other operating systems. It includes a command-line tool, native GUIs, diff and merge tool and integration with a number of IDEs. It is a full version control stack not based on Git.

In version control systems, a repository is a data structure that stores metadata for a set of files or directory structure. Depending on whether the version control system in use is distributed, like Git or Mercurial, or centralized, like Subversion, CVS, or Perforce, the whole set of information in the repository may be duplicated on every user's system or may be maintained on a single server. Some of the metadata that a repository contains includes, among other things, a historical record of changes in the repository, a set of commit objects, and a set of references to commit objects, called heads.

Dart is a programming language designed by Lars Bak and Kasper Lund and developed by Google. It can be used to develop web and mobile apps as well as server and desktop applications.

Perforce Software, Inc. is an American developer of software used for developing and running applications, including version control software, web-based repository management, developer collaboration, application lifecycle management, web application servers, debugging tools, platform automation, and agile planning software.

Sourcegraph Inc. is a company developing code search and code intelligence tool that semantically indexes and analyzes large codebases so that they can be searched across commercial, open-source, local, and cloud-based repositories.

In version-control systems, a monorepo is a software-development strategy in which the code for a number of projects is stored in the same repository. This practice dates back to at least the early 2000s, when it was commonly called a shared codebase. Google, Meta, Microsoft, Uber, Airbnb, and Twitter all employ very large monorepos with varying strategies to scale build systems and version control software with a large volume of code and daily changes.

Bcachefs is a copy-on-write (COW) file system for Linux-based operating systems. Its primary developer, Kent Overstreet, first announced it in 2015, and it was added to the Linux kernel beginning with 6.7. It is intended to compete with the modern features of ZFS or Btrfs, and the speed and performance of ext4 or XFS.

References

Sources