Reproducible builds

Last updated
Logo of the Software Freedom Conservancy's Reproducible Builds project Reproducible Builds project logo.svg
Logo of the Software Freedom Conservancy's Reproducible Builds project

Reproducible builds, also known as deterministic compilation, is a process of compiling software which ensures the resulting binary code can be reproduced. Source code compiled using deterministic compilation will always output the same binary. [1] [2] [3]

Contents

Reproducible builds can act as part of a chain of trust; [1] the source code can be signed, and deterministic compilation can prove that the binary was compiled from trusted source code. Verified reproducible builds provide a strong countermeasure against attacks where binaries do not match their source code, e.g., because an attacker has inserted malicious code into a binary. This is a relevant attack; attackers sometimes attack binaries but not the source code, e.g., because they can only change the distributed binary or to evade detection since it is the source code that developers normally review and modify. In a survey of 17 experts, reproducible builds had a very high utility rating from 58.8% participants, but also a high-cost rating from 70.6%. [4] Various efforts are being made to modify software development tools to reduce these costs.

Methods

For the compilation process to be deterministic, the input to the compiler must be the same, regardless of the build environment used. This typically involves normalizing variables that may change, such as order of input files, timestamps, locales, and paths.

Additionally, the compilers must not introduce non-determinism themselves. This sometimes happens when using hash tables with a random hash seed value. It can also happen when using the address of variables because that varies from address space layout randomization (ASLR).

Build systems, such as Bazel and Gitian, [5] can be used to automate deterministic build processes.

History

The GNU Project used reproducible builds in the early 1990s. Changelogs from 1992 indicate the ongoing effort. [6]

One of the older [7] projects to promote reproducible builds is the Bitcoin project with Gitian. Later, in 2013, the Tor (anonymity network) project started using Gitian for their reproducible builds. [8]

From 2011 a reproducible Java build system was developed for a decentralized peer-to-peer FOSS project: DirectDemocracyP2P. [9] The concepts of the system's application to automated updates recommendation support was first presented in April 2013 at Decentralized Coordination. [10] [11] A treatise focusing on the implementation details of the reproducible Java compilation tool itself was published in 2015. [12]

In July 2013 on the Debian project started implementing reproducible builds across its entire package archive. [13] [14] By July 2017 more than 90% of the packages in the repository have been proven to build reproducibly. [15]

In November 2018, the Reproducible Builds project joined the Software Freedom Conservancy. [16]

F-droid uses reproducible builds to provide a guarantee that the distributed APKs use the claimed free source code. [17]

The Tails portable operating system uses reproducible builds and explains to others how to verify their distribution. [18]

NixOS claims 100% reproducible build in June 2021 for their minimal ISO releases. [19]

As of May 2020, Arch Linux is working on making all official packages reproducible. [20]

Challenges

According to the Reproducible Builds project, timestamps are "the biggest source of reproducibility issues. Many build tools record the current date and time... and most archive formats will happily record modification times on top of their own timestamps." [21] They recommend that "it is better to use a date that is relevant to the source code instead of the build: old software can always be built later" if it is reproducible. They identify several ways to modify build processes to do this:

In some cases other changes must be made to make a build process reproducible. For example, some data structures do not guarantee a stable order in each execution. A typical solution is to modify the build process to specify a sorted output from those structures. [22]

See also

Related Research Articles

<span class="mw-page-title-main">Bash (Unix shell)</span> GNU replacement for the Bourne shell

Bash, short for Bourne-Again SHell, is a shell program and command language supported by the Free Software Foundation and first developed for the GNU Project by Brian Fox. Designed as a 100% free software alternative for the Bourne shell, it was initially released in 1989. Its moniker is a play on words, referencing both its predecessor, the Bourne shell, and the concept of rebirth.

<span class="mw-page-title-main">Linux distribution</span> Operating system based on the Linux kernel

A Linux distribution is an operating system that includes the Linux kernel for its kernel functionality. Although the name does not imply product distribution per se, a distro, if distributed on its own, is often obtained via a website intended specifically for the purpose. Distros have been designed for a wide variety of systems ranging from personal computers to servers and from embedded devices to supercomputers.

<span class="mw-page-title-main">Package manager</span> Software tools for handling software packages

A package manager or package-management system is a collection of software tools that automates the process of installing, upgrading, configuring, and removing computer programs for a computer in a consistent manner.

<span class="mw-page-title-main">Source Mage</span> Linux distribution

Source Mage is a source-based Linux distribution descended from Sorcerer. Components of this operating system are downloaded as source code and compiled locally on the user's computer.

<span class="mw-page-title-main">GNU Autotools</span> Software build toolset from GNU

The GNU Autotools, also known as the GNU Build System, is a suite of build automation tools designed to support building source code and packaging the resulting binaries. It supports building a codebase for multiple target systems without customizing or modifying the code. It is available on many Linux distributions and Unix-like environments.

In software development, Make is a command-line interface software tool that performs actions ordered by configured dependencies as defined in a configuration file called a makefile. It is commonly used for build automation to build executable code from source code. But, not limited to building, Make can perform any operation available via the operating system shell.

A cross compiler is a compiler capable of creating executable code for a platform other than the one on which the compiler is running. For example, a compiler that runs on a PC but generates code that runs on Android devices is a cross compiler.

<span class="mw-page-title-main">SCons</span>

SCons is a software development tool that analyzes source code dependencies and operating system adaptation requirements from a software project description and generates final binary executables for installation on the target operating system platform. Its function is similar to the more popular GNU build system.

Technical variations of Linux distributions include support for different hardware devices and systems or software package configurations. Organizational differences may be motivated by historical reasons. Other criteria include security, including how quickly security upgrades are available; ease of package management; and number of packages available.

BioLinux is a term used in a variety of projects involved in making access to bioinformatics software on a Linux platform easier using one or more of the following methods:

In the context of free and open-source software, proprietary software only available as a binary executable is referred to as a blob or binary blob. The term usually refers to a device driver module loaded into the kernel of an open-source operating system, and is sometimes also applied to code running outside the kernel, such as system firmware images, microcode updates, or userland programs. The term blob was first used in database management systems to describe a collection of binary data stored as a single entity.

CheckInstall is a computer program for Unix-like operating systems which eases the installation and uninstallation of software compiled from source by making use of package management systems. After software compilation it can automatically generate a Slackware-, RPM-, or Debian-compatible package that can later be cleanly uninstalled through the appropriate package manager.

IcedTea is a build and integration project for OpenJDK launched by Red Hat in June 2007. IcedTea also includes some addon libraries: IcedTea-Web is a free software implementation of Java Web Start and the Java web browser applet plugin. IcedTea-Sound is a collection of plugins for the Java sound subsystem, including the PulseAudio provider which used to be included with IcedTea. The Free Software Foundation recommends that all Java programmers use IcedTea as their development environment.

<span class="mw-page-title-main">RPM Package Manager</span> Package management system

RPM Package Manager (RPM) is a free and open-source package management system. The name RPM refers to the .rpm file format and the package manager program itself. RPM was intended primarily for Linux distributions; the file format is the baseline package format of the Linux Standard Base.

<span class="mw-page-title-main">Linux-libre</span> Version of the Linux kernel without proprietary code

According to the Free Software Foundation Latin America, Linux-libre is a modified version of the Linux kernel that contains no binary blobs, obfuscated code, or code released under proprietary licenses. In the Linux kernel, those types of code are mostly used for proprietary firmware images. While generally redistributable, they do not give the user the freedom to audit, modify, or, consequently, redistribute their modified versions. The GNU Project keeps Linux-libre in synchronization with the mainline Linux kernel.

<span class="mw-page-title-main">KVIrc</span> IRC Client

KVIrc is a graphical IRC client for Linux, Unix, Mac OS and Windows. The name is an acronym of K Visual IRC in which the K stands for a dependency to KDE, which became optional from version 2.0.0. The software is based on the Qt framework and its code is released under a modified GNU General Public License.

<span class="mw-page-title-main">ROCm</span> Parallel computing platform: GPGPU libraries and application programming interface

ROCm is an Advanced Micro Devices (AMD) software stack for graphics processing unit (GPU) programming. ROCm spans several domains: general-purpose computing on graphics processing units (GPGPU), high performance computing (HPC), heterogeneous computing. It offers several programming models: HIP, OpenMP, and OpenCL.

Bootstrappable builds, a process of compiling software that doesn't depend on (compiler) binaries that aren't built from source by this process.

References

  1. 1 2 "reproducible-builds.org". reproducible-builds.org. Archived from the original on 20 May 2016. Retrieved 22 August 2016. Reproducible builds are a set of software development practices which create a verifiable path from human readable source code to the binary code used by computers....build system needs to be made entirely deterministic: transforming a given source must always create the same result.
  2. Lamb, Chris; Zacchiroli, Stefano (March 2022). "Reproducible Builds: Increasing the Integrity of Software Supply Chains". IEEE Software. 39 (2): 62–70. arXiv: 2104.06020 . doi:10.1109/MS.2021.3073045. S2CID   233219473 . Retrieved 26 March 2023.
  3. Ratliff, Emily (4 April 2016). "Establishing Correspondence Between an Application and its Source Code | SecurityWeek.com". www.securityweek.com. SecurityWeek. Archived from the original on 20 September 2016. Retrieved 22 August 2016.
  4. Ladisa, Piergiorgio; Plate, Henrik; Martinez, Matias; Barais, Olivier (19 April 2022). "Taxonomy of Attacks on Open-Source Software Supply Chains". arxiv.org. arXiv: 2204.04008 . doi:10.1109/SP46215.2023.00010 (inactive 1 November 2024).{{cite journal}}: CS1 maint: DOI inactive as of November 2024 (link)
  5. "Gitian: a secure software distribution method". gitian.org. Retrieved 2018-01-10.
  6. Gilmore, John (2017-01-24). "SOURCE_PREFIX_MAP and Occam's Razor". rb-general (Mailing list).
  7. "LICENSE-file of the Gitian-Project". GitHub . Retrieved 2019-12-03.
  8. Deterministic Builds Part Two: Technical Details. October 04, 2013
  9. "DDP2P". GitHub . 2011.
  10. Alhamed, Khalid, et al. " "Security by Decentralized Certification of Automatic-Updates for Open Source Software controlled by Volunteers". Citeseer ..", Proceedings of Decentralized Coordination. pp 40-59, Lulu Publisher, April 6, 2013.
  11. Silaghi, M. C., Alhamed, K., Dhannoon, O., Qin, S., Vishen, R., Knowles, R., ... & Hirayama, K. (2013, September). DirectDemocracyP2P—decentralized deliberative petition drives—. In IEEE P2P 2013 Proceedings (pp. 1-2). IEEE.
  12. Silaghi, M., Alhamed, K., & Stansifer, R. (2015, December). Java tool extensions for supporting multiple recommenders and distributed bundles. In 2015 International Conference on Computational Science and Computational Intelligence (CSCI) (pp. 722-725). IEEE.
  13. "Reproducible Builds talk in Debian". 21 September 2014.
  14. "Reproducible Builds history".
  15. "Linux-Distributionen: Mehr als 90 Prozent der Debian-Pakete reproduzierbar - Golem.de" (in German). 2017-07-24. Retrieved 2018-10-30.
  16. "Reproducible Builds joins the Software Freedom Conservancy" . Retrieved 2018-12-15.
  17. "Reproducible Builds". F-Droid.
  18. "Verifying a Tails image for reproducibility". Tails.
  19. "Nixos-unstable's iso_minimal.x86_64-linux is 100% reproducible!". NixOS Discourse. 2021-06-20. Retrieved 2021-06-21.
  20. "ArchWiki - Reproducible Builds".
  21. "Timestamps". Reproducible builds. Retrieved 2022-04-16.
  22. "Timestamps". Reproducible builds. Retrieved 2022-04-16.