Bootstrappable builds

Last updated

Bootstrappable builds, a process of compiling software that doesn't depend on (compiler) binaries that aren't built from source by this process. [1] [2] [3]

Contents

This process can protect against compiler backdoors: if the build process doesn't depend on binary code that is difficult to audit, then a compiler backdoor cannot be hidden in compiler binaries anymore.

Methods

A way to tackle the issue for a Software distributions is to reduce the size of the binaries used to bootstrap the distribution until there are not needed anymore or that the size is small enough to be easily reviewed by humans. [4]

Many compilers for various programming languages are written in the language they target. For instance the official Go compiler(gc) is written in Go.

So without alternatives compilers compiler like GCC that are written in another programming language (here in C and C++) the go compiler would require a binary of a previous version of the go compiler binary to be built.

To have bootstrappable builds, it is often possible to find an older versions of the compiler that could be built from sources, and from that, write code to automatically build the next version of the compilers until having a recent version. Identifying which version can build which versions is often not trivial and that often result in very long compilation times for the bootstrap procedure. Sometimes this also require to maintain older compiler versions and to backport support for newer CPU architectures on older compilers versions to be able to bootstrap these architectures.

This process can also be replaced or combined with other ways to bootstrap compilers.

For instance it is also possible to write a new compiler for a language, that is written in another language.

These techniques can be used to reduce the size of the binaries used to bootstrap a distribution.

As for building the first compiler that can build the subsequent compilers, it is possible to reduce the size to a single binary that is 357 bytes [5] and from that use multiple stages in the bootstrapping procedure to be able to build a C compiler, and from that build the other compilers or software. [6]

History

Bootstrappable builds was started in 2016 as a spin-off of the reproducible builds project. [3]

See also

Related Research Articles

In general, bootstrapping usually refers to a self-starting process that is supposed to continue or grow without external input.

<span class="mw-page-title-main">Cygwin</span> Unix subsystem for Windows machines

Cygwin is a Unix-like environment and command-line interface for Microsoft Windows. Cygwin's purpose is expressed in its motto: "Get that Linux feeling – on Windows".

<span class="mw-page-title-main">GNU Compiler Collection</span> Free and open-source compiler for various programming languages

The GNU Compiler Collection (GCC) is an optimizing compiler produced by the GNU Project supporting various programming languages, hardware architectures and operating systems. The Free Software Foundation (FSF) distributes GCC as free software under the GNU General Public License. GCC is a key component of the GNU toolchain and the standard compiler for most projects related to GNU and the Linux kernel. With roughly 15 million lines of code in 2019, GCC is one of the biggest free programs in existence. It has played an important role in the growth of free software, as both a tool and an example.

<span class="mw-page-title-main">GNU Debugger</span> Source-level debugger

The GNU Debugger (GDB) is a portable debugger that runs on many Unix-like systems and works for many programming languages, including Ada, Assembly, C, C++, D, Fortran, Haskell, Go, Objective-C, OpenCL C, Modula-2, Pascal, Rust, and partially others.

In computing, source code, or simply code, is any collection of text, with or without comments, written using a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the work of computer programmers, who specify the actions to be performed by a computer mostly by writing source code.

<span class="mw-page-title-main">Source Mage</span> Linux distribution

Source Mage is a Linux distribution. As a package is being installed, its source code is automatically downloaded, compiled, and installed. Source Mage is descended from Sorcerer.

MinGW, formerly mingw32, is a free and open source software development environment to create Microsoft Windows applications.

Technical variations of Linux distributions include support for different hardware devices and systems or software package configurations. Organizational differences may be motivated by historical reasons. Other criteria include security, including how quickly security upgrades are available; ease of package management; and number of packages available.

<span class="mw-page-title-main">GNU Guile</span> Extension Language

GNU Ubiquitous Intelligent Language for Extensions is the preferred extension language system for the GNU Project and features an implementation of the programming language Scheme. Its first version was released in 1993. In addition to large parts of Scheme standards, Guile Scheme includes modularized extensions for many different programming tasks.

In computer science, bootstrapping is the technique for producing a self-compiling compiler – that is, a compiler written in the source programming language that it intends to compile. An initial core version of the compiler is generated in a different language ; successive expanded versions of the compiler are developed using this minimal subset of the language. The problem of compiling a self-compiling compiler has been called the chicken-or-egg problem in compiler design, and bootstrapping is a solution to this problem.

<span class="mw-page-title-main">Tiny C Compiler</span> Compiler for the C programming language

The Tiny C Compiler is an x86, X86-64 and ARM processor C compiler initially written by Fabrice Bellard. It is designed to work for slow computers with little disk space. Windows operating system support was added in version 0.9.23. TCC is distributed under the GNU Lesser General Public License.

Nix is a cross-platform package manager that uses a deployment model where software is installed into unique directories generated through cryptographic hashes. It is also the name of the tool's programming language. A package's hash takes into account the dependencies, which is claimed to eliminate dependency hell, as an alternative to the typical solution of installing multiple versions of dependencies at the same time. This package management model advertises more reliable, reproducible, and portable packages.

<span class="mw-page-title-main">Linux-libre</span> Version of the Linux kernel without proprietary code

According to the Free Software Foundation Latin America, Linux-libre is a modified version of the Linux kernel that contains no binary blobs, obfuscated code, or code released under proprietary licenses. In the Linux kernel, they are mostly used for proprietary firmware images. While generally redistributable, binary blobs do not give the user the freedom to audit, modify, or, consequently, redistribute their modified versions. The GNU Project keeps Linux-libre in synchronization with the mainline Linux kernel.

NixOS is a free and open-source Linux distribution based on the purely functional Nix package manager. NixOS is composed using modules and packages defined in the nixpkgs project.

<span class="mw-page-title-main">GNU Guix</span> Purely functional package manager for the GNU system

GNU Guix is a functional cross-platform package manager and a tool to instantiate and manage Unix-like operating systems, based on the Nix package manager. Configuration and package recipes are written in Guile Scheme. GNU Guix is the default package manager of the GNU Guix System distribution.

Reproducible builds, also known as deterministic compilation, is a process of compiling software which ensures the resulting binary code can be reproduced. Source code compiled using deterministic compilation will always output the same binary.

In computer programming, self-hosting is the use of a program as part of the toolchain or operating system that produces new versions of that same program—for example, a compiler that can compile its own source code. Self-hosting software is commonplace on personal computers and larger systems. Other programs that are typically self-hosting include kernels, assemblers, command-line interpreters and revision control software.

<span class="mw-page-title-main">GNU Guix System</span> Rolling release distribution of the GNU operating system built around the GNU Guix package manager

GNU Guix System or Guix System is a rolling release, free and open source Linux distribution built around the GNU Guix package manager. It enables a declarative operating system configuration and allows system upgrades which the user can rollback. It uses the GNU Shepherd init system and the Linux-libre kernel, with support of the GNU Hurd kernel under development. On February 3, 2015, the Free Software Foundation added the distribution to its list of endorsed free Linux distributions. The Guix package manager and the Guix System drew inspiration from and were based on the Nix package manager and NixOS respectively.

References

  1. "Guix Further Reduces Bootstrap Seed to 25% — 2020 — Blog — GNU Guix".
  2. "Bootstrappable builds". Bootstrappable.org. Retrieved 2022-12-16.
  3. 1 2 "Bootstrappable builds [LWN.net]".
  4. "NLnet; GNU Mes: Full Source bootstrap".
  5. "[PATCH core-updates 0/8] the Full Source Bootstrap".
  6. https://reproducible-builds.org/news/2022/05/18/jan-nieuwenhuizen-on-bootrappable-builds-gnu-mes-and-gnu-guix/