BOSH (software)

Last updated
Cloud Foundry BOSH
Developer(s) Cloud Foundry
Initial release2010
Stable release
277.4.3 [1]   OOjs UI icon edit-ltr-progressive.svg / 25 July 2023;12 days ago (25 July 2023)
Repository
Written in Ruby, Go and Bash
Operating system Cross-platform
Type Cloud computing
License Apache License 2.0 [2]
Website bosh.io OOjs UI icon edit-ltr-progressive.svg

BOSH is an open-source software project that offers a toolchain for release engineering, software deployment and application lifecycle management of large-scale distributed services. The toolchain is made up of a server (the BOSH Director) and a command line tool. BOSH is typically used to package, deploy and manage cloud software. While BOSH was initially developed by VMware in 2010 to deploy Cloud Foundry PaaS, it can be used to deploy other software (such as Hadoop, RabbitMQ, or MySQL for instance). BOSH is designed to manage the whole lifecycle of large distributed systems.

Contents

Since March 2016, BOSH can manage deployments on both Microsoft Windows [3] and Linux servers.

A BOSH Director communicates with a single Infrastructure as a service (IaaS) provider to manage the underlying networking and virtual machines (VMs) (or containers). Several IaaS providers are supported: Amazon Web Services EC2, Apache CloudStack, Google Compute Engine, Microsoft Azure, OpenStack, and VMware vSphere.

To help support more underlying IaaS providers, BOSH uses the concept of a Cloud Provider Interface (CPI). There is an implementation of the CPI for each of the IaaS providers listed above. Typically the CPI is used to deploy VMs, but it can be used to deploy containers as well.

Few CPIs exist for deploying containers with BOSH and only one is actively supported. For this one, BOSH uses a CPI that deploys Pivotal Software's Garden containers (Garden is very similar to Docker) on a single virtual machine, run by VirtualBox or VMware Workstation. In theory, any other container engine could be supported, if the necessary CPIs were developed.

Due to BOSH indifferently supporting deployments on VMs or containers, BOSH uses the generic term “instances” to designate those. It is up to the CPI to choose whether a BOSH “instance” is actually a VM or a container.

Workflow

Once installed, a BOSH server accepts uploading root filesystems (called “stemcells”) and packages (called “releases”) to it. When a BOSH server has the necessary bits for deploying a given software system, it can be told to proceed, as described by a YAML deployment manifest. BOSH then progressively deploys “instances” (VMs or containers), using canaries to avoid deploying failing configurations.

Once a software system is deployed, BOSH monitors its instances continuously to allow detecting failing instances, and resurrecting any missing one.

When a BOSH deployment manifest is changed, BOSH accepts to roll out the implied modifications proceeding progressively, instance by instance. This means that BOSH can upgrade live clusters with possibly no downtime.

Concepts

Release

A BOSH release can either be an archive file or a git repository. In both cases, it describes a software system that can be deployed with BOSH. For this purpose, it packages up all related binary assets, source code, compilation scripts, configurable properties, startup scripts and templates for configuration files.

BOSH releases are made of “packages” and “jobs”. Roughly, BOSH packages provide something that can be run, and BOSH jobs describe how these things are configured and run.

A BOSH package details the necessary source code, binary assets (called “blobs”), and compilation scripts for building a given software component. There are two ways to provide binary “blobs”. In a BOSH release that is provided as an archive file, blobs are directly included. But with BOSH releases that are provided as git repositories, doing the same tends to be problematic when blobs get big. That's why a BOSH release provides a concept of “blobstore”, from where referenced blobs can be fetched. Most BOSH releases use blobstores that are backed by public Amazon S3 buckets, but there are other ways to refer to a private or a local “blobstore” in a BOSH release.

BOSH packages are always subject to a compilation phase, even if this just extracts files from an archive and copies them to the proper target directory. To compile a given package, BOSH spawns an ephemeral compilation instance (VM or container) that only includes any required packages and blobs, as declared by the package specification. In this dedicated instance, BOSH runs the compilation script, and seals the compilation result in its database, so that it can be safely used for reproducible deployments.

BOSH jobs on the other hand, provide configuration properties (that can possibly be documented), templates for configuration files, and startup scripts. BOSH jobs refer to one or many packages as dependencies. Jobs are also sealed into BOSH database, but the templates for configuration files are rendered at deploy time, where all configuration properties are resolved. These configuration properties are usually IP addresses, port numbers, user names, passwords, domain names, etc.

Stemcell

A BOSH stemcell packages the basics for creating a new instance (VM or container). Namely, a BOSH stemcell ships an Operating System image along with a BOSH agent and a copy of monit, which is used to manage the services (called “jobs”) that will be hosted by the instance. The BOSH agent helps BOSH communicate with the instance during all its life cycle.

The stemcell concept in BOSH is similar to Virtual Machine Images like Amazon's AMIs, but BOSH stemcells are not meant to be specialized for any particular usage. Instead, BOSH only provides different stemcells for supporting different Operating Systems (CentOS, Ubuntu or Windows), or different underlying IaaS providers (AWS or OpenStack).

The name “stemcell” originated from biological term “stem cells”, which refers to the undifferentiated cells that are able to grow into diverse cell types later. Similarly, instances created by a BOSH stemcell are identical at the beginning.

After inception, instances are configured with different CPU/memory/storage/network, and installed with different software packages. Hence, instances built from the same BOSH stemcell can behave differently.

BOSH Agent

The BOSH agent is a service that runs on every BOSH-deployed VM. It does the following:

Deployment

A BOSH deployment is basically a YAML deployment manifest, where the user describes the BOSH releases and BOSH stemcells to use, and how to set up and compose jobs into groups of identical instances (historically misnamed “jobs” and later renamed “instance groups”). Within these “instance groups”, BOSH can span identical instances (VMs or containers) across different availability zones, in order to minimise the risk for all instances to go down at the same time. This is particularly useful when deploying highly available databases or applications.

In most cases, users don't work with deployment manifest as one big YAML file. Instead, deployment manifest are split into smaller files that are easier to maintain. These separate files are merged by tools like spiff or spruce, right before they get uploaded to the BOSH server and deployed.

In a deployment manifest, all configuration properties, as declared by jobs from all referenced releases, can be customized. Different jobs can refer to configuration properties with same name, in order to share common settings.

Key principles

BOSH was purposefully constructed to address the four principles of modern release engineering in the following ways:

Identifiability

Being able to identify all of the source, tools, environment, and other components that make up a particular release. In its concept of “release”, BOSH packages up all related source code, binary assets, configurable properties, compilation scripts, and startup scripts. This allows users to easily track what is actually deployed, and how it is run. Additionally, BOSH provides a way to capture the root filesystems that will be the basis of deployed instances (VMs or containers), as single images called “stemcells”. BOSH releases and BOSH stemcells are identified by UUIDs and sealed by SHA-1 checksums.

Reproducibility

The ability to integrate source, third party components, data, and deployment externals of a software system in order to guarantee operational stability. BOSH tool chain provides a centralized server for operating the deployed systems. This server holds software “releases”, Operating System images (called “stemcells”), persistent data, and system configuration. Therefore, a given deployment is guaranteed to reproduce an identical result.

Consistency

The mission to provide a stable framework for development, deployment, audit, and accountability for software components. BOSH achieves such consistency with its software “releases”, that bring a consistent framework for developing and deploying the software systems. Moreover, audit and accountability are provided by the BOSH server, which allows users to see and track changes made to the deployed systems.

Agility

The ongoing research into what are the repercussions of modern software engineering practices on the productivity in the software cycle, i.e. Continuous Integration. BOSH tool chain integrates well with current best practices of software engineering (including Continuous Delivery) by providing ways to easily create software releases in an automated way and to update complex deployed systems with simple commands.

History

Designed to address shortcomings found in available tools to manage Cloud Foundry. Chef was used originally, but was limited in its ability to package, spin up/down servers, limited in monitoring and self-management capabilities. Originally developed for Cloud Foundry's own needs, but the project has now grown to be completely generic, and can be used for orchestration of other software such as Hadoop, RabbitMQ, MySQL and similar platform or application software.

Architecture

A BOSH installation is made of several separate components that can possibly be split across different VMs or containers:

A BOSH managed environment usually centers around the Director deployed on a VM.

BOSH Architecture Bosh-architecture.png
BOSH Architecture

Cloud / Platform / OS compatibility

BOSH connects to the underlying IaaS layer through an abstraction called the CPI (Cloud Provider Interface). There are CPIs available for Amazon Web Services, certain OpenStack versions, vSphere, vCloud. Some community maintained CPIs exist for Google Compute Engine, Microsoft Azure and CloudStack.

Deployment

BOSH can be deployed as a BOSH release, which may create a “chicken or egg” surprise for newcomers.

A BOSH server is not the only software that can deploy BOSH releases. There is a BOSH provisioner project that can deploy BOSH in a VM, a Docker container, or a bare metal server. This component is used by the BOSH packer provisioner, which creates a Vagrant box running BOSH-lite, which is what most users rely on when learning BOSH.

Governance

Once a sub-component of Cloud Foundry, BOSH is now a separate open source project, that aims at deploying any distributed software. BOSH is managed by the Cloud Foundry Foundation. Nearly all contributions to BOSH are made by Pivotal.

Users

Pivotal uses BOSH to orchestrate Cloud Foundry within Pivotal Cloud Foundry (PCF), as well as all of the Pivotal Data Services for Cloud Foundry. Announced public users of BOSH and PCF include Axel Springer, Corelogic, IBM, Monsanto, Philips, SAP, and Swisscom.

Distributions

BOSH is not commercially distributed as a standalone product. It is included as part of Pivotal Cloud Foundry, IBM Bluemix, and HP Helion Developer Platform, and is also used and supported commercially by Cloud Credo, Stark & Wayne, Gstack, and others.

Related Research Articles

In telecommunication, provisioning involves the process of preparing and equipping a network to allow it to provide new services to its users. In National Security/Emergency Preparedness telecommunications services, "provisioning" equates to "initiation" and includes altering the state of an existing priority service or capability.

IBM App Connect Enterprise is IBM's premier integration software offering, allowing business information to flow between disparate applications across multiple hardware and software platforms. Rules can be applied to the data flowing through user-authored integrations to route and transform the information. The product can be used as an Enterprise Service Bus supplying a communication channel between applications and services in a service-oriented architecture.

Infrastructure as a service (IaaS) is a cloud computing service model by means of which computing resources are supplied by a cloud services provider. The IaaS vendor provides the storage, network, servers, and virtualization (which mostly refers, in this case, to emulating computer hardware). This service enables users to free themselves from maintaining an on-premises data center. The IaaS provider is hosting these resources in either the public cloud (meaning users share the same hardware, storage, and network devices with other users), the private cloud (meaning users do not share these resources), or the hybrid cloud (combination of both).

Open Virtualization Format (OVF) is an open standard for packaging and distributing virtual appliances or, more generally, software to be run in virtual machines.

<span class="mw-page-title-main">Puppet (software)</span> Open source configuration management software

Puppet is a software configuration management tool which includes its own declarative language to describe system configuration. It is a model-driven solution that requires limited programming knowledge to use.

In computing, virtualization or virtualisation is the act of creating a virtual version of something at the same abstraction level, including virtual computer hardware platforms, storage devices, and computer network resources.

Oracle VM Server for x86 is the server virtualization offering from Oracle Corporation. Oracle VM Server for x86 incorporates the free and open-source Xen hypervisor technology, supports Windows, Linux, and Solaris guests and includes an integrated Web based management console. Oracle VM Server for x86 features fully tested and certified Oracle Applications stack in an enterprise virtualization environment.

<span class="mw-page-title-main">Cloud computing</span> Form of shared Internet-based computing

Cloud computing is the on-demand availability of computer system resources, especially data storage and computing power, without direct active management by the user. Large clouds often have functions distributed over multiple locations, each of which is a data center. Cloud computing relies on sharing of resources to achieve coherence and typically uses a pay-as-you-go model, which can help in reducing capital expenses but may also lead to unexpected operating expenses for users.

ElasticHosts was a computer service provider based in London, England. It was founded in March 2008, and closed on 30 June 2020. It provided a cloud computing service, which used ten data centres; in the United Kingdom, Netherlands, United States, Canada, Hong Kong and Australia.

<span class="mw-page-title-main">OpenNebula</span> Cloud-computing platform for managing heterogeneous distributed infrastructure

OpenNebula is an open source cloud computing platform for managing heterogeneous data center, public cloud and edge computing infrastructure resources. OpenNebula manages on-premise and remote virtual infrastructure to build private, public, or hybrid implementations of Infrastructure as a Service and multi-tenant Kubernetes deployments. The two primary uses of the OpenNebula platform are data center virtualization and cloud deployments based on the KVM hypervisor, LXD/LXC system containers, and AWS Firecracker microVMs. The platform is also capable of offering the cloud infrastructure necessary to operate a cloud on top of existing VMware infrastructure. In early June 2020, OpenNebula announced the release of a new Enterprise Edition for corporate users, along with a Community Edition. OpenNebula CE is free and open-source software, released under the Apache License version 2. OpenNebula CE comes with free access to patch releases containing critical bug fixes but with no access to the regular EE maintenance releases. Upgrades to the latest minor/major version is only available for CE users with non-commercial deployments or with significant open source contributions to the OpenNebula Community. OpenNebula EE is distributed under a closed-source license and requires a commercial Subscription.

<span class="mw-page-title-main">Cloud Foundry</span> Open source, multi-cloud application platform as a service

Cloud Foundry is an open source, multi-cloud application platform as a service (PaaS) governed by the Cloud Foundry Foundation, a 501(c)(6) organization.

Google Compute Engine (GCE) is the Infrastructure as a Service (IaaS) component elo of Google Cloud Platform which is built on the global infrastructure that runs Google's search engine, Gmail, YouTube and other services. Google Compute Engine enables users to launch virtual machines (VMs) on demand. VMs can be launched from the standard images or custom images created by users. GCE users must authenticate based on OAuth 2.0 before launching the VMs. Google Compute Engine can be accessed via the Developer Console, RESTful API or command-line interface (CLI).

Network functions virtualization (NFV) is a network architecture concept that leverages IT virtualization technologies to virtualize entire classes of network node functions into building blocks that may connect, or chain together, to create and deliver communication services.

Synnefo is a complete open-source cloud stack written in Python that provides Compute, Network, Image, Volume and Storage services, similar to the ones offered by AWS. Synnefo manages multiple Google Ganeti clusters at the backend that handle low-level VM operations and uses Archipelago to unify cloud storage. To boost 3rd-party compatibility, Synnefo exposes the OpenStack APIs to users.

In distributed system and system resource, elasticity is defined as "the degree to which a system is able to adapt to workload changes by provisioning and de-provisioning resources in an autonomic manner, such that at each point in time the available resources match the current demand as closely as possible". Elasticity is a defining characteristic that differentiates cloud computing from previously proposed computing paradigms, such as grid computing. The dynamic adaptation of capacity, e.g., by altering the use of computing resources, to meet a varying workload is called "elastic computing".

OrionVM Wholesale Pty Limited is an Australian infrastructure as a service provider and white-label cloud platform. Resellers present customers with a rebranded interface for deploying virtual machine instances, which are only billed for what their customers use. Cloud Harmony benchmarked the OrionVM Cloud Platform's InfiniBand-backed network storage as the world's fastest in 2011.

Kubernetes is an open-source container orchestration system for automating software deployment, scaling, and management. Originally designed by Google, the project is now maintained by the Cloud Native Computing Foundation.

In computing, a system virtual machine is a virtual machine (VM) that provides a complete system platform and supports the execution of a complete operating system (OS). These usually emulate an existing architecture, and are built with the purpose of either providing a platform to run programs where the real hardware is not available for use, or of having multiple instances of virtual machines leading to more efficient use of computing resources, both in terms of energy consumption and cost effectiveness, or both. A VM was originally defined by Popek and Goldberg as "an efficient, isolated duplicate of a real machine".

<span class="mw-page-title-main">Veeam Backup & Replication</span> Backup and disaster recovery software

Veeam Backup & Replication is a proprietary backup app developed by Veeam for virtual environments built on VMware vSphere, Nutanix AHV, and Microsoft Hyper-V hypervisors. The software provides backup, restore and replication functionality for virtual machines, physical servers and workstations as well as cloud-based workload.

<span class="mw-page-title-main">Oracle Cloud</span> Cloud computing service

Oracle Cloud is a cloud computing service offered by Oracle Corporation providing servers, storage, network, applications and services through a global network of Oracle Corporation managed data centers. The company allows these services to be provisioned on demand over the Internet.

References

  1. "Release 277.4.3". 25 July 2023. Retrieved 29 July 2023.
  2. "LICENSE file" . Retrieved 5 November 2019 via GitHub.
  3. "[cf-dev] Announcement: BOSH for Windows - cf-dev - Mailing-List Archives". lists.cloudfoundry.org. Archived from the original on 2016-04-25. Retrieved 2016-03-31.
  4. Image source