Kubeflow

Last updated
Kubeflow
Original author(s) Google
Developer(s) Kubeflow Contributors [1] - AWS, Bloomberg, Google, IBM, NVIDIA, Nutanix, Red Hat, Arrikto, and others
Initial releaseApril 5, 2018;6 years ago (2018-04-05) [2]
Stable release
1.9 [3] / July 22, 2024;4 months ago (2024-07-22)
Repository github.com/kubeflow
Written in Go, Python, TypeScript
Platform Kubernetes
Type Machine Learning Platform
License Apache License 2.0
Website kubeflow.org

Kubeflow is an open-source platform for machine learning and MLOps on Kubernetes introduced by Google. The different stages in a typical machine learning lifecycle are represented with different software components in Kubeflow, including model development (Kubeflow Notebooks [4] ), model training (Kubeflow Pipelines , [5] Kubeflow Training Operator [6] ), model serving (KServe [a] [7] ), and automated machine learning (Katib [8] ).

Contents

Each component of Kubeflow can be deployed separately, and it is not a requirement to deploy every component. [9]

History

The Kubeflow project was first announced at KubeCon + CloudNativeCon North America 2017 by Google engineers David Aronchick, Jeremy Lewi, and Vishnu Kannan [10] to address a perceived lack of flexible options for building production-ready machine learning systems. [11] The project has also stated it began as a way for Google to open-source how they ran TensorFlow internally. [12]

The first release of Kubeflow (Kubeflow 0.1) was announced at KubeCon + CloudNativeCon Europe 2018. [13] [14] Kubeflow 1.0 was released in March 2020 via a public blog post announcing that many Kubeflow components were graduating to a "stable status", indicating they were now ready for production usage. [15]

In October 2022, Google announced that the Kubeflow project had applied to join the Cloud Native Computing Foundation. [16] [17] In July 2023, the foundation voted to accept Kubeflow as an incubating stage project. [18] [19]

Components

Kubeflow Notebooks for model development

Machine learning models are developed in the notebooks component called Kubeflow Notebooks. The component runs web-based development environments inside a Kubernetes cluster, with native support for Jupyter Notebook, Visual Studio Code, and RStudio. [20]

Kubeflow Pipelines for model training

Once developed, models are trained in the Kubeflow Pipelines component. The component acts as a platform for building and deploying portable, scalable machine learning workflows based on Docker containers. [21] Google Cloud Platform has adopted the Kubeflow Pipelines DSL within its Vertex AI Pipelines product. [22]

Kubeflow Training Operator for model training

For certain machine learning models and libraries, the Kubeflow Training Operator component provides Kubernetes custom resources support. The component runs distributed or non-distributed TensorFlow, PyTorch, Apache MXNet, XGBoost, and MPI training jobs on Kubernetes. [6]

KServe for model serving

The KServe component (previously named KFServing [23] ) provides Kubernetes custom resources for serving machine learning models on arbitrary frameworks including TensorFlow, XGBoost, scikit-learn, PyTorch, and ONNX. [24] KServe was developed collaboratively by Google, IBM, Bloomberg, NVIDIA, and Seldon. [23] Publicly disclosed adopters of KServe include Bloomberg, [25] Gojek, [26] the Wikimedia Foundation, [27] and others. [28]

Katib for automated machine learning

Lastly, Kubeflow includes a component for automated training and development of machine learning models, the Katib component. It is described as a Kubernetes-native project and features hyperparameter tuning, early stopping, and neural architecture search. [29]

Release timeline

Release timeline
VersionRelease DateRelease InformationRelease Blog
Kubeflow 0.15 April, 2018 [2] - https://kubernetes.io/blog/2018/05/04/announcing-kubeflow-0.1/
Kubeflow 0.22 July, 2018 [30] - https://medium.com/kubeflow/kubeflow-0-2-offers-new-components-and-simplified-setup-735e4c56988d
Kubeflow 0.35 October, 2018 [31] - https://medium.com/kubeflow/kubeflow-0-3-simplifies-setup-improves-ml-development-98b8ca10bd69
Kubeflow 0.48 January, 2019 [32] - https://medium.com/kubeflow/kubeflow-0-4-release-enhancements-for-machine-learning-productivity-d77c54df07a9
Kubeflow 0.59 April, 2019 [33] - https://medium.com/kubeflow/kubeflow-v0-5-simplifies-model-development-with-enhanced-ui-and-fairing-library-78e19cdc9f50
Kubeflow 0.619 July, 2019 [34] https://www.kubeflow.org/docs/releases/kubeflow-0.6/ https://medium.com/kubeflow/kubeflow-v0-6-a-robust-foundation-for-artifact-tracking-data-versioning-multi-user-support-9896d329412c
Kubeflow 0.717 October, 2019 [35] https://www.kubeflow.org/docs/releases/kubeflow-0.7/ https://medium.com/kubeflow/kubeflow-v0-7-delivers-beta-functionality-in-the-leadup-to-v1-0-1e63036c07b8
Kubeflow 1.020 February, 2020 [36] https://www.kubeflow.org/docs/releases/kubeflow-1.0/ https://blog.kubeflow.org/releases/2020/03/02/kubeflow-1-0-cloud-native-ml-for-everyone
Kubeflow 1.131 July, 2020 [37] https://www.kubeflow.org/docs/releases/kubeflow-1.1/ https://blog.kubeflow.org/release/official/2020/07/31/kubeflow-1.1-blog-post
Kubeflow 1.218 November, 2020 [38] https://www.kubeflow.org/docs/releases/kubeflow-1.2/ https://blog.kubeflow.org/release/official/2020/11/18/kubeflow-1.2-blog-post
Kubeflow 1.323 April, 2021 [39] https://www.kubeflow.org/docs/releases/kubeflow-1.3/ https://blog.kubeflow.org/kubeflow-1.3-release/
Kubeflow 1.412 October, 2021 [40] https://www.kubeflow.org/docs/releases/kubeflow-1.4/ https://blog.kubeflow.org/kubeflow-1.4-release/
Kubeflow 1.510 March, 2022 [41] https://www.kubeflow.org/docs/releases/kubeflow-1.5/ https://blog.kubeflow.org/kubeflow-1.5-release/
Kubeflow 1.67 September, 2022 [42] https://www.kubeflow.org/docs/releases/kubeflow-1.6/ https://blog.kubeflow.org/kubeflow-1.6-release/
Kubeflow 1.729 March, 2023 [43] https://www.kubeflow.org/docs/releases/kubeflow-1.7/ https://blog.kubeflow.org/kubeflow-1.7-release/
Kubeflow 1.81 November, 2023 [44] https://www.kubeflow.org/docs/releases/kubeflow-1.8/ https://blog.kubeflow.org/kubeflow-1.8-release/
Kubeflow 1.922 July, 2024 [3] https://www.kubeflow.org/docs/releases/kubeflow-1.9/ https://blog.kubeflow.org/kubeflow-1.9-release/

Notes

  1. KServe was previously known as KFServing [b]

Related Research Articles

<span class="mw-page-title-main">Vertica</span> Software company

Vertica is an analytic database management software company. Vertica was founded in 2005 by the database researcher Michael Stonebraker with Andrew Palmer as the founding CEO. Ralph Breslauer and Christopher P. Lynch served as CEOs later on.

<span class="mw-page-title-main">OpenShift</span> Cloud computing software

OpenShift is a family of containerization software products developed by Red Hat. Its flagship product is the OpenShift Container Platform — a hybrid cloud platform as a service built around Linux containers orchestrated and managed by Kubernetes on a foundation of Red Hat Enterprise Linux. The family's other products provide this platform through different environments: OKD serves as the community-driven upstream, Several deployment methods are available including self-managed, cloud native under ROSA, ARO and RHOIC on AWS, Azure, and IBM Cloud respectively, OpenShift Online as software as a service, and OpenShift Dedicated as a managed service.

Google Cloud Platform (GCP) is a suite of cloud computing services offered by Google that provides a series of modular cloud services including computing, data storage, data analytics, and machine learning, alongside a set of management tools. It runs on the same infrastructure that Google uses internally for its end-user products, such as Google Search, Gmail, and Google Docs, according to Verma et al. Registration requires a credit card or bank account details.

Kubernetes is an open-source container orchestration system for automating software deployment, scaling, and management. Originally designed by Google, the project is now maintained by a worldwide community of contributors, and the trademark is held by the Cloud Native Computing Foundation.

<span class="mw-page-title-main">TensorFlow</span> Machine learning software library

TensorFlow is a software library for machine learning and artificial intelligence. It can be used across a range of tasks, but is used mainly for training and inference of neural networks. It is one of the most popular deep learning frameworks, alongside others such as PyTorch and PaddlePaddle. It is free and open-source software released under the Apache License 2.0.

<span class="mw-page-title-main">XGBoost</span> Gradient boosting machine learning library

XGBoost is an open-source software library which provides a regularizing gradient boosting framework for C++, Java, Python, R, Julia, Perl, and Scala. It works on Linux, Microsoft Windows, and macOS. From the project description, it aims to provide a "Scalable, Portable and Distributed Gradient Boosting Library". It runs on a single machine, as well as the distributed processing frameworks Apache Hadoop, Apache Spark, Apache Flink, and Dask.

<span class="mw-page-title-main">Keras</span> Neural network library

Keras is an open-source library that provides a Python interface for artificial neural networks. Keras was first independent software, then integrated into the TensorFlow library, and later supporting more. "Keras 3 is a full rewrite of Keras [and can be used] as a low-level cross-framework language to develop custom components such as layers, models, or metrics that can be used in native workflows in JAX, TensorFlow, or PyTorch — with one codebase." Keras 3 will be the default Keras version for TensorFlow 2.16 onwards, but Keras 2 can still be used.

spaCy Software library for natural language processing

spaCy is an open-source software library for advanced natural language processing, written in the programming languages Python and Cython. The library is published under the MIT license and its main developers are Matthew Honnibal and Ines Montani, the founders of the software company Explosion.

<span class="mw-page-title-main">Prometheus (software)</span> Application used for event monitoring and alerting

Prometheus is a free software application used for event monitoring and alerting. It records metrics in a time series database built using an HTTP pull model, with flexible queries and real-time alerting. The project is written in Go and licensed under the Apache 2 License, with source code available on GitHub, and is a graduated project of the Cloud Native Computing Foundation, along with Kubernetes and Envoy.

<span class="mw-page-title-main">Dask (software)</span> Python library for parallel computing

Dask is an open-source Python library for parallel computing. Dask scales Python code from multi-core local machines to large distributed clusters in the cloud. Dask provides a familiar user interface by mirroring the APIs of other libraries in the PyData ecosystem including: Pandas, scikit-learn and NumPy. It also exposes low-level APIs that help programmers run custom algorithms in parallel.

TiDB is an open-source NewSQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. Designed to be MySQL compatible, it is developed and supported primarily by PingCAP and licensed under Apache 2.0. It is also available as a paid product. TiDB drew its initial design inspiration from Google's Spanner and F1 papers.

<span class="mw-page-title-main">IBM Cloud</span> Cloud computing services provided by IBM

IBM Cloud is a set of cloud computing services for business offered by the information technology company IBM.

The Cloud Native Computing Foundation (CNCF) is a Linux Foundation project that was started in 2015 to help advance container technology and align the tech industry around its evolution.

LightGBM, short for Light Gradient-Boosting Machine, is a free and open-source distributed gradient-boosting framework for machine learning, originally developed by Microsoft. It is based on decision tree algorithms and used for ranking, classification and other machine learning tasks. The development focus is on performance and scalability.

Helm is a package manager for Kubernetes. It uses 'charts' as its package format, which is based on YAML. Helm was accepted to Cloud Native Computing Foundation on June 1, 2018 at the Incubating maturity level and then moved to the Graduated maturity level on May 1, 2020.

Open Service Mesh (OSM) was a free and open source cloud native service mesh developed by Microsoft that ran on Kubernetes.

<span class="mw-page-title-main">CatBoost</span> Open-source software library developed by Yandex

CatBoost is an open-source software library developed by Yandex. It provides a gradient boosting framework which, among other features, attempts to solve for categorical features using a permutation-driven alternative to the classical algorithm. It works on Linux, Windows, macOS, and is available in Python, R, and models built using CatBoost can be used for predictions in C++, Java, C#, Rust, Core ML, ONNX, and PMML. The source code is licensed under Apache License and available on GitHub.

<span class="mw-page-title-main">Dapr</span> Event-driven, portable runtime for building microservices on cloud and edge

Dapr is a free and open source runtime system designed to support cloud native and serverless computing. Its initial release supported SDKs and APIs for Java, .NET, Python, and Go, and targeted the Kubernetes cloud deployment system.

<span class="mw-page-title-main">Data Version Control (software)</span> Open source version system

DVC is a free and open-source, platform-agnostic version system for data, machine learning models, and experiments. It is designed to make ML models shareable, experiments reproducible, and to track versions of models, data, and pipelines. DVC works on top of Git repositories and cloud storage.

<span class="mw-page-title-main">Cilium (computing)</span> Open source cloud computing software

Cilium is a cloud native technology for networking, observability, and security. It is based on the kernel technology eBPF, originally for better networking performance, and now leverages many additional features for different use cases. The core networking component has evolved from only providing a flat Layer 3 network for containers to including advanced networking features, like BGP and Service mesh, within a Kubernetes cluster, across multiple clusters, and connecting with the world outside Kubernetes. Hubble was created as the network observability component and Tetragon was later added for security observability and runtime enforcement. Cilium runs on Linux and is one of the first eBPF applications being ported to Microsoft Windows through the eBPF on Windows project.

References

  1. "Kubeflow Website - Working Groups".
  2. 1 2 "Kubeflow 0.1 - Release Tag". GitHub .
  3. 1 2 "Kubeflow 1.9 - Release Information".
  4. "Kubeflow Website - Kubeflow Notebooks".
  5. "Kubeflow Website - Kubeflow Pipelines".
  6. 1 2 "Kubeflow GitHub - Kubeflow Training Operator". GitHub .
  7. "Kubeflow Website - KServe".
  8. "Kubeflow Website - Katib".
  9. "Kubeflow Website - Installing Kubeflow".
  10. ""Hot Dogs or Not" - At Scale with Kubernetes [I] - Vish Kannan & David Aronchick, Google". YouTube . 15 December 2017.
  11. "Introducing Kubeflow - A Composable, Portable, Scalable ML Stack Built for Kubernetes". 21 December 2017.
  12. "Kubeflow Website - History".
  13. "Google-led Kubeflow, machine learning for Kubernetes, begins to take shape". 4 May 2018.
  14. "Announcing Kubeflow 0.1". 4 May 2018.
  15. "Kubeflow 1.0: Cloud-Native ML for Everyone". 2 March 2020.
  16. Lamkin, Thea (2022-10-24). "Kubeflow has applied to become a CNCF incubating project". Kubeflow. Retrieved 2023-11-02.
  17. "Kubeflow applies to become a CNCF incubating project". Google Open Source Blog. 2022-10-24. Retrieved 2023-11-02.
  18. "Kubeflow brings MLOps to the CNCF Incubator". Cloud Native Computing Foundation. 2023-07-25. Retrieved 2023-11-02.
  19. "Kubeflow joins the CNCF family". Google Open Source Blog. 2023-07-25. Retrieved 2023-11-02.
  20. "Kubeflow Website - Kubeflow Notebooks Overview".
  21. "Kubeflow Website - Kubeflow Pipelines Introduction".
  22. "Vertex AI - Building a pipeline".
  23. 1 2 "KServe: The next generation of KFServing". 27 September 2021.
  24. "KServe GitHub". GitHub .
  25. "The journey to build Bloomberg's ML Inference Platform Using KServe (formerly KFServing)". Bloomberg L.p. 12 October 2021.
  26. "Merlin: Making ML Model Deployments Magical".
  27. "Machine Learning/LiftWing".
  28. "KServe Website - Adopters of KServe".
  29. "Kubeflow GitHub - Katib". GitHub .
  30. "Kubeflow 0.2 - Release Tag". GitHub .
  31. "Kubeflow 0.3 - Release Tag". GitHub .
  32. "Kubeflow 0.4 - Release Tag". GitHub .
  33. "Kubeflow 0.5 - Release Tag". GitHub .
  34. "Kubeflow 0.6 - Release Information".
  35. "Kubeflow 0.7 - Release Information".
  36. "Kubeflow 1.0 - Release Information".
  37. "Kubeflow 1.1 - Release Information".
  38. "Kubeflow 1.2 - Release Information".
  39. "Kubeflow 1.3 - Release Information".
  40. "Kubeflow 1.4 - Release Information".
  41. "Kubeflow 1.5 - Release Information".
  42. "Kubeflow 1.6 - Release Information".
  43. "Kubeflow 1.7 - Release Information".
  44. "Kubeflow 1.8 - Release Information".