Kubeflow

Last updated
Kubeflow
Original author(s) Google
Developer(s) Kubeflow Contributors [1] - AWS, Bloomberg, Google, IBM, NVIDIA, Nutanix, Red Hat, Arrikto, and others
Initial releaseApril 5, 2018;5 years ago (2018-04-05) [2]
Stable release
1.8 [3] / November 1, 2023;18 days ago (2023-11-01)
Repository github.com/kubeflow
Written in Go, Python
Platform Kubernetes
Type Machine Learning Platform
License Apache License 2.0
Website kubeflow.org

Kubeflow is an open-source platform for machine learning and MLOps on Kubernetes introduced by Google. The different stages in a typical machine learning lifecycle are represented with different software components in Kubeflow, including model development (Kubeflow Notebooks [4] ), model training (Kubeflow Pipelines , [5] Kubeflow Training Operator [6] ), model serving (KServe [lower-alpha 1] [7] ), and automated machine learning (Katib [8] ).

Contents

Each component of Kubeflow can be deployed separately, and it is not a requirement to deploy every component. [9]

History

The Kubeflow project was first announced at KubeCon + CloudNativeCon North America 2017 by Google engineers David Aronchick, Jeremy Lewi, and Vishnu Kannan [10] to address a perceived lack of flexible options for building production-ready machine learning systems. [11] The project has also stated it began as a way for Google to open-source how they ran TensorFlow internally. [12]

The first release of Kubeflow (Kubeflow 0.1) was announced at KubeCon + CloudNativeCon Europe 2018 [13] with claims of having already become among the top 2% of GitHub projects ever. [14] Kubeflow 1.0 was released in March 2020 via a public blog post announcing that many Kubeflow components were graduating to a "stable status", indicating they were now ready for production usage. [15]

In October 2022, Google announced that the Kubeflow project had applied to join the Cloud Native Computing Foundation. [16] [17] In July 2023, the foundation voted to accept Kubeflow as an incubating stage project. [18] [19]

Components

Kubeflow Notebooks for model development

Machine learning models are developed in the notebooks component called Kubeflow Notebooks. The component runs web-based development environments inside a Kubernetes cluster, with native support for Jupyter Notebook, Visual Studio Code, and RStudio. [20]

Kubeflow Pipelines for model training

Once developed, models are trained in the Kubeflow Pipelines component. The component acts as a platform for building and deploying portable, scalable machine learning workflows based on Docker containers. [21] Google Cloud Platform has adopted the Kubeflow Pipelines DSL within its Vertex AI Pipelines product. [22]

Kubeflow Training Operator for model training

For certain machine learning models and libraries, the Kubeflow Training Operator component provides Kubernetes custom resources support. The component runs distributed or non-distributed TensorFlow, PyTorch, Apache MXNet, XGBoost, and MPI training jobs on Kubernetes. [6]

KServe for model serving

The KServe component (previously named KFServing [23] ) provides Kubernetes custom resources for serving machine learning models on arbitrary frameworks including TensorFlow, XGBoost, scikit-learn, PyTorch, and ONNX. [24] KServe was developed collaboratively by Google, IBM, Bloomberg, NVIDIA, and Seldon. [23] Publicly disclosed adopters of KServe include Bloomberg, [25] Gojek, [26] and others. [27]

Katib for automated machine learning

Lastly, Kubeflow includes a component for automated training and development of machine learning models, the Katib component. It is described as a Kubernetes-native project and features hyperparameter tuning, early stopping, and neural architecture search. [28]

Release timeline

Release timeline
VersionRelease DateRelease InformationRelease Blog
Kubeflow 0.15 April, 2018 [2] - https://kubernetes.io/blog/2018/05/04/announcing-kubeflow-0.1/
Kubeflow 0.22 July, 2018 [29] - https://medium.com/kubeflow/kubeflow-0-2-offers-new-components-and-simplified-setup-735e4c56988d
Kubeflow 0.35 October, 2018 [30] - https://medium.com/kubeflow/kubeflow-0-3-simplifies-setup-improves-ml-development-98b8ca10bd69
Kubeflow 0.48 January, 2019 [31] - https://medium.com/kubeflow/kubeflow-0-4-release-enhancements-for-machine-learning-productivity-d77c54df07a9
Kubeflow 0.59 April, 2019 [32] - https://medium.com/kubeflow/kubeflow-v0-5-simplifies-model-development-with-enhanced-ui-and-fairing-library-78e19cdc9f50
Kubeflow 0.619 July, 2019 [33] https://www.kubeflow.org/docs/releases/kubeflow-0.6/ https://medium.com/kubeflow/kubeflow-v0-6-a-robust-foundation-for-artifact-tracking-data-versioning-multi-user-support-9896d329412c
Kubeflow 0.717 October, 2019 [34] https://www.kubeflow.org/docs/releases/kubeflow-0.7/ https://medium.com/kubeflow/kubeflow-v0-7-delivers-beta-functionality-in-the-leadup-to-v1-0-1e63036c07b8
Kubeflow 1.020 February, 2020 [35] https://www.kubeflow.org/docs/releases/kubeflow-1.0/ https://blog.kubeflow.org/releases/2020/03/02/kubeflow-1-0-cloud-native-ml-for-everyone
Kubeflow 1.131 July, 2020 [36] https://www.kubeflow.org/docs/releases/kubeflow-1.1/ https://blog.kubeflow.org/release/official/2020/07/31/kubeflow-1.1-blog-post
Kubeflow 1.218 November, 2020 [37] https://www.kubeflow.org/docs/releases/kubeflow-1.2/ https://blog.kubeflow.org/release/official/2020/11/18/kubeflow-1.2-blog-post
Kubeflow 1.323 April, 2021 [38] https://www.kubeflow.org/docs/releases/kubeflow-1.3/ https://blog.kubeflow.org/kubeflow-1.3-release/
Kubeflow 1.412 October, 2021 [39] https://www.kubeflow.org/docs/releases/kubeflow-1.4/ https://blog.kubeflow.org/kubeflow-1.4-release/
Kubeflow 1.510 March, 2022 [40] https://www.kubeflow.org/docs/releases/kubeflow-1.5/ https://blog.kubeflow.org/kubeflow-1.5-release/
Kubeflow 1.67 September, 2022 [41] https://www.kubeflow.org/docs/releases/kubeflow-1.6/ https://blog.kubeflow.org/kubeflow-1.6-release/
Kubeflow 1.729 March, 2023 [42] https://www.kubeflow.org/docs/releases/kubeflow-1.7/ https://blog.kubeflow.org/kubeflow-1.7-release/
Kubeflow 1.81 November, 2023 [3] https://www.kubeflow.org/docs/releases/kubeflow-1.8/ https://blog.kubeflow.org/kubeflow-1.8-release/

Notes

  1. KServe was previously known as KFServing [lower-alpha 2]

Related Research Articles

<span class="mw-page-title-main">Orange (software)</span>

Orange is an open-source data visualization, machine learning and data mining toolkit. It features a visual programming front-end for explorative qualitative data analysis and interactive data visualization.

<span class="mw-page-title-main">Vertica</span> Software company

Vertica is an analytic database management software company. Vertica was founded in 2005 by the database researcher Michael Stonebraker with Andrew Palmer as the founding CEO. Ralph Breslauer and Christopher P. Lynch served as CEOs later on.

<span class="mw-page-title-main">OpenShift</span> Cloud computing software

OpenShift is a family of containerization software products developed by Red Hat. Its flagship product is the OpenShift Container Platform — a hybrid cloud platform as a service built around Linux containers orchestrated and managed by Kubernetes on a foundation of Red Hat Enterprise Linux. The family's other products provide this platform through different environments: OKD serves as the community-driven upstream, Several deployment methods are available including self-managed, cloud native under ROSA, ARO and RHOIC on AWS, Azure, and IBM Cloud respectively, OpenShift Online as software as a service, and OpenShift Dedicated as a managed service.

Google Cloud Platform (GCP), offered by Google, is a suite of cloud computing services that provides a series of modular cloud services including computing, data storage, data analytics and machine learning, alongside a set of management tools. It runs on the same infrastructure that Google uses internally for its end-user products, such as Google Search, Gmail, and Google Docs, according to Verma, et.al. Registration requires a credit card or bank account details.

Kubernetes is an open-source container orchestration system for automating software deployment, scaling, and management. Originally designed by Google, the project is now maintained by the Cloud Native Computing Foundation.

Fluentd is a cross-platform open-source data collection software project originally developed at Treasure Data. It is written primarily in the Ruby programming language.

<span class="mw-page-title-main">TensorFlow</span> Machine learning software library

TensorFlow is a free and open-source software library for machine learning and artificial intelligence. It can be used across a range of tasks but has a particular focus on training and inference of deep neural networks.

<span class="mw-page-title-main">XGBoost</span> Gradient boosting machine learning library

XGBoost is an open-source software library which provides a regularizing gradient boosting framework for C++, Java, Python, R, Julia, Perl, and Scala. It works on Linux, Microsoft Windows, and macOS. From the project description, it aims to provide a "Scalable, Portable and Distributed Gradient Boosting Library". It runs on a single machine, as well as the distributed processing frameworks Apache Hadoop, Apache Spark, Apache Flink, and Dask.

<span class="mw-page-title-main">Notebook interface</span> Programming tool blending code and documents

A notebook interface or computational notebook is a virtual notebook environment used for literate programming, a method of writing computer programs. Some notebooks are WYSIWYG environments including executable calculations embedded in formatted documents; others separate calculations and text into separate sections. Notebooks share some goals and features with spreadsheets and word processors but go beyond their limited data models.

spaCy Software library

spaCy is an open-source software library for advanced natural language processing, written in the programming languages Python and Cython. The library is published under the MIT license and its main developers are Matthew Honnibal and Ines Montani, the founders of the software company Explosion.

<span class="mw-page-title-main">Prometheus (software)</span> Application used for event monitoring and alerting

Prometheus is a free software application used for event monitoring and alerting. It records metrics in a time series database built using an HTTP pull model, with flexible queries and real-time alerting. The project is written in Go and licensed under the Apache 2 License, with source code available on GitHub, and is a graduated project of the Cloud Native Computing Foundation, along with Kubernetes and Envoy.

TiDB is an open-source NewSQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. It is MySQL compatible and can provide horizontal scalability, strong consistency, and high availability. It is developed and supported primarily by PingCAP and licensed under Apache 2.0. TiDB drew its initial design inspiration from Google's Spanner and F1 papers.

<span class="mw-page-title-main">IBM Cloud</span> Cloud computing services provided by IBM

IBM Cloud is a set of cloud computing services for business offered by the information technology company IBM.

Cloud native computing is an approach in software development that utilizes cloud computing to "build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds". These technologies such as containers, microservices, serverless functions, cloud native processors and immutable infrastructure, deployed via declarative code are common elements of this architectural style. Cloud native technologies focus on minimizing users' operational burden.

The Cloud Native Computing Foundation (CNCF) is a Linux Foundation project that was founded in 2015 to help advance container technology and align the tech industry around its evolution.

LightGBM, short for light gradient-boosting machine, is a free and open-source distributed gradient-boosting framework for machine learning, originally developed by Microsoft. It is based on decision tree algorithms and used for ranking, classification and other machine learning tasks. The development focus is on performance and scalability.

Open Service Mesh (OSM) was a free and open source cloud native service mesh developed by Microsoft that ran on Kubernetes.

<span class="mw-page-title-main">Dapr</span> Event-driven, portable runtime for building microservices on cloud and edge

Dapr is a free and open source runtime system designed to support cloud native and serverless computing. Its initial release supported SDKs and APIs for Java, .NET, Python, and Go, and targeted the Kubernetes cloud deployment system.

<span class="mw-page-title-main">Seldon (company)</span> British technology company

Seldon Technologies Limited is a British technology company founded in 2014, and headquartered in London, England. It makes MLOps software for enterprise deployment of machine learning models, and is a primary maintainer and contributor to a number of popular open source repositories such as Seldon Core.

<span class="mw-page-title-main">Data Version Control (software)</span>

DVC is a free and open-source, platform-agnostic version system for data, machine learning models, and experiments. It is designed to make ML models shareable, experiments reproducible, and to track versions of models, data, and pipelines. DVC works on top of Git repositories and cloud storage.

References

  1. "Kubeflow Website - Working Groups".
  2. 1 2 "Kubeflow 0.1 - Release Tag". GitHub .
  3. 1 2 "Kubeflow 1.8 - Release Information".
  4. "Kubeflow Website - Kubeflow Notebooks".
  5. "Kubeflow Website - Kubeflow Pipelines".
  6. 1 2 "Kubeflow GitHub - Kubeflow Training Operator". GitHub .
  7. "Kubeflow Website - KServe".
  8. "Kubeflow Website - Katib".
  9. "Kubeflow Website - Installing Kubeflow".
  10. ""Hot Dogs or Not" - At Scale with Kubernetes [I] - Vish Kannan & David Aronchick, Google". YouTube .
  11. "Introducing Kubeflow - A Composable, Portable, Scalable ML Stack Built for Kubernetes". 21 December 2017.
  12. "Kubeflow Website - History".
  13. "Google-led Kubeflow, machine learning for Kubernetes, begins to take shape". 4 May 2018.
  14. "Announcing Kubeflow 0.1". 4 May 2018.
  15. "Kubeflow 1.0: Cloud-Native ML for Everyone". 2 March 2020.
  16. Lamkin, Thea (2022-10-24). "Kubeflow has applied to become a CNCF incubating project". Kubeflow. Retrieved 2023-11-02.
  17. "Kubeflow applies to become a CNCF incubating project". Google Open Source Blog. 2022-10-24. Retrieved 2023-11-02.
  18. "Kubeflow brings MLOps to the CNCF Incubator". Cloud Native Computing Foundation. 2023-07-25. Retrieved 2023-11-02.
  19. "Kubeflow joins the CNCF family". Google Open Source Blog. 2023-07-25. Retrieved 2023-11-02.
  20. "Kubeflow Website - Kubeflow Notebooks Overview".
  21. "Kubeflow Website - Kubeflow Pipelines Introduction".
  22. "Vertex AI - Building a pipeline".
  23. 1 2 "KServe: The next generation of KFServing". 27 September 2021.
  24. "KServe GitHub". GitHub .
  25. "The journey to build Bloomberg's ML Inference Platform Using KServe (formerly KFServing)". Bloomberg L.p. 12 October 2021.
  26. "Merlin: Making ML Model Deployments Magical".
  27. "KServe Website - Adopters of KServe".
  28. "Kubeflow GitHub - Katib". GitHub .
  29. "Kubeflow 0.2 - Release Tag". GitHub .
  30. "Kubeflow 0.3 - Release Tag". GitHub .
  31. "Kubeflow 0.4 - Release Tag". GitHub .
  32. "Kubeflow 0.5 - Release Tag". GitHub .
  33. "Kubeflow 0.6 - Release Information".
  34. "Kubeflow 0.7 - Release Information".
  35. "Kubeflow 1.0 - Release Information".
  36. "Kubeflow 1.1 - Release Information".
  37. "Kubeflow 1.2 - Release Information".
  38. "Kubeflow 1.3 - Release Information".
  39. "Kubeflow 1.4 - Release Information".
  40. "Kubeflow 1.5 - Release Information".
  41. "Kubeflow 1.6 - Release Information".
  42. "Kubeflow 1.7 - Release Information".