A major contributor to this article appears to have a close connection with its subject.(September 2022) |
| Kubeflow | |
|---|---|
| | |
| Original author | |
| Developers | Kubeflow Contributors [1] - AWS, Bloomberg, Google, IBM, NVIDIA, Nutanix, Red Hat, Arrikto, and others |
| Initial release | April 5, 2018 [2] |
| Stable release | 1.10 [3] / April 1, 2025 |
| Repository | github |
| Written in | Go, Python, TypeScript |
| Platform | Kubernetes |
| Type | Machine Learning Platform |
| License | Apache License 2.0 |
| Website | kubeflow |
Kubeflow is an open-source platform for machine learning and MLOps on Kubernetes introduced by Google. The different stages in a typical machine learning lifecycle are represented with different software components in Kubeflow, including model development (Kubeflow Notebooks [4] ), model training (Kubeflow Pipelines , [5] Kubeflow Training Operator [6] ), model serving (KServe [a] [7] ), and automated machine learning (Katib [8] ).
Each component of Kubeflow can be deployed separately, and it is not a requirement to deploy every component. [9]
The Kubeflow project was first announced at KubeCon + CloudNativeCon North America 2017 by Google engineers David Aronchick, Jeremy Lewi, and Vishnu Kannan [10] to address a perceived lack of flexible options for building production-ready machine learning systems. [11] The project has also stated it began as a way for Google to open-source how they ran TensorFlow internally. [12]
The first release of Kubeflow (Kubeflow 0.1) was announced at KubeCon + CloudNativeCon Europe 2018. [13] [14] Kubeflow 1.0 was released in March 2020 via a public blog post announcing that many Kubeflow components were graduating to a "stable status", indicating they were now ready for production usage. [15]
In October 2022, Google announced that the Kubeflow project had applied to join the Cloud Native Computing Foundation. [16] [17] In July 2023, the foundation voted to accept Kubeflow as an incubating stage project. [18] [19]
Machine learning models are developed in the notebooks component called Kubeflow Notebooks. The component runs web-based development environments inside a Kubernetes cluster, with native support for Jupyter Notebook, Visual Studio Code, and RStudio. [20]
Once developed, models are trained in the Kubeflow Pipelines component. The component acts as a platform for building and deploying portable, scalable machine learning workflows based on Docker containers. [21] Google Cloud Platform has adopted the Kubeflow Pipelines DSL within its Vertex AI Pipelines product. [22]
For certain machine learning models and libraries, the Kubeflow Training Operator component provides Kubernetes custom resources support. The component runs distributed or non-distributed TensorFlow, PyTorch, Apache MXNet, XGBoost, and MPI training jobs on Kubernetes. [6]
The KServe component (previously named KFServing [23] ) provides Kubernetes custom resources for serving machine learning models on arbitrary frameworks including TensorFlow, XGBoost, scikit-learn, PyTorch, and ONNX. [24] KServe was developed collaboratively by Google, IBM, Bloomberg, NVIDIA, and Seldon. [23] Publicly disclosed adopters of KServe include Bloomberg, [25] Gojek, [26] the Wikimedia Foundation, [27] and others. [28]
Lastly, Kubeflow includes a component for automated training and development of machine learning models, the Katib component. It is described as a Kubernetes-native project and features hyperparameter tuning, early stopping, and neural architecture search. [29]