A major contributor to this article appears to have a close connection with its subject.(September 2022) |
Original author(s) | |
---|---|
Developer(s) | Kubeflow Contributors [1] - AWS, Bloomberg, Google, IBM, NVIDIA, Nutanix, Red Hat, Arrikto, and others |
Initial release | April 5, 2018 [2] |
Stable release | 1.8 [3] / November 1, 2023 |
Repository | github |
Written in | Go, Python |
Platform | Kubernetes |
Type | Machine Learning Platform |
License | Apache License 2.0 |
Website | kubeflow |
Kubeflow is an open-source platform for machine learning and MLOps on Kubernetes introduced by Google. The different stages in a typical machine learning lifecycle are represented with different software components in Kubeflow, including model development (Kubeflow Notebooks [4] ), model training (Kubeflow Pipelines , [5] Kubeflow Training Operator [6] ), model serving (KServe [lower-alpha 1] [7] ), and automated machine learning (Katib [8] ).
Each component of Kubeflow can be deployed separately, and it is not a requirement to deploy every component. [9]
The Kubeflow project was first announced at KubeCon + CloudNativeCon North America 2017 by Google engineers David Aronchick, Jeremy Lewi, and Vishnu Kannan [10] to address a perceived lack of flexible options for building production-ready machine learning systems. [11] The project has also stated it began as a way for Google to open-source how they ran TensorFlow internally. [12]
The first release of Kubeflow (Kubeflow 0.1) was announced at KubeCon + CloudNativeCon Europe 2018 [13] with claims of having already become among the top 2% of GitHub projects ever. [14] Kubeflow 1.0 was released in March 2020 via a public blog post announcing that many Kubeflow components were graduating to a "stable status", indicating they were now ready for production usage. [15]
In October 2022, Google announced that the Kubeflow project had applied to join the Cloud Native Computing Foundation. [16] [17] In July 2023, the foundation voted to accept Kubeflow as an incubating stage project. [18] [19]
Machine learning models are developed in the notebooks component called Kubeflow Notebooks. The component runs web-based development environments inside a Kubernetes cluster, with native support for Jupyter Notebook, Visual Studio Code, and RStudio. [20]
Once developed, models are trained in the Kubeflow Pipelines component. The component acts as a platform for building and deploying portable, scalable machine learning workflows based on Docker containers. [21] Google Cloud Platform has adopted the Kubeflow Pipelines DSL within its Vertex AI Pipelines product. [22]
For certain machine learning models and libraries, the Kubeflow Training Operator component provides Kubernetes custom resources support. The component runs distributed or non-distributed TensorFlow, PyTorch, Apache MXNet, XGBoost, and MPI training jobs on Kubernetes. [6]
The KServe component (previously named KFServing [23] ) provides Kubernetes custom resources for serving machine learning models on arbitrary frameworks including TensorFlow, XGBoost, scikit-learn, PyTorch, and ONNX. [24] KServe was developed collaboratively by Google, IBM, Bloomberg, NVIDIA, and Seldon. [23] Publicly disclosed adopters of KServe include Bloomberg, [25] Gojek, [26] and others. [27]
Lastly, Kubeflow includes a component for automated training and development of machine learning models, the Katib component. It is described as a Kubernetes-native project and features hyperparameter tuning, early stopping, and neural architecture search. [28]
Orange is an open-source data visualization, machine learning and data mining toolkit. It features a visual programming front-end for explorative qualitative data analysis and interactive data visualization.
Vertica is an analytic database management software company. Vertica was founded in 2005 by the database researcher Michael Stonebraker with Andrew Palmer as the founding CEO. Ralph Breslauer and Christopher P. Lynch served as CEOs later on.
OpenShift is a family of containerization software products developed by Red Hat. Its flagship product is the OpenShift Container Platform — a hybrid cloud platform as a service built around Linux containers orchestrated and managed by Kubernetes on a foundation of Red Hat Enterprise Linux. The family's other products provide this platform through different environments: OKD serves as the community-driven upstream, Several deployment methods are available including self-managed, cloud native under ROSA, ARO and RHOIC on AWS, Azure, and IBM Cloud respectively, OpenShift Online as software as a service, and OpenShift Dedicated as a managed service.
Google Cloud Platform (GCP), offered by Google, is a suite of cloud computing services that provides a series of modular cloud services including computing, data storage, data analytics and machine learning, alongside a set of management tools. It runs on the same infrastructure that Google uses internally for its end-user products, such as Google Search, Gmail, and Google Docs, according to Verma, et.al. Registration requires a credit card or bank account details.
Kubernetes is an open-source container orchestration system for automating software deployment, scaling, and management. Originally designed by Google, the project is now maintained by the Cloud Native Computing Foundation.
Fluentd is a cross-platform open-source data collection software project originally developed at Treasure Data. It is written primarily in the Ruby programming language.
TensorFlow is a free and open-source software library for machine learning and artificial intelligence. It can be used across a range of tasks but has a particular focus on training and inference of deep neural networks.
XGBoost is an open-source software library which provides a regularizing gradient boosting framework for C++, Java, Python, R, Julia, Perl, and Scala. It works on Linux, Microsoft Windows, and macOS. From the project description, it aims to provide a "Scalable, Portable and Distributed Gradient Boosting Library". It runs on a single machine, as well as the distributed processing frameworks Apache Hadoop, Apache Spark, Apache Flink, and Dask.
A notebook interface or computational notebook is a virtual notebook environment used for literate programming, a method of writing computer programs. Some notebooks are WYSIWYG environments including executable calculations embedded in formatted documents; others separate calculations and text into separate sections. Notebooks share some goals and features with spreadsheets and word processors but go beyond their limited data models.
spaCy is an open-source software library for advanced natural language processing, written in the programming languages Python and Cython. The library is published under the MIT license and its main developers are Matthew Honnibal and Ines Montani, the founders of the software company Explosion.
Prometheus is a free software application used for event monitoring and alerting. It records metrics in a time series database built using an HTTP pull model, with flexible queries and real-time alerting. The project is written in Go and licensed under the Apache 2 License, with source code available on GitHub, and is a graduated project of the Cloud Native Computing Foundation, along with Kubernetes and Envoy.
TiDB is an open-source NewSQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. It is MySQL compatible and can provide horizontal scalability, strong consistency, and high availability. It is developed and supported primarily by PingCAP and licensed under Apache 2.0. TiDB drew its initial design inspiration from Google's Spanner and F1 papers.
IBM Cloud is a set of cloud computing services for business offered by the information technology company IBM.
Cloud native computing is an approach in software development that utilizes cloud computing to "build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds". These technologies such as containers, microservices, serverless functions, cloud native processors and immutable infrastructure, deployed via declarative code are common elements of this architectural style. Cloud native technologies focus on minimizing users' operational burden.
The Cloud Native Computing Foundation (CNCF) is a Linux Foundation project that was founded in 2015 to help advance container technology and align the tech industry around its evolution.
LightGBM, short for light gradient-boosting machine, is a free and open-source distributed gradient-boosting framework for machine learning, originally developed by Microsoft. It is based on decision tree algorithms and used for ranking, classification and other machine learning tasks. The development focus is on performance and scalability.
Open Service Mesh (OSM) was a free and open source cloud native service mesh developed by Microsoft that ran on Kubernetes.
Dapr is a free and open source runtime system designed to support cloud native and serverless computing. Its initial release supported SDKs and APIs for Java, .NET, Python, and Go, and targeted the Kubernetes cloud deployment system.
Seldon Technologies Limited is a British technology company founded in 2014, and headquartered in London, England. It makes MLOps software for enterprise deployment of machine learning models, and is a primary maintainer and contributor to a number of popular open source repositories such as Seldon Core.
DVC is a free and open-source, platform-agnostic version system for data, machine learning models, and experiments. It is designed to make ML models shareable, experiments reproducible, and to track versions of models, data, and pipelines. DVC works on top of Git repositories and cloud storage.