Kubeflow v0.6: support for artifact tracking, data versioning & multi-user

Josh Bottum
kubeflow
Published in
8 min readJul 31, 2019

--

The Kubeflow Community is excited to announce the release of Kubeflow v0.6, which introduces new primitives to enable artifact tracking, and data versioning in Istio-based multi-user environments.

Early user comments

Per Jeff Fogarty, an Innovation Engineer at US Bank and a Kubeflow user and contributor, “Multi-user functionality is a foundational building block, especially for on-prem environments, and we are excited to integrate this enhancement into our deployment.” Laura Schornack, who is using Kubeflow as a part of the City Scholars partnership between Chase and the University of Illinois added, “Data versioning, especially for Kubeflow Pipelines, allows us to snapshot datasets and recreate models quickly. This significantly simplifies workflows and improves productivity.”

New feature overview

For artifact tracking, Kubeflow v0.6 introduces a new metadata component, along with a metadata API and initial corresponding clients. These allow users to track their artifacts and execution contexts through an end-to-end ML workflow. Users are now able to interact with the metadata component from inside their Notebooks or from Kubeflow Pipelines. In addition, metadata comes with an intuitive web UI to view a list of artifacts and detailed information of each individual artifact.

Kubeflow v0.6 extends the Kubeflow Pipelines’ Domain Specific Language (DSL) by adding new primitives that enable data and execution context versioning in Kubeflow Pipelines. Kubeflow Pipelines now expose two new resources: Persistent Volume, and Volume Snapshot. Both are integrated via standards-based Kubernetes PVC primitives backed by the latest functionality of the Container Storage Interface (CSI).

v0.6 also delivers a secure architecture for multi-user support by leveraging a new integration with Istio. The architecture provides flexible options to integrate Kubeflow with authentication services in cloud and on-prem environments.

In addition, Kubeflow 0.6 delivers several documentation updates and valuable new operational and configuration capabilities, most notably the introduction of Kustomize as a complete replacement of ksonnet. The following provides more details on the major deliveries in Kubeflow 0.6:

Artifact Tracking

Superior model builders need to track the intermediate and final artifacts during their end-to-end ML workflows. When artifact tracking is combined with versioning, the model builders have access to completely reproducible results, which accelerates their model iteration and re-creation.

Kubeflow v0.6 introduces several enhancements including:

  • an initial metadata schema to track artifacts related to execution contexts,
  • a Metadata API for storing and retrieving metadata,
  • a new Metadata component for storing and serving this metadata,
  • an initial client libraries for end-users to interact with the Metadata service from their Notebooks or Pipelines code.

The new architecture allows for the definition of arbitrary artifacts to be tracked. Kubeflow v0.6 ships with three pre-defined artifacts:

  • Models
  • Datasets
  • Evaluation Metrics

The Models and Datasets can be visualized in the Metadata Web UI so that end-users can start tracking them out-of-the-box, with no extra work required.

The following demonstrates the artifact tracking enhancements using an XGBoost training example with Fairing and Metadata logging for a Housing Price Prediction model:

  • Zoom into the metadata logging setup to represent the training process.
  • Metadata UI: artifact list view with filtering and sorting.
  • Metadata UI: model details for various versions with different HParams.

Versioning for Data Volumes

Kubeflow v0.6 extends the Kubeflow Pipelines’ DSL to seamlessly support the use of Persistent Volumes and Volume Snapshots as distinct Kubeflow Pipelines resources. The storage volumes are managed by standards-based, vendor-neutral Kubernetes primitives, namely PersistentVolumeClaim and VolumeSnapshot API objects. These primitives simplify the delivery of persistent data volumes to Kubeflow Pipelines users and eliminate the burden of manually manipulating low-level K8s objects as part of a Pipeline workflow. Pipelines can now exchange multi-GB data via standard file I/O on mounted Persistent Volumes, without having to upload & download data to & from external object stores.

In v0.6, Kubeflow Pipelines supports data versioning via integration with external storage and data management systems. For example, the primitives are easily added to critical Pipeline steps and provide for immutable versioned snapshots. In addition, the extensions enable simplified data sharing between Kubeflow components, i.e. attaching Pipelines results into Notebooks for further experimentation. They also lay the foundation a feature candidate for our next release: snapshotting a step’s whole execution context, including data (volumes) and metadata (K8s objects).

Building and deploying ML pipelines (faster) is a core value of Kubeflow. With the v0.6 functionality, we reduce the steps required to create and run Kubeflow Pipelines with versioning for data volumes. As superior pipelines might require scores of runs, an automated version tracking solution enables data scientists to build better models faster. This also supports pipeline reproducibility, critical for ML projects that have compliance, auditing and bias analysis requirements.

With v0.6, we’ve reduced the number of steps needed to build and run a pipeline in an on-prem environment by ~50%. Data scientists have less need to wait for an ML engineer to create persistent volumes and seed data into their pipelines (via YAML files and low-level kubernetes commands). For example, in the popular Chicago Taxi Cab demo, a data scientist developing models on-prem can now build and deploy the complete pipeline from a Python notebook and the Pipelines UI:

Pipelines Improvements

Kubeflow v0.6 also includes a number of new features and bug fixes for Pipelines, including support for pre-emptible VMs, which can enable large reductions in usage costs.

We have also introduced a number of usability and UI improvements, such as streamlined run creation, improved visualization of pipeline metadata, support for default experiments, and 10x performance improvements when handling UI queries. v0.6 also introduced a new Kubernetes controller for managing Tensorboard instances, which are tightly integrated with the UI.

The Pipelines authoring SDK also has been updated with a number of new features and the documentation has been comprehensively revamped. Notably, the SDK now allows free-form Python functions to be packaged as pipeline components.

Finally, community-contributed pipelines continue to grow, along with pipelines working against Google Cloud, IBM Cloud and Watson, and AWS Sagemaker instances. A full list of new features can be found in the changelog.

Multi-user Authentication & Authorization with SSO

Kubeflow v0.6 provides a flexible architecture for multi-user isolation and Single Sign-on (SSO). It leverages Istio and K8s namespaces, which incorporate the new “Profiles” K8s Custom Resource. These building blocks enable dynamic per-user creation of namespaces, and each user can run isolated by default. In addition to isolation, Kubeflow’s new Istio functionality enables integration with authentication services, i.e. LDAP and/or Active Directory along with RBAC-based authorization services. If needed, this configuration can be installed and operated in air gapped environments, which do not have internet connectivity. For a detailed description of the Istio integration with independent OIDC providers on-prem, please take a look at this post:

Kubeflow 0.6 operational updates and improvements

In addition, Kubeflow v0.6 includes valuable operational updates and improvements, including the replacement of ksonnet with kustomize. The Kubeflow and kustomize technical teams have similar philosophies, and are already collaborating on advanced use cases. Please find a quick summary on “Why Kustomize?” below:

  • Easier to read, YAML
  • Integrated with K8s
  • Kubectl level commands, no extra tools
  • Supports the common customizations needs
  • images, env, secrets, config maps
  • Kustomize Overlays provide for easier extensions
  • Support complex parameterization or customization without PR
  • Doesn’t create another API

In addition, we are also happy to announce two Kubernetes / Kubeflow operators that have graduated from incubating to graduating: TFJob and PyTorchJob. These operators have matured their code, documentation, v1 APIs and test plans.

Kubeflow doc sprint

The Kubeflow community held our first doc sprint on July 10–12. For three days, in time zones around the world, we worked together to write documentation and build samples that help people understand and use Kubeflow. This allowed us to merge 27 pull requests and counting, closing 28 docs issues. You can find more details in our blog post.

What’s next

With this release under our belts, the community is starting to plan for the v0.7 release, which will focus on enhancing data scientist usability. Some work areas include:

  • Simplifying Tensorboard creation and management
  • Enhancing Metadata functionality with cluster-level logging and parameter-setting values from Kubeflow subsystems i.e. Pipelines, Katib, TFJob, etc.
  • Building a GUI-based volume manager to simplify the access and sharing of data between Kubeflow subsystems, and between users.
  • Defining operational improvements such as simplified upgradability for Kubeflow and its APIs.
  • Offering additional architectural options for multi-user authentication and authorization, as detailed in this Multi-User Critical User Journey (CUJ).

You can follow our release progress in the 0.7 Kanban board. The Kubeflow roadmap is driving our development towards Kubeflow 1.0, and we welcome your input!

Community-driven development

We put a lot of work in improving Kubeflow’s stability, fit and finish over 150+ closed issues and 250+ merged PRs. For this release, the community gathered extensive end-user input from Kubecon Barcelona, Kubeflow User Surveys, Kubeflow Community Meetings and several reviews of the Customer User Journeys (CUJs) we use to define core experiences to build for each release.

Finally, thanks to all who contributed to v0.6! Kubeflow is home to 150+ contributors from 25+ organizations working together to build a Kubernetes-native, portable and scalable ML stack, and we need even more help. Here’s how to get involved:

Thanks to Jeff Fogarty (US Bank), Laura Schornack (City Scholars/Chase), Constantinos Venetsanopoulos (Arrikto), Animesh Singh (IBM), Johnu George (Cisco), Zhenghui Wang, Pavel Dournov, Sarah Maddox, Abishek Gupta, Anand Iyer, Ajay Gopinathan and Thea Lamkin (all of Google) for contributing to this post.

--

--

Josh Bottum
kubeflow

Kubeflow Community Product Manager & VP of Arrikto