Difficulty: Beginner
Estimated Time: 10 minutes

In this scenario, you will learn how to deploy PyTorch workloads using Kubeflow.

The example uses a Distributed MNIST Model created using PyTorch which will be trained using Kubeflow and Kubernetes.

The Kubeflow project is dedicated to making Machine Learning on Kubernetes easy, portable and scalable. The goal is not to recreate other services, but to provide a straightforward way for spinning up best of breed OSS solutions.

Details of the project can be found at https://github.com/kubeflow/kubeflow

In this scenario you learned how to deploy PyTorch workloads using Kubernetes and Kubeflow.

The aim of Kubeflow is to provide a set of simple manifests that give you an easy to use ML stack anywhere Kubernetes is already running and can self configure based on the cluster it deploys into.

More details can be found at https://github.com/kubeflow/kubeflow

Deploying PyTorch with Kubeflow

Step 1 of 4

Deploy Kubeflow

With Kubeflow being an extension to Kubernetes, all the components need to be deployed to the platform.

The team have provided an installation script which uses Ksonnet to deploy Kubeflow to an existing Kubernetes cluster. Ksonnet requires a valid Github token. The following can be used within Katacoda. Run the command to set the required environment variable.

export GITHUB_TOKEN=99510f2ccf40e496d1e97dbec9f31cb16770b884

Once installed, you can run the installation script:

curl https://raw.githubusercontent.com/kubeflow/kubeflow/v${KUBEFLOW_VERSION}/scripts/deploy.sh | bash

You should see the Kubeflow pods starting.

kubectl get pods