Kubeflow

  • Kubeflow Charmers | bundle
  • Cloud
Channel Revision Published
latest/candidate 294 24 Jan 2022
latest/beta 430 30 Aug 2024
latest/edge 423 26 Jul 2024
1.10/stable 436 07 Apr 2025
1.10/candidate 434 02 Apr 2025
1.10/beta 433 24 Mar 2025
1.9/stable 432 03 Dec 2024
1.9/beta 420 19 Jul 2024
1.9/edge 431 03 Dec 2024
1.8/stable 414 22 Nov 2023
1.8/beta 411 22 Nov 2023
1.8/edge 413 22 Nov 2023
1.7/stable 409 27 Oct 2023
1.7/beta 408 27 Oct 2023
1.7/edge 407 27 Oct 2023
juju deploy kubeflow --channel 1.10/stable
Show information

Platform:

This guide discusses Kubernetes (K8s) scheduling patterns for Charmed Kubeflow (CKF) workloads.

Scheduling CKF workloads into Pods to run on K8s nodes with specialised hardware requires specific configurations. These vary depending on the use case and the working environment.

The most common scheduling patterns are the following:

  1. Schedule on GPU nodes.
  2. Schedule on a specific node pool.
  3. Schedule on Tainted nodes.

Schedule on GPU nodes

In most production scenarios, Pods are scheduled on GPUs using one or a combination of the following methods:

  1. Setting up GPUs via their resources.
  2. Configuring Taints for getting scheduled on Tainted GPU nodes.
  3. Configuring Affinities for getting scheduled on nodes with specialised hardware.

See Use NVIDIA GPUs for more details on how to leverage NVIDIA GPU resources in your CKF deployment.

Schedule on a specific node pool

Configuring resources in the workload Pod allows Kubernetes to schedule it on a node with the required hardware. However, there may be additional scheduling requirements beyond hardware needs.

For example, a workload might require GPU resources but also run on a development node, not production, within a specific availability zone or data center.

This is achieved by configuring the underlying workload Pod’s nodeSelector or node Affinities, specifying the list of nodes the Pod should be scheduled on.

Schedule on Tainted nodes

Nodes with specialized hardware, such as GPUs, are very expensive. As a result, a common pattern is to use autoscaling node pools for these nodes, so they are scaled down when not in use.

To support this setup, administrators often apply Taints to these nodes, ensuring that only Pods configured with the appropriate Tolerations can be scheduled on them. See K8s use cases for more details.

In this scenario, CKF workload Pods must also be configured with the necessary Tolerations to be scheduled on the specialised nodes.

See also