Kubeflow

Kubeflow Charmers | bundle
Cloud

Channel	Revision	Published
latest/candidate	294	24 Jan 2022
latest/beta	430	30 Aug 2024
latest/edge	423	26 Jul 2024
1.10/stable	436	07 Apr 2025
1.10/candidate	434	02 Apr 2025
1.10/beta	433	24 Mar 2025
1.9/stable	432	03 Dec 2024
1.9/beta	420	19 Jul 2024
1.9/edge	431	03 Dec 2024
1.8/stable	414	22 Nov 2023
1.8/beta	411	22 Nov 2023
1.8/edge	413	22 Nov 2023
1.7/stable	409	27 Oct 2023
1.7/beta	408	27 Oct 2023
1.7/edge	407	27 Oct 2023

Learn to deploy on juju >

Platform:

Relevant links

Homepage

Share your thoughts on this charm with the community on discourse.

Join the discussion

This guide discusses Kubernetes (K8s) scheduling patterns for Charmed Kubeflow (CKF) workloads.

Scheduling CKF workloads into Pods to run on K8s nodes with specialised hardware requires specific configurations. These vary depending on the use case and the working environment.

The most common scheduling patterns are the following:

Schedule on GPU nodes.
Schedule on a specific node pool.
Schedule on Tainted nodes.

Schedule on GPU nodes

In most production scenarios, Pods are scheduled on GPUs using one or a combination of the following methods:

Setting up GPUs via their resources.
Configuring Taints for getting scheduled on Tainted GPU nodes.
Configuring Affinities for getting scheduled on nodes with specialised hardware.

See Use NVIDIA GPUs for more details on how to leverage NVIDIA GPU resources in your CKF deployment.

Schedule on a specific node pool

Configuring resources in the workload Pod allows Kubernetes to schedule it on a node with the required hardware. However, there may be additional scheduling requirements beyond hardware needs.

For example, a workload might require GPU resources but also run on a development node, not production, within a specific availability zone or data center.

This is achieved by configuring the underlying workload Pod’s nodeSelector or node Affinities, specifying the list of nodes the Pod should be scheduled on.

Schedule on Tainted nodes

Nodes with specialized hardware, such as GPUs, are very expensive. As a result, a common pattern is to use autoscaling node pools for these nodes, so they are scaled down when not in use.

To support this setup, administrators often apply Taints to these nodes, ensuring that only Pods configured with the appropriate Tolerations can be scheduled on them. See K8s use cases for more details.

In this scenario, CKF workload Pods must also be configured with the necessary Tolerations to be scheduled on the specialised nodes.

Kubeflow

Relevant links

Schedule on GPU nodes

Schedule on a specific node pool

Schedule on Tainted nodes

See also