Introduction

Kubernetes, an open-source container orchestration platform, excels at managing and scheduling containerized applications across clusters of machines. The default Kubernetes scheduler is highly effective, but for specialized workloads and advanced requirements, custom scheduling can offer significant advantages. This article explores how Kubernetes achieves advanced scheduling and dives into the concept of custom schedulers.

Advanced Scheduling in Kubernetes

1. Default Kubernetes Scheduler

The default Kubernetes scheduler is responsible for placing pods onto nodes in a cluster. It follows a two-step process:

Filtering: The scheduler filters nodes that cannot accommodate the pod due to resource constraints, taints, or other conditions.
Scoring: The scheduler ranks the remaining nodes based on various factors (e.g., resource availability, affinity/anti-affinity rules) and selects the best node for the pod.

2. Scheduling Policies and Features

Kubernetes provides several built-in features to enhance scheduling capabilities:

a. Resource Requests and Limits

Resource requests and limits ensure that pods have the necessary CPU and memory to operate effectively.

apiVersion: v1
kind: Pod
metadata:
  name: example-pod
spec:
  containers:
  - name: example-container
    image: example-image
    resources:
      requests:
        memory: "64Mi"
        cpu: "250m"
      limits:
        memory: "128Mi"
        cpu: "500m"

b. Node Affinity and Anti-Affinity

Node affinity and anti-affinity rules allow pods to prefer or avoid specific nodes based on labels.

apiVersion: v1
kind: Pod
metadata:
  name: example-pod
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: disktype
            operator: In
            values:
            - ssd

c. Pod Affinity and Anti-Affinity

Pod affinity and anti-affinity enable the co-location or separation of pods based on labels.

apiVersion: v1
kind: Pod
metadata:
  name: example-pod
spec:
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - example-app
        topologyKey: "kubernetes.io/hostname"

d. Taints and Tolerations

Taints and tolerations allow nodes to repel certain pods unless the pods tolerate the taint.

apiVersion: v1
kind: Pod
metadata:
  name: example-pod
spec:
  tolerations:
  - key: "key1"
    operator: "Equal"
    value: "value1"
    effect: "NoSchedule"

Custom Schedulers

Overview

While the default Kubernetes scheduler covers most use cases, custom schedulers provide greater flexibility and control for specific scheduling requirements. Custom schedulers are user-defined components that implement custom scheduling logic tailored to unique needs.

Use Cases for Custom Schedulers

Specialized Workloads: Applications requiring specialized scheduling policies (e.g., GPU workloads, real-time processing).
Custom Constraints: Complex constraints that the default scheduler cannot handle (e.g., specific hardware requirements, business rules).
Enhanced Performance: Optimizing scheduling for performance-sensitive applications (e.g., low latency, high throughput).

How to Implement a Custom Scheduler

Develop the Custom Scheduler

Custom schedulers are typically implemented as separate components that interact with the Kubernetes API. They watch for unscheduled pods and use custom logic to assign nodes.

 package main

 import (
     "context"
     "fmt"
     "log"
     "k8s.io/client-go/kubernetes"
     "k8s.io/client-go/tools/clientcmd"
     "k8s.io/client-go/util/homedir"
     "path/filepath"
 )

 func main() {
     kubeconfig := filepath.Join(homedir.HomeDir(), ".kube", "config")
     config, err := clientcmd.BuildConfigFromFlags("", kubeconfig)
     if err != nil {
         log.Fatal(err)
     }
     clientset, err := kubernetes.NewForConfig(config)
     if err != nil {
         log.Fatal(err)
     }
     for {
         podList, err := clientset.CoreV1().Pods("").List(context.TODO(), metav1.ListOptions{
             FieldSelector: "status.phase=Pending",
         })
         if err != nil {
             log.Fatal(err)
         }
         for _, pod := range podList.Items {
             // Custom scheduling logic goes here
             fmt.Println("Found pending pod:", pod.Name)
         }
     }
 }

Deploy the Custom Scheduler

Deploy the custom scheduler as a pod within the Kubernetes cluster. Ensure it has the necessary permissions to interact with the Kubernetes API.

 apiVersion: v1
 kind: Pod
 metadata:
   name: custom-scheduler
 spec:
   containers:
   - name: custom-scheduler
     image: custom-scheduler-image
     args:
     - --kubeconfig=/root/.kube/config
     volumeMounts:
     - name: kubeconfig
       mountPath: /root/.kube
   volumes:
   - name: kubeconfig
     hostPath:
       path: /root/.kube

Annotate Pods for Custom Scheduling

Annotate pods that should be scheduled by the custom scheduler.

 apiVersion: v1
 kind: Pod
 metadata:
   name: example-pod
   annotations:
     "scheduler.alpha.kubernetes.io/name": "custom-scheduler"
 spec:
   containers:
   - name: example-container
     image: example-image

Example: Custom Scheduler for GPU Workloads

A company runs machine learning workloads requiring GPUs. The default scheduler lacks the necessary logic to efficiently distribute these workloads across GPU nodes.

Solution: Implement a custom scheduler to handle GPU allocation.

package main

import (
    "context"
    "fmt"
    "log"
    "k8s.io/client-go/kubernetes"
    "k8s.io/client-go/tools/clientcmd"
    "k8s.io/client-go/util/homedir"
    "path/filepath"
)

func main() {
    kubeconfig := filepath.Join(homedir.HomeDir(), ".kube", "config")
    config, err := clientcmd.BuildConfigFromFlags("", kubeconfig)
    if err != nil {
        log.Fatal(err)
    }
    clientset, err := kubernetes.NewForConfig(config)
    if err != nil {
        log.Fatal(err)
    }
    for {
        podList, err := clientset.CoreV1().Pods("").List(context.TODO(), metav1.ListOptions{
            FieldSelector: "status.phase=Pending",
        })
        if err != nil {
            log.Fatal(err)
        }
        for _, pod := range podList.Items {
            if _, exists := pod.Annotations["scheduler.alpha.kubernetes.io/name"]; !exists {
                continue
            }
            if pod.Annotations["scheduler.alpha.kubernetes.io/name"] != "gpu-scheduler" {
                continue
            }
            // Custom scheduling logic for GPU allocation
            fmt.Println("Scheduling GPU workload:", pod.Name)
        }
    }
}

Deploy the custom scheduler and annotate GPU workloads:

apiVersion: v1
kind: Pod
metadata:
  name: gpu-scheduler
spec:
  containers:
  - name: gpu-scheduler
    image: gpu-scheduler-image
    args:
    - --kubeconfig=/root/.kube/config
    volumeMounts:
    - name: kubeconfig
      mountPath: /root/.kube
  volumes:
  - name: kubeconfig
    hostPath:
      path: /root/.kube

apiVersion: v1
kind: Pod
metadata:
  name: ml-workload
  annotations:
    "scheduler.alpha.kubernetes.io/name": "gpu-scheduler"
spec:
  containers:
  - name: ml-container
    image: ml-image
    resources:
      limits:
        nvidia.com/gpu: 1

Conclusion

Kubernetes provides advanced scheduling capabilities to meet diverse workload requirements. While the default scheduler handles most scenarios effectively, custom schedulers offer the flexibility to implement specialized scheduling logic for unique use cases. By leveraging custom schedulers, organizations can optimize resource utilization, enhance performance, and ensure that their specific application needs are met efficiently.

Advanced Scheduling & Custom Schedulers in Kubernetes

Table of contents