Advanced Scheduling & Custom Schedulers in Kubernetes

Advanced Scheduling & Custom Schedulers in Kubernetes

Introduction

Kubernetes, an open-source container orchestration platform, excels at managing and scheduling containerized applications across clusters of machines. The default Kubernetes scheduler is highly effective, but for specialized workloads and advanced requirements, custom scheduling can offer significant advantages. This article explores how Kubernetes achieves advanced scheduling and dives into the concept of custom schedulers.

Advanced Scheduling in Kubernetes

1. Default Kubernetes Scheduler

The default Kubernetes scheduler is responsible for placing pods onto nodes in a cluster. It follows a two-step process:

  1. Filtering: The scheduler filters nodes that cannot accommodate the pod due to resource constraints, taints, or other conditions.

  2. Scoring: The scheduler ranks the remaining nodes based on various factors (e.g., resource availability, affinity/anti-affinity rules) and selects the best node for the pod.

2. Scheduling Policies and Features

Kubernetes provides several built-in features to enhance scheduling capabilities:

a. Resource Requests and Limits

Resource requests and limits ensure that pods have the necessary CPU and memory to operate effectively.

apiVersion: v1
kind: Pod
metadata:
  name: example-pod
spec:
  containers:
  - name: example-container
    image: example-image
    resources:
      requests:
        memory: "64Mi"
        cpu: "250m"
      limits:
        memory: "128Mi"
        cpu: "500m"

b. Node Affinity and Anti-Affinity

Node affinity and anti-affinity rules allow pods to prefer or avoid specific nodes based on labels.

apiVersion: v1
kind: Pod
metadata:
  name: example-pod
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: disktype
            operator: In
            values:
            - ssd

c. Pod Affinity and Anti-Affinity

Pod affinity and anti-affinity enable the co-location or separation of pods based on labels.

apiVersion: v1
kind: Pod
metadata:
  name: example-pod
spec:
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - example-app
        topologyKey: "kubernetes.io/hostname"

d. Taints and Tolerations

Taints and tolerations allow nodes to repel certain pods unless the pods tolerate the taint.

apiVersion: v1
kind: Pod
metadata:
  name: example-pod
spec:
  tolerations:
  - key: "key1"
    operator: "Equal"
    value: "value1"
    effect: "NoSchedule"

Custom Schedulers

Overview

While the default Kubernetes scheduler covers most use cases, custom schedulers provide greater flexibility and control for specific scheduling requirements. Custom schedulers are user-defined components that implement custom scheduling logic tailored to unique needs.

Use Cases for Custom Schedulers

  1. Specialized Workloads: Applications requiring specialized scheduling policies (e.g., GPU workloads, real-time processing).

  2. Custom Constraints: Complex constraints that the default scheduler cannot handle (e.g., specific hardware requirements, business rules).

  3. Enhanced Performance: Optimizing scheduling for performance-sensitive applications (e.g., low latency, high throughput).

How to Implement a Custom Scheduler

  1. Develop the Custom Scheduler

    Custom schedulers are typically implemented as separate components that interact with the Kubernetes API. They watch for unscheduled pods and use custom logic to assign nodes.

     package main
    
     import (
         "context"
         "fmt"
         "log"
         "k8s.io/client-go/kubernetes"
         "k8s.io/client-go/tools/clientcmd"
         "k8s.io/client-go/util/homedir"
         "path/filepath"
     )
    
     func main() {
         kubeconfig := filepath.Join(homedir.HomeDir(), ".kube", "config")
         config, err := clientcmd.BuildConfigFromFlags("", kubeconfig)
         if err != nil {
             log.Fatal(err)
         }
         clientset, err := kubernetes.NewForConfig(config)
         if err != nil {
             log.Fatal(err)
         }
         for {
             podList, err := clientset.CoreV1().Pods("").List(context.TODO(), metav1.ListOptions{
                 FieldSelector: "status.phase=Pending",
             })
             if err != nil {
                 log.Fatal(err)
             }
             for _, pod := range podList.Items {
                 // Custom scheduling logic goes here
                 fmt.Println("Found pending pod:", pod.Name)
             }
         }
     }
    
  2. Deploy the Custom Scheduler

    Deploy the custom scheduler as a pod within the Kubernetes cluster. Ensure it has the necessary permissions to interact with the Kubernetes API.

     apiVersion: v1
     kind: Pod
     metadata:
       name: custom-scheduler
     spec:
       containers:
       - name: custom-scheduler
         image: custom-scheduler-image
         args:
         - --kubeconfig=/root/.kube/config
         volumeMounts:
         - name: kubeconfig
           mountPath: /root/.kube
       volumes:
       - name: kubeconfig
         hostPath:
           path: /root/.kube
    
  3. Annotate Pods for Custom Scheduling

    Annotate pods that should be scheduled by the custom scheduler.

     apiVersion: v1
     kind: Pod
     metadata:
       name: example-pod
       annotations:
         "scheduler.alpha.kubernetes.io/name": "custom-scheduler"
     spec:
       containers:
       - name: example-container
         image: example-image
    

Example: Custom Scheduler for GPU Workloads

A company runs machine learning workloads requiring GPUs. The default scheduler lacks the necessary logic to efficiently distribute these workloads across GPU nodes.

Solution: Implement a custom scheduler to handle GPU allocation.

package main

import (
    "context"
    "fmt"
    "log"
    "k8s.io/client-go/kubernetes"
    "k8s.io/client-go/tools/clientcmd"
    "k8s.io/client-go/util/homedir"
    "path/filepath"
)

func main() {
    kubeconfig := filepath.Join(homedir.HomeDir(), ".kube", "config")
    config, err := clientcmd.BuildConfigFromFlags("", kubeconfig)
    if err != nil {
        log.Fatal(err)
    }
    clientset, err := kubernetes.NewForConfig(config)
    if err != nil {
        log.Fatal(err)
    }
    for {
        podList, err := clientset.CoreV1().Pods("").List(context.TODO(), metav1.ListOptions{
            FieldSelector: "status.phase=Pending",
        })
        if err != nil {
            log.Fatal(err)
        }
        for _, pod := range podList.Items {
            if _, exists := pod.Annotations["scheduler.alpha.kubernetes.io/name"]; !exists {
                continue
            }
            if pod.Annotations["scheduler.alpha.kubernetes.io/name"] != "gpu-scheduler" {
                continue
            }
            // Custom scheduling logic for GPU allocation
            fmt.Println("Scheduling GPU workload:", pod.Name)
        }
    }
}

Deploy the custom scheduler and annotate GPU workloads:

apiVersion: v1
kind: Pod
metadata:
  name: gpu-scheduler
spec:
  containers:
  - name: gpu-scheduler
    image: gpu-scheduler-image
    args:
    - --kubeconfig=/root/.kube/config
    volumeMounts:
    - name: kubeconfig
      mountPath: /root/.kube
  volumes:
  - name: kubeconfig
    hostPath:
      path: /root/.kube
apiVersion: v1
kind: Pod
metadata:
  name: ml-workload
  annotations:
    "scheduler.alpha.kubernetes.io/name": "gpu-scheduler"
spec:
  containers:
  - name: ml-container
    image: ml-image
    resources:
      limits:
        nvidia.com/gpu: 1

Conclusion

Kubernetes provides advanced scheduling capabilities to meet diverse workload requirements. While the default scheduler handles most scenarios effectively, custom schedulers offer the flexibility to implement specialized scheduling logic for unique use cases. By leveraging custom schedulers, organizations can optimize resource utilization, enhance performance, and ensure that their specific application needs are met efficiently.