Introduction
Kubernetes, an open-source container orchestration platform, excels at managing and scheduling containerized applications across clusters of machines. The default Kubernetes scheduler is highly effective, but for specialized workloads and advanced requirements, custom scheduling can offer significant advantages. This article explores how Kubernetes achieves advanced scheduling and dives into the concept of custom schedulers.
Advanced Scheduling in Kubernetes
1. Default Kubernetes Scheduler
The default Kubernetes scheduler is responsible for placing pods onto nodes in a cluster. It follows a two-step process:
Filtering: The scheduler filters nodes that cannot accommodate the pod due to resource constraints, taints, or other conditions.
Scoring: The scheduler ranks the remaining nodes based on various factors (e.g., resource availability, affinity/anti-affinity rules) and selects the best node for the pod.
2. Scheduling Policies and Features
Kubernetes provides several built-in features to enhance scheduling capabilities:
a. Resource Requests and Limits
Resource requests and limits ensure that pods have the necessary CPU and memory to operate effectively.
apiVersion: v1
kind: Pod
metadata:
name: example-pod
spec:
containers:
- name: example-container
image: example-image
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
b. Node Affinity and Anti-Affinity
Node affinity and anti-affinity rules allow pods to prefer or avoid specific nodes based on labels.
apiVersion: v1
kind: Pod
metadata:
name: example-pod
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: disktype
operator: In
values:
- ssd
c. Pod Affinity and Anti-Affinity
Pod affinity and anti-affinity enable the co-location or separation of pods based on labels.
apiVersion: v1
kind: Pod
metadata:
name: example-pod
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- example-app
topologyKey: "kubernetes.io/hostname"
d. Taints and Tolerations
Taints and tolerations allow nodes to repel certain pods unless the pods tolerate the taint.
apiVersion: v1
kind: Pod
metadata:
name: example-pod
spec:
tolerations:
- key: "key1"
operator: "Equal"
value: "value1"
effect: "NoSchedule"
Custom Schedulers
Overview
While the default Kubernetes scheduler covers most use cases, custom schedulers provide greater flexibility and control for specific scheduling requirements. Custom schedulers are user-defined components that implement custom scheduling logic tailored to unique needs.
Use Cases for Custom Schedulers
Specialized Workloads: Applications requiring specialized scheduling policies (e.g., GPU workloads, real-time processing).
Custom Constraints: Complex constraints that the default scheduler cannot handle (e.g., specific hardware requirements, business rules).
Enhanced Performance: Optimizing scheduling for performance-sensitive applications (e.g., low latency, high throughput).
How to Implement a Custom Scheduler
Develop the Custom Scheduler
Custom schedulers are typically implemented as separate components that interact with the Kubernetes API. They watch for unscheduled pods and use custom logic to assign nodes.
package main import ( "context" "fmt" "log" "k8s.io/client-go/kubernetes" "k8s.io/client-go/tools/clientcmd" "k8s.io/client-go/util/homedir" "path/filepath" ) func main() { kubeconfig := filepath.Join(homedir.HomeDir(), ".kube", "config") config, err := clientcmd.BuildConfigFromFlags("", kubeconfig) if err != nil { log.Fatal(err) } clientset, err := kubernetes.NewForConfig(config) if err != nil { log.Fatal(err) } for { podList, err := clientset.CoreV1().Pods("").List(context.TODO(), metav1.ListOptions{ FieldSelector: "status.phase=Pending", }) if err != nil { log.Fatal(err) } for _, pod := range podList.Items { // Custom scheduling logic goes here fmt.Println("Found pending pod:", pod.Name) } } }
Deploy the Custom Scheduler
Deploy the custom scheduler as a pod within the Kubernetes cluster. Ensure it has the necessary permissions to interact with the Kubernetes API.
apiVersion: v1 kind: Pod metadata: name: custom-scheduler spec: containers: - name: custom-scheduler image: custom-scheduler-image args: - --kubeconfig=/root/.kube/config volumeMounts: - name: kubeconfig mountPath: /root/.kube volumes: - name: kubeconfig hostPath: path: /root/.kube
Annotate Pods for Custom Scheduling
Annotate pods that should be scheduled by the custom scheduler.
apiVersion: v1 kind: Pod metadata: name: example-pod annotations: "scheduler.alpha.kubernetes.io/name": "custom-scheduler" spec: containers: - name: example-container image: example-image
Example: Custom Scheduler for GPU Workloads
A company runs machine learning workloads requiring GPUs. The default scheduler lacks the necessary logic to efficiently distribute these workloads across GPU nodes.
Solution: Implement a custom scheduler to handle GPU allocation.
package main
import (
"context"
"fmt"
"log"
"k8s.io/client-go/kubernetes"
"k8s.io/client-go/tools/clientcmd"
"k8s.io/client-go/util/homedir"
"path/filepath"
)
func main() {
kubeconfig := filepath.Join(homedir.HomeDir(), ".kube", "config")
config, err := clientcmd.BuildConfigFromFlags("", kubeconfig)
if err != nil {
log.Fatal(err)
}
clientset, err := kubernetes.NewForConfig(config)
if err != nil {
log.Fatal(err)
}
for {
podList, err := clientset.CoreV1().Pods("").List(context.TODO(), metav1.ListOptions{
FieldSelector: "status.phase=Pending",
})
if err != nil {
log.Fatal(err)
}
for _, pod := range podList.Items {
if _, exists := pod.Annotations["scheduler.alpha.kubernetes.io/name"]; !exists {
continue
}
if pod.Annotations["scheduler.alpha.kubernetes.io/name"] != "gpu-scheduler" {
continue
}
// Custom scheduling logic for GPU allocation
fmt.Println("Scheduling GPU workload:", pod.Name)
}
}
}
Deploy the custom scheduler and annotate GPU workloads:
apiVersion: v1
kind: Pod
metadata:
name: gpu-scheduler
spec:
containers:
- name: gpu-scheduler
image: gpu-scheduler-image
args:
- --kubeconfig=/root/.kube/config
volumeMounts:
- name: kubeconfig
mountPath: /root/.kube
volumes:
- name: kubeconfig
hostPath:
path: /root/.kube
apiVersion: v1
kind: Pod
metadata:
name: ml-workload
annotations:
"scheduler.alpha.kubernetes.io/name": "gpu-scheduler"
spec:
containers:
- name: ml-container
image: ml-image
resources:
limits:
nvidia.com/gpu: 1
Conclusion
Kubernetes provides advanced scheduling capabilities to meet diverse workload requirements. While the default scheduler handles most scenarios effectively, custom schedulers offer the flexibility to implement specialized scheduling logic for unique use cases. By leveraging custom schedulers, organizations can optimize resource utilization, enhance performance, and ensure that their specific application needs are met efficiently.