Upgrading a Kubernetes cluster on Amazon EKS (Elastic Kubernetes Service) involves updating both the control plane and the worker nodes to a new Kubernetes version. Managing upgrades carefully is crucial to ensuring minimal downtime and maintaining the stability of your applications. Here's a detailed guide on the process and best practices for upgrading an EKS cluster:
Step-by-Step Process for Upgrading an EKS Cluster
1. Preparation and Planning
Review Release Notes: Check the Kubernetes release notes and EKS documentation for the target version to understand new features, deprecations, and changes that might impact your workloads.
Check Compatibility: Ensure that your add-ons, tools (e.g., Helm, kubectl), and custom resources are compatible with the new Kubernetes version.
Back Up Cluster Data:
Take a backup of your application data, including any stateful data stored in persistent volumes.
Export current cluster configurations using
kubectl get <resource> -o yaml > <resource>.yaml
.
2. Upgrade the EKS Control Plane
Use AWS Management Console, AWS CLI, or eksctl:
Using AWS Management Console:
Go to the Amazon EKS console.
Select your cluster and click on "Update now" in the Kubernetes version section.
Choose the target Kubernetes version and start the update process.
Using AWS CLI:
aws eks update-cluster-version --name <cluster-name> --kubernetes-version <new-version>
Using eksctl:
eksctl upgrade cluster --name <cluster-name> --version <new-version>
Verify the Control Plane Upgrade:
Confirm that the control plane is upgraded by checking the cluster version:
kubectl get nodes
3. Upgrade Managed Node Groups or Self-Managed Nodes
Upgrade Managed Node Groups:
Update Node Group Version:
In the EKS console, navigate to the "Compute" tab, select your managed node group, and click "Update version."
Choose the Kubernetes version that matches your control plane.
Rolling Update:
EKS performs a rolling update, creating new nodes with the updated version and draining old nodes one by one.
Verify that new nodes are ready and joined the cluster before old nodes are terminated.
Upgrade Self-Managed Nodes:
Launch New AMIs:
Use the latest EKS optimized AMI for the target Kubernetes version.
Update your autoscaling group to use the new AMI ID and set the desired capacity to a higher number to start creating new nodes.
Drain Old Nodes:
Once new nodes are available, gracefully drain old nodes to ensure that workloads are shifted properly:
kubectl drain <old-node-name> --ignore-daemonsets --delete-emptydir-data
Terminate Old Nodes:
- After draining, you can terminate the old nodes manually or reduce the autoscaling group's desired capacity.
4. Upgrade Add-ons and Custom Resources
Upgrade EKS Add-ons:
Upgrade EKS add-ons like CoreDNS, kube-proxy, and the AWS VPC CNI plugin to versions compatible with the new Kubernetes version.
Use AWS CLI or eksctl to update add-ons:
eksctl utils update-kube-proxy --cluster <cluster-name> --approve eksctl utils update-coredns --cluster <cluster-name> --approve eksctl utils update-aws-node --cluster <cluster-name> --approve
Upgrade Custom Add-ons:
- Update custom add-ons like Prometheus, Grafana, or Ingress controllers using Helm or kubectl to ensure compatibility.
5. Test and Validate
Run Application Tests:
- Deploy test workloads or run smoke tests to validate that applications are functioning correctly after the upgrade.
Monitor Logs and Metrics:
- Use monitoring tools (e.g., CloudWatch, Prometheus) to check for any errors or performance issues during and after the upgrade.
6. Roll Back if Necessary
Revert Control Plane:
- If there are major issues, consider rolling back by restoring from backups or re-creating the cluster with the previous Kubernetes version.
Revert Node Groups:
- For managed node groups, downgrade the version or roll back to a previous AMI version for self-managed nodes.
Best Practices for Minimal Downtime
Use Managed Node Groups: Leveraging managed node groups simplifies the upgrade process with automated rolling updates.
Perform Upgrades During Maintenance Windows: Schedule upgrades during low-traffic periods to minimize the impact on production workloads.
Use Blue-Green or Canary Deployments: For critical services, consider using Blue-Green or Canary deployments to test the upgraded environment before fully committing.
Monitor Health and Roll Back Quickly if Needed: Continuously monitor the health of the cluster and be prepared to roll back if there are critical issues.
You can upgrade your EKS cluster with minimal downtime, ensuring a smooth transition to the new Kubernetes version.
Reference:
https://docs.aws.amazon.com/eks/latest/userguide/update-cluster.html