How do you manage Kubernetes cluster upgrades on Amazon EKS?

How do you manage Kubernetes cluster upgrades on Amazon EKS?

Upgrading a Kubernetes cluster on Amazon EKS (Elastic Kubernetes Service) involves updating both the control plane and the worker nodes to a new Kubernetes version. Managing upgrades carefully is crucial to ensuring minimal downtime and maintaining the stability of your applications. Here's a detailed guide on the process and best practices for upgrading an EKS cluster:

Step-by-Step Process for Upgrading an EKS Cluster

1. Preparation and Planning

  • Review Release Notes: Check the Kubernetes release notes and EKS documentation for the target version to understand new features, deprecations, and changes that might impact your workloads.

  • Check Compatibility: Ensure that your add-ons, tools (e.g., Helm, kubectl), and custom resources are compatible with the new Kubernetes version.

  • Back Up Cluster Data:

    • Take a backup of your application data, including any stateful data stored in persistent volumes.

    • Export current cluster configurations using kubectl get <resource> -o yaml > <resource>.yaml.

2. Upgrade the EKS Control Plane

  • Use AWS Management Console, AWS CLI, or eksctl:

    • Using AWS Management Console:

      1. Go to the Amazon EKS console.

      2. Select your cluster and click on "Update now" in the Kubernetes version section.

      3. Choose the target Kubernetes version and start the update process.

    • Using AWS CLI:

        aws eks update-cluster-version --name <cluster-name> --kubernetes-version <new-version>
      
    • Using eksctl:

        eksctl upgrade cluster --name <cluster-name> --version <new-version>
      
  • Verify the Control Plane Upgrade:

    • Confirm that the control plane is upgraded by checking the cluster version:

        kubectl get nodes
      

3. Upgrade Managed Node Groups or Self-Managed Nodes

  • Upgrade Managed Node Groups:

    1. Update Node Group Version:

      • In the EKS console, navigate to the "Compute" tab, select your managed node group, and click "Update version."

      • Choose the Kubernetes version that matches your control plane.

    2. Rolling Update:

      • EKS performs a rolling update, creating new nodes with the updated version and draining old nodes one by one.

      • Verify that new nodes are ready and joined the cluster before old nodes are terminated.

  • Upgrade Self-Managed Nodes:

    1. Launch New AMIs:

      • Use the latest EKS optimized AMI for the target Kubernetes version.

      • Update your autoscaling group to use the new AMI ID and set the desired capacity to a higher number to start creating new nodes.

    2. Drain Old Nodes:

      • Once new nodes are available, gracefully drain old nodes to ensure that workloads are shifted properly:

          kubectl drain <old-node-name> --ignore-daemonsets --delete-emptydir-data
        
    3. Terminate Old Nodes:

      • After draining, you can terminate the old nodes manually or reduce the autoscaling group's desired capacity.

4. Upgrade Add-ons and Custom Resources

  • Upgrade EKS Add-ons:

    • Upgrade EKS add-ons like CoreDNS, kube-proxy, and the AWS VPC CNI plugin to versions compatible with the new Kubernetes version.

    • Use AWS CLI or eksctl to update add-ons:

        eksctl utils update-kube-proxy --cluster <cluster-name> --approve
        eksctl utils update-coredns --cluster <cluster-name> --approve
        eksctl utils update-aws-node --cluster <cluster-name> --approve
      
  • Upgrade Custom Add-ons:

    • Update custom add-ons like Prometheus, Grafana, or Ingress controllers using Helm or kubectl to ensure compatibility.

5. Test and Validate

  • Run Application Tests:

    • Deploy test workloads or run smoke tests to validate that applications are functioning correctly after the upgrade.
  • Monitor Logs and Metrics:

    • Use monitoring tools (e.g., CloudWatch, Prometheus) to check for any errors or performance issues during and after the upgrade.

6. Roll Back if Necessary

  • Revert Control Plane:

    • If there are major issues, consider rolling back by restoring from backups or re-creating the cluster with the previous Kubernetes version.
  • Revert Node Groups:

    • For managed node groups, downgrade the version or roll back to a previous AMI version for self-managed nodes.

Best Practices for Minimal Downtime

  • Use Managed Node Groups: Leveraging managed node groups simplifies the upgrade process with automated rolling updates.

  • Perform Upgrades During Maintenance Windows: Schedule upgrades during low-traffic periods to minimize the impact on production workloads.

  • Use Blue-Green or Canary Deployments: For critical services, consider using Blue-Green or Canary deployments to test the upgraded environment before fully committing.

  • Monitor Health and Roll Back Quickly if Needed: Continuously monitor the health of the cluster and be prepared to roll back if there are critical issues.

You can upgrade your EKS cluster with minimal downtime, ensuring a smooth transition to the new Kubernetes version.

Reference:

https://docs.aws.amazon.com/eks/latest/userguide/update-cluster.html