Automatically scale the number of pod replicas based on custom metrics, such as application-specific performance metrics

Scenario: Automatically scale the number of pod replicas based on custom metrics, such as application-specific performance metrics.

Objective: Automatically scale the number of pod replicas based on the total number of HTTP requests to efficiently handle varying traffic loads.

Steps:

Install Metrics Server and Prometheus

Purpose: The Metrics Server provides basic resource metrics, while Prometheus collects custom metrics like HTTP request counts.

Commands:

 kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
 helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
 helm repo update
 helm install prometheus prometheus-community/prometheus
 helm install prometheus-adapter prometheus-community/prometheus-adapter

Expose HTTP Request Count Metric

Why: Your application needs to expose metrics for Prometheus to collect. This allows Kubernetes to make scaling decisions based on real usage.

Python Application Example:

Install Prometheus Client:
```
  pip install prometheus_client
```

Update Application Code:

  from prometheus_client import start_http_server, Counter
  import random
  import time

  # Metric to count HTTP requests
  request_counter = Counter('http_requests_total', 'Total number of HTTP requests')

  start_http_server(9090)

  while True:
      request_counter.inc(random.randint(1, 5))  # Simulate incoming requests
      time.sleep(10)  # Wait for 10 seconds

Deploy Your Application

Purpose: Ensure the application runs and exposes the metrics endpoint.

Deployment YAML:

 apiVersion: apps/v1
 kind: Deployment
 metadata:
   name: my-web-app
 spec:
   replicas: 1
   selector:
     matchLabels:
       app: my-web-app
   template:
     metadata:
       labels:
         app: my-web-app
     spec:
       containers:
       - name: web-container
         image: my-web-app-image
         ports:
         - containerPort: 80
         - containerPort: 9090  # Metrics endpoint

Create a Horizontal Pod Autoscaler (HPA)

Why: The HPA will scale your pods based on the custom metric (HTTP request count), ensuring the application handles traffic efficiently.

HPA YAML:

 apiVersion: autoscaling/v2beta2
 kind: HorizontalPodAutoscaler
 metadata:
   name: web-app-hpa
 spec:
   scaleTargetRef:
     apiVersion: apps/v1
     kind: Deployment
     name: my-web-app
   minReplicas: 1
   maxReplicas: 10
   metrics:
   - type: Pods
     pods:
       metric:
         name: http_requests_total
       target:
         type: AverageValue
         averageValue: 100

Command to Apply HPA:

 kubectl apply -f hpa.yaml

The scenario of automatically scaling pods based on custom metrics is useful for several key reasons:

1. Optimizes Resource Utilization

Why: Custom metrics like request count or latency often reflect the real load on your application more accurately than basic metrics such as CPU or memory.
Benefit: Autoscaling based on these metrics ensures that your application scales in response to actual demand, reducing resource wastage and avoiding under-provisioning.

2. Improves Performance and Reliability

Why: By scaling pods in response to specific metrics, you ensure that your application can handle increased load effectively.
Benefit: This proactive scaling helps prevent performance bottlenecks and maintains a consistent user experience, even during traffic spikes or high usage periods.

3. Enhances Cost Efficiency

Why: Scaling based on custom metrics allows for more precise control over resource allocation.
Benefit: You only use and pay for the resources you need, as pods are scaled up or down according to actual traffic or load patterns rather than just generic resource usage.

4. Supports Complex Applications

Why: Many modern applications have complex scaling needs that are not fully captured by CPU and memory alone.
Benefit: Custom metrics enable scaling based on application-specific factors, such as the number of active sessions, request rates, or other business-critical indicators.

5. Adapts to Variable Load Patterns

Why: Applications often experience variable traffic and load patterns that cannot be predicted solely by static resource limits.
Benefit: Custom metrics allow your autoscaling setup to adapt dynamically to changing conditions, ensuring that your application remains responsive and efficient under varying loads.

Example Use Case:

Imagine you have a web application that handles different types of user requests, such as viewing content and making purchases. If the purchase requests surge, CPU and memory usage might not increase significantly, but the application might still be under heavy load. By scaling based on the number of purchase requests, you can add more pods to handle the increased demand effectively, ensuring smooth operation and a positive user experience during peak times.

In summary, using custom metrics for autoscaling ensures that your application remains performant, cost-effective, and responsive to actual usage patterns, which is crucial for modern, dynamic workloads. This approach ensures your web application scales appropriately to handle varying traffic loads, optimizing performance, cost, and resource utilization.