Table of contents
- Introduction
- Not Setting Resource Requests 🙅
- Omit Health Checks 🚫
- Using the Latest Tag 🛑
- Overprivileged Containers 📛
- Lack of Monitoring and Logging 👀
- Default Namespace for all Objects 😵
- Missing Security Configurations 🔒
- Missing PodDisruptionBudget
- Self anti-affinities for pods 😠
- LoadBalancer for every HTTP service ⚖️
- Non-kubernetes-aware cluster autoscaling
- Conclusion:
Introduction
If you're new to Kubernetes or considering it for your applications, it's important to be aware of common mistakes. While Kubernetes offers powerful tools for managing automatically scalable, highly available distributed cloud-native applications, many users encounter pitfalls. In this article, we'll explore these pitfalls and provide tips on how to avoid them.
Not Setting Resource Requests 🙅
This definitely deserves the most attention and first place in this list.
CPU requests are usually either not set or set very low (so that we can fit a lot of pods on each node) and nodes are thus overcommited. In times of high demand, the CPUs of the node are fully utilized and our workload is getting only “what it had requested” and gets CPU throttled, causing increased application latency, timeouts, etc.
BestEffort (please don’t):
resources: {}
very low CPU (please don’t):
resources:
requests:
cpu: "1m"
On the other hand, having a CPU limit can unnecessarily throttle pods even if the node’s CPU is not fully utilized which again can cause increased latency.
There is an open discussion around CPU CFS quota in Linux kernel and CPU throttling based on set CPU limits and turning off the CFS quota.
CPU limits can cause more problems than they solve. See more in the link below.
Memory overcommitting can get you in more trouble. Reaching a CPU limit results in throttling, reaching the memory limit will get your pod killed. Ever seen OOMkill(Out of Memory)? Yep, that’s the one we are talking about. Want to minimize how often it can happen? Don’t overcommit your memory and use Guaranteed QoS (Quality of Service) setting memory requests equal to the limit like in the example below. Read more about the topic in Henning Jacobs’ (Zalando) presentation.
Burstable (more likely to get OOMkilled more often):
resources:
requests:
memory: "128Mi"
cpu: "500m"
limits:
memory: "256Mi"
cpu: 2
Guaranteed:
resources:
requests:
memory: "128Mi"
cpu: 2
limits:
memory: "128Mi"
cpu: 2
So what can help you when setting resources?
You can see the current cpu and memory usage of pods (and containers in them) using metrics-server. Chances are, you are already running it. Simply run these:
kubectl top pods
kubectl top pods --containers
kubectl top nodes
However these show just the current usage. That is great to get the rough idea about the numbers but you end up wanting to see these usage metrics in time (to answer questions like: what was the cpu usage in peak, yesterday morning, etc.). For that, you can use Prometheus, DataDog and many others. They just ingest the metrics from metrics-server and store them, then you can query & graph them.
VerticalPodAutoscaler can help you automate away this manual process — looking at cpu/mem usage in time and setting new requests and limits based on that all over again.
Utilizing effectively your compute is not an easy task. It is like playing tetris all the time. If you find yourself paying a lot for compute while having low average utilization (say ~10%), you might want to check AWS Fargate or Virtual Kubelet based products that leverage more of a serverless/pay-per-usage billing model that might be cheaper for you.
Omit Health Checks 🚫
When deploying your services to Kubernetes, health checks play an important role in maintaining your services.
Health checks are highlyunder-utilized 😿 in the Kubernetes environment. Through health checks, you keep an eye on the health of the pods and their containers.
Kubernetes has three main tools you can use for health checks:
Liveness Check allows Kubernetes to check whether your app is alive or not. The Kubelet agent running on each node uses liveness probes to ensure that the containers are running as expected. Readiness checks run during the entire lifecycle of the container. Kubernetes uses this probe to know when the container is ready to accept traffic. The startup probe determines when the container application has been started successfully. If the startup check fails, the pod is restarted.
Using the Latest Tag 🛑
This one is a classic. The latest is not descriptive and hard to work with. The Kubernetes docs are very clear on using docker images:latest tags in a production environment:
You should avoid using the :latest tag when deploying containers in production, as it makes it difficult to track which version of the image is running and hard to roll back.
I feel like lately we don’t see this very often as a lot of us got burned too many times and we stopped using :latest
and everyone started to pin the versions. Yay!
ECR has a great feature of tag immutability, definitely worth checking out.
Overprivileged Containers 📛
Overprivileged containers are containers that have been given too many privileges, such as access to resources that are not accessible in ordinary containers.
This is a common mistake that developers make while using Kubernetes, and it can create security risks.
For example, running a Docker daemon inside a Docker container is an example of a privileged container, which is not necessarily secure.
To avoid this, it is recommended to avoid giving CAP_SYS_ADMIN capability to your container, as it comprises more than 25% of all kernel vulnerabilities.
Additionally, it is important to avoid giving full privileges to the container and host file system privileges given to the container. This means the container can be used to compromise the whole host by replacing binaries with malicious ones.
To prevent overprivileged containers, it is essential to carefully configure permission settings and never run processes with higher privileges than they need.
It is also important to use monitoring and logging to detect and resolve issues.
Lack of Monitoring and Logging 👀
The lack of monitoring and logging in a Kubernetes environment can be detrimental to its security and overall performance. Inadequate logging and monitoring present challenges during incident investigation and response efforts, making it difficult to detect and resolve issues effectively.
One common pitfall is being unable to find failure points in the Kubernetes platform and applications due to a lack of relevant logs or metrics.
To address this, it is essential to set up proper monitoring and logging tools, such as Prometheus, Grafana, Fluentd, and Jaeger, to collect, analyze, and visualize metrics, logs, and traces, gaining insights into the performance and health of the Kubernetes environment.
By implementing robust monitoring and logging practices, organizations can effectively correlate information, gain deeper insights, and overcome the challenges associated with the dynamic and complex nature of Kubernetes environments.
Default Namespace for all Objects 😵
Using the default namespace for all objects in Kubernetes can lead to organizational and management challenges.
The default
namespace is where services and apps are created by default, and it is also the active namespace unless explicitly specified.
Relying solely on the default namespace can result in a lack of isolation and organization for different components or teams within the cluster. This can lead to difficulties in resource management, access control, and visibility. To avoid this, it is recommended to create custom namespaces for different projects, teams, or applications, allowing for better organization, resource allocation, and access control within the Kubernetes cluster.
By utilizing multiple namespaces, users can effectively compartmentalize and manage their resources, enhancing the overall operational efficiency and security of the Kubernetes environment.
Missing Security Configurations 🔒
When deploying your application, you should always keep security in mind. So what are some of the most important things to consider when it comes to security? For example, using an endpoint accessible outside of your cluster, not securing your secrets, not considering how to run privileged containers, etc. safely.
Kubernetes security is an integral part of any Kubernetes deployment. Security challenges include:
● Authorization — Authentication and authorization are essential for controlling access to resources in a Kubernetes cluster.
● Networking — Kubernetes networking involves managing overlay networks and service endpoints to ensure that traffic between containers is routed securely within the cluster.
● Storage — Securing storage in a cluster consists of providing that data cannot be accessed by unauthorized users or processes and that data.
The Kubernetes API server has a REST interface that provides access to all the information stored. This means that users can access any information stored in the API by simply sending HTTP requests to it. To protect this data from unauthenticated users, you need to configure authentication for the API server using supported methods like username/password or token-based authentication.
It’s not just about securing the cluster itself but also the secrets and configurations on it. To protect the cluster from vulnerabilities, you will need to configure a set of security controls on it.
One such robust security control is securing a Kubernetes cluster with RBAC: Role-Based Access Control can be used to secure Kubernetes clusters by limiting access to resources based on roles assigned to users. These roles can be configured as “admin” or “operator.”
The admin role has full access rights, while the operator role has limited rights over resources within the cluster. We can control and manage anyone getting access to the cluster by doing this.
Missing PodDisruptionBudget
You run the production workload on kubernetes. Your nodes & cluster have to be upgraded, or decommissioned, from time to time. PodDisruptionBudget (pdb) is sort of an API for service guarantees between cluster administrators and cluster-users.
Make sure to create pdb
to avoid unnecessary service outages due to draining nodes.
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: db-pdb
spec:
minAvailable: 2
selector:
matchLabels:
app: database
With this as a cluster-user you tell the cluster-administrators: “hey, I have this database service here and no matter what you have to do, I’d like to have at least 2 replicas always available”.
Self anti-affinities for pods 😠
Running e.g. 3 pod replicas of some deployment, node goes down and all the replicas with it. Huh? All the replicas were running on one node? Wasn’t Kubernetes supposed to be magical and provide HA?!
You can’t expect a kubernetes scheduler to enforce anti-affinites
for your pods. You have to define them explicitly.
......
labels:
app: db
......
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: "app"
operator: In
values:
- db
topologyKey: "kubernetes.io/hostname"
That’s it. This will make sure the pods will be scheduled to different nodes (this is being checked only at scheduling time, not at execution time, hence the requiredDuringSchedulingIgnoredDuringExecution
).
We are talking about podAntiAffinity on different node names — topologyKey: "kubernetes.io/hostname"
- not different availability zones. If you really need proper HA, dig a bit deeper into this topic.
LoadBalancer for every HTTP service ⚖️
Chances are you have more http services in your cluster that you’d like to expose to the outside world.
If you expose the kubernetes service as a type: LoadBalancer
, its controller (vendor specific) will provision and reconcile an external LoadBalancer and those resources might get expensive (external static IPv4 address, compute, per-second pricing…) as you create many of them.
In that case, sharing one external loadbalancer might make more sense and you expose your services as type: NodePort
. Or yet better, deploying something like nginx-ingress-controller (or traefik or Istio) being the single NodePort endpoint exposed to the external loadbalancer and routing the traffic in the cluster based on kubernetes ingress resources.
The other in-cluster (micro)services that talk to each other can talk through ClusterIP services and out-of-box DNS service discovery.
Be careful about not using their public DNS/IPs as it could affect their latency and cloud cost.
Non-kubernetes-aware cluster autoscaling
When adding and removing nodes to/from the cluster, you shouldn’t consider some simple metrics like a CPU utilization of those nodes. When scheduling pods, you decide based on a lot of scheduling constraints like pod & node affinities, taints, and tolerations, resource requests, QoS, etc. Having an external autoscaler that does not understand these constraints might be troublesome.
Imagine there is a new pod to be scheduled but all of the CPU available is requested and the pod is stuck in the Pending state. External autoscaler sees the average CPU currently used (not requested) and won’t scale out (will not add another node). The pod won’t be scheduled.
Scaling-in (removing a node from the cluster) is always harder. Imagine you have a stateful pod (with persistent volume attached) and as persistent volumes are usually resources that belong to a specific availability zone and are not replicated in the region, your custom autoscaler removes a node with this pod on it and scheduler cannot schedule it onto a different node as it is very limited by the only availability zone with your persistent disk in it. Pod is again stuck in Pending state.
The community is widely usingcluster-autoscalerwhich runs in your cluster and is integrated with most major public cloud vendors APIs, understands all these constraints and would scale-out in the mentioned cases. It will also figure out if it can gracefully scale-in without affecting any constraints we have set and saves you money on compute.
Conclusion:
In conclusion, Kubernetes is a powerful tool for managing containerized applications, but it comes with its own set of challenges. To avoid common mistakes and pitfalls, it is essential to pay close attention to your interactions with Kubernetes and understand the differences between how it interacts with your deployed services.
Don’t expect everything to work automagically, and invest some time in making your app cloud-native. By avoiding these mistakes, you can work efficiently with your Kubernetes deployments and enhance the stability, performance, and security of your Kubernetes environment.