Introduction

In modern Kubernetes environments, log management is crucial for monitoring application health, debugging issues, and ensuring security compliance. One of the most popular logging solutions for Kubernetes is the EFK stack (Elasticsearch, Fluentd, Kibana).

This guide provides a step-by-step approach to setting up Fluentd for log collection in a Kubernetes cluster, while deploying Elasticsearch and Kibana for centralized log storage and visualization in a production-ready environment.

Why Use Fluentd, Elasticsearch, and Kibana in Kubernetes?

1. Fluentd (Log Collector and Forwarder)

Collects logs from Kubernetes pods, services, and nodes.
Processes, filters, and enriches logs before sending them to Elasticsearch.
Lightweight and scales well in production.

2. Elasticsearch (Log Storage and Search Engine)

A distributed and scalable search engine designed to store, index, and query logs efficiently.
Ideal for real-time log analysis.

3. Kibana (Visualization and Monitoring)

Provides a web-based interface for searching, analyzing, and visualizing logs.
Helps DevOps teams monitor system health and detect anomalies.

Prerequisites

Before setting up Fluentd, Elasticsearch, and Kibana, ensure that:
✅ You have a Kubernetes cluster (AWS EKS, GKE, AKS, or a self-hosted cluster).
✅ kubectl is installed and configured to interact with the cluster.
✅ Helm is installed (for easier deployment of Elasticsearch and Kibana).
✅ You have sufficient CPU, memory, and storage resources.

Why is Elasticsearch Deployed as a StatefulSet?

In Kubernetes, StatefulSets are used for stateful applications like Elasticsearch that require:

Stable network identities (predictable pod names for easy clustering).
Persistent storage (data should not be lost if a pod restarts).
Ordered deployments and scaling to avoid data corruption.

How Elasticsearch Works as a StatefulSet

Persistent Storage
- Each Elasticsearch pod gets a PersistentVolumeClaim (PVC), ensuring that log data is retained even if a pod restarts.
- Example storage paths:
```
  elasticsearch-master-0 → /data/elasticsearch
  elasticsearch-master-1 → /data/elasticsearch
  elasticsearch-master-2 → /data/elasticsearch
```
Stable Network Identity
- Each pod gets a unique, predictable hostname (e.g., elasticsearch-master-0, elasticsearch-master-1).
- This helps Elasticsearch nodes discover each other easily.
Ordered Scaling & Updates
- Kubernetes ensures that nodes are started in order and shutdown gracefully.
- This prevents data corruption or split-brain issues in Elasticsearch.

Example: Deploying Elasticsearch as a StatefulSet

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: elasticsearch
spec:
  serviceName: "elasticsearch"
  replicas: 3
  selector:
    matchLabels:
      app: elasticsearch
  template:
    metadata:
      labels:
        app: elasticsearch
    spec:
      containers:
      - name: elasticsearch
        image: docker.elastic.co/elasticsearch/elasticsearch:7.10.0
        resources:
          requests:
            memory: "2Gi"
            cpu: "1"
        volumeMounts:
        - name: storage
          mountPath: /usr/share/elasticsearch/data
  volumeClaimTemplates:
  - metadata:
      name: storage
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 10Gi

Benefits of Using StatefulSets for Elasticsearch

Ensures high availability and data persistence.
Prevents data loss and ensures graceful scaling.
Stable networking for Elasticsearch cluster discovery.

Step 1: Deploy Elasticsearch in Kubernetes (Production Setup)

Elasticsearch requires persistent storage and proper resource allocation in production. We will use Helm to deploy it.

1. Add the Helm Repository

helm repo add elastic https://helm.elastic.co
helm repo update

2. Deploy Elasticsearch with Helm

helm install elasticsearch elastic/elasticsearch \
  --set replicas=3 \
  --set minimumMasterNodes=2 \
  --set persistence.enabled=true \
  --set resources.requests.cpu=1 \
  --set resources.requests.memory=2Gi

Configuration Explanation:

replicas=3: Deploys a 3-node cluster (production-ready setup).
minimumMasterNodes=2: Ensures high availability.
persistence.enabled=true: Enables persistent storage for logs.
resources.requests.cpu=1, memory=2Gi: Allocates sufficient resources.

3. Verify Elasticsearch Deployment

kubectl get pods -n default -l app=elasticsearch

4. Expose Elasticsearch (Optional for External Access)

kubectl port-forward svc/elasticsearch-master 9200:9200

Now, you can access Elasticsearch at http://localhost:9200.

Step 2: Deploy Kibana in Kubernetes

Kibana will connect to Elasticsearch to visualize logs.

1. Deploy Kibana Using Helm

helm install kibana elastic/kibana \
  --set service.type=ClusterIP

2. Verify Kibana Deployment

kubectl get pods -l app=kibana

3. Expose Kibana UI (Port Forwarding for Testing)

kubectl port-forward svc/kibana-kibana 5601:5601

Now, you can access Kibana at http://localhost:5601.

Step 3: Deploy Fluentd as a DaemonSet in Kubernetes

Fluentd will run as a DaemonSet, ensuring that logs from all nodes in the cluster are collected and forwarded to Elasticsearch.

1. Create a Fluentd Configuration File

Create a ConfigMap for Fluentd configuration:

apiVersion: v1
kind: ConfigMap
metadata:
  name: fluentd-config
  namespace: kube-system
data:
  fluentd.conf: |
    <source>
      @type tail
      path /var/log/containers/*.log
      pos_file /var/log/fluentd-containers.pos
      tag kubernetes.*
      read_from_head true
      <parse>
        @type json
        time_format %Y-%m-%dT%H:%M:%S.%NZ
      </parse>
    </source>

    <match kubernetes.**>
      @type elasticsearch
      host elasticsearch-master
      port 9200
      logstash_format true
      logstash_prefix kubernetes-logs
      flush_interval 5s
    </match>

2. Apply the ConfigMap to the Cluster

kubectl apply -f fluentd-config.yaml

3. Deploy Fluentd DaemonSet

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd
  namespace: kube-system
spec:
  selector:
    matchLabels:
      name: fluentd
  template:
    metadata:
      labels:
        name: fluentd
    spec:
      serviceAccountName: fluentd
      containers:
      - name: fluentd
        image: fluent/fluentd-kubernetes-daemonset:v1.15-debian-elasticsearch7
        env:
        - name: FLUENT_ELASTICSEARCH_HOST
          value: "elasticsearch-master"
        - name: FLUENT_ELASTICSEARCH_PORT
          value: "9200"
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: config-volume
          mountPath: /etc/fluent/config.d
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: config-volume
        configMap:
          name: fluentd-config

4. Apply Fluentd DaemonSet to the Cluster

kubectl apply -f fluentd-daemonset.yaml

5. Verify Fluentd Logs

kubectl logs -l name=fluentd -n kube-system

If Fluentd is running correctly, it should show log data being forwarded to Elasticsearch.

Step 4: Visualizing Logs in Kibana

1. Access Kibana

If Kibana is not exposed externally, you can use port forwarding:

kubectl port-forward svc/kibana-kibana 5601:5601

Then, open localhost:5601 in your browser.

2. Configure Kibana to Read Logs from Elasticsearch

In Kibana, navigate to Management → Stack Management → Index Patterns.
Click Create Index Pattern and enter kubernetes-logs-* (same as defined in the Fluentd config).
Select the @timestamp field and save.

3. Explore Kubernetes Logs

Go to Discover in Kibana.
Filter logs using pod names, namespaces, or error messages.

Step 5: (Optional) Securing the Setup for Production

For a secure and production-ready setup, you should:

Enable authentication and role-based access control (RBAC) for Elasticsearch and Kibana.
Use persistent storage (PVCs) for Elasticsearch data.
Enable TLS encryption for Elasticsearch and Fluentd communications.
Set up log retention policies in Elasticsearch.

Conclusion

By deploying Fluentd as a DaemonSet, Elasticsearch for storage, and Kibana for visualization, we achieve a scalable, centralized logging system for Kubernetes clusters.

🚀 Benefits of This Setup:
✅ Real-time log aggregation across Kubernetes nodes.
✅ Advanced search & filtering for troubleshooting.
✅ Scalable & production-ready architecture.
✅ Integrated with Kibana for visualization.

With this setup, your team can monitor, analyze, and troubleshoot logs efficiently in any Kubernetes production environment! 🚀

Setting Up Fluentd in a Kubernetes Environment with Elasticsearch and Kibana for Production

Table of contents