Deploy cluster-level logging on Amazon EKS clusters using Fluentd and Amazon CloudWatch Logs
Logging provides critical observability for your containerized application. As part of providing a fully managed Kubernetes control plane, Amazon Elastic Container Service for Kubernetes Amazon EKS provides automatic logging to Amazon CloudWatch Logs. This lets you see lots of information related to the Amazon EKS managed Kubernetes control plane by using the CloudWatch Logs console or the AWS Command Line Interface AWS CLI.
Worker nodes, the EC2 instances that run your containers, are managed by the Amazon EKS control plane, but they run in your account. On worker nodes, there are two types of logs:
- System logs generated by kubelet, kube-proxy, or dockerd
- Application logs generated by your application containers
In this post, I describe one of the ways to collect and search these worker node logs using Fluentd and CloudWatch Logs. I provide a sample manifest to deploy a DaemonSet to your worker nodes. The DaemonSet forwards the node’s logs to CloudWatch Logs. I also show how to search these logs using the CloudWatch Logs console and the AWS CLI.
Kubernetes logging architecture
By default, Kubernetes stores all logs on the Amazon EC2 instance. After an instance is terminated, all the logs are deleted if you don’t export them or preserve the instance’s EBS volume. Additionally, it makes sense to see logs grouped by some meaningful dimension such as by service. However, pods tend to be distributed across instances, so it is hard to collate logs in a single, meaningful view.
To solve these problems, it’s common to add additional cluster-level logging architecture to your Kubernetes cluster. Typically, this means that all logs are forwarded from the individual instances in the cluster to a logging backend where they are combined for higher-level reporting.
To forward logs from an instance to the logging backend, you need something between your containers and the backend. Because Kubernetes itself doesn’t have any implementation or abstraction layer for this (as of version 1.10), choose and construct an architecture to do this job.
The most common pattern is to use a node logging agent as described in Logging Architecture. With this pattern, you deploy a special set of pods on each instance, using DaemonSet to forward the logs on the instance to the backend. Therefore, all normal pods don’t have to care about forwarding logs and they just put logs to STDOUT. In addition, the agent pods can also forward system logs, giving you better insight into the status and performance of your underlying infrastructure.
There are many options for logging backends. Many of these require you to implement and manage the backend yourself, which can be a lot of work. Managed services for logging are easier to implement and manage.
Fluentd and CloudWatch Logs
Fluentd, a CNCF project like Kubernetes, is a popular logging agent. Fluentd has a plugin system and there are many useful plugins available for ingress and egress:
- Using in_tail, you can easily tail and parse most log files.
- Using fluent-plugin-systemd, you can ingest systemd journal as well.
There are multiple output plugins for various backends, such as Amazon Kinesis. To use these plugins with Fluentd, install them using RubyGems and configure with Fluentd config files.
Amazon CloudWatch Logs is a fully managed logging service from AWS. CloudWatch Logs is designed for storing and filtering logs, and integrating with other AWS services. You don’t need to provision any resources in advance, just push log events to CloudWatch Logs. Then, you can filter to search logs, or count filtered logs and automatically alert to CloudWatch. Also, by adding subscriptions, you can process filtered logs in real time with AWS Lambda or stream them to Amazon Kinesis.
CloudWatch Logs has two dimensions:
Log streams are actual event streams of logs that you specify when you write logs. Log groups are top-level resources that identify a group of log streams. You can filter or subscribe to log groups, so sometimes log groups are thought of as collections of log streams. In this post, you use CloudWatch Logs as the logging backend and Fluentd as the logging agent on each EKS node. To send logs from Fluentd to CloudWatch Logs, use the fluent-plugin-cloudwatch-logs plugin.
Deploy Fluentd DaemonSet to your EKS cluster
To build the cluster-level logging architecture, start by deploying a DaemonSet for Fluentd to the EKS cluster. You can install and run Fluentd on the instance level outside Kubernetes, by using the cloud-init script or baked AMI for example. However, combining DaemonSet, Pod, and ConfigMap has more than enough capability:
- You can run only one Fluentd agent on each node by using DaemonSet. This means updating the agent is easy compared to installing it for each instance.
- You can mount instance volumes to a pod. Even inside containers, Fluentd can read log files on the instance. It is possible to mount instance volumes with read-only permission for security.
- You can mount ConfigMap data onto volumes as well, so you can deploy fluentd config file via ConfigMap. No need to bake config file into container images or AMI, or deploy config files everywhere. Also, you may want to add additional metadata to application logs in terms of Kubernetes, such as pod name, namespace, etc. There is another plugin of fluentd to do this, called fluent-plugin-kubernetes_metadata_filter. To use this plugin, the pods must have authorized access privilege to the Kubernetes API. Because EKS enables role-based access control (RBAC), you create a service account and role binding to do this.
Application logs are written as files on /var/log/containers, but they are symbolic-linked to /var/log/pods and then /var/lib/docker/containers, which is where the actual log files are stored. Mount these directories onto the Fluentd container. System logs are managed by systemd and stored on /run/log/journal, so also mount this directory.
For the container image, use prebuilt container images of Fluentd for Kubernetes. In these images, all required plugins are pre-installed at build time, so no additional work is required. There are various variations of images depending on backend output. For this post, use the v1.1-debian-cloudwatch image as it contains the cloudwatch-logs and systemd plugins.
Finally, set up your IAM credentials because the Fluentd container must call the CloudWatch Logs API. Here’s an IAM policy to add to the IAM role for the EC2 instances.
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"logs:DescribeLogGroups",
"logs:DescribeLogStreams",
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": "*",
"Effect": "Allow"
}
]
}
Now, let’s look into a sample manifest to deploy the Fluentd agent:
apiVersion: v1
kind: ServiceAccount
metadata:
name: fluentd
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: fluentd
namespace: kube-system
rules:
- apiGroups: [""]
resources:
- namespaces
- pods
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: fluentd
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: fluentd
subjects:
- kind: ServiceAccount
name: fluentd
namespace: kube-system
---
apiVersion: v1
kind: ConfigMap
metadata:
name: fluentd-config
namespace: kube-system
labels:
k8s-app: fluentd-cloudwatch
data:
fluent.conf: |
@include containers.conf
@include systemd.conf
<match fluent.**>
@type null
</match>
containers.conf: |
<source>
@type tail
@id in_tail_container_logs
@label @containers
path /var/log/containers/*.log
pos_file /var/log/fluentd-containers.log.pos
tag *
read_from_head true
<parse>
@type json
time_format %Y-%m-%dT%H:%M:%S.%NZ
</parse>
</source>
<label @containers>
<filter **>
@type kubernetes_metadata
@id filter_kube_metadata
</filter>
<filter **>
@type record_transformer
@id filter_containers_stream_transformer
<record>
stream_name ${tag_parts[3]}
</record>
</filter>
<match **>
@type cloudwatch_logs
@id out_cloudwatch_logs_containers
region "#{ENV.fetch('REGION')}"
log_group_name "/eks/#{ENV.fetch('CLUSTER_NAME')}/containers"
log_stream_name_key stream_name
remove_log_stream_name_key true
auto_create_stream true
<buffer>
flush_interval 5
chunk_limit_size 2m
queued_chunks_limit_size 32
retry_forever true
</buffer>
</match>
</label>
systemd.conf: |
<source>
@type systemd
@id in_systemd_kubelet
@label @systemd
filters [{ "_SYSTEMD_UNIT": "kubelet.service" }]
<entry>
field_map {"MESSAGE": "message", "_HOSTNAME": "hostname", "_SYSTEMD_UNIT": "systemd_unit"}
field_map_strict true
</entry>
path /run/log/journal
pos_file /var/log/fluentd-journald-kubelet.pos
read_from_head true
tag kubelet.service
</source>
<source>
@type systemd
@id in_systemd_kubeproxy
@label @systemd
filters [{ "_SYSTEMD_UNIT": "kubeproxy.service" }]
<entry>
field_map {"MESSAGE": "message", "_HOSTNAME": "hostname", "_SYSTEMD_UNIT": "systemd_unit"}
field_map_strict true
</entry>
path /run/log/journal
pos_file /var/log/fluentd-journald-kubeproxy.pos
read_from_head true
tag kubeproxy.service
</source>
<source>
@type systemd
@id in_systemd_docker
@label @systemd
filters [{ "_SYSTEMD_UNIT": "docker.service" }]
<entry>
field_map {"MESSAGE": "message", "_HOSTNAME": "hostname", "_SYSTEMD_UNIT": "systemd_unit"}
field_map_strict true
</entry>
path /run/log/journal
pos_file /var/log/fluentd-journald-docker.pos
read_from_head true
tag docker.service
</source>
<label @systemd>
<filter **>
@type record_transformer
@id filter_systemd_stream_transformer
<record>
stream_name ${tag}-${record["hostname"]}
</record>
</filter>
<match **>
@type cloudwatch_logs
@id out_cloudwatch_logs_systemd
region "#{ENV.fetch('REGION')}"
log_group_name "/eks/#{ENV.fetch('CLUSTER_NAME')}/systemd"
log_stream_name_key stream_name
auto_create_stream true
remove_log_stream_name_key true
<buffer>
flush_interval 5
chunk_limit_size 2m
queued_chunks_limit_size 32
retry_forever true
</buffer>
</match>
</label>
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: fluentd-cludwatch
namespace: kube-system
labels:
k8s-app: fluentd-cloudwatch
spec:
template:
metadata:
labels:
k8s-app: fluentd-cloudwatch
spec:
serviceAccountName: fluentd
terminationGracePeriodSeconds: 30
# Because the image's entrypoint requires to write on /fluentd/etc but we mount configmap there which is read-only,
# this initContainers workaround or other is needed.
# See https://github.com/fluent/fluentd-kubernetes-daemonset/issues/90
initContainers:
- name: copy-fluentd-config
image: busybox
command: ['sh', '-c', 'cp /config-volume/..data/* /fluentd/etc']
volumeMounts:
- name: config-volume
mountPath: /config-volume
- name: fluentdconf
mountPath: /fluentd/etc
containers:
- name: fluentd-cloudwatch
image: fluent/fluentd-kubernetes-daemonset:v1.1-debian-cloudwatch
env:
- name: REGION
value: us-west-2
- name: CLUSTER_NAME
value: demo
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 200Mi
volumeMounts:
- name: config-volume
mountPath: /config-volume
- name: fluentdconf
mountPath: /fluentd/etc
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
- name: runlogjournal
mountPath: /run/log/journal
readOnly: true
volumes:
- name: config-volume
configMap:
name: fluentd-config
- name: fluentdconf
emptyDir: {}
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: runlogjournal
hostPath:
path: /run/log/journal
To use this sample directly, modify the environment variables REGION and CLUSTER_NAME placed at the line 195–199. REGION is used to create the CloudWatch Logs log group, and CLUSTER_NAME is used in the log group name. Run the following command:
kubectl create -f manifest.yaml
That’s it! To troubleshoot, view the output of Fluentd containers using the following command:
kubectl logs -l k8s-app=fluentd-cloudwatch -n kube-system
Next, search the published logs.
Search logs with CloudWatch Logs
CloudWatch Logs is a managed logging service. You can search logs directly through the console, AWS CLI, or SDK.
CloudWatch console
Open the CloudWatch Logs console.
Scroll down and select a log group named /eks/CLUSTER_NAME/containers.
Summary
In this post, I described the Kubernetes logging pattern, especially cluster-level logging using a node agent. I then introduced Fluentd and CloudWatch Logs as tools for forwarding and storing logs, and showed a sample manifest to deploy such architecture on your EKS cluster. Also, I showed how to search logs stored on CloudWatch Logs from the console.
With this architecture, you can easily deploy a centralized logging system onto your cluster to have more visibility into your cluster. Because both Fluentd and CloudWatch Logs are flexible, you can configure this implementation as needed.