AWS EKS: Architecture and Monitoring

AWS Elastic Container Service for Kubernetes (EKS) is a managed service ideal for large clusters of nodes running heavy and variable workloads. Because of how account permissions work in AWS, EKS's architecture is unusual and creates slight differences in your monitoring strategy. Overall, it's still the same Kubernetes you know and love.

Kubernetes cluster architecture

Kubernetes consists of a control plane and a data plane. The control plane includes the services Kubernetes requires to manage the nodes, primarily kube-apiserver (used by kubectl, amongst others), controller-manager, and kube-scheduler. These usually run on one or more master nodes within the cluster. The master node can be replicated for fault tolerance. Still, the control plane components can also be deployed as pods within Kubernetes, which you'll see when developing on minikube, for example.

The data plane is made up of server or VM nodes running the kubelet service and kube-proxy, which allows them to respond to changes in the Kubernetes configuration. Kubelet also manages the Pod and Node APIs, which drive container execution. Like all APIs in Kubernetes, they are visible and available to serve as the foundation for tools and extensions.

All nodes in the cluster run etcd to allow the cluster state to be coordinated across the cluster. When a request is made to kube-apiserver that changes the state of the cluster, kube-apiserver updates the object in etcd, which is then propagated across the cluster. Kubelet then implements those changes on each node.

If the master node goes down, the group of nodes effectively ceases to be a cluster, but applications will usually continue to function even without a master. Restarting nodes may cause issues with DNS routing until the master is back up, but a short outage usually has no effect.

The Control Plane in EKS

EKS is a managed Kubernetes service that splits the responsibility for the two planes between your account and AWS. It consists of all the same moving parts as Kubernetes, so you have complete control of the cluster via kubectl and the api-server, but the master nodes and control plane are managed and maintained by AWS on their account. In addition, a cloud-controller-manager daemon is run to handle interactions with AWS resources. Configuration details are provided for your local kubectl to connect to the kube-apiserver, and the AWS account is permitted to access the instances on your account that act as nodes in your cluster.

There are a few requirements:

  • The Kubernetes master node needs full access to the kubelet's APIs on each node; each kubelet needs access to the master in return;
  • Cloud-controller-manager needs access to your account;
  • etcd needs to be able to sync between nodes on both accounts and
  • In order to communicate with each other, the nodes and pods need to be able to maintain static DNS-findable IP addresses.

When you set up EKS, you'll be asked to provide an IAM role which can be used to grant access to the nodes from the AWS account; you'll also be asked for a security group that grants access to the kubernetes-specific ports; and you'll be asked for a VPC (virtual private cloud) which will be used to provide static (internal) addresses which the pods and nodes can use to find each other.

The size of the VPC address space limits the number of pods you can run, so make sure to choose a large enough range—for example, a /24 subnet only provides 254 addresses, and each node in the cluster also needs an address, which reduces the total available to pods. There's also a limit to the number of IPs that can be assigned to a single node based on the number of Network Interfaces and how many IPs each can sustain.

To simplify the process of spinning up and managing your EKS clusters, AWS provides a small command-line utility called eksctl. This utility can use credentials stored by the AWS CLI to create cluster nodes and roles on your behalf. It will automatically export the credentials you need to control your cluster to kubectl.



Services in EKS

If you launch a Service, Kubernetes' cloud-controller-manager will choose an appropriate AWS resource to launch depending on the service type. In EKS, NodePort or ClusterIP services function as in a standard Kubernetes setup, but the LoadBalancer or Ingress service types both trigger the creation of AWS resources:

  • By default, a LoadBalancer service type creates a Classic Elastic Load Balancer, the original load balancer created by AWS. This type of load balancer consists of a small instance that routes requests to defined endpoints using a round-robin approach.
  • Alternatively, Network Load Balancers are available, which route requests to the different endpoints by creating a hash-map based on protocol, source IP/port, destination IP/port, and TCP sequence number. All traffic for one TCP connection will go to the same endpoint. You can indicate to Kubernetes to use Network Load Balancers instead of Classic Load Balancers by setting an annotation on the service: nb
  • The Ingress service type distributes traffic to different pods based on the requested destination URL (layer seven routing) so that traffic for several destinations can be sent to the same IP address and sent to different sets of pods based on the URL that was used. For EKS, this service is handled by an Application Load Balancer, AWS's layer seven routing LB.

This page provides more information on how AWS's different load balancers route traffic.

Volumes in EKS

Kubernetes supports many storage types, the simplest of which is an emptyDir. This volume persists for the lifetime of the pod it's attached to, regardless of container restarts within the pod. By default, emptyDir volumes are created in the disk space of the node – in EKS, since the node is an EC2 instance, the actual type of storage used for emptyDir is (usually) an Elastic Block Storage volume. EC2 also offers temporary local storage, known as the Instance Store. However, that's not the default anymore due to its transient nature.

Kubernetes also supports using an EBS volume as storage directly, creating a volume that persists between pods. Optionally, you can pre-load the EBS volume with data the pods need access to. EBS volumes can only be mounted to one EC2 instance at a time, which the pod must access.

Monitoring EKS

In a typical Kubernetes setup, one of the most important things to monitor is the Master Node(s) since the Kubernetes cluster ceases to be a cluster without those. In EKS, the master node and most of the control plane are managed by AWS, which means they're out of your reach. It may be possible to get metrics about the cluster as a whole but not the actual master node.

Since one of Kubernetes' jobs is to deal with outages of nodes and containers, it has built-in Monitoring that can be accessed as part of your monitoring setup. Kubelet is responsible for collecting data on the states of nodes, pods, and containers (cAdvisor is built in) to inform when a container is having issues and needs restarted, and the metrics are available via the Metrics Server. Metrics-server is a highly lightweight service with short-term storage, so it's used to capture the current state of Kubernetes resources on request. Another tool is needed to capture the metrics as a time series and use them as a graph.

AWS already monitors EC2 instances, which are stored in CloudWatch. They also provide a service called Container Insights, an agent that can be run on your EKS cluster as a DaemonSet to get metrics about nodes, pods, and containers and send them back to CloudWatch as custom metrics.

Custom metrics have a relatively steep cost in CloudWatch, though—$0.30 per metric before alerting is added. A more common approach is to use Prometheus to monitor the cluster. It's easy to get up and running since Kubernetes supports Prom-format metrics natively. Optionally, you can set up Helm, a Kubernetes package manager, which will install and configure Prometheus and all the supporting services, including Alert Manager, PushGateway, and Node-Exporter.

Since AWS CloudWatch provides API access metrics, there's a middle ground in the form of the Prometheus CloudWatch exporter. This exporter gets metrics from CloudWatch and makes them available to Prometheus. You can install it using Helm or directly from GitHub.

With Prometheus, there is a choice about where to keep the metric data being retrieved. An emptyDir-type storage volume (described above) that lives as long as the Prometheus pod makes sense but doesn't provide resilience. A separate EBS volume is a safe option since it exists separate from the node and containers and can even be increased in size if needed. There is a minimum size, though, so it could be well-spent if you don't need all that storage. Remote storage provides resiliency and may allow alerts and dashboards to be created outside the cluster (MetricFire's Prometheus plugin, for example).

EKS is a service best suited to large cross-AZ deployments, and that's what Kubernetes does best. However, Monitoring is essential because it is very easy to spend money on costly EC2 instances you don't need. Ensuring the cluster's capacity matches your resource consumption is vital, and good Monitoring can help ensure you achieve that.

Do you want to discuss monitoring best practices and any problems you have? Get in touch at We offer Prometheus as a service and Graphite as a service, with a 14-day free trial. You can also book a demo and talk to us directly about monitoring solutions that work for you.


