🎉 DevOps Interview Prep Bundle is live — 1000+ Q&A across 20 topicsGet it →
All Articles

What Are Linux cgroups and Namespaces? The Foundation of Containers Explained

Docker and Kubernetes containers are built on Linux cgroups and namespaces. Understanding these fundamentals helps you debug container issues and set resource limits properly.

DevOpsBoys5 min read
Share:Tweet

Every Docker container and Kubernetes pod is built on two Linux kernel features: namespaces and cgroups. If you've ever wondered how containers provide isolation and resource limits, this is the answer.

Namespaces: Isolation

Namespaces give each container its own isolated view of the system. A process inside a namespace sees only the resources in that namespace.

Linux has 8 namespace types:

NamespaceWhat It Isolates
pidProcess IDs — container sees its own PID 1
netNetwork interfaces, routes, firewall rules
mntFilesystem mounts
utsHostname and domain name
ipcInter-process communication (shared memory, semaphores)
userUser and group IDs
cgroupcgroup root (container sees its own cgroup view)
timeSystem clock (Linux 5.6+)

Why This Matters for DevOps

PID namespace: In a container, the main process has PID 1. From outside, it has a different PID. This is why killing PID 1 inside a container terminates the container.

bash
# See the container's PID from the host
docker run -d --name nginx nginx
docker inspect nginx | grep '"Pid"'
# "Pid": 45231  ← host PID
 
# Inside the container:
docker exec nginx ps aux
# PID USER      COMMAND
#   1 root      nginx: master process  ← PID 1 inside, 45231 outside

Network namespace: Each container gets its own network stack — its own lo, eth0, IP address, routing table. This is why containers can have the same port (3000) without conflicting.

bash
# Container's network view
docker exec myapp ip addr
# 1: lo: <LOOPBACK> inet 127.0.0.1
# 28: eth0: inet 172.17.0.2  ← container's IP
 
# Host's network view
ip addr
# Shows host's interfaces, not the container's

Mount namespace: Each container has its own filesystem mount view. docker run -v /host:/container creates a bind mount that appears in the container's namespace.

cgroups: Resource Control

While namespaces provide isolation, cgroups (control groups) control how much of the host's resources each container can use: CPU, memory, disk I/O, network bandwidth.

CPU Control

bash
# Limit a container to 0.5 CPUs
docker run --cpus=0.5 nginx
 
# Or in Kubernetes
resources:
  limits:
    cpu: "500m"    # 500 millicores = 0.5 CPU

Under the hood, Kubernetes sets the cgroup CPU quota:

bash
# Find the cgroup for a container
cat /proc/$(docker inspect nginx --format '{{.State.Pid}}')/cgroup
 
# Read the CPU quota
cat /sys/fs/cgroup/cpu/docker/<container-id>/cpu.cfs_quota_us
# 50000  ← 50ms per 100ms period = 50% of one CPU
 
cat /sys/fs/cgroup/cpu/docker/<container-id>/cpu.cfs_period_us
# 100000  ← 100ms period

CPU throttling (not killing) happens when a container tries to use more CPU than its quota. You'll see this as increased latency, not OOMKilled.

Memory Control

bash
# Limit container to 512MB
docker run --memory=512m nginx
 
# Kubernetes
resources:
  limits:
    memory: "512Mi"

When a container exceeds the memory limit, the kernel OOM killer kills the container process. This shows as OOMKilled in Kubernetes.

bash
# Check memory limit for a container
cat /sys/fs/cgroup/memory/docker/<container-id>/memory.limit_in_bytes
# 536870912  ← 512MB in bytes
 
# Current memory usage
cat /sys/fs/cgroup/memory/docker/<container-id>/memory.usage_in_bytes
# 234881024  ← ~224MB currently used

Memory Requests vs Limits in Kubernetes

yaml
resources:
  requests:
    memory: "256Mi"    # scheduler uses this to pick a node
  limits:
    memory: "512Mi"    # cgroup enforces this — OOMKilled if exceeded
  • Request: The minimum guaranteed. Used by the Kubernetes scheduler to find a node with enough free memory.
  • Limit: The maximum allowed. Enforced by cgroups. Exceed it = killed.

If you set request = limit, you get Kubernetes Guaranteed QoS class — the pod won't be evicted under memory pressure. Good for production.

How Containers Actually Work

When Docker creates a container, it does roughly this:

python
# Pseudocode showing what Docker does
def create_container(image, command, memory_limit, cpu_limit):
    # 1. Create namespaces
    pid_ns = create_namespace("pid")
    net_ns = create_namespace("net")
    mnt_ns = create_namespace("mnt")
    
    # 2. Set up cgroups
    cgroup = create_cgroup(f"docker/{container_id}")
    cgroup.set("memory.limit_in_bytes", memory_limit)
    cgroup.set("cpu.cfs_quota_us", cpu_limit * 100000)
    
    # 3. Prepare filesystem (UnionFS/OverlayFS)
    rootfs = overlay_mount(image_layers)
    
    # 4. Fork into new namespaces
    pid = fork_into_namespaces([pid_ns, net_ns, mnt_ns], cgroup)
    
    # 5. Run command as PID 1 in the new namespaces
    exec(pid, command)

No VM, no hypervisor. Containers are just processes with special kernel features applied. This is why containers start in milliseconds and have near-zero overhead — they're using the same kernel as the host.

Practical Debugging with cgroup Info

bash
# Find resource usage for a specific pod's container
# Get the container ID
CONTAINER_ID=$(docker ps | grep "my-app" | awk '{print $1}')
 
# Check memory usage
cat /sys/fs/cgroup/memory/docker/${CONTAINER_ID}/memory.usage_in_bytes
 
# Check if being throttled
cat /sys/fs/cgroup/cpu/docker/${CONTAINER_ID}/cpu.stat
# nr_throttled: 1523  ← number of times throttled
# throttled_time: 45000000000  ← nanoseconds throttled
 
# See what limits are set
docker inspect --format '{{.HostConfig.Memory}} {{.HostConfig.NanoCpus}}' my-app

High nr_throttled means your container is CPU-limited and latency is suffering. The fix is to increase the CPU limit or reduce the workload.

Security Implications

Understanding namespaces helps you understand container security:

  • User namespace not used by default in Docker — container root = host root (in terms of capabilities). This is why running containers as non-root matters.
  • --privileged flag — disables most namespace protections. Never use in production.
  • securityContext.runAsNonRoot: true in Kubernetes uses the user namespace properly.
yaml
# Kubernetes security context
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    fsGroup: 2000
  containers:
    - name: app
      securityContext:
        allowPrivilegeEscalation: false
        readOnlyRootFilesystem: true
        capabilities:
          drop: ["ALL"]

This is what actually makes a container "secure" — not the container itself, but the combination of namespaces, cgroups, and dropped capabilities.

Understanding these fundamentals makes you better at:

  • Setting appropriate resource limits
  • Debugging OOMKilled and CPU throttling issues
  • Understanding security risks in container configurations
  • Building better Kubernetes manifests
🔧

Today I Fixed

Short real fixes from production — posted daily

Browse fixes
Newsletter

Stay ahead of the curve

Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.

Related Articles

Comments