šŸŽ‰ DevOps Interview Prep Bundle is live — 1000+ Q&A across 20 topicsGet it →
All Articles

What is a Linux Process and Thread? Explained Simply for DevOps Engineers

Processes, threads, PIDs, and signals — these come up constantly in DevOps work. Here's a clear explanation with real examples you'll actually use.

DevOpsBoys5 min read
Share:Tweet

Understanding processes and threads is fundamental to debugging container issues, understanding Kubernetes resource limits, and knowing why your application behaves the way it does.

What is a Process?

A process is a running instance of a program. When you run nginx, a Python script, or a Java app, the OS creates a process for it.

Each process has:

  • A unique PID (Process ID)
  • Its own memory space (no other process can read its memory without explicit sharing)
  • File descriptors (open files, sockets)
  • A parent process (the process that started it)
bash
# List all processes
ps aux
 
# Your process list looks like:
# USER       PID %CPU %MEM    VSZ   RSS COMMAND
# root         1  0.0  0.0   4236   764 /sbin/init
# root       412  0.0  0.2  12548  4792 nginx: master process
# www-data   413  0.0  0.1  12980  2048 nginx: worker process

In Docker/Kubernetes, the process with PID 1 is special — it receives all signals and is responsible for reaping zombie processes.

What is a Thread?

A thread is a unit of execution within a process. A single process can have multiple threads.

The key difference:

  • Processes have separate memory — changing memory in process A doesn't affect process B
  • Threads share the same memory within a process — thread A can read/write memory that thread B uses
Process (nginx master):
ā”œā”€ā”€ Thread 1 (event loop)
ā”œā”€ā”€ Thread 2 (event loop)  
└── Thread 3 (event loop)
     All share the same memory space

Why use threads?

  • They're faster to create than processes (shared memory = no copy needed)
  • Good for concurrent I/O operations (web server handling multiple requests)
  • Shared memory makes communication between threads simple

Why use multiple processes instead?

  • Processes are isolated — a crash in one doesn't take down others
  • Better security (memory isolation)
  • Better for multi-core CPUs (OS can schedule different processes on different cores)

nginx uses a multi-process model: one master process + multiple worker processes. If one worker crashes, the master restarts it without affecting other workers.

Node.js is single-threaded by default — one thread handles all requests using an event loop. That's why CPU-heavy work blocks everything in Node.

Commands DevOps Engineers Use Daily

bash
# Find a process by name
ps aux | grep nginx
pgrep nginx       # just returns PID(s)
 
# Get detailed info about a process
ps -p 1234 -o pid,ppid,cmd,cpu,mem
 
# See process tree (parent-child relationships)
pstree -p
pstree -p 1234  # tree starting from PID 1234
 
# Real-time process monitor
top
htop            # better than top, install with: apt install htop
 
# How many threads does a process have?
ps -p 1234 -o nlwp  # nlwp = number of lightweight processes (threads)
 
# List all threads of a process
ps -p 1234 -T
 
# Or with top:
top -H -p 1234  # -H shows individual threads

Signals: How Processes Communicate

Signals are messages sent to processes. As a DevOps engineer, you need to know:

SignalNumberWhat it does
SIGTERM15Graceful shutdown request — process can handle and clean up
SIGKILL9Force kill — cannot be caught or ignored
SIGHUP1Hang up — often used to reload config
SIGINT2Interrupt — what Ctrl+C sends
SIGSTOP19Pause process execution
SIGCONT18Resume paused process
bash
# Send SIGTERM (graceful shutdown)
kill 1234
kill -15 1234
kill -TERM 1234
 
# Force kill (when SIGTERM doesn't work)
kill -9 1234
kill -KILL 1234
 
# Kill all processes named nginx
pkill nginx
pkill -9 nginx
 
# Reload nginx config (SIGHUP)
kill -HUP $(pgrep nginx)
nginx -s reload  # easier

In Kubernetes: When a pod is terminated, Kubernetes sends SIGTERM to PID 1. If the process doesn't exit within terminationGracePeriodSeconds (default 30s), Kubernetes sends SIGKILL.

This is why your application must handle SIGTERM properly:

python
# Python: handle SIGTERM for graceful shutdown
import signal
import sys
 
def graceful_shutdown(signum, frame):
    print("Received SIGTERM, shutting down gracefully...")
    # Close DB connections, flush buffers, etc.
    sys.exit(0)
 
signal.signal(signal.SIGTERM, graceful_shutdown)

Zombie Processes

A zombie is a process that has finished but its exit status hasn't been read by its parent. Zombies take up a PID slot but no CPU/memory.

bash
# See zombie processes
ps aux | grep 'Z'
# STATE column shows 'Z' for zombies

In Docker containers, this happens when PID 1 doesn't properly reap child processes. Fix: use tini as your init process:

dockerfile
FROM ubuntu:22.04
RUN apt-get install -y tini
ENTRYPOINT ["/usr/bin/tini", "--"]
CMD ["/app/myapp"]

Kubernetes also supports this:

yaml
spec:
  shareProcessNamespace: false
  containers:
    - name: app
      securityContext:
        runAsUser: 1000

Background Processes and Jobs

bash
# Run a process in background
./long-running-script.sh &
 
# See background jobs
jobs
 
# Bring background job to foreground
fg %1
 
# Send running process to background
Ctrl+Z  # suspend
bg      # run in background
 
# Run process that survives terminal close
nohup ./script.sh &
nohup ./script.sh > /var/log/myapp.log 2>&1 &
 
# Better option: use screen or tmux
screen -S mysession
tmux new -s mysession

Process Resource Usage

bash
# CPU and memory of a specific process
ps -p 1234 -o pid,pcpu,pmem,vsz,rss,cmd
 
# VSZ = virtual memory (total address space)
# RSS = resident set size (actual RAM used right now)
 
# File descriptors (useful for debugging "too many open files")
ls /proc/1234/fd | wc -l
lsof -p 1234 | wc -l
 
# Limit file descriptors
ulimit -n 65535  # set for current shell
# Permanent: /etc/security/limits.conf

In Kubernetes Context

When you set resources.limits.memory: 512Mi, Kubernetes uses cgroups to enforce this. If the process exceeds 512Mi, it's OOMKilled — Linux sends SIGKILL.

When you set resources.limits.cpu: "0.5", Kubernetes throttles the CPU time available to the container's cgroup — it doesn't kill the process, but it slows down proportionally.

Understanding this helps you interpret:

  • OOMKilled → process used more RAM than the memory limit
  • High latency → process is CPU throttled (check CPU throttling in Prometheus/Grafana)

These fundamentals show up in real incidents constantly. The better you understand them, the faster you debug.

šŸ”§

Today I Fixed

Short real fixes from production — posted daily

Browse fixes
Newsletter

Stay ahead of the curve

Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.

Related Articles

Comments