🎉 DevOps Interview Prep Bundle is live — 1000+ Q&A across 20 topicsGet it →
All Articles

10 DevOps Interview Questions You Will Definitely Get Asked (With Answers)

These 10 questions come up in almost every senior DevOps interview. Real answers that show depth — not textbook definitions.

DevOpsBoysJun 4, 20266 min read
Share:Tweet

Every DevOps interview asks variations of the same core questions. Here are the 10 you'll definitely face — with answers that show depth, not just definitions.


1. "Walk me through how you'd design a CI/CD pipeline from scratch."

What they want: Architecture thinking, not tool names.

Strong answer: "I'd start with the developer workflow — what triggers a build, where code lives. Then work outward: source control → build → test → artifact storage → deploy → verify.

For a typical Kubernetes-based stack: GitHub Actions triggers on PR and merge. PR builds the image, runs unit and integration tests, scans with Trivy. On merge to main, the image is tagged with commit SHA and pushed to ECR. ArgoCD detects the new tag in the Helm values file (updated by the pipeline) and syncs to staging. After smoke tests pass, the same process promotes to production with a manual approval gate.

Key decisions I'd flag: should deploys be push-based (pipeline kubectl apply) or pull-based (ArgoCD)? For production I prefer pull-based — no cluster credentials in CI."


2. "How would you handle a production outage at 2 AM?"

What they want: Incident management process, not heroics.

Strong answer: "First: assess scope before touching anything. Is it total outage or partial? One region or all? Check the monitoring dashboard, not Slack.

Immediate mitigation first, root cause second. If a bad deploy caused it — rollback immediately, don't debug the root cause while users are down.

I'd open an incident channel, keep stakeholders updated every 15 minutes with: what we know, what we're doing, ETA. After resolution — blameless postmortem within 48 hours with 5 Whys and concrete action items.

I've found that having runbooks for the top 10 failure modes means you're executing a checklist at 2 AM, not improvising."


3. "Explain the difference between containers and VMs."

What they want: Understanding, not just 'containers are lighter.'

Strong answer: "VMs virtualize hardware — each VM has its own kernel, full OS, and hypervisor overhead. Containers virtualize the OS — they share the host kernel using Linux namespaces and cgroups for isolation.

This means containers start in milliseconds vs minutes for VMs, use MBs vs GBs of memory. But the isolation is weaker — a kernel exploit can affect all containers on a host. VMs provide stronger security boundaries.

In practice: containers for stateless microservices, VMs for multi-tenant workloads or anything requiring strong isolation. Kubernetes runs containers but often on VMs — you get both layers."


4. "How do you manage secrets in Kubernetes?"

What they want: You know the problems with native K8s secrets.

Strong answer: "Kubernetes Secrets are base64-encoded, not encrypted by default — anyone with etcd access can read them. For production I use one of:

External Secrets Operator + AWS Secrets Manager — secrets live in AWS, ESO syncs them to K8s Secrets. Rotating a secret means updating it in Secrets Manager, ESO auto-syncs.

HashiCorp Vault with the agent injector — Vault injects secrets as files into pods at runtime. Supports dynamic secrets (auto-expiring DB credentials).

What I avoid: hardcoding secrets in ConfigMaps, committing them to Git even temporarily, or using default namespace Secrets with no RBAC."


5. "What's the difference between Deployment and StatefulSet?"

Strong answer: "Deployments are for stateless workloads — pods are interchangeable, any pod can handle any request. StatefulSets are for stateful workloads where pods need stable identity.

StatefulSets give you: ordered pod names (db-0, db-1, db-2), stable DNS (db-0.service.namespace), persistent volumes that follow the pod across reschedules, and ordered startup/shutdown.

Use StatefulSet for databases, message queues, anything with local persistent state. Use Deployment for APIs, web servers, workers."


6. "How do you monitor a Kubernetes cluster?"

What they want: Layers — infrastructure, app, business metrics.

Strong answer: "I think in three layers: infrastructure, application, business.

Infrastructure: Prometheus + node-exporter for node CPU/memory/disk, kube-state-metrics for cluster state (pod counts, deployment availability). Alerts for node not ready, high memory pressure, PVC filling up.

Application: service-level metrics via Prometheus client libraries (request rate, error rate, latency P99). Distributed traces via OpenTelemetry → Tempo. Logs via Fluent Bit → Loki.

Business: custom metrics surfaced to Grafana — orders per minute, conversion rate, whatever matters to the product.

For alerting I follow SLO-based alerting over threshold alerts — alert when error budget burn rate is high, not just when error rate > X%."


7. "Explain Infrastructure as Code and why it matters."

Strong answer: "IaC means your infrastructure is defined in version-controlled code. Terraform, Pulumi, AWS CDK.

Why it matters beyond 'automation': consistency (no more snowflake servers configured differently), auditability (git log tells you who changed what and when), reproducibility (spin up identical staging), and disaster recovery (rebuild entire infra from code, not memory).

The key cultural shift: infrastructure changes go through PR review, just like application code. A terraform plan in the PR comment shows exactly what will change — reviewers can catch mistakes before they hit production."


8. "What is GitOps and how does it differ from traditional CI/CD?"

Strong answer: "In traditional CI/CD, the pipeline pushes to the cluster — the pipeline has kubectl credentials and runs kubectl apply or helm upgrade. The cluster state lives in the pipeline.

In GitOps, Git is the source of truth. An agent (ArgoCD, Flux) inside the cluster watches a Git repo and pulls changes. The pipeline never touches the cluster — it just updates the Git repo (e.g., bumps an image tag in a Helm values file).

Benefits: you can see the desired state at any time by looking at Git. Rollback = git revert. Drift detection is automatic — if someone kubectl-applies something manually, ArgoCD detects and reverts it."


9. "How would you reduce Kubernetes costs?"

Strong answer: "I'd attack it in four areas:

Right-sizing: VPA recommendations show actual vs requested resources. Most teams over-provision by 2-3x out of caution.

Spot/preemptible nodes: stateless workloads (web, workers, CI runners) can run on spot. Karpenter makes this easy — it picks the cheapest available instance type automatically.

Scheduling: dev/staging environments don't run at night. KEDA or CronJobs to scale to zero outside business hours.

Visibility: Kubecost per-namespace cost allocation so each team sees their own bill. People optimize what they can measure."


10. "Tell me about a production incident you caused or handled."

What they want: Ownership, learning, process improvement.

Strong answer (structure): "I'll be specific. [Describe incident briefly]. The root cause was [X]. What I should have caught earlier was [Y].

We fixed it by [Z]. In the postmortem we identified three action items: [specific improvements]. Since then we've [concrete change that prevented recurrence].

The thing I learned is [genuine insight — not 'I learned to be more careful']."

Interviewers remember honest, specific stories with concrete outcomes. Generic answers about 'a database went down' don't stick.


Preparation tip: write 2-3 specific stories from your experience that answer multiple questions — the production incident can also be the 'tell me about a challenge' question. Specificity beats polish every time.

Sharpen your Kubernetes answers with hands-on practice at KodeKloud.

🔧

Today I Fixed

Short real fixes from production — posted daily

Browse fixes
Newsletter

Stay ahead of the curve

Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.

Related Articles

Comments