Google SRE Roadmap
Roadmap to crack Google SRE interviews. Covers system design, coding, Linux internals, networking, SRE principles, and Google-specific infrastructure knowledge.
Data Structures & Algorithms
LeetCode medium-hard level required
What to learn
- Arrays, strings, hashmaps, linked lists
- Trees, graphs, BFS/DFS, topological sort
- Dynamic programming and greedy algorithms
- System design data structures — LRU cache, trie, bloom filters
- Time/space complexity analysis (Big O)
- Practice 150–200 LeetCode problems (medium focus)
Key tools
Linux & Systems Internals
Deep OS knowledge — Google tests this hard
What to learn
- Linux boot process — BIOS/UEFI to userspace
- Process management — fork, exec, signals, zombies
- Memory management — virtual memory, page tables, swap
- File systems — inodes, ext4, VFS layer
- Networking stack — socket programming, TCP/IP internals
- Systemd, cgroups, namespaces (container foundations)
- Performance tuning — strace, perf, vmstat, iostat
Key tools
Networking Deep Dive
From L2 to L7 — Google's infra is network-heavy
What to learn
- OSI and TCP/IP model — every layer matters
- DNS resolution in depth — recursive, authoritative, caching
- TCP — 3-way handshake, congestion control, window scaling
- HTTP/2, HTTP/3 (QUIC), and gRPC
- Load balancing — L4 vs L7, consistent hashing
- CDN architecture and anycast routing
- Network debugging — tcpdump, Wireshark, dig, traceroute
Key tools
System Design
Design Google-scale distributed systems
What to learn
- Scalability — horizontal scaling, sharding, partitioning
- Consistency models — strong, eventual, causal
- CAP theorem and real-world tradeoffs
- Database selection — SQL vs NoSQL, when to use what
- Message queues — Pub/Sub, Kafka patterns
- Caching strategies — cache-aside, write-through, invalidation
- Design: URL shortener, rate limiter, chat system, search engine
- Design: monitoring system, distributed file system, CDN
Key tools
SRE Principles & Practices
The Google SRE book is your bible
What to learn
- SLIs, SLOs, SLAs — defining reliability targets
- Error budgets and error budget policies
- Toil reduction and automation philosophy
- Incident response — ICS, blameless postmortems
- Capacity planning and load testing
- Release engineering — canary, blue-green, progressive
- On-call best practices and escalation policies
Key tools
Google-Specific Knowledge
Understand Google's infrastructure philosophy
What to learn
- Borg → Kubernetes evolution (container orchestration at Google)
- Colossus/GFS — distributed file systems
- Spanner — globally distributed database
- Zanzibar — global authorization system
- B4/Andromeda — Google's network infrastructure
- Monarch — monitoring at Google scale
- Read Google's published SRE case studies
Key tools
Interview Prep & Mock Interviews
Practice the exact interview format
What to learn
- Coding rounds — 2-3 LeetCode medium/hard in 45 mins
- System design round — design for scale and reliability
- Troubleshooting round — debug a production outage live
- Linux/networking deep dive — explain kernel internals
- Behavioral (Googleyness) — collaboration, ambiguity
- Mock interviews — practice with peers or platforms
Key tools
Interview Prep
DevOps Interview Prep Bundle — 1000+ Q&A
Every topic on this roadmap has interview questions in the bundle — Docker, Kubernetes, AWS, CI/CD, Linux, SRE, FinOps, System Design. Grab it before your next interview.
Frequently Asked Questions
Common questions about the Google SRE roadmap