What is Continuous Profiling? (Explained with Pyroscope — No PhD Required)
Continuous profiling tells you exactly which function is burning your CPU or leaking memory — in production, all the time. Here's what it is, how it works, and how to set it up with Pyroscope.
You have metrics. You have logs. You have traces. Your observability stack is technically complete.
And yet when your service slows down at 3 PM every Tuesday, you can't tell which function is causing it. Your CPU graph goes up. Your latency graph goes up. Your logs say nothing useful. You attach a profiler, but that requires redeploying and the issue disappears before you can reproduce it.
Continuous profiling is the missing piece. Here's what it is and why it matters.
The Observability Gap
Think of the three pillars of observability:
- Metrics tell you that something is wrong (CPU is high, latency is up)
- Logs tell you what happened (errors, events)
- Traces tell you where in your service the request went (which services, which APIs)
None of these tell you why your CPU is high at the code level. Which function? Which loop? Which SQL query? Which line?
That's what profiling answers. And "continuous" profiling means you're doing it all the time — in production — without waiting for an incident to manually attach a profiler.
What a Profiler Actually Does
A profiler samples your running process at regular intervals — say, every 10 milliseconds. At each sample, it captures a stack trace: what function is currently executing, and which functions called it.
After collecting thousands of samples, you know: "Function X appeared in 40% of all samples." That means Function X consumed 40% of your CPU time. This is called sampling-based profiling and it's how most modern profilers work — low overhead, statistically accurate.
The result is usually visualized as a flame graph — a horizontal bar chart where each bar is a function call, and the width represents how much time was spent in that function.
┌─────────────────────────────────────────────────────────┐
│ HTTP Handler (100%) │
├───────────────────────────────┬─────────────────────────┤
│ processOrder() (67%) │ validateAuth() (33%) │
├─────────────┬─────────────────┤ │
│ queryDB() │ calculatePrice()│ │
│ (55%) │ (12%) │ │
└─────────────┴─────────────────┘─────────────────────────┘
You look at this and immediately know: queryDB() is where most time is spent. That's your optimization target.
Why "Continuous" Matters
Traditional profiling is reactive. Production slows down → you SSH in → attach a profiler → collect data for 30 seconds → analyze → production returns to normal → you have data from a 30-second window that may or may not represent the real issue.
Continuous profiling runs all the time. When you notice the slowdown 3 hours later, you can go back in time and see exactly what your code was doing at 3 PM. It's like having a DVR for your application's behavior.
This is especially powerful for:
- Intermittent performance issues that don't reproduce on demand
- Memory leaks that grow slowly over hours or days
- Regression detection — compare profiles before and after a deployment
- Cost optimization — find functions consuming CPU unnecessarily
Pyroscope: Open Source Continuous Profiling
Pyroscope (now Grafana Pyroscope) is the leading open source continuous profiling tool. It supports Go, Python, Java, Node.js, Ruby, Rust, and .NET.
It works in two modes:
- Pull mode — Pyroscope scrapes profiling data from your application (like Prometheus scrapes metrics)
- Push mode — your application sends profiling data to Pyroscope
Setting Up Pyroscope with Go
Add the Pyroscope agent to your Go application:
go get github.com/grafana/pyroscope-gopackage main
import (
"github.com/grafana/pyroscope-go"
)
func main() {
pyroscope.Start(pyroscope.Config{
ApplicationName: "my-service",
ServerAddress: "http://pyroscope:4040",
ProfileTypes: []pyroscope.ProfileType{
pyroscope.ProfileCPU,
pyroscope.ProfileAllocObjects,
pyroscope.ProfileAllocSpace,
pyroscope.ProfileInuseObjects,
pyroscope.ProfileInuseSpace,
},
})
// Your application code
}That's it. Pyroscope now continuously samples your application and sends CPU, memory allocation, and heap profiles every 10 seconds.
Setting Up Pyroscope with Python
pip install pyroscope-ioimport pyroscope
pyroscope.configure(
application_name="my-python-service",
server_address="http://pyroscope:4040",
tags={
"version": "1.0.0",
"env": "production"
}
)
# Your application code runs hereDeploying Pyroscope on Kubernetes
apiVersion: apps/v1
kind: Deployment
metadata:
name: pyroscope
spec:
replicas: 1
selector:
matchLabels:
app: pyroscope
template:
spec:
containers:
- name: pyroscope
image: grafana/pyroscope:latest
ports:
- containerPort: 4040
env:
- name: PYROSCOPE_STORAGE_TYPE
value: "s3"
- name: PYROSCOPE_S3_BUCKET
value: "my-pyroscope-data"
---
apiVersion: v1
kind: Service
metadata:
name: pyroscope
spec:
selector:
app: pyroscope
ports:
- port: 4040
targetPort: 4040Reading a Flame Graph
Once Pyroscope is collecting data, you'll see flame graphs in the UI. Here's how to read them:
- Width = time — wider bars mean more CPU time spent in that function
- Vertical position = call stack — functions at the bottom called functions above them
- Look for wide bars near the top — those are functions doing the most work
- Look for surprisingly wide bars that you didn't expect — those are your performance wins
The color is usually meaningless (just for visual distinction) unless your tool uses color coding to indicate hot paths.
The Four Profile Types You Should Collect
- CPU profile — which functions consume CPU cycles. Most common, most useful.
- Heap/Memory profile — which functions allocate the most memory (helps find leaks)
- Goroutine/Thread profile — how many goroutines exist and what they're doing (helps find goroutine leaks)
- Blocking profile — where goroutines are blocked waiting (helps find lock contention)
Don't try to optimize everything at once. Start with CPU, find the biggest function, optimize it, measure again.
Overhead: Is It Safe for Production?
Yes — sampling profilers have very low overhead. Pyroscope's CPU overhead is typically 0.5-2% of total CPU. Memory overhead is minimal. The sampling rate (default: 100Hz for Go) can be adjusted lower if needed.
The trade-off is accuracy vs overhead: lower sampling rate = less overhead = less accurate data. For production, 10-100Hz is the standard range.
Continuous profiling is the last piece of observability most teams add, but it often delivers the highest return. When you can see exactly which function is costing you CPU, you stop guessing and start fixing.
Already using Prometheus + Grafana? Grafana Pyroscope integrates natively and lets you correlate metrics, traces, and profiles in a single view. Start there.
Today I Fixed
Short real fixes from production — posted daily
Stay ahead of the curve
Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.
Related Articles
What is Observability? Explained Simply for Beginners (2026)
Observability explained in plain English — what it means, how it's different from monitoring, the three pillars (metrics, logs, traces), and why every DevOps engineer needs to understand it.
What Is OpenTelemetry? Observability Standard Explained Simply
OpenTelemetry (OTel) is the open standard for collecting traces, metrics, and logs. Learn what it is, why it matters, and how to start using it.
Why Agentic AI Will Kill the Traditional On-Call Rotation by 2028
60% of enterprises now use AIOps self-healing. 83% of alerts auto-resolve without humans. The era of 2 AM PagerDuty wake-ups is ending. Here's what replaces it.