LLM Output Validation with Instructor + Pydantic in Production
LLMs return unpredictable text. Instructor + Pydantic turns that into validated, typed Python objects — automatically retrying when the model returns garbage. Here's how to use it in production.
The hardest part of building LLM-powered applications isn't the LLM call — it's what happens to the output.
You ask the model to return JSON. It returns JSON wrapped in a markdown code block. Or it returns JSON with a comment explaining what the JSON means. Or it returns valid JSON but with a field spelled differently than you expected. Or it returns null for a required field.
Your JSON parser crashes. Your application breaks. Your users see errors.
Instructor is the library that fixes this. It wraps your LLM client and enforces Pydantic schema validation on the output, with automatic retry on failure.
Installation
pip install instructor pydantic anthropicBasic Usage — Guaranteed Typed Output
import instructor
from anthropic import Anthropic
from pydantic import BaseModel, Field
# Patch the Anthropic client with Instructor
client = instructor.from_anthropic(Anthropic())
class IncidentSummary(BaseModel):
severity: str = Field(description="One of: critical, high, medium, low")
affected_service: str = Field(description="The primary service affected")
root_cause: str = Field(description="One sentence root cause")
immediate_action: str = Field(description="First thing the on-call engineer should do")
estimated_impact: str = Field(description="User-facing impact description")
# This ALWAYS returns an IncidentSummary object — never raw text
summary = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{
"role": "user",
"content": """Analyze this incident and return a structured summary:
Logs show: checkout service returning 503, payment service connection refused,
database showing 95% connection pool usage. Started 14 minutes ago.
Affecting ~12% of checkout attempts."""
}],
response_model=IncidentSummary,
)
print(summary.severity) # "critical"
print(summary.affected_service) # "checkout-service"
print(summary.root_cause) # "Database connection pool exhausted..."
print(summary.immediate_action) # "Restart payment-service pods..."No JSON parsing. No .get() with fallbacks. Just a clean Python object with type safety.
How Instructor Works
Under the hood, Instructor:
- Converts your Pydantic model to a JSON schema
- Passes it to the LLM as a tool definition (or system prompt depending on the mode)
- Gets the LLM response
- Validates the response against your schema
- If validation fails — automatically retries with the error message included, so the model can fix it
- Returns a validated Pydantic object
The retry mechanism is the key part. Instead of crashing on bad output, it tells the model: "You returned X but I expected Y, please fix it." Models fix their output correctly 95%+ of the time on the first retry.
Production Patterns
Pattern 1: Nested Schemas for Complex Data
from pydantic import BaseModel, Field
from typing import Literal
from enum import Enum
class Severity(str, Enum):
CRITICAL = "critical"
HIGH = "high"
MEDIUM = "medium"
LOW = "low"
class AffectedComponent(BaseModel):
name: str
impact: str
team_owner: str
class KubernetesAction(BaseModel):
command: str = Field(description="Exact kubectl command to run")
explanation: str = Field(description="Why this command helps")
safe_to_run: bool = Field(description="Whether this is safe to run without approval")
class IncidentAnalysis(BaseModel):
severity: Severity
title: str = Field(max_length=80, description="Short incident title for Slack")
affected_components: list[AffectedComponent]
root_cause_hypothesis: str
recommended_actions: list[KubernetesAction]
escalate_immediately: bool
estimated_resolution_time_minutes: int
analysis = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
messages=[{"role": "user", "content": f"Analyze this incident: {incident_data}"}],
response_model=IncidentAnalysis,
)
# Now you have fully typed, validated data
for action in analysis.recommended_actions:
if action.safe_to_run:
print(f"Running: {action.command}")
# os.system(action.command) # only safe actionsPattern 2: Partial Extraction — Extract Structure from Unstructured Text
from typing import Optional
class PodErrorExtract(BaseModel):
pod_name: Optional[str] = None
namespace: Optional[str] = None
error_type: Optional[str] = None
error_message: str
suggested_fix: str
# Feed raw log output — Instructor extracts the structure
result = client.messages.create(
model="claude-haiku-4-5-20251001", # Use fast model for extraction
max_tokens=512,
messages=[{
"role": "user",
"content": f"""Extract error information from these logs:
{raw_log_output}"""
}],
response_model=PodErrorExtract,
)Pattern 3: Validation with Custom Validators
from pydantic import BaseModel, field_validator, Field
class ResourceRecommendation(BaseModel):
cpu_request: str = Field(description="CPU request in Kubernetes format, e.g. '100m' or '0.5'")
cpu_limit: str = Field(description="CPU limit, must be >= cpu_request")
memory_request: str = Field(description="Memory in Mi or Gi, e.g. '256Mi'")
memory_limit: str = Field(description="Memory limit, must be >= memory_request")
reasoning: str
@field_validator('cpu_request', 'cpu_limit')
@classmethod
def validate_cpu_format(cls, v):
import re
if not re.match(r'^\d+m$|^\d+(\.\d+)?$', v):
raise ValueError(f"Invalid CPU format: {v}. Use '100m' or '0.5'")
return v
@field_validator('memory_request', 'memory_limit')
@classmethod
def validate_memory_format(cls, v):
import re
if not re.match(r'^\d+(Mi|Gi|Ki|M|G|K)$', v):
raise ValueError(f"Invalid memory format: {v}. Use '256Mi' or '1Gi'")
return v
# If Claude returns "0.1 cores" instead of "100m",
# validator catches it and Claude retries with the correct format
recommendation = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=512,
messages=[{"role": "user", "content": f"Recommend Kubernetes resources for: {workload_metrics}"}],
response_model=ResourceRecommendation,
max_retries=3, # retry up to 3 times if validation fails
)Pattern 4: Streaming Partial Objects
For long responses where you want to show progress:
import instructor
from anthropic import Anthropic
client = instructor.from_anthropic(Anthropic(), mode=instructor.Mode.ANTHROPIC_TOOLS)
class DeploymentChecklist(BaseModel):
pre_deploy_checks: list[str]
deployment_steps: list[str]
post_deploy_validation: list[str]
rollback_procedure: list[str]
# Stream the object as it's being generated
for partial_checklist in client.messages.create_partial(
model="claude-sonnet-4-6",
max_tokens=2048,
messages=[{"role": "user", "content": "Generate deployment checklist for a Node.js API on Kubernetes"}],
response_model=DeploymentChecklist,
):
# partial_checklist is populated as fields complete
if partial_checklist.pre_deploy_checks:
print(f"Pre-deploy: {len(partial_checklist.pre_deploy_checks)} checks ready")Production Configuration
import instructor
from anthropic import Anthropic
from tenacity import retry, stop_after_attempt, wait_exponential
# Production-ready client setup
def create_instructor_client():
return instructor.from_anthropic(
Anthropic(
api_key=os.getenv("ANTHROPIC_API_KEY"),
timeout=30.0,
max_retries=2,
),
mode=instructor.Mode.ANTHROPIC_TOOLS, # use tool calling, not JSON mode
)
client = create_instructor_client()
# Wrap your calls with retry for network errors
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=1, max=10)
)
def extract_with_retry(text: str, model_class):
return client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": text}],
response_model=model_class,
max_retries=2, # Instructor retries for validation errors
)When to Use Which Model
# Fast extraction tasks — use Haiku (cheap + fast)
PodErrorExtract, ResourceRecommendation, SimpleClassification
# Complex reasoning tasks — use Sonnet
IncidentAnalysis, DeploymentChecklist, RootCauseAnalysis
# Never use Opus for structured extraction — overkill, slow, expensiveInstructor vs Raw Tool Calling
Without Instructor, extracting structured data from Claude looks like:
# Without Instructor — fragile, lots of boilerplate
response = anthropic.messages.create(
model="claude-sonnet-4-6",
tools=[{"name": "extract", "input_schema": {"type": "object", "properties": {...}}}],
messages=[...]
)
# Parse tool call manually
tool_use = next(b for b in response.content if b.type == "tool_use")
data = tool_use.input
# No validation — what if a required field is missing?
# No retry — what if the schema wasn't followed?
# Manual Pydantic conversion...With Instructor, all of that goes away. You define your schema once as a Pydantic model, and every call returns a validated, typed object. The retry logic, the JSON parsing, the tool call handling — all handled automatically.
For production LLM applications that need reliable structured output, Instructor is the fastest path from "LLM returns text" to "application gets typed data."
Set up LLM tracing and observability: LangSmith RAG Evaluation Guide
Today I Fixed
Short real fixes from production — posted daily
Stay ahead of the curve
Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.
Related Articles
Build an AI-Powered Incident Report Generator with Claude API (2026)
Writing postmortems takes 2-3 hours. Here's how to build an AI tool that generates a structured incident report from Slack logs, metrics screenshots, and alert data in minutes.
Build an AI Kubernetes Runbook Generator with LLMs (2026)
Manual runbooks go stale. Build a system that watches your Kubernetes cluster, detects incidents, and generates step-by-step runbooks automatically using LLMs. Full implementation with Python, kubectl, and Ollama.
Build an AI Terraform Cost Estimator Using Claude (2026)
Before you run terraform apply, wouldn't you want to know how much it'll cost? Build an AI cost estimator that reads your Terraform plan output and gives you a detailed cost breakdown using Claude as the reasoning engine.