🎉 DevOps Interview Prep Bundle is live — 1000+ Q&A across 20 topicsGet it →
All Articles

LLM Output Validation with Instructor + Pydantic in Production

LLMs return unpredictable text. Instructor + Pydantic turns that into validated, typed Python objects — automatically retrying when the model returns garbage. Here's how to use it in production.

DevOpsBoysJun 14, 20265 min read
Share:Tweet

The hardest part of building LLM-powered applications isn't the LLM call — it's what happens to the output.

You ask the model to return JSON. It returns JSON wrapped in a markdown code block. Or it returns JSON with a comment explaining what the JSON means. Or it returns valid JSON but with a field spelled differently than you expected. Or it returns null for a required field.

Your JSON parser crashes. Your application breaks. Your users see errors.

Instructor is the library that fixes this. It wraps your LLM client and enforces Pydantic schema validation on the output, with automatic retry on failure.

Installation

bash
pip install instructor pydantic anthropic

Basic Usage — Guaranteed Typed Output

python
import instructor
from anthropic import Anthropic
from pydantic import BaseModel, Field
 
# Patch the Anthropic client with Instructor
client = instructor.from_anthropic(Anthropic())
 
class IncidentSummary(BaseModel):
    severity: str = Field(description="One of: critical, high, medium, low")
    affected_service: str = Field(description="The primary service affected")
    root_cause: str = Field(description="One sentence root cause")
    immediate_action: str = Field(description="First thing the on-call engineer should do")
    estimated_impact: str = Field(description="User-facing impact description")
 
# This ALWAYS returns an IncidentSummary object — never raw text
summary = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": """Analyze this incident and return a structured summary:
        
        Logs show: checkout service returning 503, payment service connection refused,
        database showing 95% connection pool usage. Started 14 minutes ago.
        Affecting ~12% of checkout attempts."""
    }],
    response_model=IncidentSummary,
)
 
print(summary.severity)          # "critical"
print(summary.affected_service)  # "checkout-service"
print(summary.root_cause)        # "Database connection pool exhausted..."
print(summary.immediate_action)  # "Restart payment-service pods..."

No JSON parsing. No .get() with fallbacks. Just a clean Python object with type safety.

How Instructor Works

Under the hood, Instructor:

  1. Converts your Pydantic model to a JSON schema
  2. Passes it to the LLM as a tool definition (or system prompt depending on the mode)
  3. Gets the LLM response
  4. Validates the response against your schema
  5. If validation fails — automatically retries with the error message included, so the model can fix it
  6. Returns a validated Pydantic object

The retry mechanism is the key part. Instead of crashing on bad output, it tells the model: "You returned X but I expected Y, please fix it." Models fix their output correctly 95%+ of the time on the first retry.

Production Patterns

Pattern 1: Nested Schemas for Complex Data

python
from pydantic import BaseModel, Field
from typing import Literal
from enum import Enum
 
class Severity(str, Enum):
    CRITICAL = "critical"
    HIGH = "high"
    MEDIUM = "medium"
    LOW = "low"
 
class AffectedComponent(BaseModel):
    name: str
    impact: str
    team_owner: str
 
class KubernetesAction(BaseModel):
    command: str = Field(description="Exact kubectl command to run")
    explanation: str = Field(description="Why this command helps")
    safe_to_run: bool = Field(description="Whether this is safe to run without approval")
 
class IncidentAnalysis(BaseModel):
    severity: Severity
    title: str = Field(max_length=80, description="Short incident title for Slack")
    affected_components: list[AffectedComponent]
    root_cause_hypothesis: str
    recommended_actions: list[KubernetesAction]
    escalate_immediately: bool
    estimated_resolution_time_minutes: int
 
analysis = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    messages=[{"role": "user", "content": f"Analyze this incident: {incident_data}"}],
    response_model=IncidentAnalysis,
)
 
# Now you have fully typed, validated data
for action in analysis.recommended_actions:
    if action.safe_to_run:
        print(f"Running: {action.command}")
        # os.system(action.command)  # only safe actions

Pattern 2: Partial Extraction — Extract Structure from Unstructured Text

python
from typing import Optional
 
class PodErrorExtract(BaseModel):
    pod_name: Optional[str] = None
    namespace: Optional[str] = None
    error_type: Optional[str] = None
    error_message: str
    suggested_fix: str
 
# Feed raw log output — Instructor extracts the structure
result = client.messages.create(
    model="claude-haiku-4-5-20251001",  # Use fast model for extraction
    max_tokens=512,
    messages=[{
        "role": "user",
        "content": f"""Extract error information from these logs:
        
        {raw_log_output}"""
    }],
    response_model=PodErrorExtract,
)

Pattern 3: Validation with Custom Validators

python
from pydantic import BaseModel, field_validator, Field
 
class ResourceRecommendation(BaseModel):
    cpu_request: str = Field(description="CPU request in Kubernetes format, e.g. '100m' or '0.5'")
    cpu_limit: str = Field(description="CPU limit, must be >= cpu_request")
    memory_request: str = Field(description="Memory in Mi or Gi, e.g. '256Mi'")
    memory_limit: str = Field(description="Memory limit, must be >= memory_request")
    reasoning: str
    
    @field_validator('cpu_request', 'cpu_limit')
    @classmethod
    def validate_cpu_format(cls, v):
        import re
        if not re.match(r'^\d+m$|^\d+(\.\d+)?$', v):
            raise ValueError(f"Invalid CPU format: {v}. Use '100m' or '0.5'")
        return v
    
    @field_validator('memory_request', 'memory_limit')
    @classmethod
    def validate_memory_format(cls, v):
        import re
        if not re.match(r'^\d+(Mi|Gi|Ki|M|G|K)$', v):
            raise ValueError(f"Invalid memory format: {v}. Use '256Mi' or '1Gi'")
        return v
 
# If Claude returns "0.1 cores" instead of "100m", 
# validator catches it and Claude retries with the correct format
recommendation = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=512,
    messages=[{"role": "user", "content": f"Recommend Kubernetes resources for: {workload_metrics}"}],
    response_model=ResourceRecommendation,
    max_retries=3,  # retry up to 3 times if validation fails
)

Pattern 4: Streaming Partial Objects

For long responses where you want to show progress:

python
import instructor
from anthropic import Anthropic
 
client = instructor.from_anthropic(Anthropic(), mode=instructor.Mode.ANTHROPIC_TOOLS)
 
class DeploymentChecklist(BaseModel):
    pre_deploy_checks: list[str]
    deployment_steps: list[str]
    post_deploy_validation: list[str]
    rollback_procedure: list[str]
 
# Stream the object as it's being generated
for partial_checklist in client.messages.create_partial(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    messages=[{"role": "user", "content": "Generate deployment checklist for a Node.js API on Kubernetes"}],
    response_model=DeploymentChecklist,
):
    # partial_checklist is populated as fields complete
    if partial_checklist.pre_deploy_checks:
        print(f"Pre-deploy: {len(partial_checklist.pre_deploy_checks)} checks ready")

Production Configuration

python
import instructor
from anthropic import Anthropic
from tenacity import retry, stop_after_attempt, wait_exponential
 
# Production-ready client setup
def create_instructor_client():
    return instructor.from_anthropic(
        Anthropic(
            api_key=os.getenv("ANTHROPIC_API_KEY"),
            timeout=30.0,
            max_retries=2,
        ),
        mode=instructor.Mode.ANTHROPIC_TOOLS,  # use tool calling, not JSON mode
    )
 
client = create_instructor_client()
 
# Wrap your calls with retry for network errors
@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=1, max=10)
)
def extract_with_retry(text: str, model_class):
    return client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[{"role": "user", "content": text}],
        response_model=model_class,
        max_retries=2,  # Instructor retries for validation errors
    )

When to Use Which Model

python
# Fast extraction tasks — use Haiku (cheap + fast)
PodErrorExtract, ResourceRecommendation, SimpleClassification
 
# Complex reasoning tasks — use Sonnet
IncidentAnalysis, DeploymentChecklist, RootCauseAnalysis
 
# Never use Opus for structured extraction — overkill, slow, expensive

Instructor vs Raw Tool Calling

Without Instructor, extracting structured data from Claude looks like:

python
# Without Instructor — fragile, lots of boilerplate
response = anthropic.messages.create(
    model="claude-sonnet-4-6",
    tools=[{"name": "extract", "input_schema": {"type": "object", "properties": {...}}}],
    messages=[...]
)
# Parse tool call manually
tool_use = next(b for b in response.content if b.type == "tool_use")
data = tool_use.input
# No validation — what if a required field is missing?
# No retry — what if the schema wasn't followed?
# Manual Pydantic conversion...

With Instructor, all of that goes away. You define your schema once as a Pydantic model, and every call returns a validated, typed object. The retry logic, the JSON parsing, the tool call handling — all handled automatically.

For production LLM applications that need reliable structured output, Instructor is the fastest path from "LLM returns text" to "application gets typed data."

Set up LLM tracing and observability: LangSmith RAG Evaluation Guide

🔧

Today I Fixed

Short real fixes from production — posted daily

Browse fixes
Newsletter

Stay ahead of the curve

Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.

Related Articles

Comments