LLM Output Validation with Instructor + Pydantic in Production

LLMs return unpredictable text. Instructor + Pydantic turns that into validated, typed Python objects — automatically retrying when the model returns garbage. Here's how to use it in production.

The hardest part of building LLM-powered applications isn't the LLM call — it's what happens to the output.

You ask the model to return JSON. It returns JSON wrapped in a markdown code block. Or it returns JSON with a comment explaining what the JSON means. Or it returns valid JSON but with a field spelled differently than you expected. Or it returns null for a required field.

Your JSON parser crashes. Your application breaks. Your users see errors.

Instructor is the library that fixes this. It wraps your LLM client and enforces Pydantic schema validation on the output, with automatic retry on failure.

Installation

bash

pip install instructor pydantic anthropic

Basic Usage — Guaranteed Typed Output

python

import instructor
from anthropic import Anthropic
from pydantic import BaseModel, Field
 
# Patch the Anthropic client with Instructor
client = instructor.from_anthropic(Anthropic())
 
class IncidentSummary(BaseModel):
    severity: str = Field(description="One of: critical, high, medium, low")
    affected_service: str = Field(description="The primary service affected")
    root_cause: str = Field(description="One sentence root cause")
    immediate_action: str = Field(description="First thing the on-call engineer should do")
    estimated_impact: str = Field(description="User-facing impact description")
 
# This ALWAYS returns an IncidentSummary object — never raw text
summary = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": """Analyze this incident and return a structured summary:
        
        Logs show: checkout service returning 503, payment service connection refused,
        database showing 95% connection pool usage. Started 14 minutes ago.
        Affecting ~12% of checkout attempts."""
    }],
    response_model=IncidentSummary,
)
 
print(summary.severity)          # "critical"
print(summary.affected_service)  # "checkout-service"
print(summary.root_cause)        # "Database connection pool exhausted..."
print(summary.immediate_action)  # "Restart payment-service pods..."

No JSON parsing. No .get() with fallbacks. Just a clean Python object with type safety.

How Instructor Works

Under the hood, Instructor:

Converts your Pydantic model to a JSON schema
Passes it to the LLM as a tool definition (or system prompt depending on the mode)
Gets the LLM response
Validates the response against your schema
If validation fails — automatically retries with the error message included, so the model can fix it
Returns a validated Pydantic object

The retry mechanism is the key part. Instead of crashing on bad output, it tells the model: "You returned X but I expected Y, please fix it." Models fix their output correctly 95%+ of the time on the first retry.

Production Patterns

Pattern 1: Nested Schemas for Complex Data

python

from pydantic import BaseModel, Field
from typing import Literal
from enum import Enum
 
class Severity(str, Enum):
    CRITICAL = "critical"
    HIGH = "high"
    MEDIUM = "medium"
    LOW = "low"
 
class AffectedComponent(BaseModel):
    name: str
    impact: str
    team_owner: str
 
class KubernetesAction(BaseModel):
    command: str = Field(description="Exact kubectl command to run")
    explanation: str = Field(description="Why this command helps")
    safe_to_run: bool = Field(description="Whether this is safe to run without approval")
 
class IncidentAnalysis(BaseModel):
    severity: Severity
    title: str = Field(max_length=80, description="Short incident title for Slack")
    affected_components: list[AffectedComponent]
    root_cause_hypothesis: str
    recommended_actions: list[KubernetesAction]
    escalate_immediately: bool
    estimated_resolution_time_minutes: int
 
analysis = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    messages=[{"role": "user", "content": f"Analyze this incident: {incident_data}"}],
    response_model=IncidentAnalysis,
)
 
# Now you have fully typed, validated data
for action in analysis.recommended_actions:
    if action.safe_to_run:
        print(f"Running: {action.command}")
        # os.system(action.command)  # only safe actions

Pattern 2: Partial Extraction — Extract Structure from Unstructured Text

python

from typing import Optional
 
class PodErrorExtract(BaseModel):
    pod_name: Optional[str] = None
    namespace: Optional[str] = None
    error_type: Optional[str] = None
    error_message: str
    suggested_fix: str
 
# Feed raw log output — Instructor extracts the structure
result = client.messages.create(
    model="claude-haiku-4-5-20251001",  # Use fast model for extraction
    max_tokens=512,
    messages=[{
        "role": "user",
        "content": f"""Extract error information from these logs:
        
        {raw_log_output}"""
    }],
    response_model=PodErrorExtract,
)

Pattern 3: Validation with Custom Validators

python

from pydantic import BaseModel, field_validator, Field
 
class ResourceRecommendation(BaseModel):
    cpu_request: str = Field(description="CPU request in Kubernetes format, e.g. '100m' or '0.5'")
    cpu_limit: str = Field(description="CPU limit, must be >= cpu_request")
    memory_request: str = Field(description="Memory in Mi or Gi, e.g. '256Mi'")
    memory_limit: str = Field(description="Memory limit, must be >= memory_request")
    reasoning: str
    
    @field_validator('cpu_request', 'cpu_limit')
    @classmethod
    def validate_cpu_format(cls, v):
        import re
        if not re.match(r'^\d+m$|^\d+(\.\d+)?$', v):
            raise ValueError(f"Invalid CPU format: {v}. Use '100m' or '0.5'")
        return v
    
    @field_validator('memory_request', 'memory_limit')
    @classmethod
    def validate_memory_format(cls, v):
        import re
        if not re.match(r'^\d+(Mi|Gi|Ki|M|G|K)$', v):
            raise ValueError(f"Invalid memory format: {v}. Use '256Mi' or '1Gi'")
        return v
 
# If Claude returns "0.1 cores" instead of "100m", 
# validator catches it and Claude retries with the correct format
recommendation = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=512,
    messages=[{"role": "user", "content": f"Recommend Kubernetes resources for: {workload_metrics}"}],
    response_model=ResourceRecommendation,
    max_retries=3,  # retry up to 3 times if validation fails
)

Pattern 4: Streaming Partial Objects

For long responses where you want to show progress:

python

import instructor
from anthropic import Anthropic
 
client = instructor.from_anthropic(Anthropic(), mode=instructor.Mode.ANTHROPIC_TOOLS)
 
class DeploymentChecklist(BaseModel):
    pre_deploy_checks: list[str]
    deployment_steps: list[str]
    post_deploy_validation: list[str]
    rollback_procedure: list[str]
 
# Stream the object as it's being generated
for partial_checklist in client.messages.create_partial(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    messages=[{"role": "user", "content": "Generate deployment checklist for a Node.js API on Kubernetes"}],
    response_model=DeploymentChecklist,
):
    # partial_checklist is populated as fields complete
    if partial_checklist.pre_deploy_checks:
        print(f"Pre-deploy: {len(partial_checklist.pre_deploy_checks)} checks ready")

Production Configuration

python

import instructor
from anthropic import Anthropic
from tenacity import retry, stop_after_attempt, wait_exponential
 
# Production-ready client setup
def create_instructor_client():
    return instructor.from_anthropic(
        Anthropic(
            api_key=os.getenv("ANTHROPIC_API_KEY"),
            timeout=30.0,
            max_retries=2,
        ),
        mode=instructor.Mode.ANTHROPIC_TOOLS,  # use tool calling, not JSON mode
    )
 
client = create_instructor_client()
 
# Wrap your calls with retry for network errors
@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=1, max=10)
)
def extract_with_retry(text: str, model_class):
    return client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[{"role": "user", "content": text}],
        response_model=model_class,
        max_retries=2,  # Instructor retries for validation errors
    )

When to Use Which Model

python

# Fast extraction tasks — use Haiku (cheap + fast)
PodErrorExtract, ResourceRecommendation, SimpleClassification
 
# Complex reasoning tasks — use Sonnet
IncidentAnalysis, DeploymentChecklist, RootCauseAnalysis
 
# Never use Opus for structured extraction — overkill, slow, expensive

Instructor vs Raw Tool Calling

Without Instructor, extracting structured data from Claude looks like:

python

# Without Instructor — fragile, lots of boilerplate
response = anthropic.messages.create(
    model="claude-sonnet-4-6",
    tools=[{"name": "extract", "input_schema": {"type": "object", "properties": {...}}}],
    messages=[...]
)
# Parse tool call manually
tool_use = next(b for b in response.content if b.type == "tool_use")
data = tool_use.input
# No validation — what if a required field is missing?
# No retry — what if the schema wasn't followed?
# Manual Pydantic conversion...

With Instructor, all of that goes away. You define your schema once as a Pydantic model, and every call returns a validated, typed object. The retry logic, the JSON parsing, the tool call handling — all handled automatically.

For production LLM applications that need reliable structured output, Instructor is the fastest path from "LLM returns text" to "application gets typed data."

Set up LLM tracing and observability: LangSmith RAG Evaluation Guide

LLM Output Validation with Instructor + Pydantic in Production

Installation

Basic Usage — Guaranteed Typed Output

How Instructor Works

Production Patterns

Pattern 1: Nested Schemas for Complex Data

Pattern 2: Partial Extraction — Extract Structure from Unstructured Text

Pattern 3: Validation with Custom Validators

Pattern 4: Streaming Partial Objects

Production Configuration

When to Use Which Model

Instructor vs Raw Tool Calling

Stay ahead of the curve

Related Articles

Build an AI DevOps Onboarding Assistant with Claude API

Build an AI-Powered Incident Report Generator with Claude API (2026)

Build AI-Powered Kubernetes Policy Enforcer with OPA and Claude

Comments