🎉 DevOps Interview Prep Bundle is live — 1000+ Q&A across 20 topicsGet it →
All Articles

LLM Cost Attribution and Chargeback for Multi-Tenant Teams

One Anthropic API key, ten teams using it, one bill at the end of the month with no idea who spent what. Here's how to attribute LLM costs per team, per feature, and per customer so finance can actually chargeback.

DevOpsBoysJun 17, 20264 min read
Share:Tweet

The same FinOps problem that hit cloud infrastructure a decade ago is now hitting LLM spend — a shared API key, growing usage across multiple teams and features, and a monthly bill with no breakdown. Unlike cloud cost allocation (which has mature tooling), LLM cost attribution usually has to be built by hand, because the provider's billing dashboard only shows the total.

The Core Idea: Tag Every Request at Call Time

python
import time
from dataclasses import dataclass
from anthropic import Anthropic
 
client = Anthropic()
 
@dataclass
class UsageContext:
    team: str
    feature: str
    customer_id: str | None = None
    environment: str = "production"
 
def tracked_call(messages: list, model: str, max_tokens: int, context: UsageContext) -> dict:
    start = time.time()
    
    response = client.messages.create(
        model=model,
        max_tokens=max_tokens,
        messages=messages
    )
    
    usage_record = {
        "timestamp": time.time(),
        "team": context.team,
        "feature": context.feature,
        "customer_id": context.customer_id,
        "environment": context.environment,
        "model": model,
        "input_tokens": response.usage.input_tokens,
        "output_tokens": response.usage.output_tokens,
        "latency_seconds": time.time() - start,
    }
    
    log_usage(usage_record)
    return response

The discipline here is simple but easy to skip under deadline pressure: every single LLM call anywhere in your systems goes through a wrapper that requires a UsageContext. No bare API calls. This is the entire foundation — if calls bypass this, attribution has gaps.

Step 1: Compute Actual Cost Per Call

python
# Pricing changes — keep this centralized and easy to update
MODEL_PRICING = {
    "claude-sonnet-4-6": {"input": 3.00, "output": 15.00},   # per million tokens
    "claude-haiku-4-5-20251001": {"input": 0.80, "output": 4.00},
    "claude-opus-4-8": {"input": 15.00, "output": 75.00},
}
 
def calculate_cost(record: dict) -> float:
    pricing = MODEL_PRICING[record["model"]]
    input_cost = (record["input_tokens"] / 1_000_000) * pricing["input"]
    output_cost = (record["output_tokens"] / 1_000_000) * pricing["output"]
    return round(input_cost + output_cost, 6)

Step 2: Store Usage Records for Aggregation

python
import sqlite3  # use Postgres/Timescale in production; this is the simplest version
 
def log_usage(record: dict):
    cost = calculate_cost(record)
    
    conn = sqlite3.connect("llm_usage.db")
    conn.execute("""
        INSERT INTO llm_usage 
        (timestamp, team, feature, customer_id, environment, model, 
         input_tokens, output_tokens, cost_usd)
        VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
    """, (
        record["timestamp"], record["team"], record["feature"], 
        record["customer_id"], record["environment"], record["model"],
        record["input_tokens"], record["output_tokens"], cost
    ))
    conn.commit()
sql
CREATE TABLE llm_usage (
    timestamp REAL,
    team TEXT,
    feature TEXT,
    customer_id TEXT,
    environment TEXT,
    model TEXT,
    input_tokens INTEGER,
    output_tokens INTEGER,
    cost_usd REAL
);
 
CREATE INDEX idx_usage_team_time ON llm_usage(team, timestamp);
CREATE INDEX idx_usage_customer ON llm_usage(customer_id);

Step 3: Generate Chargeback Reports

python
def monthly_chargeback_report(year: int, month: int) -> dict:
    conn = sqlite3.connect("llm_usage.db")
    
    rows = conn.execute("""
        SELECT team, feature, 
               SUM(cost_usd) as total_cost,
               SUM(input_tokens + output_tokens) as total_tokens,
               COUNT(*) as request_count
        FROM llm_usage
        WHERE strftime('%Y', datetime(timestamp, 'unixepoch')) = ?
          AND strftime('%m', datetime(timestamp, 'unixepoch')) = ?
        GROUP BY team, feature
        ORDER BY total_cost DESC
    """, (str(year), f"{month:02d}")).fetchall()
    
    return [
        {"team": r[0], "feature": r[1], "cost_usd": round(r[2], 2), 
         "tokens": r[3], "requests": r[4]}
        for r in rows
    ]
team           feature                cost_usd   tokens      requests
support-team   ticket-triage          $847.32    28,200,000  142,000
search-team    semantic-search        $412.18    13,700,000  89,000
platform-team  incident-summaries     $156.90     5,200,000   12,400
growth-team    content-generation     $98.45      3,280,000    4,100

This is the report finance actually needs — not "total LLM spend was $1,514.85" but a breakdown they can charge back to cost centers or use to evaluate whether a feature's LLM cost is justified by its value.

Step 4: Per-Customer Attribution (for Usage-Based Pricing)

If you're billing customers based on their actual LLM usage (common in B2B AI features), the same data answers that directly:

python
def customer_usage_for_billing(customer_id: str, start_date: str, end_date: str) -> dict:
    conn = sqlite3.connect("llm_usage.db")
    
    result = conn.execute("""
        SELECT SUM(cost_usd), SUM(input_tokens + output_tokens), COUNT(*)
        FROM llm_usage
        WHERE customer_id = ? AND timestamp BETWEEN ? AND ?
    """, (customer_id, start_date, end_date)).fetchone()
    
    return {
        "customer_id": customer_id,
        "raw_llm_cost": round(result[0] or 0, 2),
        "total_tokens": result[1] or 0,
        "request_count": result[2] or 0
    }

Most teams add a margin multiplier on top of raw LLM cost when passing this to customer billing — the raw cost covers the API bill, not your infrastructure, support, or product development.

Step 5: Catching Runaway Spend Before the Bill Arrives

python
def check_team_budget_alerts(daily_budgets: dict[str, float]):
    """Run hourly — catch a team blowing through their daily budget before month-end surprise."""
    conn = sqlite3.connect("llm_usage.db")
    today_start = get_today_start_timestamp()
    
    for team, budget in daily_budgets.items():
        spent_today = conn.execute("""
            SELECT SUM(cost_usd) FROM llm_usage 
            WHERE team = ? AND timestamp >= ?
        """, (team, today_start)).fetchone()[0] or 0
        
        if spent_today > budget * 0.8:
            send_slack_alert(
                f"⚠️ {team} has spent ${spent_today:.2f} of ${budget:.2f} "
                f"daily LLM budget ({spent_today/budget:.0%})"
            )

The pattern that matters most: attribution has to be designed in from the first API call, not bolted on after six months of unattributed spend. Retrofitting usage tags onto historical data is usually impossible — you can't tag a request after the fact if the context wasn't captured at call time.

Reduce the underlying spend too: LLM Cost Optimization — Caching and Batching in Production

🔧

Today I Fixed

Short real fixes from production — posted daily

Browse fixes
Newsletter

Stay ahead of the curve

Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.

Related Articles

Comments