LLM Cost Attribution and Chargeback for Multi-Tenant Teams

One Anthropic API key, ten teams using it, one bill at the end of the month with no idea who spent what. Here's how to attribute LLM costs per team, per feature, and per customer so finance can actually chargeback.

The same FinOps problem that hit cloud infrastructure a decade ago is now hitting LLM spend — a shared API key, growing usage across multiple teams and features, and a monthly bill with no breakdown. Unlike cloud cost allocation (which has mature tooling), LLM cost attribution usually has to be built by hand, because the provider's billing dashboard only shows the total.

The Core Idea: Tag Every Request at Call Time

python

import time
from dataclasses import dataclass
from anthropic import Anthropic
 
client = Anthropic()
 
@dataclass
class UsageContext:
    team: str
    feature: str
    customer_id: str | None = None
    environment: str = "production"
 
def tracked_call(messages: list, model: str, max_tokens: int, context: UsageContext) -> dict:
    start = time.time()
    
    response = client.messages.create(
        model=model,
        max_tokens=max_tokens,
        messages=messages
    )
    
    usage_record = {
        "timestamp": time.time(),
        "team": context.team,
        "feature": context.feature,
        "customer_id": context.customer_id,
        "environment": context.environment,
        "model": model,
        "input_tokens": response.usage.input_tokens,
        "output_tokens": response.usage.output_tokens,
        "latency_seconds": time.time() - start,
    }
    
    log_usage(usage_record)
    return response

The discipline here is simple but easy to skip under deadline pressure: every single LLM call anywhere in your systems goes through a wrapper that requires a UsageContext. No bare API calls. This is the entire foundation — if calls bypass this, attribution has gaps.

Step 1: Compute Actual Cost Per Call

python

# Pricing changes — keep this centralized and easy to update
MODEL_PRICING = {
    "claude-sonnet-4-6": {"input": 3.00, "output": 15.00},   # per million tokens
    "claude-haiku-4-5-20251001": {"input": 0.80, "output": 4.00},
    "claude-opus-4-8": {"input": 15.00, "output": 75.00},
}
 
def calculate_cost(record: dict) -> float:
    pricing = MODEL_PRICING[record["model"]]
    input_cost = (record["input_tokens"] / 1_000_000) * pricing["input"]
    output_cost = (record["output_tokens"] / 1_000_000) * pricing["output"]
    return round(input_cost + output_cost, 6)

Step 2: Store Usage Records for Aggregation

python

import sqlite3  # use Postgres/Timescale in production; this is the simplest version
 
def log_usage(record: dict):
    cost = calculate_cost(record)
    
    conn = sqlite3.connect("llm_usage.db")
    conn.execute("""
        INSERT INTO llm_usage 
        (timestamp, team, feature, customer_id, environment, model, 
         input_tokens, output_tokens, cost_usd)
        VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
    """, (
        record["timestamp"], record["team"], record["feature"], 
        record["customer_id"], record["environment"], record["model"],
        record["input_tokens"], record["output_tokens"], cost
    ))
    conn.commit()

sql

CREATE TABLE llm_usage (
    timestamp REAL,
    team TEXT,
    feature TEXT,
    customer_id TEXT,
    environment TEXT,
    model TEXT,
    input_tokens INTEGER,
    output_tokens INTEGER,
    cost_usd REAL
);
 
CREATE INDEX idx_usage_team_time ON llm_usage(team, timestamp);
CREATE INDEX idx_usage_customer ON llm_usage(customer_id);

Step 3: Generate Chargeback Reports

python

def monthly_chargeback_report(year: int, month: int) -> dict:
    conn = sqlite3.connect("llm_usage.db")
    
    rows = conn.execute("""
        SELECT team, feature, 
               SUM(cost_usd) as total_cost,
               SUM(input_tokens + output_tokens) as total_tokens,
               COUNT(*) as request_count
        FROM llm_usage
        WHERE strftime('%Y', datetime(timestamp, 'unixepoch')) = ?
          AND strftime('%m', datetime(timestamp, 'unixepoch')) = ?
        GROUP BY team, feature
        ORDER BY total_cost DESC
    """, (str(year), f"{month:02d}")).fetchall()
    
    return [
        {"team": r[0], "feature": r[1], "cost_usd": round(r[2], 2), 
         "tokens": r[3], "requests": r[4]}
        for r in rows
    ]

team           feature                cost_usd   tokens      requests
support-team   ticket-triage          $847.32    28,200,000  142,000
search-team    semantic-search        $412.18    13,700,000  89,000
platform-team  incident-summaries     $156.90     5,200,000   12,400
growth-team    content-generation     $98.45      3,280,000    4,100

This is the report finance actually needs — not "total LLM spend was $1,514.85" but a breakdown they can charge back to cost centers or use to evaluate whether a feature's LLM cost is justified by its value.

Step 4: Per-Customer Attribution (for Usage-Based Pricing)

If you're billing customers based on their actual LLM usage (common in B2B AI features), the same data answers that directly:

python

def customer_usage_for_billing(customer_id: str, start_date: str, end_date: str) -> dict:
    conn = sqlite3.connect("llm_usage.db")
    
    result = conn.execute("""
        SELECT SUM(cost_usd), SUM(input_tokens + output_tokens), COUNT(*)
        FROM llm_usage
        WHERE customer_id = ? AND timestamp BETWEEN ? AND ?
    """, (customer_id, start_date, end_date)).fetchone()
    
    return {
        "customer_id": customer_id,
        "raw_llm_cost": round(result[0] or 0, 2),
        "total_tokens": result[1] or 0,
        "request_count": result[2] or 0
    }

Most teams add a margin multiplier on top of raw LLM cost when passing this to customer billing — the raw cost covers the API bill, not your infrastructure, support, or product development.

Step 5: Catching Runaway Spend Before the Bill Arrives

python

def check_team_budget_alerts(daily_budgets: dict[str, float]):
    """Run hourly — catch a team blowing through their daily budget before month-end surprise."""
    conn = sqlite3.connect("llm_usage.db")
    today_start = get_today_start_timestamp()
    
    for team, budget in daily_budgets.items():
        spent_today = conn.execute("""
            SELECT SUM(cost_usd) FROM llm_usage 
            WHERE team = ? AND timestamp >= ?
        """, (team, today_start)).fetchone()[0] or 0
        
        if spent_today > budget * 0.8:
            send_slack_alert(
                f"⚠️ {team} has spent ${spent_today:.2f} of ${budget:.2f} "
                f"daily LLM budget ({spent_today/budget:.0%})"
            )

The pattern that matters most: attribution has to be designed in from the first API call, not bolted on after six months of unattributed spend. Retrofitting usage tags onto historical data is usually impossible — you can't tag a request after the fact if the context wasn't captured at call time.

Reduce the underlying spend too: LLM Cost Optimization — Caching and Batching in Production

LLM Cost Attribution and Chargeback for Multi-Tenant Teams

The Core Idea: Tag Every Request at Call Time

Step 1: Compute Actual Cost Per Call

Step 2: Store Usage Records for Aggregation

Step 3: Generate Chargeback Reports

Step 4: Per-Customer Attribution (for Usage-Based Pricing)

Step 5: Catching Runaway Spend Before the Bill Arrives

Stay ahead of the curve

Related Articles

AI-Driven Capacity Planning for Kubernetes Clusters (2026)

Build an AI Capacity Forecasting Tool with Prophet + Kubernetes Metrics

Build an AI Cloud Cost Anomaly Detector with Claude API + AWS Cost Explorer

Comments