LLM Cost Attribution and Chargeback for Multi-Tenant Teams
One Anthropic API key, ten teams using it, one bill at the end of the month with no idea who spent what. Here's how to attribute LLM costs per team, per feature, and per customer so finance can actually chargeback.
The same FinOps problem that hit cloud infrastructure a decade ago is now hitting LLM spend — a shared API key, growing usage across multiple teams and features, and a monthly bill with no breakdown. Unlike cloud cost allocation (which has mature tooling), LLM cost attribution usually has to be built by hand, because the provider's billing dashboard only shows the total.
The Core Idea: Tag Every Request at Call Time
import time
from dataclasses import dataclass
from anthropic import Anthropic
client = Anthropic()
@dataclass
class UsageContext:
team: str
feature: str
customer_id: str | None = None
environment: str = "production"
def tracked_call(messages: list, model: str, max_tokens: int, context: UsageContext) -> dict:
start = time.time()
response = client.messages.create(
model=model,
max_tokens=max_tokens,
messages=messages
)
usage_record = {
"timestamp": time.time(),
"team": context.team,
"feature": context.feature,
"customer_id": context.customer_id,
"environment": context.environment,
"model": model,
"input_tokens": response.usage.input_tokens,
"output_tokens": response.usage.output_tokens,
"latency_seconds": time.time() - start,
}
log_usage(usage_record)
return responseThe discipline here is simple but easy to skip under deadline pressure: every single LLM call anywhere in your systems goes through a wrapper that requires a UsageContext. No bare API calls. This is the entire foundation — if calls bypass this, attribution has gaps.
Step 1: Compute Actual Cost Per Call
# Pricing changes — keep this centralized and easy to update
MODEL_PRICING = {
"claude-sonnet-4-6": {"input": 3.00, "output": 15.00}, # per million tokens
"claude-haiku-4-5-20251001": {"input": 0.80, "output": 4.00},
"claude-opus-4-8": {"input": 15.00, "output": 75.00},
}
def calculate_cost(record: dict) -> float:
pricing = MODEL_PRICING[record["model"]]
input_cost = (record["input_tokens"] / 1_000_000) * pricing["input"]
output_cost = (record["output_tokens"] / 1_000_000) * pricing["output"]
return round(input_cost + output_cost, 6)Step 2: Store Usage Records for Aggregation
import sqlite3 # use Postgres/Timescale in production; this is the simplest version
def log_usage(record: dict):
cost = calculate_cost(record)
conn = sqlite3.connect("llm_usage.db")
conn.execute("""
INSERT INTO llm_usage
(timestamp, team, feature, customer_id, environment, model,
input_tokens, output_tokens, cost_usd)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
""", (
record["timestamp"], record["team"], record["feature"],
record["customer_id"], record["environment"], record["model"],
record["input_tokens"], record["output_tokens"], cost
))
conn.commit()CREATE TABLE llm_usage (
timestamp REAL,
team TEXT,
feature TEXT,
customer_id TEXT,
environment TEXT,
model TEXT,
input_tokens INTEGER,
output_tokens INTEGER,
cost_usd REAL
);
CREATE INDEX idx_usage_team_time ON llm_usage(team, timestamp);
CREATE INDEX idx_usage_customer ON llm_usage(customer_id);Step 3: Generate Chargeback Reports
def monthly_chargeback_report(year: int, month: int) -> dict:
conn = sqlite3.connect("llm_usage.db")
rows = conn.execute("""
SELECT team, feature,
SUM(cost_usd) as total_cost,
SUM(input_tokens + output_tokens) as total_tokens,
COUNT(*) as request_count
FROM llm_usage
WHERE strftime('%Y', datetime(timestamp, 'unixepoch')) = ?
AND strftime('%m', datetime(timestamp, 'unixepoch')) = ?
GROUP BY team, feature
ORDER BY total_cost DESC
""", (str(year), f"{month:02d}")).fetchall()
return [
{"team": r[0], "feature": r[1], "cost_usd": round(r[2], 2),
"tokens": r[3], "requests": r[4]}
for r in rows
]team feature cost_usd tokens requests
support-team ticket-triage $847.32 28,200,000 142,000
search-team semantic-search $412.18 13,700,000 89,000
platform-team incident-summaries $156.90 5,200,000 12,400
growth-team content-generation $98.45 3,280,000 4,100
This is the report finance actually needs — not "total LLM spend was $1,514.85" but a breakdown they can charge back to cost centers or use to evaluate whether a feature's LLM cost is justified by its value.
Step 4: Per-Customer Attribution (for Usage-Based Pricing)
If you're billing customers based on their actual LLM usage (common in B2B AI features), the same data answers that directly:
def customer_usage_for_billing(customer_id: str, start_date: str, end_date: str) -> dict:
conn = sqlite3.connect("llm_usage.db")
result = conn.execute("""
SELECT SUM(cost_usd), SUM(input_tokens + output_tokens), COUNT(*)
FROM llm_usage
WHERE customer_id = ? AND timestamp BETWEEN ? AND ?
""", (customer_id, start_date, end_date)).fetchone()
return {
"customer_id": customer_id,
"raw_llm_cost": round(result[0] or 0, 2),
"total_tokens": result[1] or 0,
"request_count": result[2] or 0
}Most teams add a margin multiplier on top of raw LLM cost when passing this to customer billing — the raw cost covers the API bill, not your infrastructure, support, or product development.
Step 5: Catching Runaway Spend Before the Bill Arrives
def check_team_budget_alerts(daily_budgets: dict[str, float]):
"""Run hourly — catch a team blowing through their daily budget before month-end surprise."""
conn = sqlite3.connect("llm_usage.db")
today_start = get_today_start_timestamp()
for team, budget in daily_budgets.items():
spent_today = conn.execute("""
SELECT SUM(cost_usd) FROM llm_usage
WHERE team = ? AND timestamp >= ?
""", (team, today_start)).fetchone()[0] or 0
if spent_today > budget * 0.8:
send_slack_alert(
f"⚠️ {team} has spent ${spent_today:.2f} of ${budget:.2f} "
f"daily LLM budget ({spent_today/budget:.0%})"
)The pattern that matters most: attribution has to be designed in from the first API call, not bolted on after six months of unattributed spend. Retrofitting usage tags onto historical data is usually impossible — you can't tag a request after the fact if the context wasn't captured at call time.
Reduce the underlying spend too: LLM Cost Optimization — Caching and Batching in Production
Today I Fixed
Short real fixes from production — posted daily
Stay ahead of the curve
Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.
Related Articles
AI-Driven Capacity Planning for Kubernetes Clusters (2026)
How to use AI and machine learning for Kubernetes capacity planning. Covers predictive autoscaling, cost optimization, tools like StormForge and Kubecost, and building custom ML models for resource forecasting.
Build an AI Capacity Forecasting Tool with Prophet + Kubernetes Metrics
Reactive autoscaling fixes problems after they happen. Build a forecasting tool using Facebook's Prophet library on historical Prometheus metrics to predict capacity needs days ahead — before traffic spikes hit.
Build an AI Cloud Cost Anomaly Detector with Claude API + AWS Cost Explorer
Cloud costs spike without warning. Build a Python bot using AWS Cost Explorer + Claude API that detects anomalies using Z-score analysis and explains the spike in plain English.