LLM Auto-Capture
Automatically track LLM token usage for billing
Nozle's LLM wrappers intercept OpenAI and Anthropic API calls, extract token usage, and automatically send billing events — no manual tracking code needed.
Cost calculation happens server-side via the Go engine's cost model system. The SDK only sends raw token counts.
OpenAI
pip install nozle-sdk[openai] # installs openai>=1.0from openai import OpenAI
from nozle import Nozle, wrap_openai
nozle = Nozle(api_key="sk_live_...")
openai = wrap_openai(
OpenAI(),
nozle,
customer_id="cust_123",
feature="code_completion", # optional: tag for entitlement tracking
metric_code="llm_tokens", # optional: defaults to "llm_tokens"
)
# Use OpenAI normally — tracking happens automatically
response = openai.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
)Streaming
Streaming is fully supported. Usage is captured from the final chunk:
stream = openai.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Explain quantum computing"}],
stream=True,
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="")
# Token usage is automatically tracked after the stream completesAnthropic
pip install nozle-sdk[anthropic] # installs anthropic>=0.30.0from anthropic import Anthropic
from nozle import Nozle, wrap_anthropic
nozle = Nozle(api_key="sk_live_...")
anthropic = wrap_anthropic(
Anthropic(),
nozle,
customer_id="cust_123",
feature="code_completion",
)
message = anthropic.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}],
)Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
customer_id | str | Yes | Customer to bill for this usage |
metric_code | str | No | Billable metric code (default: "llm_tokens") |
feature | str | No | Feature tag for entitlement tracking |
What gets tracked
Each LLM call sends a single event via nozle.track() with these properties:
| Property | Source | Description |
|---|---|---|
model | Response | Model name (e.g. gpt-4o, claude-sonnet-4-20250514) |
input_tokens | Response usage | Prompt/input token count |
output_tokens | Response usage | Completion/output token count |
latency_ms | Measured | End-to-end call duration |
feature | wrap options | Feature tag (if provided) |
The SDK does not calculate costs. The Go engine matches the model property against your cost models with per_model type and calculates cost_cents server-side. Make sure you have a cost model configured for the llm_tokens metric with rates for your models.
Privacy
Wrappers never capture prompt content or completion text — only metadata (model name, token counts, latency). No PII passes through the billing pipeline.
Manual tracking
If you prefer manual control or use a provider without a wrapper, you can track LLM usage directly:
response = openai.chat.completions.create(
model="gpt-4o",
messages=messages,
)
nozle.track("cust_123", "llm_tokens", metadata={
"model": response.model,
"input_tokens": response.usage.prompt_tokens,
"output_tokens": response.usage.completion_tokens,
})