Nozle
SDKsPython SDK

LLM Auto-Capture

Automatically track LLM token usage for billing

Nozle's LLM wrappers intercept OpenAI and Anthropic API calls, extract token usage, and automatically send billing events — no manual tracking code needed.

Cost calculation happens server-side via the Go engine's cost model system. The SDK only sends raw token counts.

OpenAI

pip install nozle-sdk[openai]  # installs openai>=1.0
from openai import OpenAI
from nozle import Nozle, wrap_openai

nozle = Nozle(api_key="sk_live_...")
openai = wrap_openai(
    OpenAI(),
    nozle,
    customer_id="cust_123",
    feature="code_completion",   # optional: tag for entitlement tracking
    metric_code="llm_tokens",    # optional: defaults to "llm_tokens"
)

# Use OpenAI normally — tracking happens automatically
response = openai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)

Streaming

Streaming is fully supported. Usage is captured from the final chunk:

stream = openai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Explain quantum computing"}],
    stream=True,
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")
# Token usage is automatically tracked after the stream completes

Anthropic

pip install nozle-sdk[anthropic]  # installs anthropic>=0.30.0
from anthropic import Anthropic
from nozle import Nozle, wrap_anthropic

nozle = Nozle(api_key="sk_live_...")
anthropic = wrap_anthropic(
    Anthropic(),
    nozle,
    customer_id="cust_123",
    feature="code_completion",
)

message = anthropic.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}],
)

Parameters

ParameterTypeRequiredDescription
customer_idstrYesCustomer to bill for this usage
metric_codestrNoBillable metric code (default: "llm_tokens")
featurestrNoFeature tag for entitlement tracking

What gets tracked

Each LLM call sends a single event via nozle.track() with these properties:

PropertySourceDescription
modelResponseModel name (e.g. gpt-4o, claude-sonnet-4-20250514)
input_tokensResponse usagePrompt/input token count
output_tokensResponse usageCompletion/output token count
latency_msMeasuredEnd-to-end call duration
featurewrap optionsFeature tag (if provided)

The SDK does not calculate costs. The Go engine matches the model property against your cost models with per_model type and calculates cost_cents server-side. Make sure you have a cost model configured for the llm_tokens metric with rates for your models.

Privacy

Wrappers never capture prompt content or completion text — only metadata (model name, token counts, latency). No PII passes through the billing pipeline.

Manual tracking

If you prefer manual control or use a provider without a wrapper, you can track LLM usage directly:

response = openai.chat.completions.create(
    model="gpt-4o",
    messages=messages,
)

nozle.track("cust_123", "llm_tokens", metadata={
    "model": response.model,
    "input_tokens": response.usage.prompt_tokens,
    "output_tokens": response.usage.completion_tokens,
})

On this page