LLM Auto-Capture
Automatically track LLM token usage for billing
Nozle's LLM wrappers intercept OpenAI and Anthropic API calls, extract token usage, and automatically send billing events — no manual tracking code needed.
Cost calculation happens server-side via the Go engine's cost model system. The SDK only sends raw token counts.
OpenAI
npm install openai # peer dependency, >=4.0.0import OpenAI from 'openai';
import { Nozle, wrapOpenAI } from '@nozle-js/node';
const nozle = new Nozle({ apiKey: 'sk_live_...' });
const openai = wrapOpenAI(new OpenAI(), nozle, {
customerId: 'cust_123',
feature: 'code_completion', // optional: tag for entitlement tracking
metricCode: 'llm_tokens', // optional: defaults to "llm_tokens"
});
// Use OpenAI normally — tracking happens automatically
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Hello' }],
});Streaming
Streaming is fully supported. Usage is captured from the final chunk:
const stream = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Explain quantum computing' }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
}
// Token usage is automatically tracked after the stream completesAnthropic
npm install @anthropic-ai/sdk # peer dependency, >=0.30.0import Anthropic from '@anthropic-ai/sdk';
import { Nozle, wrapAnthropic } from '@nozle-js/node';
const nozle = new Nozle({ apiKey: 'sk_live_...' });
const anthropic = wrapAnthropic(new Anthropic(), nozle, {
customerId: 'cust_123',
feature: 'code_completion',
});
const message = await anthropic.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 1024,
messages: [{ role: 'user', content: 'Hello' }],
});WrapOptions
| Field | Type | Required | Description |
|---|---|---|---|
customerId | string | Yes | Customer to bill for this usage |
metricCode | string | No | Billable metric code (default: "llm_tokens") |
feature | string | No | Feature tag for entitlement tracking |
What gets tracked
Each LLM call sends a single event via nozle.track() with these properties:
| Property | Source | Description |
|---|---|---|
model | Response | Model name (e.g. gpt-4o, claude-sonnet-4-20250514) |
input_tokens | Response usage | Prompt/input token count |
output_tokens | Response usage | Completion/output token count |
latency_ms | Measured | End-to-end call duration |
feature | WrapOptions | Feature tag (if provided) |
The SDK does not calculate costs. The Go engine matches the model property against your cost models with per_model type and calculates cost_cents server-side. Make sure you have a cost model configured for the llm_tokens metric with rates for your models.
Privacy
Wrappers never capture prompt content or completion text — only metadata (model name, token counts, latency). No PII passes through the billing pipeline.
Manual tracking
If you prefer manual control or use a provider without a wrapper, you can track LLM usage directly:
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages,
});
await nozle.track('cust_123', 'llm_tokens', {
model: response.model,
input_tokens: response.usage?.prompt_tokens ?? 0,
output_tokens: response.usage?.completion_tokens ?? 0,
});