LLM Auto-Capture

Nozle's LLM wrappers intercept OpenAI and Anthropic API calls, extract token usage, and automatically send billing events — no manual tracking code needed.

Cost calculation happens server-side via the Go engine's cost model system. The SDK only sends raw token counts.

OpenAI

npm install openai  # peer dependency, >=4.0.0

import OpenAI from 'openai';
import { Nozle, wrapOpenAI } from '@nozle-js/node';

const nozle = new Nozle({ apiKey: 'sk_live_...' });
const openai = wrapOpenAI(new OpenAI(), nozle, {
  customerId: 'cust_123',
  feature: 'code_completion',   // optional: tag for entitlement tracking
  metricCode: 'llm_tokens',     // optional: defaults to "llm_tokens"
});

// Use OpenAI normally — tracking happens automatically
const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Hello' }],
});

Streaming

Streaming is fully supported. Usage is captured from the final chunk:

const stream = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Explain quantum computing' }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
}
// Token usage is automatically tracked after the stream completes

Anthropic

npm install @anthropic-ai/sdk  # peer dependency, >=0.30.0

import Anthropic from '@anthropic-ai/sdk';
import { Nozle, wrapAnthropic } from '@nozle-js/node';

const nozle = new Nozle({ apiKey: 'sk_live_...' });
const anthropic = wrapAnthropic(new Anthropic(), nozle, {
  customerId: 'cust_123',
  feature: 'code_completion',
});

const message = await anthropic.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'Hello' }],
});

WrapOptions

Field	Type	Required	Description
`customerId`	string	Yes	Customer to bill for this usage
`metricCode`	string	No	Billable metric code (default: `"llm_tokens"`)
`feature`	string	No	Feature tag for entitlement tracking

What gets tracked

Each LLM call sends a single event via nozle.track() with these properties:

Property	Source	Description
`model`	Response	Model name (e.g. `gpt-4o`, `claude-sonnet-4-20250514`)
`input_tokens`	Response usage	Prompt/input token count
`output_tokens`	Response usage	Completion/output token count
`latency_ms`	Measured	End-to-end call duration
`feature`	WrapOptions	Feature tag (if provided)

The SDK does not calculate costs. The Go engine matches the model property against your cost models with per_model type and calculates cost_cents server-side. Make sure you have a cost model configured for the llm_tokens metric with rates for your models.

Privacy

Wrappers never capture prompt content or completion text — only metadata (model name, token counts, latency). No PII passes through the billing pipeline.

Manual tracking

If you prefer manual control or use a provider without a wrapper, you can track LLM usage directly:

const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages,
});

await nozle.track('cust_123', 'llm_tokens', {
  model: response.model,
  input_tokens: response.usage?.prompt_tokens ?? 0,
  output_tokens: response.usage?.completion_tokens ?? 0,
});