ctx.llm

Language model completions, embeddings, and streaming — with automatic cost tracking, provenance logging, and budget enforcement on every call.

Why this exists: Raw LLM provider calls give you text back.ctx.llm gives you text back plus the cost, token counts, a provenance ID for the audit trail, and automatic budget enforcement from the delegation. When your agent exceeds its budget, it throwsBudgetExceededError — not a surprise bill at the end of the month.

ctx.llm.complete(options)

Generate a text completion. Returns a fully typed response with cost and provenance.

typescript
const response = await ctx.llm.complete({
prompt: 'Summarize the following document in 3 bullet points.',
temperature: 0.3,
maxTokens: 500,
});
console.log(response.content); // Generated text
console.log(response.cost.usd); // e.g. 0.0023
console.log(response.usage.totalTokens); // e.g. 847
console.log(response.provenanceId); // Audit trail reference

With message arrays

typescript
// Use message arrays for multi-turn conversations
const response = await ctx.llm.complete({
system: 'You are a financial analyst. Be precise and cite sources.',
prompt: [
{ role: 'user', content: 'What are the key risks in this contract?' },
{ role: 'assistant', content: 'I see three main areas of concern...' },
{ role: 'user', content: 'Focus on the indemnification clause.' },
],
temperature: 0.2,
maxTokens: 1000,
});

Options

OptionTypeDefaultDescription
promptstring | Message[]The prompt text or conversation messages. Required.
systemstringundefinedSystem message. Sets the model's persona/behavior.
temperaturenumber0.7Sampling temperature 0.0–1.0. Lower = more deterministic.
maxTokensnumberundefinedMaximum output tokens. Capped by model context window.
modelstringdelegation defaultOverride model for this call. e.g. "gpt-4o", "claude-3-5-sonnet"
stopstring[]undefinedStop sequences that end generation.
topPnumberundefinedNucleus sampling. Alternative to temperature.

Response

FieldTypeDescription
contentstringGenerated text.
costCostCost in USD with breakdown (llm, embedding, etc).
usageTokenUsagepromptTokens, completionTokens, totalTokens.
modelstringActual model used (may differ from requested if delegated).
finishReason'stop' | 'length' | 'content_filter'Why generation stopped.
provenanceIdstringAudit trail reference. Queryable via ctx.provenance.
confidencenumber | undefinedConfidence score if the model provides it.

ctx.llm.stream(options)

Streaming completion. Returns an AsyncIterableIterator. Same options as complete(). Cost is reported in the final chunk.

typescript
// Stream tokens in real-time
for await (const chunk of ctx.llm.stream({ prompt: userQuery })) {
process.stdout.write(chunk.delta); // Print new tokens as they arrive
if (chunk.finishReason) {
console.log('\nDone. Tokens used:', chunk.usage?.totalTokens);
}
}

ctx.llm.embed(text)

Generate a vector embedding for semantic search. Embedding costs are tracked separately from completion costs in the Cost.breakdown field.

typescript
// Generate embeddings for semantic search
const embedding = await ctx.llm.embed('How do I process an invoice?');
// embedding.vector is a float array ready for vector search
const results = await ctx.memory.persistent.search(embedding.vector, {
limit: 5,
minScore: 0.7,
});
console.log(embedding.dimensions); // e.g. 1536
console.log(embedding.cost.usd); // Embedding costs are tracked too

Cost Tracking

Every call is tracked. Use getLastCost() andgetTotalCost() to monitor spend within an execution.

typescript
// Track costs across multiple LLM calls
const step1 = await ctx.llm.complete({ prompt: 'Extract entities from: ' + text });
const step2 = await ctx.llm.complete({ prompt: 'Classify entities: ' + step1.content });
// Cost of the last call
const lastCost = ctx.llm.getLastCost();
// Total cost of ALL llm calls in this execution
const totalCost = ctx.llm.getTotalCost();
console.log(`Total: $${totalCost.usd.toFixed(4)}`); // e.g. Total: $0.0047

Real-World Example

From the invoice-processor reference implementation:

typescript
// Real-world: invoice extraction (from packages/agents-reference)
const extractionResult = await ctx.llm.complete({
prompt: [
{
role: 'user',
content: `Extract invoice data as JSON with fields:
vendor, date, total, line_items (array with description, qty, unit_price, total).
Invoice content:
${invoiceContent}`,
},
],
temperature: 0.2, // Low temp for structured extraction
maxTokens: 2000,
});
const invoiceData = JSON.parse(extractionResult.content);

Errors

BudgetExceededErrorThe delegation budget would be exceeded by this call.
AccessDeniedErrorThe delegation does not include LLM access scope.
TimeoutErrorModel took longer than the configured timeout.

See Error Reference for all error types.

In the wild

Reference agents that demonstrate ctx.llm in production.

Deep Dives

See Also