`ctx.llm`

Language model completions, embeddings, and streaming — with automatic cost tracking, provenance logging, and budget enforcement on every call.

Why this exists: Raw LLM provider calls give you text back.ctx.llm gives you text back plus the cost, token counts, a provenance ID for the audit trail, and automatic budget enforcement from the delegation. When your agent exceeds its budget, it throwsBudgetExceededError — not a surprise bill at the end of the month.

`ctx.llm.complete(options)`

Generate a text completion. Returns a fully typed response with cost and provenance.

typescript

const response = await ctx.llm.complete({
  prompt: 'Summarize the following document in 3 bullet points.',
  temperature: 0.3,
  maxTokens: 500,
});

console.log(response.content);        // Generated text
console.log(response.cost.usd);       // e.g. 0.0023
console.log(response.usage.totalTokens); // e.g. 847
console.log(response.provenanceId);   // Audit trail reference

With message arrays

typescript

// Use message arrays for multi-turn conversations
const response = await ctx.llm.complete({
  system: 'You are a financial analyst. Be precise and cite sources.',
  prompt: [
    { role: 'user', content: 'What are the key risks in this contract?' },
    { role: 'assistant', content: 'I see three main areas of concern...' },
    { role: 'user', content: 'Focus on the indemnification clause.' },
  ],
  temperature: 0.2,
  maxTokens: 1000,
});

Options

Option	Type	Default	Description
`prompt`	`string \| Message[]`	—	The prompt text or conversation messages. Required.
`system`	`string`	undefined	System message. Sets the model's persona/behavior.
`temperature`	`number`	0.7	Sampling temperature 0.0–1.0. Lower = more deterministic.
`maxTokens`	`number`	undefined	Maximum output tokens. Capped by model context window.
`model`	`string`	delegation default	Override model for this call. e.g. "gpt-4o", "claude-3-5-sonnet"
`stop`	`string[]`	undefined	Stop sequences that end generation.
`topP`	`number`	undefined	Nucleus sampling. Alternative to temperature.

Response

Field	Type	Description
`content`	`string`	Generated text.
`cost`	`Cost`	Cost in USD with breakdown (llm, embedding, etc).
`usage`	`TokenUsage`	promptTokens, completionTokens, totalTokens.
`model`	`string`	Actual model used (may differ from requested if delegated).
`finishReason`	`'stop' \| 'length' \| 'content_filter'`	Why generation stopped.
`provenanceId`	`string`	Audit trail reference. Queryable via ctx.provenance.
`confidence`	`number \| undefined`	Confidence score if the model provides it.

`ctx.llm.stream(options)`

Streaming completion. Returns an AsyncIterableIterator. Same options as complete(). Cost is reported in the final chunk.

typescript

// Stream tokens in real-time
for await (const chunk of ctx.llm.stream({ prompt: userQuery })) {
  process.stdout.write(chunk.delta);  // Print new tokens as they arrive

  if (chunk.finishReason) {
    console.log('\nDone. Tokens used:', chunk.usage?.totalTokens);
  }
}

`ctx.llm.embed(text)`

Generate a vector embedding for semantic search. Embedding costs are tracked separately from completion costs in the Cost.breakdown field.

typescript

// Generate embeddings for semantic search
const embedding = await ctx.llm.embed('How do I process an invoice?');

// embedding.vector is a float array ready for vector search
const results = await ctx.memory.persistent.search(embedding.vector, {
  limit: 5,
  minScore: 0.7,
});

console.log(embedding.dimensions);  // e.g. 1536
console.log(embedding.cost.usd);    // Embedding costs are tracked too

Cost Tracking

Every call is tracked. Use getLastCost() andgetTotalCost() to monitor spend within an execution.

typescript

// Track costs across multiple LLM calls
const step1 = await ctx.llm.complete({ prompt: 'Extract entities from: ' + text });
const step2 = await ctx.llm.complete({ prompt: 'Classify entities: ' + step1.content });

// Cost of the last call
const lastCost = ctx.llm.getLastCost();

// Total cost of ALL llm calls in this execution
const totalCost = ctx.llm.getTotalCost();
console.log(`Total: $${totalCost.usd.toFixed(4)}`);  // e.g. Total: $0.0047

Real-World Example

From the invoice-processor reference implementation:

typescript

// Real-world: invoice extraction (from packages/agents-reference)
const extractionResult = await ctx.llm.complete({
  prompt: [
    {
      role: 'user',
      content: `Extract invoice data as JSON with fields:
        vendor, date, total, line_items (array with description, qty, unit_price, total).
        Invoice content:
        ${invoiceContent}`,
    },
  ],
  temperature: 0.2,   // Low temp for structured extraction
  maxTokens: 2000,
});

const invoiceData = JSON.parse(extractionResult.content);

Errors

BudgetExceededErrorThe delegation budget would be exceeded by this call.

AccessDeniedErrorThe delegation does not include LLM access scope.

TimeoutErrorModel took longer than the configured timeout.

See Error Reference for all error types.

In the wild

Reference agents that demonstrate ctx.llm in production.

ctx.llm.stream()

streaming-analyzer

Real-time streaming analysis with token accumulation and async iteration.

ctx.llm.embed()

semantic-search

Generate vector embeddings for documents and queries. The RAG foundation.

ctx.llm.complete()

invoice-processor

JSON extraction prompt with low temperature for structured, deterministic output.

Deep Dives

Signals reference workflow · Part 3 of 3

CLI & SDK Scaffolding for HUMΛN Workflows — From Zero to Production in 15 Minutes

How the HUMΛN CLI, SDK helpers, and code generators cut workflow development from hours to minutes — patterns from the Signals implementation.

18 minDeveloper

Prompt Management · Part 1 of 4

From Inline Strings to ctx.prompts: A Developer's Guide to HUMΛN Prompt Management

A hands-on walkthrough of HUMΛN's prompt SDK: authoring prompts, validating schemas, composing layers, publishing versions, and wiring telemetry — with code examples from real agents.

10 minDeveloper

Prompt Management · Part 3 of 4

The Self-Improving Prompt Loop: How Telemetry Closes the Gap Between Good and Great

Most AI platforms ship prompts and forget them. HUMΛN's protocol-level telemetry, model affinity tracking, and Prompt Refinement Agent create a virtuous cycle of continuous improvement — with humans always in the loop.

14 minArchitecture

ctx.llm

ctx.llm.complete(options)

With message arrays

Options

Response

ctx.llm.stream(options)

ctx.llm.embed(text)

Cost Tracking

Real-World Example

Errors

In the wild

Deep Dives

See Also

`ctx.llm`

`ctx.llm.complete(options)`

`ctx.llm.stream(options)`

`ctx.llm.embed(text)`