Testing

Test agents in isolation with createMockExecutionContext. No infrastructure, no real LLM calls, no secrets required — just fast, deterministic unit tests.

Why it exists

Test without real APIs or ledger. Every ctx.* method is a mock you can configure and assert on—no credentials in CI, no flake from external services.

How it makes life better

Use the mock context and you get fast, deterministic tests and clear assertions (e.g. "ctx.call.human was called with priority: high"). Bypass with integration-only tests and you get slow CI and hard-to-debug failures.

Setup

The testing utilities are in the @human/agent-sdk/testing sub-path. They use vitest mock functions internally and work with any vitest or jest setup.

Basic Agent Test

typescript
import { describe, it, expect, vi } from 'vitest';
import { createMockExecutionContext } from '@human/agent-sdk/testing';
import myAgent from './my-agent';
describe('my-agent', () => {
it('processes a question and returns an answer', async () => {
const ctx = createMockExecutionContext();
// Configure the LLM mock to return a specific response
ctx.llm.complete.mockResolvedValue({
content: 'The answer is 42.',
cost: { usd: 0.001 },
usage: { promptTokens: 50, completionTokens: 10, totalTokens: 60 },
model: 'gpt-4o',
finishReason: 'stop',
provenanceId: 'prov_test_1',
});
const result = await myAgent.execute(ctx, { question: 'What is the meaning of life?' });
expect(result.answer).toBe('The answer is 42.');
expect(ctx.llm.complete).toHaveBeenCalledOnce();
expect(ctx.llm.complete).toHaveBeenCalledWith(
expect.objectContaining({ prompt: 'What is the meaning of life?' })
);
});
});

Mock Context API

createMockExecutionContext() returns a mock ctx where every method is a vi.fn(). Configure return values, assert call counts, and inspect arguments using standard vitest/jest mock APIs.

typescript
import { createMockExecutionContext } from '@human/agent-sdk/testing';
const ctx = createMockExecutionContext({
// Optional overrides
executionId: 'exec_test_123',
agentId: 'did:agent:test-agent',
orgId: 'org_test',
});
// All ctx.* methods are vi.fn() mocks ready to spy on or configure:
ctx.llm.complete // mock function
ctx.llm.embed // mock function
ctx.llm.stream // mock async generator
ctx.call.agent // mock function
ctx.call.human // mock function
ctx.call.parallel // mock function
ctx.memory.execution.get // mock function
ctx.memory.execution.set // mock function
ctx.memory.persistent.search // mock function
ctx.provenance.log // mock function
ctx.secrets.get // mock function
ctx.prompts.load // mock function

Testing Human Escalation

Mock ctx.call.human to simulate human approval, rejection, or timeout — and verify your agent handles each case correctly.

typescript
it('escalates to human when invoice total mismatches', async () => {
const ctx = createMockExecutionContext();
// Simulate invoice extraction returning a mismatched total
ctx.llm.complete.mockResolvedValueOnce({
content: JSON.stringify({ total: 14200, line_items: [] }),
cost: { usd: 0.002 },
provenanceId: 'prov_1',
finishReason: 'stop',
model: 'gpt-4o',
usage: { promptTokens: 100, completionTokens: 50, totalTokens: 150 },
});
// Simulate human approving the mismatch
ctx.call.human.mockResolvedValue({
approved: true,
humanId: 'did:passport:alice',
reason: 'Vendor discount not reflected in original quote.',
metadata: {
decidedAt: Date.now(),
duration: 45000,
provenanceId: 'prov_human_1',
escalationId: 'esc_1',
},
});
const result = await invoiceProcessor.execute(ctx, {
invoicePath: '/test/invoice.pdf',
expectedTotal: 15000,
});
expect(result.success).toBe(true);
expect(ctx.call.human).toHaveBeenCalledOnce();
expect(ctx.call.human).toHaveBeenCalledWith(
expect.objectContaining({ priority: 'high' })
);
});

Testing Memory and Caching

typescript
it('caches LLM results in execution memory', async () => {
const ctx = createMockExecutionContext();
const cachedResult = { entities: ['Alice', 'Bob'] };
// Simulate cache hit
ctx.memory.execution.get.mockResolvedValue(cachedResult);
const result = await entityExtractor.execute(ctx, { text: 'Alice met Bob.' });
// Should read from cache, not call LLM
expect(ctx.memory.execution.get).toHaveBeenCalledWith('entities:Alice met Bob.');
expect(ctx.llm.complete).not.toHaveBeenCalled();
expect(result.entities).toEqual(['Alice', 'Bob']);
});

Testing Error Handling

typescript
import { AccessDeniedError } from '@human/agent-sdk';
it('handles missing secret gracefully', async () => {
const ctx = createMockExecutionContext();
// Simulate missing secret
ctx.secrets.get.mockRejectedValue(
new AccessDeniedError('No access to SLACK_BOT_TOKEN in delegation')
);
const result = await notificationAgent.execute(ctx, { message: 'hello' });
// Agent should degrade gracefully, not throw
expect(result.notified).toBe(false);
expect(result.reason).toContain('Slack not configured');
});

Tips

Test the execute function directly

Call myAgent.execute(ctx, input) — not the handler wrapper. This gives you direct control over ctx.

Use mockResolvedValueOnce for sequential calls

When an agent calls ctx.llm.complete() multiple times, use mockResolvedValueOnce() to return different values for each call in order.

Check provenance.log calls

Verify your agent logs the right business events: expect(ctx.provenance.log).toHaveBeenCalledWith(expect.objectContaining({ action: "invoice.validated" })).

Test delegation errors explicitly

Throw AccessDeniedError from mocks to verify your agent degrades gracefully when delegation scope is missing.

See Also