Testing
Test agents in isolation with createMockExecutionContext. No infrastructure, no real LLM calls, no secrets required — just fast, deterministic unit tests.
Why it exists
Test without real APIs or ledger. Every ctx.* method is a mock you can configure and assert on—no credentials in CI, no flake from external services.
How it makes life better
Use the mock context and you get fast, deterministic tests and clear assertions (e.g. "ctx.call.human was called with priority: high"). Bypass with integration-only tests and you get slow CI and hard-to-debug failures.
Setup
The testing utilities are in the @human/agent-sdk/testing sub-path. They use vitest mock functions internally and work with any vitest or jest setup.
Basic Agent Test
import { describe, it, expect, vi } from 'vitest';import { createMockExecutionContext } from '@human/agent-sdk/testing';import myAgent from './my-agent';
describe('my-agent', () => { it('processes a question and returns an answer', async () => { const ctx = createMockExecutionContext();
// Configure the LLM mock to return a specific response ctx.llm.complete.mockResolvedValue({ content: 'The answer is 42.', cost: { usd: 0.001 }, usage: { promptTokens: 50, completionTokens: 10, totalTokens: 60 }, model: 'gpt-4o', finishReason: 'stop', provenanceId: 'prov_test_1', });
const result = await myAgent.execute(ctx, { question: 'What is the meaning of life?' });
expect(result.answer).toBe('The answer is 42.'); expect(ctx.llm.complete).toHaveBeenCalledOnce(); expect(ctx.llm.complete).toHaveBeenCalledWith( expect.objectContaining({ prompt: 'What is the meaning of life?' }) ); });});Mock Context API
createMockExecutionContext() returns a mock ctx where every method is a vi.fn(). Configure return values, assert call counts, and inspect arguments using standard vitest/jest mock APIs.
import { createMockExecutionContext } from '@human/agent-sdk/testing';
const ctx = createMockExecutionContext({ // Optional overrides executionId: 'exec_test_123', agentId: 'did:agent:test-agent', orgId: 'org_test',});
// All ctx.* methods are vi.fn() mocks ready to spy on or configure:ctx.llm.complete // mock functionctx.llm.embed // mock functionctx.llm.stream // mock async generatorctx.call.agent // mock functionctx.call.human // mock functionctx.call.parallel // mock functionctx.memory.execution.get // mock functionctx.memory.execution.set // mock functionctx.memory.persistent.search // mock functionctx.provenance.log // mock functionctx.secrets.get // mock functionctx.prompts.load // mock functionTesting Human Escalation
Mock ctx.call.human to simulate human approval, rejection, or timeout — and verify your agent handles each case correctly.
it('escalates to human when invoice total mismatches', async () => { const ctx = createMockExecutionContext();
// Simulate invoice extraction returning a mismatched total ctx.llm.complete.mockResolvedValueOnce({ content: JSON.stringify({ total: 14200, line_items: [] }), cost: { usd: 0.002 }, provenanceId: 'prov_1', finishReason: 'stop', model: 'gpt-4o', usage: { promptTokens: 100, completionTokens: 50, totalTokens: 150 }, });
// Simulate human approving the mismatch ctx.call.human.mockResolvedValue({ approved: true, humanId: 'did:passport:alice', reason: 'Vendor discount not reflected in original quote.', metadata: { decidedAt: Date.now(), duration: 45000, provenanceId: 'prov_human_1', escalationId: 'esc_1', }, });
const result = await invoiceProcessor.execute(ctx, { invoicePath: '/test/invoice.pdf', expectedTotal: 15000, });
expect(result.success).toBe(true); expect(ctx.call.human).toHaveBeenCalledOnce(); expect(ctx.call.human).toHaveBeenCalledWith( expect.objectContaining({ priority: 'high' }) );});Testing Memory and Caching
it('caches LLM results in execution memory', async () => { const ctx = createMockExecutionContext(); const cachedResult = { entities: ['Alice', 'Bob'] };
// Simulate cache hit ctx.memory.execution.get.mockResolvedValue(cachedResult);
const result = await entityExtractor.execute(ctx, { text: 'Alice met Bob.' });
// Should read from cache, not call LLM expect(ctx.memory.execution.get).toHaveBeenCalledWith('entities:Alice met Bob.'); expect(ctx.llm.complete).not.toHaveBeenCalled(); expect(result.entities).toEqual(['Alice', 'Bob']);});Testing Error Handling
import { AccessDeniedError } from '@human/agent-sdk';
it('handles missing secret gracefully', async () => { const ctx = createMockExecutionContext();
// Simulate missing secret ctx.secrets.get.mockRejectedValue( new AccessDeniedError('No access to SLACK_BOT_TOKEN in delegation') );
const result = await notificationAgent.execute(ctx, { message: 'hello' });
// Agent should degrade gracefully, not throw expect(result.notified).toBe(false); expect(result.reason).toContain('Slack not configured');});Tips
Test the execute function directly
Call myAgent.execute(ctx, input) — not the handler wrapper. This gives you direct control over ctx.
Use mockResolvedValueOnce for sequential calls
When an agent calls ctx.llm.complete() multiple times, use mockResolvedValueOnce() to return different values for each call in order.
Check provenance.log calls
Verify your agent logs the right business events: expect(ctx.provenance.log).toHaveBeenCalledWith(expect.objectContaining({ action: "invoice.validated" })).
Test delegation errors explicitly
Throw AccessDeniedError from mocks to verify your agent degrades gracefully when delegation scope is missing.