Testing

Test agents in isolation with createMockExecutionContext. No infrastructure, no real LLM calls, no secrets required — just fast, deterministic unit tests.

Why it exists

Test without real APIs or ledger. Every ctx.* method is a mock you can configure and assert on—no credentials in CI, no flake from external services.

How it makes life better

Use the mock context and you get fast, deterministic tests and clear assertions (e.g. "ctx.escalate was called with priority: high"). Bypass with integration-only tests and you get slow CI and hard-to-debug failures.

Setup

The testing utilities are in the @human/agent-sdk/testing sub-path. They use vitest mock functions internally and work with any vitest or jest setup.

Basic Agent Test

typescript

import { describe, it, expect, vi } from 'vitest';
import { createMockExecutionContext } from '@human/agent-sdk/testing';
import myAgent from './my-agent';

describe('my-agent', () => {
  it('processes a question and returns an answer', async () => {
    const ctx = createMockExecutionContext();

    // Configure the LLM mock to return a specific response
    ctx.llm.complete.mockResolvedValue({
      content: 'The answer is 42.',
      cost: { usd: 0.001 },
      usage: { promptTokens: 50, completionTokens: 10, totalTokens: 60 },
      model: 'gpt-4o',
      finishReason: 'stop',
      provenanceId: 'prov_test_1',
    });

    const result = await myAgent.execute(ctx, { question: 'What is the meaning of life?' });

    expect(result.answer).toBe('The answer is 42.');
    expect(ctx.llm.complete).toHaveBeenCalledOnce();
    expect(ctx.llm.complete).toHaveBeenCalledWith(
      expect.objectContaining({ prompt: 'What is the meaning of life?' })
    );
  });
});

Mock Context API

createMockExecutionContext() returns a mock ctx where every method is a vi.fn(). Configure return values, assert call counts, and inspect arguments using standard vitest/jest mock APIs.

typescript

import { createMockExecutionContext } from '@human/agent-sdk/testing';

const ctx = createMockExecutionContext({
  // Optional overrides
  executionId: 'exec_test_123',
  agentId: 'did:agent:test-agent',
  orgId: 'org_test',
});

// All ctx.* methods are vi.fn() mocks ready to spy on or configure:
ctx.llm.complete       // mock function
ctx.llm.embed          // mock function
ctx.llm.stream         // mock async generator
ctx.call.agent         // mock function
ctx.escalate           // stub — use vi.spyOn(ctx, 'escalate') for mockResolvedValue / assertions
ctx.call.parallel      // mock function
ctx.memory.execution.get   // mock function
ctx.memory.execution.set   // mock function
ctx.memory.persistent.search  // mock function
ctx.provenance.log     // mock function
ctx.config.get         // mock function
ctx.credentials.get    // mock function
ctx.prompts.load       // mock function

Testing Human Escalation

Mock ctx.escalate to simulate approval, rejection, or timeout — and verify your agent handles each case correctly.

typescript

import { vi } from 'vitest';

it('escalates to human when invoice total mismatches', async () => {
  const ctx = createMockExecutionContext();

  // Simulate invoice extraction returning a mismatched total
  ctx.llm.complete.mockResolvedValueOnce({
    content: JSON.stringify({ total: 14200, line_items: [] }),
    cost: { usd: 0.002 },
    provenanceId: 'prov_1',
    finishReason: 'stop',
    model: 'gpt-4o',
    usage: { promptTokens: 100, completionTokens: 50, totalTokens: 150 },
  });

  // Simulate human approving the mismatch
  vi.spyOn(ctx, 'escalate').mockResolvedValue({
    approved: true,
    humanId: 'did:passport:alice',
    reason: 'Vendor discount not reflected in original quote.',
    metadata: {
      decidedAt: Date.now(),
      duration: 45000,
      provenanceId: 'prov_human_1',
      escalationId: 'esc_1',
    },
  } as any);

  const result = await invoiceProcessor.execute(ctx, {
    invoicePath: '/test/invoice.pdf',
    expectedTotal: 15000,
  });

  expect(result.success).toBe(true);
  expect(ctx.escalate).toHaveBeenCalledOnce();
  expect(ctx.escalate).toHaveBeenCalledWith(
    expect.objectContaining({ priority: 'high' })
  );
});

Testing Memory and Caching

typescript

it('caches LLM results in execution memory', async () => {
  const ctx = createMockExecutionContext();
  const cachedResult = { entities: ['Alice', 'Bob'] };

  // Simulate cache hit
  ctx.memory.execution.get.mockResolvedValue(cachedResult);

  const result = await entityExtractor.execute(ctx, { text: 'Alice met Bob.' });

  // Should read from cache, not call LLM
  expect(ctx.memory.execution.get).toHaveBeenCalledWith('entities:Alice met Bob.');
  expect(ctx.llm.complete).not.toHaveBeenCalled();
  expect(result.entities).toEqual(['Alice', 'Bob']);
});

Testing Error Handling

typescript

import { AccessDeniedError } from '@human/agent-sdk';
import { vi } from 'vitest';

it('handles missing secret gracefully', async () => {
  const ctx = createMockExecutionContext();

  // Simulate missing credential
  vi.spyOn(ctx.credentials, 'get').mockRejectedValue(
    new AccessDeniedError('No access to SLACK_BOT_TOKEN in delegation')
  );

  const result = await notificationAgent.execute(ctx, { message: 'hello' });

  // Agent should degrade gracefully, not throw
  expect(result.notified).toBe(false);
  expect(result.reason).toContain('Slack not configured');
});

Tips

Test the execute function directly

Call myAgent.execute(ctx, input) — not the handler wrapper. This gives you direct control over ctx.

Use mockResolvedValueOnce for sequential calls

When an agent calls ctx.llm.complete() multiple times, use mockResolvedValueOnce() to return different values for each call in order.

Check provenance.log calls

Verify your agent logs the right business events: expect(ctx.provenance.log).toHaveBeenCalledWith(expect.objectContaining({ action: "invoice.validated" })).

Test delegation errors explicitly

Throw AccessDeniedError from mocks to verify your agent degrades gracefully when delegation scope is missing.

Testing

Setup

Basic Agent Test

Mock Context API

Testing Human Escalation

Testing Memory and Caching

Testing Error Handling

Tips

See Also