Reference Implementations

Business

receipt-ocr-processor

beginner

Extract structured data from receipt images via LLM vision.

APIs Used

ctx.filesctx.llmctx.credentials

Capabilities Required

finance/receipt/extract

What this demonstrates

  • 1ctx.files.readText() for receipt images
  • 2ctx.llm.complete() for OCR and structured data extraction
  • 3ctx.credentials.get() to retrieve an external API key
  • 4The simplest complete pattern: read → LLM extract → return output
typescript
/**
* Receipt OCR Processor - Production Reference Agent
*
* Canon alignment: KB 105
* Demonstrates: ctx.files, ctx.llm, ctx.credentials
*
* Real use case: Extract data from receipt images via LLM vision or OCR API.
* Uses secrets for API keys.
*/
import { handler, withProvenanceContext } from '@human/agent-sdk';
import type { ExecutionContext } from '@human/agent-sdk';
export const AGENT_ID = 'receipt-ocr-processor';
export const VERSION = '1.0.0';
export const CAPABILITIES = ['finance/receipt/extract'];
export interface ReceiptOCRInput {
/** Path to receipt image or text file */
receipt_path: string;
}
export interface ReceiptOCROutput {
success: boolean;
merchant?: string;
total: number;
date?: string;
items?: Array<{ description: string; amount: number }>;
provenance_id: string;
}
const execute = async (
ctx: ExecutionContext,
input: ReceiptOCRInput
): Promise<ReceiptOCROutput> => {
ctx.log.info('Processing receipt', { path: input.receipt_path });
const content = await ctx.files.read(input.receipt_path);
const isImage =
input.receipt_path.endsWith('.png') ||
input.receipt_path.endsWith('.jpg') ||
input.receipt_path.endsWith('.jpeg');
// Get OCR API credentials (ctx.credentials — progressive permission acquisition)
const apiKey = await ctx.credentials.get('OCR_API_KEY');
let textContent: string;
if (isImage) {
// In production, send to external OCR service with the API key.
// The secret is fetched before the branch to demonstrate that
// credential access is always capability-gated and logged.
ctx.log.info('Processing image receipt via OCR', {
size: content.length,
hasApiKey: !!apiKey,
});
textContent = `[Image file, ${content.length} bytes. OCR service called with key ${apiKey ? 'present' : 'missing'}.]`;
} else {
textContent = content.toString('utf-8');
}
const result = await ctx.llm.complete({
prompt: [
{
role: 'system',
content: `Extract receipt data as JSON: { "merchant":"", "total":0, "date":"", "items": [{"description":"","amount":0}] }. Only return valid JSON.`,
},
{
role: 'user',
content: `Extract from receipt:\n\n${textContent}`,
},
],
temperature: 0.1,
maxTokens: 1000,
});
let merchant: string | undefined;
let total = 0;
let date: string | undefined;
let items: ReceiptOCROutput['items'] = [];
try {
const parsed = JSON.parse(result.content) as ReceiptOCROutput;
merchant = parsed.merchant;
total = typeof parsed.total === 'number' ? parsed.total : 0;
date = parsed.date;
items = parsed.items ?? [];
} catch {
// Fallback if LLM returns invalid JSON
total = 0;
}
const provenanceId = await ctx.provenance.log(
withProvenanceContext(ctx, {
action: 'receipt:processed',
status: 'success',
input: { receipt_path: input.receipt_path },
output: { merchant, total, item_count: items.length },
})
);
return {
success: true,
merchant,
total,
date,
items,
provenance_id: provenanceId,
};
};
export default handler({
name: AGENT_ID,
id: AGENT_ID,
version: VERSION,
capabilities: CAPABILITIES,
manifest: {
operations: [
{
name: 'extract',
description: 'Extract merchant, total, date, and line items from a receipt image or text',
paramsSchema: {
receipt_path: { type: 'string', required: true, description: 'Path to receipt image or text file' },
},
resultKind: 'agent.receipt-ocr-processor.result',
},
],
},
execute,
});

Run the tests

From monorepo root

$ pnpm test:agents:reference

$ pnpm test:agents:reference:verbose

The reference suite runs all 23 agents with createMockExecutionContext(), verifying every ctx.* API call and output shape.

See Also