Reference Implementationsbeginner
Business
receipt-ocr-processor
Extract structured data from receipt images via LLM vision.
APIs Used
ctx.filesctx.llmctx.credentialsCapabilities Required
finance/receipt/extractWhat this demonstrates
- 1ctx.files.readText() for receipt images
- 2ctx.llm.complete() for OCR and structured data extraction
- 3ctx.credentials.get() to retrieve an external API key
- 4The simplest complete pattern: read → LLM extract → return output
Source
View on GitHubtypescript
/** * Receipt OCR Processor - Production Reference Agent * * Canon alignment: KB 105 * Demonstrates: ctx.files, ctx.llm, ctx.credentials * * Real use case: Extract data from receipt images via LLM vision or OCR API. * Uses secrets for API keys. */
import { handler, withProvenanceContext } from '@human/agent-sdk';import type { ExecutionContext } from '@human/agent-sdk';
export const AGENT_ID = 'receipt-ocr-processor';export const VERSION = '1.0.0';export const CAPABILITIES = ['finance/receipt/extract'];
export interface ReceiptOCRInput { /** Path to receipt image or text file */ receipt_path: string;}
export interface ReceiptOCROutput { success: boolean; merchant?: string; total: number; date?: string; items?: Array<{ description: string; amount: number }>; provenance_id: string;}
const execute = async ( ctx: ExecutionContext, input: ReceiptOCRInput): Promise<ReceiptOCROutput> => { ctx.log.info('Processing receipt', { path: input.receipt_path });
const content = await ctx.files.read(input.receipt_path); const isImage = input.receipt_path.endsWith('.png') || input.receipt_path.endsWith('.jpg') || input.receipt_path.endsWith('.jpeg');
// Get OCR API credentials (ctx.credentials — progressive permission acquisition) const apiKey = await ctx.credentials.get('OCR_API_KEY');
let textContent: string; if (isImage) { // In production, send to external OCR service with the API key. // The secret is fetched before the branch to demonstrate that // credential access is always capability-gated and logged. ctx.log.info('Processing image receipt via OCR', { size: content.length, hasApiKey: !!apiKey, }); textContent = `[Image file, ${content.length} bytes. OCR service called with key ${apiKey ? 'present' : 'missing'}.]`; } else { textContent = content.toString('utf-8'); }
const result = await ctx.llm.complete({ prompt: [ { role: 'system', content: `Extract receipt data as JSON: { "merchant":"", "total":0, "date":"", "items": [{"description":"","amount":0}] }. Only return valid JSON.`, }, { role: 'user', content: `Extract from receipt:\n\n${textContent}`, }, ], temperature: 0.1, maxTokens: 1000, });
let merchant: string | undefined; let total = 0; let date: string | undefined; let items: ReceiptOCROutput['items'] = [];
try { const parsed = JSON.parse(result.content) as ReceiptOCROutput; merchant = parsed.merchant; total = typeof parsed.total === 'number' ? parsed.total : 0; date = parsed.date; items = parsed.items ?? []; } catch { // Fallback if LLM returns invalid JSON total = 0; }
const provenanceId = await ctx.provenance.log( withProvenanceContext(ctx, { action: 'receipt:processed', status: 'success', input: { receipt_path: input.receipt_path }, output: { merchant, total, item_count: items.length }, }) );
return { success: true, merchant, total, date, items, provenance_id: provenanceId, };};
export default handler({ name: AGENT_ID, id: AGENT_ID, version: VERSION, capabilities: CAPABILITIES, manifest: { operations: [ { name: 'extract', description: 'Extract merchant, total, date, and line items from a receipt image or text', paramsSchema: { receipt_path: { type: 'string', required: true, description: 'Path to receipt image or text file' }, }, resultKind: 'agent.receipt-ocr-processor.result', }, ], }, execute,});Run the tests
From monorepo root
$ pnpm test:agents:reference
$ pnpm test:agents:reference:verbose
The reference suite runs all 23 agents with createMockExecutionContext(), verifying every ctx.* API call and output shape.
See Also
SDK Reference