Business

receipt-ocr-processor

beginner

Extract structured data from receipt images via LLM vision.

APIs Used

ctx.filesctx.llmctx.credentials

Capabilities Required

finance/receipt/extract

What this demonstrates

1ctx.files.readText() for receipt images
2ctx.llm.complete() for OCR and structured data extraction
3ctx.credentials.get() to retrieve an external API key
4The simplest complete pattern: read → LLM extract → return output

Source

View on GitHub

typescript

/**
 * Receipt OCR Processor - Production Reference Agent
 *
 * Canon alignment: KB 105
 * Demonstrates: ctx.files, ctx.llm, ctx.credentials
 *
 * Real use case: Extract data from receipt images via LLM vision or OCR API.
 * Uses secrets for API keys.
 */

import { handler, withProvenanceContext } from '@human/agent-sdk';
import type { ExecutionContext } from '@human/agent-sdk';

export const AGENT_ID = 'receipt-ocr-processor';
export const VERSION = '1.0.0';
export const CAPABILITIES = ['finance/receipt/extract'];

export interface ReceiptOCRInput {
  /** Path to receipt image or text file */
  receipt_path: string;
}

export interface ReceiptOCROutput {
  success: boolean;
  merchant?: string;
  total: number;
  date?: string;
  items?: Array<{ description: string; amount: number }>;
  provenance_id: string;
}

const execute = async (
  ctx: ExecutionContext,
  input: ReceiptOCRInput
): Promise<ReceiptOCROutput> => {
  ctx.log.info('Processing receipt', { path: input.receipt_path });

  const content = await ctx.files.read(input.receipt_path);
  const isImage =
    input.receipt_path.endsWith('.png') ||
    input.receipt_path.endsWith('.jpg') ||
    input.receipt_path.endsWith('.jpeg');

  // Get OCR API credentials (ctx.credentials — progressive permission acquisition)
  const apiKey = await ctx.credentials.get('OCR_API_KEY');

  let textContent: string;
  if (isImage) {
    // In production, send to external OCR service with the API key.
    // The secret is fetched before the branch to demonstrate that
    // credential access is always capability-gated and logged.
    ctx.log.info('Processing image receipt via OCR', {
      size: content.length,
      hasApiKey: !!apiKey,
    });
    textContent = `[Image file, ${content.length} bytes. OCR service called with key ${apiKey ? 'present' : 'missing'}.]`;
  } else {
    textContent = content.toString('utf-8');
  }

  const result = await ctx.llm.complete({
    prompt: [
      {
        role: 'system',
        content: `Extract receipt data as JSON: { "merchant":"", "total":0, "date":"", "items": [{"description":"","amount":0}] }. Only return valid JSON.`,
      },
      {
        role: 'user',
        content: `Extract from receipt:\n\n${textContent}`,
      },
    ],
    temperature: 0.1,
    maxTokens: 1000,
  });

  let merchant: string | undefined;
  let total = 0;
  let date: string | undefined;
  let items: ReceiptOCROutput['items'] = [];

  try {
    const parsed = JSON.parse(result.content) as ReceiptOCROutput;
    merchant = parsed.merchant;
    total = typeof parsed.total === 'number' ? parsed.total : 0;
    date = parsed.date;
    items = parsed.items ?? [];
  } catch {
    // Fallback if LLM returns invalid JSON
    total = 0;
  }

  const provenanceId = await ctx.provenance.log(
    withProvenanceContext(ctx, {
      action: 'receipt:processed',
      status: 'success',
      input: { receipt_path: input.receipt_path },
      output: { merchant, total, item_count: items.length },
    })
  );

  return {
    success: true,
    merchant,
    total,
    date,
    items,
    provenance_id: provenanceId,
  };
};

export default handler({
  name: AGENT_ID,
  id: AGENT_ID,
  version: VERSION,
  capabilities: CAPABILITIES,
  manifest: {
    operations: [
      {
        name: 'extract',
        description: 'Extract merchant, total, date, and line items from a receipt image or text',
        paramsSchema: {
          receipt_path: { type: 'string', required: true, description: 'Path to receipt image or text file' },
        },
        resultKind: 'agent.receipt-ocr-processor.result',
      },
    ],
  },
  execute,
});

Run the tests

From monorepo root

$ pnpm test:agents:reference

$ pnpm test:agents:reference:verbose

The reference suite runs all 23 agents with createMockExecutionContext(), verifying every ctx.* API call and output shape.

receipt-ocr-processor

What this demonstrates

Source

Run the tests

See Also