Pattern 3: UDOP Classification with Claude Extraction

This pattern implements an intelligent document processing workflow that uses UDOP (Unified Document Processing) for page classification and grouping, followed by Claude for information extraction.

Fine tuning a UDOP model
Architecture Overview
Testing
Best Practices

Fine tuning a UDOP model for classification

See Fine-Tuning Models on SageMaker

Once you have trained the model, deploy the GenAIIDP stack for Pattern-3 using the path for your new fine tuned model.

Architecture Overview

The workflow consists of three main processing steps:

OCR processing using Amazon Textract
Page classification and grouping using a UDOP model deployed on SageMaker
Field extraction using Claude via Amazon Bedrock

State Machine Workflow

The Step Functions state machine (workflow.asl.json) orchestrates the following flow:

OCRStep → ClassificationStep → ProcessPageGroups (Map State for Extraction)

Each step includes comprehensive retry logic for handling transient errors:

Initial retry after 2 seconds
Exponential backoff with rate of 2
Maximum of 10 retry attempts

Lambda Functions

OCR Function

Purpose: Processes input PDFs using Amazon Textract

Input:

{
  "execution_arn": "<ARN>",
  "output_bucket": "<BUCKET>",
  "input": {
    "detail": {
      "bucket": { "name": "<BUCKET>" },
      "object": { "key": "<KEY>" }
    }
  }
}

Output:

{
  "metadata": {
    "input_bucket": "<BUCKET>",
    "object_key": "<KEY>",
    "output_bucket": "<BUCKET>",
    "output_prefix": "<PREFIX>",
    "num_pages": "<NUMBER OF PAGES>"
  },
  "pages": {
    "<PAGE_NUMBER>": {
      "rawTextUri": "<S3_URI>",
      "parsedTextUri": "<S3_URI>",
      "imageUri": "<S3_URI>"
    }
  }
}

Classification Function

Purpose: Classifies pages using UDOP model on SageMaker, and segments into sections using class boundaries
Input: Output from OCR function plus output bucket

Output:

{
  "metadata": "<FROM_OCR>",
  "sections": [
    {
      "id": "<GROUP_ID>",
      "class": "<CLASS>",
      "pages": [...]
    }
  ]
}

Extraction Function

Purpose: Extracts fields using Claude via Amazon Bedrock
Input: Individual section from Classification output

Output:

{
  "section": {
    "id": "<ID>",
    "class": "<CLASS>",
    "page_ids": ["<PAGEID>", "..."],
    "outputJSONUri": "<S3_URI>"
  },
  "pages": [
    {
      "Id": "<ID>",
      "Class": "<CLASS>",
      "TextUri": "<S3_URI>",
      "ImageUri": "<S3_URI>"
    }
  ]
}

ProcessResults Function

Purpose: Aggregates results for all sections
Input: Extraction output from each section extraction

Output: Consumed by the GenAIIDP parent stack workflow tracker to update job status/UI etc

{
  "Sections": [
    {
      "Id": "<ID>",
      "PageIds": ["<PAGEID>", "..."],
      "Class": "<CLASS>",
      "OutputJSONUri": "<S3_URI>"
    }
  ],
  "Pages": [
    {
      "Id": "<ID>",
      "Class": "<CLASS>",
      "TextUri": "<S3_URI>",
      "ImageUri": "<S3_URI>"
    }
  ],
  "PageCount": "<TOTAL_PAGES>"
}

UDOP Model on SageMaker

The pattern includes a complete UDOP model deployment:

SageMaker Endpoint: sagemaker_classifier_endpoint.yaml provisions:
- SageMaker model
- Endpoint configuration
- Endpoint with auto-scaling
- IAM roles and permissions

To create a new UDOP model fine tuned for your data, see Fine tuning a UDOP model.

Monitoring and Metrics

The pattern includes a comprehensive CloudWatch dashboard with:

Performance Metrics

Document and page throughput (per minute)
Token usage (input/output/total)
Bedrock request statistics and latency
SageMaker endpoint invocation metrics
Processing latencies by function

Error Tracking

Lambda function errors with detailed logs
Long-running invocations with duration metrics
Classification/extraction failures
API throttling and retry metrics
SageMaker endpoint errors

Lambda Function Metrics

Duration (average, p90, maximum)
Memory usage and utilization
Error rates and exception counts
Concurrent execution metrics

Template Outputs

The pattern exports these outputs to the parent stack:

StateMachineName: Name of Step Functions state machine
StateMachineArn: ARN of Step Functions state machine
StateMachineLogGroup: CloudWatch log group for state machine
DashboardName: Name of pattern-specific dashboard
DashboardArn: ARN of pattern-specific dashboard

Configuration

Stack Deployment Parameters:

UDOPModelArtifactPath: S3 path to UDOP model artifacts (see Fine tuning a UDOP model)
Summarization: Control summarization via configuration file summarization.enabled property (replaces IsSummarizationEnabled parameter)
ConfigurationDefaultS3Uri: Optional S3 URI to custom configuration (uses default configuration if not specified)
MaxConcurrentWorkflows: Workflow concurrency limit
LogRetentionDays: CloudWatch log retention period
ExecutionTimeThresholdMs: Latency threshold for alerts

Configuration Management:

Model selection for extraction is now handled through configuration files rather than CloudFormation parameters
Configuration supports multiple presets per pattern (e.g., default, checkboxed_attributes_extraction, medical_records_summarization)
Configuration can be updated through the Web UI without stack redeployment
Extraction model choices are constrained through enum constraints in the configuration schema
UDOP classification model is still specified via CloudFormation parameter due to SageMaker endpoint requirements

Customizing Extraction

The system uses a combination of prompt engineering and predefined attributes to extract information from documents. You can customize both to match your specific document types and extraction needs.

Extraction Prompts

The extraction prompts are now defined in the configuration files rather than CloudFormation template. The structure is similar to the one shown below:

DEFAULT_SYSTEM_PROMPT = "You are a document assistant. Respond only with JSON..."
BASELINE_PROMPT = """
<background>
You are an expert in business document analysis and information extraction. You can understand and extract key information from various types of business documents including letters, memos, financial documents, scientific papers, news articles, advertisements, emails, forms, handwritten notes, invoices, purchase orders, questionnaires, resumes, scientific publications, and specifications...
</background>
...

To modify the extraction behavior:

Modify the configuration settings through the Web UI or configuration files
Edit the system_prompt to change the AI assistant's basic behavior
Customize the task_prompt to:
- Provide domain expertise for your document types
- Add specific instructions for handling edge cases
- Modify output formatting requirements

Model Selection

Extraction model selection is now handled through configuration files with enum constraints
Available models are defined in the configuration schema
Changes can be made through the Web UI without redeployment
Classification model (UDOP) is still specified via CloudFormation parameter due to SageMaker endpoint requirements

Extraction Attributes

Attributes to be extracted are defined in the configuration files' classes section using JSON Schema format. The structure is similar to the example below:

Example attribute definition:

classes:
  - $schema: "https://json-schema.org/draft/2020-12/schema"
    $id: Letter
    x-aws-idp-document-type: Letter
    type: object
    description: A formal written message that is typically sent from one person to another
    properties:
      SenderName:
        type: string
        description: The name of the person or entity who wrote or sent the letter. Look for text following or near terms like 'from', 'sender', 'authored by', 'written by', or at the end of the letter before a signature.
      SenderAddress:
        type: string
        description: The physical address of the sender, typically appearing at the top of the letter. May be labeled as 'address', 'location', or 'from address'.
      RecipientName:
        type: string
        description: The name of the person or entity receiving the letter. Look for this after 'to', 'recipient', 'addressee', or at the beginning of the letter.
      RecipientAddress:
        type: string
        description: The physical address where the letter is to be delivered. Often labeled as 'to address' or 'delivery address', typically appearing below the recipient name.
  - $schema: "https://json-schema.org/draft/2020-12/schema"
    $id: Form
    x-aws-idp-document-type: Form
    type: object
    description: A document with blank spaces for filling in information
    properties:
      FormType:
        type: string
        description: The category or purpose of the form, such as 'application', 'registration', 'request', etc. May be identified by 'form name', 'document type', or 'form category'.
      FormId:
        type: string
        description: The unique identifier for the form, typically a number or alphanumeric code. Often labeled as 'form number', 'id', or 'reference number'.

To customize attributes:

Modify the classes section in the configuration files or through the Web UI
For each attribute, provide a clear name and detailed description
Changes are applied immediately without requiring function redeployment

Note: Configuration changes through the Web UI take effect immediately for new document processing jobs.

Testing

Use the provided test events in the testing/ directory:

# Test OCR function
sam local invoke OCRFunction --env-vars testing/env.json -e testing/OCRFunction-event.json

# Test classification
sam local invoke ClassificationFunction --env-vars testing/env.json -e testing/ClassificationFunctionEvent.json

# Test extraction
sam local invoke ExtractionFunction --env-vars testing/env.json -e testing/ExtractionFunction-event.json

Note that for proper testing of the classification function, you'll need access to a running SageMaker endpoint with the UDOP model deployed.

Best Practices

Configuration Management:
- Use the configuration library for different use cases (default, medical_records, etc.)
- Test configuration changes thoroughly before production deployment
- Leverage the Web UI for extraction configuration updates without redeployment
SageMaker Endpoint Management:
- Configure appropriate instance type based on model size and throughput requirements
- Enable auto-scaling for cost optimization during varying loads
- Monitor endpoint performance metrics for potential bottlenecks
Retry Handling:
- All functions implement exponential backoff with jitter
- Configure appropriate retry limits for different types of failures
- Handle SageMaker endpoint throttling differently from transient failures
Performance Optimization:
- Pre-warm the SageMaker endpoint with sample requests during initialization
- Configure appropriate memory for Lambda functions based on document size
- Use efficient image preprocessing and data handling techniques
Monitoring:
- Set up CloudWatch alarms for critical metrics
- Use detailed logging with correlation IDs for request tracking
- Monitor token usage and cost metrics for GenAI components
Security:
- Use KMS encryption for data at rest and in transit
- Implement least privilege IAM roles for all components
- Protect SageMaker endpoints with appropriate security groups
Error Handling:
- Implement graceful degradation strategies
- Provide clear error messages with actionable information
- Track error rates by category for targeted improvements

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pattern 3: UDOP Classification with Claude Extraction

Table of Contents

Fine tuning a UDOP model for classification

Architecture Overview

State Machine Workflow

Lambda Functions

OCR Function

Classification Function

Extraction Function

ProcessResults Function

UDOP Model on SageMaker

Monitoring and Metrics

Performance Metrics

Error Tracking

Lambda Function Metrics

Template Outputs

Configuration

Customizing Extraction

Extraction Prompts

Model Selection

Extraction Attributes

Testing

Best Practices

FilesExpand file tree

pattern-3.md

Latest commit

History

pattern-3.md

File metadata and controls

Pattern 3: UDOP Classification with Claude Extraction

Table of Contents

Fine tuning a UDOP model for classification

Architecture Overview

State Machine Workflow

Lambda Functions

OCR Function

Classification Function

Extraction Function

ProcessResults Function

UDOP Model on SageMaker

Monitoring and Metrics

Performance Metrics

Error Tracking

Lambda Function Metrics

Template Outputs

Configuration

Customizing Extraction

Extraction Prompts

Model Selection

Extraction Attributes

Testing

Best Practices