| title | JSON Schema Migration Guide |
|---|
Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. SPDX-License-Identifier: MIT-0
Starting with version 0.3.21, the GenAI IDP solution uses JSON Schema format for document class definitions instead of the legacy custom format. This provides:
- ✅ Industry standard format with broad tooling support
- ✅ Better validation using standard JSON Schema validators
- ✅ Improved documentation through self-describing schemas
- ✅ Backward compatibility - automatic migration of legacy configurations
jsonconfig.mp4
classes:
- name: Payslip
description: An employee wage statement
attributes:
- name: YTDNetPay
description: Year-to-date net pay amount
attributeType: simple
evaluation_method: NUMERIC_EXACT
- name: CompanyAddress
description: Complete business address
attributeType: group
evaluation_method: LLM
groupAttributes:
- name: Street
description: Street address
- name: City
description: City name
- name: Deductions
description: List of deductions
attributeType: list
listItemTemplate:
itemDescription: A single deduction
itemAttributes:
- name: Type
description: Deduction type
- name: Amount
description: Deduction amountclasses:
- $schema: "https://json-schema.org/draft/2020-12/schema"
$id: Payslip
x-aws-idp-document-type: Payslip
type: object
description: An employee wage statement
properties:
YTDNetPay:
type: string
description: Year-to-date net pay amount
x-aws-idp-evaluation-method: NUMERIC_EXACT
CompanyAddress:
type: object
description: Complete business address
x-aws-idp-evaluation-method: LLM
properties:
Street:
type: string
description: Street address
City:
type: string
description: City name
Deductions:
type: array
description: List of deductions
x-aws-idp-list-item-description: A single deduction
items:
type: object
properties:
Type:
type: string
description: Deduction type
Amount:
type: string
description: Deduction amount| Legacy Field | JSON Schema Field | Notes |
|---|---|---|
name |
$id and x-aws-idp-document-type |
Document class name |
description |
description |
Same field name |
attributes |
properties |
List → Object |
attributeType: simple |
type: string |
Simple values are strings |
attributeType: group |
type: object with properties |
Nested object |
attributeType: list |
type: array with items |
Array of items |
groupAttributes |
properties (nested) |
Object properties |
listItemTemplate |
items |
Array item schema |
itemAttributes |
items.properties |
Properties of array items |
itemDescription |
x-aws-idp-list-item-description |
AWS IDP extension |
evaluation_method |
x-aws-idp-evaluation-method |
AWS IDP extension |
confidence_threshold |
x-aws-idp-confidence-threshold |
AWS IDP extension |
prompt_override |
x-aws-idp-prompt-override |
AWS IDP extension |
| Legacy Type | JSON Schema Type |
|---|---|
attributeType: simple |
type: string |
attributeType: group |
type: object |
attributeType: list |
type: array |
The solution automatically migrates legacy configurations to JSON Schema format:
- First read after upgrade - When configuration is loaded from DynamoDB
- Automatic persistence - Migrated format is saved back to DynamoDB
- One-time process - Subsequent reads use JSON Schema format directly
- ✅ Non-destructive - Legacy data is preserved during migration
- ✅ Idempotent - Won't re-migrate already migrated data
- ✅ Transparent - Happens automatically without user intervention
- ✅ Logged - Migration activity logged to CloudWatch
Check Lambda logs to verify migration:
aws logs tail /aws/lambda/<STACK>-ConfigurationResolverFunction-<ID> \
--region <REGION> --followLook for:
Migrating 6 legacy classes to JSON Schema format
Successfully migrated classes to JSON Schema format
JSON Schema is extended with custom AWS IDP fields:
x-aws-idp-document-type- Marks a schema as a document type (value is the document class name)
x-aws-idp-evaluation-method- Evaluation method for attribute comparison- Valid values:
EXACT,NUMERIC_EXACT,FUZZY,SEMANTIC,LLM
- Valid values:
x-aws-idp-confidence-threshold- Confidence threshold (0.0 to 1.0)x-aws-idp-prompt-override- Custom prompt for attribute extraction
x-aws-idp-list-item-description- Description for array itemsx-aws-idp-original-name- Preserved original attribute name from legacy format
x-aws-idp-class-prompt- Classification prompt for examplex-aws-idp-attributes-prompt- Extraction prompt for examplex-aws-idp-image-path- Path to example image
The web UI provides two ways to create/edit document schemas:
-
Schema Builder - Visual editor with drag-and-drop interface
- Navigate to Configuration → Document Schema tab
- Click "Schema Builder" view
- Click "Add Class" to choose between:
- Custom Class — define your own class with custom fields
- Standard Class — import from 35+ pre-built document types (Invoice, Receipt, W-2, Bank Statement, Payslip, Driver License, Passport, tax forms, insurance cards, certificates, and more) derived from AWS BDA standard blueprints. Imported classes are fully editable.
- Add/edit document types and properties visually
-
JSON View - Direct JSON editing with validation
- Navigate to Configuration → JSON View
- Edit the
classesarray directly - Validation happens in real-time
When creating configurations manually, use JSON Schema format:
classes:
- $schema: "https://json-schema.org/draft/2020-12/schema"
$id: MyDocument
x-aws-idp-document-type: MyDocument
type: object
description: Document description here
properties:
FieldName:
type: string
description: Field description
x-aws-idp-evaluation-method: EXACTFind JSON Schema templates in:
config_library/unified/- Pattern 2 examples (Bedrock)
The solution supports these evaluation methods:
EXACT- Exact string matchNUMERIC_EXACT- Exact numeric match (handles different number formats)FUZZY- Fuzzy string matching (Levenshtein distance)SEMANTIC- Semantic similarity using embeddings
LLM- LLM-based evaluation for complex/contextual comparisons- Useful for address blocks, multi-field groups
- Higher cost but more flexible
- Requires evaluation configuration with LLM model
Good:
properties:
InvoiceDate:
type: string
description: Date when invoice was issuedAvoid:
properties:
Date: # Too generic
type: stringproperties:
TotalAmount:
type: string # Store as string for exact extraction
description: Total invoice amount including taxes
x-aws-idp-evaluation-method: NUMERIC_EXACT
Email:
type: string
format: email # Standard JSON Schema format
description: Customer email address
Status:
type: string
enum: [PAID, PENDING, OVERDUE] # Constrain values
description: Payment statusFor nested data like addresses:
properties:
ShippingAddress:
type: object
description: Complete shipping address
x-aws-idp-evaluation-method: LLM # Use LLM for complex structures
properties:
Street:
type: string
City:
type: string
State:
type: string
ZipCode:
type: stringFor line items, deductions, etc.:
properties:
LineItems:
type: array
description: Invoice line items
x-aws-idp-list-item-description: A single line item
items:
type: object
properties:
Description:
type: string
Quantity:
type: string
UnitPrice:
type: string
Total:
type: string
x-aws-idp-evaluation-method: NUMERIC_EXACTSymptoms:
- UI displays
attributesarray instead ofpropertiesobject - Configuration tab is blank
Solution:
- Refresh browser cache (hard refresh: Ctrl+Shift+R or Cmd+Shift+R)
- Check Lambda logs for migration errors
- Verify Lambda has latest code with migration support
Symptoms:
- Error: "Invalid evaluation_method 'LLM'"
Solution:
- Ensure using version 0.3.21 or later
- Check
lib/idp_common_pkg/idp_common/config/schema_constants.pyincludesEVALUATION_METHOD_LLM
Symptoms:
- Legacy format still in DynamoDB after upgrade
- No migration logs in CloudWatch
Solution:
- Verify Lambda has
requirements.txtwith./lib/idp_common_pkg - Check Lambda includes
ConfigurationManagercode - Trigger migration manually via UI configuration load