Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
416 changes: 222 additions & 194 deletions config_library/pattern-2/bank-statement-sample/config.yaml

Large diffs are not rendered by default.

2,075 changes: 1,174 additions & 901 deletions config_library/pattern-2/lending-package-sample/config.yaml

Large diffs are not rendered by default.

Large diffs are not rendered by default.

1,326 changes: 891 additions & 435 deletions config_library/pattern-2/rvl-cdip-package-sample/config.yaml

Large diffs are not rendered by default.

1,314 changes: 885 additions & 429 deletions config_library/pattern-3/rvl-cdip-package-sample/config.yaml

Large diffs are not rendered by default.

79 changes: 46 additions & 33 deletions docs/assessment.md
Original file line number Diff line number Diff line change
Expand Up @@ -318,9 +318,9 @@ For basic single-value extractions like dates, amounts, or names:

**Configuration:**
```yaml
attributes:
- name: "StatementDate"
attributeType: "simple"
properties:
StatementDate:
type: string
description: "The date of the bank statement"
```

Expand Down Expand Up @@ -360,14 +360,16 @@ For nested object structures with multiple related fields:

**Configuration:**
```yaml
attributes:
- name: "AccountDetails"
attributeType: "group"
properties:
AccountDetails:
type: object
description: "Bank account information"
groupAttributes:
- name: "AccountNumber"
properties:
AccountNumber:
type: string
description: "The account number"
- name: "RoutingNumber"
RoutingNumber:
type: string
description: "The bank routing number"
```

Expand Down Expand Up @@ -413,18 +415,22 @@ For arrays of items, such as transactions in a bank statement:

**Configuration:**
```yaml
attributes:
- name: "Transactions"
attributeType: "list"
properties:
Transactions:
type: array
description: "List of all transactions on the statement"
listItemTemplate:
itemDescription: "Individual transaction entry"
itemAttributes:
- name: "Date"
x-aws-idp-list-item-description: "Individual transaction entry"
items:
type: object
properties:
Date:
type: string
description: "Transaction date"
- name: "Description"
Description:
type: string
description: "Transaction description"
- name: "Amount"
Amount:
type: string
description: "Transaction amount"
```

Expand Down Expand Up @@ -979,27 +985,34 @@ attributes:
Processes complex nested structures as single units:
```yaml
# Each group becomes one focused task
attributes:
- name: "AccountDetails"
attributeType: "group"
groupAttributes:
- name: "AccountNumber"
- name: "RoutingNumber"
- name: "AccountType"
properties:
AccountDetails:
type: object
properties:
AccountNumber:
type: string
RoutingNumber:
type: string
AccountType:
type: string
```

#### List Item Tasks
Assesses each list item individually for maximum accuracy:
```yaml
# 100 transactions = 100 individual assessment tasks
attributes:
- name: "Transactions"
attributeType: "list"
listItemTemplate:
itemAttributes:
- name: "Date"
- name: "Description"
- name: "Amount"
properties:
Transactions:
type: array
items:
type: object
properties:
Date:
type: string
Description:
type: string
Amount:
type: string
```

### Performance Tuning
Expand Down
46 changes: 29 additions & 17 deletions docs/classification.md
Original file line number Diff line number Diff line change
Expand Up @@ -602,13 +602,16 @@ When you want all pages of a document to be classified as the same class, you ca

```yaml
classes:
- name: Payslip
- $schema: "https://json-schema.org/draft/2020-12/schema"
$id: Payslip
x-aws-idp-document-type: Payslip
type: object
description: "Employee wage statement showing earnings and deductions"
document_name_regex: "(?i).*(payslip|paystub|salary|wage).*"
attributes:
- name: EmployeeName
x-aws-idp-document-name-regex: "(?i).*(payslip|paystub|salary|wage).*"
properties:
EmployeeName:
type: string
description: "Name of the employee"
attributeType: simple
```

**Benefits:**
Expand All @@ -632,24 +635,33 @@ classification:
classificationMethod: multimodalPageLevelClassification

classes:
- name: Invoice
- $schema: "https://json-schema.org/draft/2020-12/schema"
$id: Invoice
x-aws-idp-document-type: Invoice
type: object
description: "Business invoice document"
document_page_content_regex: "(?i)(invoice\\s+number|bill\\s+to|amount\\s+due)"
attributes:
- name: InvoiceNumber
x-aws-idp-document-page-content-regex: "(?i)(invoice\\s+number|bill\\s+to|amount\\s+due)"
properties:
InvoiceNumber:
type: string
description: "Invoice number"
attributeType: simple
- name: Payslip
- $schema: "https://json-schema.org/draft/2020-12/schema"
$id: Payslip
x-aws-idp-document-type: Payslip
type: object
description: "Employee wage statement"
document_page_content_regex: "(?i)(gross\\s+pay|net\\s+pay|employee\\s+id)"
attributes:
- name: EmployeeName
x-aws-idp-document-page-content-regex: "(?i)(gross\\s+pay|net\\s+pay|employee\\s+id)"
properties:
EmployeeName:
type: string
description: "Employee name"
attributeType: simple
- name: Other
- $schema: "https://json-schema.org/draft/2020-12/schema"
$id: Other
x-aws-idp-document-type: Other
type: object
description: "Documents that don't match specific patterns"
# No regex - will always use LLM
attributes: []
properties: {}
```

**Benefits:**
Expand Down
97 changes: 58 additions & 39 deletions docs/evaluation.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,19 +121,24 @@ Basic single-value extractions evaluated as individual fields:

```yaml
classes:
- name: invoice
attributes:
- name: invoice_number
- $schema: "https://json-schema.org/draft/2020-12/schema"
$id: invoice
x-aws-idp-document-type: invoice
type: object
properties:
invoice_number:
type: string
description: The unique identifier for the invoice
attributeType: simple # or omit for default
evaluation_method: EXACT # Use exact string matching
- name: amount_due
x-aws-idp-evaluation-method: EXACT # Use exact string matching
amount_due:
type: string
description: The total amount to be paid
evaluation_method: NUMERIC_EXACT # Use numeric comparison
- name: vendor_name
x-aws-idp-evaluation-method: NUMERIC_EXACT # Use numeric comparison
vendor_name:
type: string
description: Name of the vendor
evaluation_method: FUZZY # Use fuzzy matching
evaluation_threshold: 0.8 # Minimum similarity threshold
x-aws-idp-evaluation-method: FUZZY # Use fuzzy matching
x-aws-idp-confidence-threshold: 0.8 # Minimum similarity threshold
```

### Group Attributes
Expand All @@ -142,30 +147,38 @@ Nested object structures where each sub-attribute is evaluated individually:

```yaml
classes:
- name: "Bank Statement"
attributes:
- name: "Account Holder Address"
- $schema: "https://json-schema.org/draft/2020-12/schema"
$id: BankStatement
x-aws-idp-document-type: "Bank Statement"
type: object
properties:
Account Holder Address:
type: object
description: "Complete address information for the account holder"
attributeType: group
groupAttributes:
- name: "Street Number"
properties:
Street Number:
type: string
description: "House or building number"
evaluation_method: FUZZY
evaluation_threshold: 0.9
- name: "Street Name"
x-aws-idp-evaluation-method: FUZZY
x-aws-idp-confidence-threshold: 0.9
Street Name:
type: string
description: "Name of the street"
evaluation_method: FUZZY
evaluation_threshold: 0.8
- name: "City"
x-aws-idp-evaluation-method: FUZZY
x-aws-idp-confidence-threshold: 0.8
City:
type: string
description: "City name"
evaluation_method: FUZZY
evaluation_threshold: 0.9
- name: "State"
x-aws-idp-evaluation-method: FUZZY
x-aws-idp-confidence-threshold: 0.9
State:
type: string
description: "State abbreviation (e.g., CA, NY)"
evaluation_method: EXACT
- name: "ZIP Code"
x-aws-idp-evaluation-method: EXACT
ZIP Code:
type: string
description: "5 or 9 digit postal code"
evaluation_method: EXACT
x-aws-idp-evaluation-method: EXACT
```

### List Attributes
Expand All @@ -174,19 +187,25 @@ Arrays of items where each item's attributes are evaluated individually across a

```yaml
classes:
- name: "Bank Statement"
attributes:
- name: "Transactions"
- $schema: "https://json-schema.org/draft/2020-12/schema"
$id: BankStatement
x-aws-idp-document-type: "Bank Statement"
type: object
properties:
Transactions:
type: array
description: "List of all transactions in the statement period"
attributeType: list
listItemTemplate:
itemDescription: "Individual transaction record"
itemAttributes:
- name: "Date"
x-aws-idp-list-item-description: "Individual transaction record"
items:
type: object
properties:
Date:
type: string
description: "Transaction date (MM/DD/YYYY)"
evaluation_method: FUZZY
evaluation_threshold: 0.9
- name: "Description"
x-aws-idp-evaluation-method: FUZZY
x-aws-idp-confidence-threshold: 0.9
Description:
type: string
description: "Transaction description or merchant name"
evaluation_method: SEMANTIC
evaluation_threshold: 0.7
Expand Down
Loading