You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -30,6 +30,8 @@ SPDX-License-Identifier: MIT-0
30
30
31
31
-**Chandra OCR Lambda Hook Sample** — New `GENAIIDP-chandra-ocr-hook` sample in `samples/lambda-hook-inference/` that integrates [Datalab Chandra OCR 2](https://github.com/datalab-to/chandra) with the LambdaHook feature for high-quality OCR. Supports 90+ languages, math, tables, forms, and handwriting. Uses the Datalab hosted async API (`/api/v1/convert`) with configurable output format (markdown/json/html) and conversion mode (fast/balanced/accurate). Includes standalone SAM template, local test script, and deployment instructions. See `docs/lambda-hook-inference.md` — Chandra OCR Integration section.
32
32
33
+
-**Per-Class Extraction Model Override** — New `x-aws-idp-extraction-model` JSON Schema extension allows overriding the global `extraction.model` on a per-document-class basis. Useful when certain document types benefit from a different model (e.g., a more powerful model for complex financial forms, a faster/cheaper model for simple documents). Classes without the extension continue to use the global default. Works with both traditional and agentic extraction modes. See `docs/extraction.md` — Per-Class Extraction Model Override section.
34
+
33
35
-**Wildcard pattern support for delete-documents** — `idp-cli delete-documents` and `client.batch.delete_documents()` now accept a `--pattern` / `pattern` parameter for fnmatch-style wildcard matching (e.g. `"batch-123/*.pdf"`, `"*invoice*"`). Combines with `--status-filter` to delete e.g. all failed invoices across batches.
34
36
-**Prompt Preview** — New "Prompt Preview" tab in the Configuration page lets you preview the actual prompts sent to the LLM for each processing step (Classification, Extraction, Assessment, Summarization). Config-derived placeholders are filled in with real values (class names, cleaned JSON Schema), while document-specific placeholders are shown as highlighted markers. Includes token estimates, copy-to-clipboard, and a substitution details panel showing the exact schema sent to the LLM. Helps optimize document class schemas and prompt templates.
Copy file name to clipboardExpand all lines: docs/extraction.md
+39Lines changed: 39 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -112,6 +112,45 @@ classes:
112
112
description: "The date by which payment is due, typically labeled as 'Due Date', 'Payment Due', or similar"
113
113
```
114
114
115
+
### Per-Class Extraction Model Override
116
+
117
+
By default, all document classes use the model specified in `extraction.model`. You can override this on a per-class basis using the `x-aws-idp-extraction-model` extension on any class schema. This is useful when certain document types benefit from a different model — for example, using a more powerful model for complex financial forms while keeping a faster, cheaper model for simpler documents.
118
+
119
+
Classes without the override continue to use the global `extraction.model`. The override works with both **traditional** and **agentic** extraction modes.
120
+
121
+
```yaml
122
+
extraction:
123
+
model: us.amazon.nova-pro-v1:0 # Default for most classes
124
+
125
+
classes:
126
+
# This class uses the default model (us.amazon.nova-pro-v1:0)
Copy file name to clipboardExpand all lines: patterns/unified/template.yaml
+38Lines changed: 38 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -377,6 +377,44 @@ Resources:
377
377
type: string
378
378
description: "Optional regex pattern to match against page content text. When matched during multi-modal page-level classification, the page will be classified as this class type without LLM processing."
379
379
order: 2.6
380
+
extraction_model:
381
+
type: string
382
+
description: "Optional per-class extraction model override. When set, this model is used for extraction instead of the global extraction.model. Useful for classes that benefit from a different model."
0 commit comments