Skip to content

Commit 0a67db1

Browse files
committed
feat(docs): add new Chandra OCR Lambda Hook sample to CHANGELOG
1 parent 3aedbce commit 0a67db1

1 file changed

Lines changed: 2 additions & 6 deletions

File tree

CHANGELOG.md

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -14,14 +14,10 @@ SPDX-License-Identifier: MIT-0
1414
- **Two Input Modes**: S3 path (select bucket + prefix), zip upload (presigned URL), or local directory (CLI/SDK)
1515
- **Configuration Integration**: Discovered classes are saved directly to the selected config version's `classes` array in DynamoDB, immediately available for document processing without manual schema creation
1616

17-
### Fixed
18-
19-
- **"View Config" from Discovery shows wrong config version** — Fixed race condition where clicking "View in Configuration →" from a discovery job showed classes from the 'default' config version instead of the job's version. Root cause: `selectedVersion` was initialized as `null`, causing `useConfiguration('default')` to fire before URL params were read. Fix: initialize `selectedVersion` synchronously from URL hash params at mount time.
17+
- **Chandra OCR Lambda Hook Sample** — New `GENAIIDP-chandra-ocr-hook` sample in `samples/lambda-hook-inference/` that integrates [Datalab Chandra OCR 2](https://github.com/datalab-to/chandra) with the LambdaHook feature for high-quality OCR. Supports 90+ languages, math, tables, forms, and handwriting. Uses the Datalab hosted async API (`/api/v1/convert`) with configurable output format (markdown/json/html) and conversion mode (fast/balanced/accurate). Includes standalone SAM template, local test script, and deployment instructions. See `docs/lambda-hook-inference.md` — Chandra OCR Integration section.
2018

21-
- **Publish pipeline missing multi-doc discovery Docker build trigger** — Added `<MULTI_DOC_DISCOVERY_BUILD_HASH_TOKEN>` to template token replacements and `package_multi_doc_discovery_source()` to create/upload the source zip for CodeBuild, ensuring Docker images are rebuilt when handler code changes.
22-
- **Wildcard pattern support for delete-documents**`idp-cli delete-documents` and `client.batch.delete_documents()` now accept a `--pattern` / `pattern` parameter for fnmatch-style wildcard matching (e.g. `"batch-123/*.pdf"`, `"*invoice*"`). Combines with `--status-filter` to delete e.g. all failed invoices across batches.
2319

24-
- **Chandra OCR Lambda Hook Sample** — New `GENAIIDP-chandra-ocr-hook` sample in `samples/lambda-hook-inference/` that integrates [Datalab Chandra OCR 2](https://github.com/datalab-to/chandra) with the LambdaHook feature for high-quality OCR. Supports 90+ languages, math, tables, forms, and handwriting. Uses the Datalab hosted async API (`/api/v1/convert`) with configurable output format (markdown/json/html) and conversion mode (fast/balanced/accurate). Includes standalone SAM template, local test script, and deployment instructions. See `docs/lambda-hook-inference.md` — Chandra OCR Integration section.
20+
### Fixed
2521

2622
- **`delete-documents` fails with DynamoDB errors** — Fixed two bugs in `get_documents_by_batch()`: (1) passing empty `ExpressionAttributeNames={}` when no status filter caused `ValidationException`, and (2) using low-level DynamoDB client type descriptors (`{"S": "..."}`) with the high-level Table resource caused `begins_with` operand type mismatch. Rewrote to use the high-level `Table.scan()` API with `boto3.dynamodb.conditions.Attr`.
2723

0 commit comments

Comments
 (0)