You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+6Lines changed: 6 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,8 +7,14 @@ SPDX-License-Identifier: MIT-0
7
7
8
8
### Added
9
9
10
+
-**Wildcard pattern support for delete-documents** — `idp-cli delete-documents` and `client.batch.delete_documents()` now accept a `--pattern` / `pattern` parameter for fnmatch-style wildcard matching (e.g. `"batch-123/*.pdf"`, `"*invoice*"`). Combines with `--status-filter` to delete e.g. all failed invoices across batches.
11
+
10
12
-**Chandra OCR Lambda Hook Sample** — New `GENAIIDP-chandra-ocr-hook` sample in `samples/lambda-hook-inference/` that integrates [Datalab Chandra OCR 2](https://github.com/datalab-to/chandra) with the LambdaHook feature for high-quality OCR. Supports 90+ languages, math, tables, forms, and handwriting. Uses the Datalab hosted async API (`/api/v1/convert`) with configurable output format (markdown/json/html) and conversion mode (fast/balanced/accurate). Includes standalone SAM template, local test script, and deployment instructions. See `docs/lambda-hook-inference.md` — Chandra OCR Integration section.
11
13
14
+
### Fixed
15
+
16
+
-**`delete-documents` fails with DynamoDB errors** — Fixed two bugs in `get_documents_by_batch()`: (1) passing empty `ExpressionAttributeNames={}` when no status filter caused `ValidationException`, and (2) using low-level DynamoDB client type descriptors (`{"S": "..."}`) with the high-level Table resource caused `begins_with` operand type mismatch. Rewrote to use the high-level `Table.scan()` API with `boto3.dynamodb.conditions.Attr`.
Permanently delete all documents in a batch and their associated data from InputBucket, OutputBucket, and DynamoDB.
550
+
Permanently delete documents and their associated data from InputBucket, OutputBucket, and DynamoDB. Select documents by batch ID or wildcard pattern.
551
551
552
552
**Parameters:**
553
-
-`batch_id` (str, required): Batch identifier
553
+
-`batch_id` (str, optional): Batch identifier (selects all docs containing this string)
554
+
-`pattern` (str, optional): Wildcard pattern to match document keys (e.g., `"batch-123/*.pdf"`, `"*invoice*"`)
554
555
-`status_filter` (str, optional): Filter by document status (e.g., "FAILED", "COMPLETED")
555
556
-`stack_name` (str, optional): Stack name override
556
557
-`dry_run` (bool, optional): If True, simulate deletion without actually deleting (default: False)
557
558
-`continue_on_error` (bool, optional): Continue deleting if one document fails (default: True)
558
559
560
+
**Note:** Must specify either `batch_id` or `pattern` (not both).
561
+
559
562
**Returns:**`BatchDeletionResult` with `success`, `deleted_count`, `failed_count`, `total_count`, `dry_run`, and `results` (list of DocumentDeletionResult)
560
563
561
564
```python
@@ -568,6 +571,17 @@ result = client.batch.delete_documents(
0 commit comments