aws-solutions-library-samples
diff --git a/‎CHANGELOG.md‎
Lines changed: 44 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 44 additions & 0 deletions
diff --git a/‎Makefile‎
Lines changed: 29 additions & 1 deletion b/‎Makefile‎
Lines changed: 29 additions & 1 deletion
diff --git a/‎VERSION‎
Lines changed: 1 addition & 1 deletion b/‎VERSION‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎config_library/pricing.yaml‎
Lines changed: 29 additions & 0 deletions b/‎config_library/pricing.yaml‎
Lines changed: 29 additions & 0 deletions
diff --git a/‎config_library/unified/ocr-benchmark/README.md‎
Lines changed: 35 additions & 3 deletions b/‎config_library/unified/ocr-benchmark/README.md‎
Lines changed: 35 additions & 3 deletions
@@ -5,6 +5,50 @@ SPDX-License-Identifier: MIT-0
 
 ## [Unreleased]
 
+## [0.5.1]
+
+### Added
+
+- **Scalable Document List and Test Executions** — Comprehensive redesign to eliminate UI and backend bottlenecks when working with thousands of documents. ([#203](https://github.com/aws-solutions-library-samples/accelerated-intelligent-document-processing-on-aws/issues/203))
+  - **TypeDateIndex GSI on TrackingTable**: New DynamoDB Global Secondary Index (`ItemType` + `InitialEventTime`) enables efficient queries by item type (document, testrun, testset) sorted by time, replacing full table scans. Includes 20 projected attributes for list-view rendering without base table fetches.
+  - **GSI Attribute Backfill Mechanism**: Robust Step Functions state machine with parallel scan workers that automatically backfills `ItemType` and `HITLPendingReview` attributes on existing items during stack upgrades. Features timeout-safe continuation, idempotent conditional updates, and automatic trigger via CloudFormation Custom Resource.
+  - **GSI-Based Document List Resolver**: New `listDocuments` Lambda resolver queries the TypeDateIndex GSI with server-side pagination (`limit`/`nextToken`).
+  - **`getDocumentCount` API**: New efficient count query using GSI `Select: 'COUNT'` for accurate document totals without fetching data.
+  - **UI Document List Rewrite**: Eliminated the N+1 query pattern (shard queries → individual `getDocument` per document). Now uses a single paginated `listDocuments` GSI query for all time periods. First page renders immediately with incremental background loading of remaining pages.
+  - **Subscription Optimization**: `onUpdateDocument` events now use subscription data directly instead of triggering individual `getDocument` API calls, eliminating thousands of redundant requests during active processing.
+  - **GSI-Based Test Runs Query**: Replaced full table scan in `get_test_runs()` and `get_test_runs_by_date_range()` with GSI query + BatchGetItem pattern for efficient test run listing with all fields (including Context, ConfigVersion).
+  - **GSI-Based Test Sets Query**: Replaced full table scan in `get_test_sets()` with GSI query + BatchGetItem pattern, avoiding scanning the entire TrackingTable (which includes all documents) just to find ~10 test sets.
+  - **`ItemType` Written on All Creation Paths**: All document, test run, and test set creation paths (DynamoDB service, AppSync resolvers, test runners, dataset deployers) now write `ItemType` and `InitialEventTime` for immediate GSI indexing.
+  - **Improved Error Messages**: Document list errors now show the actual failure reason (e.g., Lambda throttling, timeout details) instead of generic "please try again" messages.
+
+- **GraphQL Type Generation & Unit Testing** — Replaced 60+ hand-written GraphQL query/mutation/subscription files with auto-generated types via `@graphql-codegen`, added typed AWSJSON parsers with unit tests (vitest + jsdom), and integrated a CI codegen-check to prevent type drift.
+
+- **Third-Party Model Support** — Added Meta Llama 4 Maverick 17B, Llama 4 Scout 17B, Google Gemma 3 27B IT, and NVIDIA Nemotron Nano 12B v2 VL as selectable models across all pipeline stages (OCR, Classification, Extraction, Assessment, Summarization, Evaluation, Discovery, Agents, Rule Validation). Includes per-token pricing configuration and EU region fallback mappings for Llama 4 models. ([#217](https://github.com/aws-solutions-library-samples/accelerated-intelligent-document-processing-on-aws/issues/217))
+
+- **Load Test Config Version Support** — Added `--config-version` parameter to the `idp-cli load-test` command, enabling load tests to target a specific configuration version. Files uploaded during load tests now include `config-version` S3 metadata, consistent with the `process` command behavior.
+
+- **Deploy Failure Root Cause Analysis** — Enhanced `idp-cli deploy` failure reporting to recursively analyze nested stack events and identify actual root causes. Previously, failures in nested stacks showed only a generic "Embedded stack was not successfully created" message. Now displays a structured "Root Cause Analysis" section with the specific resource, type, and error message from the nested stack that caused the failure, along with cascade failure counts.
+
+- **MCP Server** — Added additional tool to MCP Server for retrieving results of the processed document from the IDP system.
+
+
+### Changed
+
+- **OCR Benchmark Config Optimization** — Optimized `config_library/unified/ocr-benchmark` configuration with targeted field descriptions, explicit model/prompt/OCR settings, and corrected date format (YYYY-MM-DD to match ground truth). Improved overall extraction accuracy from 51.5% to 75.2% on the full 293-document benchmark at equivalent cost (~$2.62). Classification remains 100% across all 9 document classes. ([#220](https://github.com/aws-solutions-library-samples/accelerated-intelligent-document-processing-on-aws/pull/220))
+
+- **GraphQL Type Generation & Unit Testing** — Replaced 60+ hand-written GraphQL query/mutation/subscription files with auto-generated types via `@graphql-codegen`, added typed AWSJSON parsers with unit tests (vitest + jsdom), and integrated a CI codegen-check to prevent type drift.
+
+### Fixed
+
+- **AgentCore Gateway Manager** — Fixed the issue where gateway was not getting deleted once stack is deleted.
+
+- **Configuration Page Error Display** — Fixed `[object Object]` error message when configuration loading fails (e.g., due to Lambda throttling) by properly extracting error messages from Amplify GraphQL error responses.
+
+### Templates
+   - us-west-2: `https://s3.us-west-2.amazonaws.com/aws-ml-blog-us-west-2/artifacts/genai-idp/idp-main_0.5.1.yaml`
+   - us-east-1: `https://s3.us-east-1.amazonaws.com/aws-ml-blog-us-east-1/artifacts/genai-idp/idp-main_0.5.1.yaml`
+   - eu-central-1: `https://s3.eu-central-1.amazonaws.com/aws-ml-blog-eu-central-1/artifacts/genai-idp/idp-main_0.5.1.yaml`
+
 ## [0.5.0]
 
 ### Added
 
@@ -118,7 +118,12 @@ lint-cicd:
 		echo -e "$(RED)ERROR: UI build failed$(NC)"; \
 		exit 1; \
 	fi
-	
+
+	@if ! make codegen-check; then \
+		echo -e "$(RED)ERROR: GraphQL codegen check failed$(NC)"; \
+		exit 1; \
+	fi
+
 	@echo -e "$(GREEN)All code quality checks passed!$(NC)"
 
 # Validate AWS CodeBuild buildspec files
@@ -194,6 +199,29 @@ ui-build:
 	@echo "Checking UI build"
 	cd src/ui && npm ci --prefer-offline --no-audit && npm run build
 
+# Verify generated GraphQL types and operations are up-to-date
+codegen:
+	@cd src/ui && npm run codegen
+	@echo -e "$(GREEN)✅ GraphQL types regenerated. Don't forget to commit the changes.$(NC)"
+
+codegen-check:
+	@echo "Checking if GraphQL codegen output is up-to-date..."
+	@cd src/ui && npm ci --prefer-offline --no-audit && npm run codegen
+	@if ! git diff --quiet src/ui/src/graphql/generated/; then \
+		if [ -n "$$CI" ] || [ -n "$$GITHUB_ACTIONS" ]; then \
+			echo -e "$(RED)ERROR: Generated GraphQL files are out of date!$(NC)"; \
+			echo -e "$(YELLOW)Run 'make codegen' and commit the updated files.$(NC)"; \
+			git diff --stat src/ui/src/graphql/generated/; \
+			exit 1; \
+		else \
+			echo -e "$(YELLOW)Generated GraphQL files were out of date — auto-updated.$(NC)"; \
+			git diff --stat src/ui/src/graphql/generated/; \
+			echo -e "$(YELLOW)Please commit the changes above.$(NC)"; \
+		fi \
+	else \
+		echo -e "$(GREEN)✅ GraphQL codegen output is up-to-date$(NC)"; \
+	fi
+
 commit: lint test
 	$(info Generating commit message...)
 	export COMMIT_MESSAGE="$(shell kiro-cli chat --no-interactive --trust-all-tools "Understand pending local git change and changes to be committed, then infer a commit message. Return this commit message only on a single line." | grep ">" | tail -n 1 | sed 's/\x1b\[[0-9;]*m//g')" && \
 
@@ -1 +1 @@
-0.5.0
+0.5.1
@@ -606,6 +606,35 @@ pricing:
       - name: outputTokens
         price: "2.66E-6"
 
+  - name: bedrock/us.meta.llama4-maverick-17b-instruct-v1:0
+    units:
+      - name: inputTokens
+        price: "2.4E-7"
+      - name: outputTokens
+        price: "9.7E-7"
+
+  - name: bedrock/us.meta.llama4-scout-17b-instruct-v1:0
+    units:
+      - name: inputTokens
+        price: "1.7E-7"
+      - name: outputTokens
+        price: "6.6E-7"
+
+  - name: bedrock/google.gemma-3-27b-it
+    units:
+      - name: inputTokens
+        price: "2.3E-7"
+      - name: outputTokens
+        price: "3.8E-7"
+
+  - name: bedrock/nvidia.nemotron-nano-12b-v2
+    units:
+      - name: inputTokens
+        price: "2.0E-7"
+      - name: outputTokens
+        price: "6.0E-7"
+
+
   # ---------------------------------------------------------------------------
   # AWS Lambda Pricing (US East - N. Virginia)
   # ---------------------------------------------------------------------------
 
@@ -20,14 +20,46 @@ The OCR Benchmark dataset contains diverse document types with ground truth JSON
 | **REAL_ESTATE** | Real estate transaction data | transactions[], transactionsByCity[] |
 | **SHIFT_SCHEDULE** | Employee scheduling | title, facility, employees[] with shifts |
 
+## Benchmark Results
+
+Evaluated on the full 293-document dataset using IDP Accelerator v0.5.0 (pattern-2, pipeline mode). Evaluation methods are identical across all configs for apples-to-apples comparison.
+
+| Metric | Previous Config | This Config (Nova 2 Lite) | With Sonnet 4.6 |
+|--------|----------------|---------------------------|------------------|
+| **Overall Accuracy** | 51.5% | 75.2% | 91.2% |
+| **Classification Accuracy** | 100% | 100% | 100% |
+| **Total Cost (293 docs)** | $2.60 | $2.62 | $9.73 |
+| **Cost per Document** | ~$0.009 | ~$0.009 | ~$0.033 |
+
+### Per-Class Extraction Accuracy
+
+| Class | Previous | This Config (Nova) | With Sonnet |
+|-------|----------|-------------------|-------------|
+| DELIVERY_NOTE (8) | 89.5% | 98.9% | 99.4% |
+| PETITION_FORM (51) | 74.7% | 96.7% | 98.4% |
+| COMMERCIAL_LEASE_AGREEMENT (52) | 75.5% | 96.3% | 98.5% |
+| SHIFT_SCHEDULE (18) | 68.9% | 95.7% | 96.0% |
+| REAL_ESTATE (59) | 80.6% | 91.4% | 98.9% |
+| BANK_CHECK (52) | 82.6% | 86.1% | 97.0% |
+| EQUIPMENT_INSPECTION (11) | 60.8% | 83.6% | 97.1% |
+| CREDIT_CARD_STATEMENT (11) | 53.1% | 74.7% | 82.3% |
+| GLOSSARY (31) | 68.0% | 67.3% | 95.0% |
+
+### Models Used
+
+- **Classification**: Nova 2 Lite (`us.amazon.nova-2-lite-v1:0`)
+- **Extraction**: Nova 2 Lite (`us.amazon.nova-2-lite-v1:0`)
+- **OCR**: Textract (Layout feature)
+
+To use Sonnet 4.6 for extraction, change `extraction.model` to `us.anthropic.claude-sonnet-4-6-20250929-v1:0`.
 
 ## Processing Mode
 
 **Default Mode**: Pipeline (use_bda: false). Set use_bda: true for BDA mode.
 
 ## Validation Level
 
-**Level**: 2 - Minimal Testing
+**Level**: 3 - Benchmarked
 
-- **Testing Evidence**: This configuration has been lightly tested with the RealKIE-FCC-Verified Dataset. 
-- **Known Limitations**: Performance may vary - consider this configuration a starting point. We welome Pull Requests to improve the accuracy.
+- **Testing Evidence**: Evaluated on the full 293-document OmniAI OCR Benchmark dataset with per-class accuracy breakdown. Evaluation methods identical to previous config for fair comparison.
+- **Known Limitations**: GLOSSARY class has lower accuracy (67.3%) due to OCR challenges with single-digit numbers. Upgrading extraction model to Claude Sonnet 4.6 improves overall accuracy to 91.2% at higher cost.