Skip to content

Commit 7456202

Browse files
committed
Merge branch 'develop' of ssh.gitlab.aws.dev:genaiic-reusable-assets/engagement-artifacts/genaiic-idp-accelerator into develop
2 parents 42a0d1f + 835c48a commit 7456202

16 files changed

Lines changed: 2174 additions & 174 deletions

File tree

CHANGELOG.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,18 +5,23 @@ SPDX-License-Identifier: MIT-0
55

66
## [Unreleased]
77

8+
### Added
9+
10+
- **Configuration Version in Metering Database** — Added `config_version` field to the metering database to enable cost tracking and analytics per configuration version. The metering Glue table now includes a `config_version` column, and all metering Parquet files store the configuration version used for each document. Enables Athena queries to compare costs across different configurations, support A/B testing analytics, and optimize per-version costs. Documents without a config version default to "default".
11+
812
## [0.5.6]
913

1014
### Added
1115

16+
- **Test Studio CLI Commands**`idp-cli test-result` to retrieve test results with automatic evaluation triggering and `--wait`/`--output-dir` options, and `idp-cli test-compare` to compare multiple test runs with JSON/CSV export. See `docs/idp-cli.md`.
17+
1218
- **Custom Model Fine-Tuning** — Fine-tune Amazon Nova 2 models (Lite and Pro) for document classification and extraction using your own labeled Test Sets. The end-to-end workflow — validate data, generate training data, train via Bedrock, and deploy an on-demand custom model endpoint — is driven from a new **Custom Models** page in the Web UI. Custom models can then be selected in any configuration version for classification and/or extraction. Available to Admin and Author roles. **Note:** currently requires deployment in `us-east-1`. See `docs/custom-model-finetuning.md`.
1319

1420
- **External SAML/OIDC Identity Provider Federation** — Optional support for federating authentication through an external SAML or OIDC identity provider via Amazon Cognito. Enables organizations to use existing enterprise identity providers (PingOne, Okta, Microsoft Entra ID, etc.) for single sign-on. All federation functionality is opt-in through 12 new CloudFormation parameters — leaving them empty results in zero additional resources and identical behavior to existing Cognito-native authentication. See `docs/external-idp.md`.
1521

1622
- **Private Network Deployment** — Deploy the IDP Accelerator in fully private / air-gapped environments. New `AppSyncVisibility` parameter (`GLOBAL` | `PRIVATE`) makes the AppSync API accessible only from inside the VPC. All processing Lambda functions (21 across 3 templates) are conditionally placed in customer VPC subnets with an HTTPS-only security group. Includes a separate VPC endpoint CloudFormation template (`scripts/vpc-endpoints.yaml`) with 16 interface endpoints (AppSync, Bedrock, SQS, DynamoDB, S3, Lambda, SSM, KMS, STS, Textract, and more) and per-endpoint creation flags to skip pre-existing endpoints. All features are off by default — existing deployments are completely unaffected. See `docs/deployment-private-network.md`.
1723

1824
- **Enhanced Information Panels** — Added comprehensive help content to the Information (ⓘ) panel on every page in the Web UI. Each panel now includes a feature summary, list of key capabilities, and "Learn more" links to relevant docs-site documentation pages. Created new panels for 8 pages that previously had none (Pricing, Capacity Planning, Custom Models, Discovery, User Management, Test Studio), and enriched the existing 7 panels with fuller descriptions and documentation links.
19-
2025
### Changed
2126

2227
- **Removed Claude Sonnet 4:1m and Sonnet 4.5:1m model variants** — The 1M context window beta for Claude Sonnet 4 (`claude-sonnet-4-20250514-v1:0:1m`) and Sonnet 4.5 (`claude-sonnet-4-5-20250929-v1:0:1m`) is being retired effective April 30, 2026. These `:1m` model variants have been removed from all enum lists, UI dropdowns, quota code mappings, pricing, and documentation. Users needing 1M context windows should migrate to Claude Sonnet 4.6 (`claude-sonnet-4-6:1m`), where the 1M context window is generally available (GA).

docs/idp-cli.md

Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,8 @@ https://github.com/user-attachments/assets/3d448a74-ba5b-4a4a-96ad-ec03ac0b4d7d
5050
- [config-list](#config-list)
5151
- [config-activate](#config-activate)
5252
- [config-delete](#config-delete)
53+
- [test-result](#test-result)
54+
- [test-compare](#test-compare)
5355
- [chat](#chat)
5456
- [Complete Evaluation Workflow](#complete-evaluation-workflow)
5557
- [Step 1: Deploy Your Stack](#step-1-deploy-your-stack)
@@ -2055,6 +2057,102 @@ This uses the same mechanism as the Web UI configuration management system.
20552057

20562058
---
20572059

2060+
### `test-result`
2061+
2062+
Get test results for a specific Test Studio test run with automatic evaluation triggering.
2063+
2064+
**Usage:**
2065+
```bash
2066+
idp-cli test-result [OPTIONS]
2067+
```
2068+
2069+
**Options:**
2070+
- `--stack-name` (required): CloudFormation stack name
2071+
- `--test-run-id` (required): Test run ID to retrieve results for
2072+
- `--wait`: Wait for evaluation to complete (polls until metrics are calculated)
2073+
- `--timeout`: Timeout in seconds when using `--wait` (default: 600)
2074+
- `--output-dir`: Directory to save results as JSON file
2075+
- `--region`: AWS region (optional)
2076+
2077+
**Examples:**
2078+
```bash
2079+
# Get results immediately (may show "EVALUATING" status if metrics not ready)
2080+
idp-cli test-result \
2081+
--stack-name my-stack \
2082+
--test-run-id fake-w2-20260409-123456
2083+
2084+
# Wait for evaluation to complete (recommended for CI/CD)
2085+
idp-cli test-result \
2086+
--stack-name my-stack \
2087+
--test-run-id fake-w2-20260409-123456 \
2088+
--wait --timeout 900
2089+
2090+
# Save results to JSON file
2091+
idp-cli test-result \
2092+
--stack-name my-stack \
2093+
--test-run-id fake-w2-20260409-123456 \
2094+
--wait --output-dir ./results
2095+
```
2096+
2097+
**Output:**
2098+
- Overall accuracy, precision, recall, F1 score
2099+
- Total cost
2100+
- Files completed/failed
2101+
- Created/completed timestamps
2102+
- JSON file: `<test-run-id>-result.json` (when `--output-dir` specified)
2103+
2104+
**Behavior:**
2105+
- Triggers lazy evaluation if metrics not yet calculated (first call after test run completes)
2106+
- Polls Lambda every 10 seconds when `--wait` is used
2107+
- Returns complete test run data including field-level metrics and cost breakdown
2108+
2109+
---
2110+
2111+
### `test-compare`
2112+
2113+
Compare metrics and configurations from multiple Test Studio test runs.
2114+
2115+
**Usage:**
2116+
```bash
2117+
idp-cli test-compare [OPTIONS]
2118+
```
2119+
2120+
**Options:**
2121+
- `--stack-name` (required): CloudFormation stack name
2122+
- `--test-run-ids` (required): Comma-separated list of test run IDs to compare (minimum 2)
2123+
- `--output-dir`: Directory to save comparison as JSON and CSV files
2124+
- `--region`: AWS region (optional)
2125+
2126+
**Examples:**
2127+
```bash
2128+
# Compare two test runs
2129+
idp-cli test-compare \
2130+
--stack-name my-stack \
2131+
--test-run-ids "fake-w2-20260409-123456,fake-w2-20260409-234567"
2132+
2133+
# Compare multiple runs and export to files
2134+
idp-cli test-compare \
2135+
--stack-name my-stack \
2136+
--test-run-ids "run1,run2,run3" \
2137+
--output-dir ./comparisons
2138+
```
2139+
2140+
**Output:**
2141+
- **Console**: Side-by-side table with accuracy, precision, recall, F1 score, and cost for each test run
2142+
- **JSON file**: `comparison-<timestamp>.json` - Complete comparison data with full test results and config differences
2143+
- **CSV file**: `comparison-<timestamp>.csv` - Metrics table suitable for spreadsheets
2144+
2145+
**Configuration Differences:**
2146+
- Automatically detects and displays configuration differences between test runs
2147+
- Shows nested config paths (e.g., `classification.model`, `extraction.temperature`)
2148+
- Highlights values that differ across test runs
2149+
2150+
**Requirements:**
2151+
- All test runs must be in `COMPLETE` or `PARTIAL_COMPLETE` status
2152+
- Minimum 2 test runs required for comparison
2153+
2154+
---
2155+
20582156
### `discover`
20592157

20602158
Discover document class schemas from sample documents using Amazon Bedrock.

docs/idp-sdk.md

Lines changed: 62 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1611,7 +1611,7 @@ else:
16111611

16121612
## Testing Operations
16131613

1614-
Operations for load testing and performance validation.
1614+
Operations for load testing, Test Studio evaluation results, and performance validation.
16151615

16161616
### testing.load_test()
16171617

@@ -1640,6 +1640,65 @@ print(f"Total files: {result.total_files}")
16401640
print(f"Success: {result.success}")
16411641
```
16421642

1643+
### testing.get_test_result()
1644+
1645+
Get Test Studio evaluation results for a specific test run.
1646+
1647+
**Parameters:**
1648+
- `test_run_id` (str, required): Test run identifier
1649+
- `stack_name` (str, optional): Stack name override
1650+
- `wait` (bool, optional): Wait for test run to complete if still in progress (default: False)
1651+
- `timeout` (int, optional): Maximum wait time in seconds (default: 300)
1652+
- `poll_interval` (int, optional): Polling interval in seconds (default: 5)
1653+
1654+
**Returns:** `TestRunResult` with evaluation metrics
1655+
1656+
```python
1657+
# Get result immediately (may be evaluating)
1658+
result = client.testing.get_test_result(
1659+
test_run_id="Fake-W2-Tax-Forms-20260410-173735"
1660+
)
1661+
1662+
# Wait for evaluation to complete
1663+
result = client.testing.get_test_result(
1664+
test_run_id="Fake-W2-Tax-Forms-20260410-173735",
1665+
wait=True,
1666+
timeout=900
1667+
)
1668+
1669+
print(f"Status: {result.status}")
1670+
print(f"Overall Accuracy: {result.overall_accuracy:.2%}")
1671+
print(f"Precision: {result.accuracy_breakdown['precision']:.2%}")
1672+
print(f"Recall: {result.accuracy_breakdown['recall']:.2%}")
1673+
print(f"F1 Score: {result.accuracy_breakdown['f1_score']:.2%}")
1674+
print(f"Total Cost: ${result.total_cost:.2f}")
1675+
```
1676+
1677+
### testing.compare_test_runs()
1678+
1679+
Compare multiple Test Studio evaluation runs.
1680+
1681+
**Parameters:**
1682+
- `test_run_ids` (list[str], required): List of test run identifiers to compare (minimum 2)
1683+
- `stack_name` (str, optional): Stack name override
1684+
1685+
**Returns:** `TestComparisonResult` with metrics for each test run
1686+
1687+
```python
1688+
result = client.testing.compare_test_runs(
1689+
test_run_ids=[
1690+
"Fake-W2-Tax-Forms-20260410-173735",
1691+
"Fake-W2-Tax-Forms-20260409-191545"
1692+
]
1693+
)
1694+
1695+
for test_run_id, metrics in result.metrics.items():
1696+
print(f"\nTest Run: {test_run_id}")
1697+
print(f" Accuracy: {metrics['overallAccuracy']:.2%}")
1698+
print(f" Completed: {metrics['completedFiles']}/{metrics['filesCount']}")
1699+
print(f" Cost: ${metrics['totalCost']:.2f}")
1700+
```
1701+
16431702
---
16441703

16451704
## Response Models
@@ -1728,6 +1787,8 @@ from idp_sdk import (
17281787
ExecutionsStoppedResult,
17291788
DocumentsAbortedResult,
17301789
LoadTestResult,
1790+
TestRunResult,
1791+
TestComparisonResult,
17311792

17321793
# Enums
17331794
DocumentState,

0 commit comments

Comments
 (0)