Skip to content

feat: Make Discovery accessible from CLI and SDK (#228)#232

Merged
rstrahan merged 7 commits intodevelopfrom
feature/discovery-cli-sdk
Mar 9, 2026
Merged

feat: Make Discovery accessible from CLI and SDK (#228)#232
rstrahan merged 7 commits intodevelopfrom
feature/discovery-cli-sdk

Conversation

@rstrahan
Copy link
Copy Markdown
Contributor

@rstrahan rstrahan commented Mar 9, 2026

Summary

Makes Discovery accessible from CLI (idp-cli discover) and SDK (client.discovery.run()), addressing #228.

IUsage

Usage: idp-cli discover [OPTIONS]

  Discover document class schema from sample document(s)

  Analyzes document(s) using Amazon Bedrock to automatically generate JSON
  Schema definitions for document classes.

  Ground truth files (-g) are auto-matched to documents (-d) by filename stem:
  invoice.pdf matches invoice.json. Unmatched documents run without ground
  truth.

  For --output (-o) in batch mode: if path is a directory, writes one JSON
  file per schema; if path is a file, writes all schemas as a JSON array.

  Examples:

    # Single document   idp-cli discover -d ./invoice.pdf

    # With ground truth (matched by filename stem)   idp-cli discover -d
    ./invoice.pdf -g ./invoice.json

    # Output to file   idp-cli discover -d ./form.pdf -o ./form-schema.json

    # Batch with auto-matched ground truth   idp-cli discover -d ./invoice.pdf
    -d ./w2.pdf -g ./invoice.json -g ./w2.json

    # Batch with output directory   idp-cli discover -d ./invoice.pdf -d
    ./w2.pdf -o ./schemas/

    # Batch with output file (JSON array)   idp-cli discover -d ./invoice.pdf
    -d ./w2.pdf -o ./all-schemas.json

    # Stack mode (saves to config)   idp-cli discover --stack-name my-stack -d
    ./invoice.pdf --config-version v2

Options:
  --stack-name TEXT        CloudFormation stack name (optional — if omitted,
                           runs in local mode without saving to config)
  -d, --document PATH      Path to document file(s). Specify multiple times
                           for batch: -d doc1.pdf -d doc2.pdf  [required]
  -g, --ground-truth PATH  Path to JSON ground truth file(s). Auto-matched to
                           documents by filename stem
  --config-version TEXT    Configuration version to save the discovered schema
                           to (default: active version)
  --region TEXT            AWS region (optional)
  -o, --output PATH        Output path: file (single doc or JSON array for
                           batch) or directory (one file per schema)
  --help                   Show this message and exit.

Changes

New files

  • lib/idp_sdk/idp_sdk/models/discovery.pyDiscoveryResult and DiscoveryBatchResult Pydantic models
  • lib/idp_sdk/idp_sdk/operations/discovery.pyDiscoveryOperation with run() and run_batch() methods, local and stack-connected modes
  • lib/idp_sdk/tests/unit/test_discovery_operations.py — 17 unit tests

Modified files

  • lib/idp_common_pkg/idp_common/discovery/classes_discovery.py — Added file_bytes, ground_truth_data, save_to_config optional params (backward compatible)
  • lib/idp_sdk/idp_sdk/client.py — Added discovery namespace (10th operation)
  • lib/idp_sdk/idp_sdk/operations/__init__.py — Export DiscoveryOperation
  • lib/idp_sdk/idp_sdk/models/__init__.py — Export discovery models
  • lib/idp_cli_pkg/idp_cli/cli.py — Added discover command with optional --stack-name
  • docs/idp-cli.md — Added discover command documentation
  • docs/idp-sdk.md — Added discovery operations documentation
  • lib/idp_common_pkg/README.md — Added Discovery to core services list
  • lib/idp_common_pkg/idp_common/README.md — Added Discovery usage example
  • CHANGELOG.md — Added entry under [Unreleased]

Key design decisions

  • No S3 uploads — document bytes read locally, passed directly to Bedrock via file_bytes parameter
  • --stack-name optional — without it, runs in local mode using system default Bedrock settings from base-discovery.yaml
  • Single discover CLI command-d repeatable for batch mode
  • Backward compatible — all new ClassesDiscovery parameters are optional; existing Lambda callers unaffected
  • Merged fix/discovery — Discovery no longer injects default config classes into target version

Testing

  • 65 SDK unit tests pass (17 new discovery tests)
  • 28 idp_common discovery unit tests pass
  • All lint checks clean (ruff)

Closes #228

@rstrahan rstrahan merged commit cc745ae into develop Mar 9, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant