Status: Implemented (Phase C operationalization, follow-up to issue #75 Milestone C2.d)
Parent epic: issue #75
Builds on: extractor_compilation_revalidation.md, extractor_compilation_bundle_loader.md
A one-shot CLI binary that runs revalidate_compiled_extractors against local OR BigQuery inputs so operators can periodically check the compiled extractor path without writing Python.
Local JSONL events:
bqaa-revalidate-extractors \
--bundles-root /var/bqaa/synced-bundles \
--events-jsonl events.jsonl \
--reference-extractors-module my_project.references \
--thresholds-json thresholds.json \
--report-out report.jsonBigQuery events:
bqaa-revalidate-extractors \
--bundles-root /var/bqaa/synced-bundles \
--events-bq-query-file events_query.sql \
--bq-project my-project \
--bq-location US \
--reference-extractors-module my_project.references \
--thresholds-json thresholds.json \
--report-out report.jsonWhere events_query.sql returns exactly one column named event_json (STRING) containing a JSON-encoded event dict per row. The SQL file must be fully self-contained — the CLI does not accept query parameters, so substitute concrete literals before invoking:
-- events_query.sql — bake in literal time bounds before running.
SELECT TO_JSON_STRING(STRUCT(
event_type,
span_id,
session_id,
content
)) AS event_json
FROM `my-project.my_dataset.agent_events`
WHERE event_timestamp BETWEEN TIMESTAMP('2026-05-01') AND TIMESTAMP('2026-05-12')
LIMIT 10000| Flag | Required | Description |
|---|---|---|
--bundles-root |
yes | Directory containing one subdirectory per compiled bundle (the layout discover_bundles walks). Fingerprint is auto-detected from the first bundle's manifest; every other bundle must declare the same fingerprint or sync fails with exit 2. |
--events-jsonl |
one of | Path to a JSONL file (one event JSON object per line). Empty lines are skipped; malformed lines abort with exit 2 naming the line number. Mutually exclusive with --events-bq-query-file; exactly one must be supplied. |
--events-bq-query-file |
one of | Path to a .sql file whose query returns one column named event_json (STRING) per row. The CLI does not auto-shape bigquery.Row objects — the query writer controls projection. Mutually exclusive with --events-jsonl; exactly one must be supplied. |
--bq-project |
no | BigQuery project ID for --events-bq-query-file. Optional: when omitted, the BigQuery client falls back to Application Default Credentials / environment for project inference. If both the flag is absent AND the inferred project is empty, the CLI exits 2 with Set --bq-project explicitly. |
--bq-location |
no | BigQuery location for --events-bq-query-file. Defaults to US. Ignored when --events-jsonl is used. |
--reference-extractors-module |
yes | Dotted Python path to a module exposing the reference-module contract below. |
--thresholds-json |
no | Optional JSON file mapping RevalidationThresholds field names to numeric rates in [0, 1]. When omitted, no threshold check is performed and exit is 0 on a successful run. |
--report-out |
yes | Path to write the combined JSON report. Parent directories are NOT created automatically; a missing parent directory fails at preflight with exit 2 before any work runs (no report written). Other write errors (permissions, disk full) also surface as clean exit 2. |
The dotted-path module passed to --reference-extractors-module must expose, at module scope:
EXTRACTORS: dict[str, Callable[[dict, Any], StructuredExtractionResult]]
RESOLVED_GRAPH: ResolvedGraph # output of resolve(ontology, binding)
SPEC: Any = None # optional; forwarded to extractor callsEXTRACTORS— same shaperevalidate_compiled_extractorsaccepts (event_type → callable).RESOLVED_GRAPH— the validator-input artifact. The CLI doesn't carry ontology / binding flags because the reference module is the operational contract that defined both the event_type-to-callable mapping AND the spec they validate against. One module, one contract.SPEC— optional. Defaults toNoneto match the harness's keyword default.
A module missing either EXTRACTORS or RESOLVED_GRAPH, or with EXTRACTORS of the wrong shape, fails fast at the CLI boundary (exit 2) — the harness never sees a malformed registry.
Intentionally narrow so cron / GitHub Actions can branch on them:
| Code | Meaning |
|---|---|
0 |
Revalidation completed; if thresholds were supplied, every threshold passed. |
1 |
Revalidation completed but at least one threshold was violated. The report JSON is still written; the caller inspects threshold_check.violations. |
2 |
Usage / load / input error: bad flags (missing required, unrecognized), missing files, malformed JSONL, missing reference module surface, mixed-fingerprint bundle root, threshold validation failure, etc. The report is not written. main(argv) returns this code rather than raising SystemExit (argparse's own error() is routed through the same _CliError boundary). --help still terminates via SystemExit(0) — that's the expected behavior. The CLI does not define a --version action today. |
{
"report": {
"total_events": ...,
"total_compiled_unchanged": ...,
"total_compiled_filtered": ...,
"total_fallback_for_event": ...,
"total_compiled_path_faults": ...,
"total_parity_matches": ...,
"total_parity_divergences": ...,
"total_parity_not_checked": ...,
"skipped_events": ...,
"counts_by_event_type": { ... },
"sample_decision_divergences": [ ... ],
"sample_parity_divergences": [ ... ],
"started_at": "...",
"finished_at": "..."
},
"threshold_check": null | {
"ok": true|false,
"violations": ["compiled_unchanged_rate 0.2500 < min 0.9500", ...]
}
}threshold_check is null when --thresholds-json wasn't supplied; the raw report is still written so an operator can inspect rates without committing to a gate.
The SQL must produce exactly one column named event_json (STRING) per row. The column contains a JSON-encoded event dict — same shape --events-jsonl consumes line-by-line. The CLI does not auto-shape bigquery.Row objects, which keeps the path predictable: the query writer is the one place that knows the table schema, and TO_JSON_STRING(STRUCT(...)) is the standard wrap.
Error handling:
| Failure | Behavior |
|---|---|
| BigQuery client construction fails (auth, ADC, invalid credentials, network) | exit 2 with BigQuery client construction failed: <Type>: <message> |
| BigQuery query raises (syntax, table-not-found, permission) | exit 2 with BigQuery query failed: <Type>: <message> |
Query returns extra columns beyond event_json (checked via job.schema, so caught even when the result is empty) |
exit 2 with query must produce exactly one column named 'event_json'; got [...] |
Row missing the event_json column |
exit 2 with row N: missing required column 'event_json' |
event_json value isn't a STRING (e.g. STRUCT projected without TO_JSON_STRING) |
exit 2 with row N: 'event_json' must be STRING; got <type> |
event_json STRING isn't valid JSON |
exit 2 with row N: invalid JSON in 'event_json': <msg> |
event_json decodes to non-object (array, scalar) |
exit 2 with row N: 'event_json' decodes to <type>, expected a JSON object |
Empty .sql file |
exit 2 with ... is empty |
Invalid UTF-8 in .sql file |
exit 2 with not valid UTF-8 |
The row index is the 0-based position in the result set, so an operator can LIMIT N OFFSET row_index to find the offending row.
Project resolution: --bq-project is optional. The BigQuery client tries Application Default Credentials / environment when the flag isn't set. If both fall through to an empty project, the CLI exits 2 with --bq-project not provided and the BigQuery client could not infer a project ... Set --bq-project explicitly. rather than letting a downstream BigQuery API error confuse the operator.
Any subset of RevalidationThresholds fields, with numeric rates in [0, 1]:
{
"min_compiled_unchanged_rate": 0.95,
"max_compiled_filtered_rate": 0.05,
"max_fallback_for_event_rate": 0.05,
"max_compiled_path_fault_rate": 0.01,
"min_parity_match_rate": 0.99
}Unknown fields, out-of-range rates (5.0 intended as 5%), NaN, and bool all fail at the CLI boundary with exit 2 — same __post_init__ validation that RevalidationThresholds enforces in-process.
- Events whose
event_typeisn't inEXTRACTORSor the compiled registry land inreport.skipped_events; they don't enter the rate denominators. - Empty JSONL lines are silently skipped; that's whitespace, not data.
- Malformed JSONL lines are not skipped — they abort the run with exit 2 to distinguish corrupt input from legitimately-uncovered event_types.
CI suite — tests/test_extractor_compilation_cli_revalidate.py (35 cases, 1 skipped under dev-install):
TestCliEndToEnd(3) — happy path (exit 0, report written,threshold_check: null); threshold pass (exit 0,ok: true); threshold violation (exit 1, report still written with violations listed).TestCliUsageErrors(18) — missing events file; malformed JSONL line; missing bundles root; mixed-fingerprint bundle root; empty bundle root; reference module not importable; reference module missingEXTRACTORS; reference module missingRESOLVED_GRAPH; badEXTRACTORSshape; thresholds JSON with unknown field; thresholds JSON with out-of-range rate; missing--report-outparent directory (preflight catches it before any work runs); invalid UTF-8 in--events-jsonl; invalid UTF-8 in--thresholds-json; missing required flag returns 2 (notSystemExit); unrecognized flag returns 2 (notSystemExit); both--events-jsonland--events-bq-query-filereturns 2 (argparse mutexnot allowed with); neither event source returns 2 (argparse mutexone of the arguments ... is required).TestCliEventsBQ(13) — BigQuery event-source paths, monkeypatching_make_bq_clientto inject an in-memory fake:- Happy path: 2
event_jsonrows, valid JSON, BKA shape → exit 0, report includes both events. - Project inferred from ADC:
_FakeBQClient(project="adc-project")is accepted without--bq-project. - No project anywhere:
bigquery.Client()returns a project-less client AND--bq-projectabsent → exit 2 withSet --bq-project explicitly. - Client construction failure (
_make_bq_clientraisesRuntimeError("could not authenticate")) → exit 2 withBigQuery client construction failed: .... Distinguishes auth/ADC failures from query-time failures. - Query exception (
RuntimeError("table not found")) → exit 2 withBigQuery query failed: RuntimeError: table not found. - Extra column rejected (non-empty result): row with
event_json+extra_colkeys → exit 2 withexactly one column ... got ['event_json', 'extra_col']. Enforces the documented contract; report not written. - Extra column rejected on empty result set: zero-row query (
SELECT event_json, extra_col FROM t WHERE FALSE) with a wrong schema → still exit 2. Locks the choice to validate viajob.schema(populated regardless of row count) rather than just first-row keys, which would silently accept a wrong-schema empty result. - Correct schema + empty result set: zero rows with the correct one-column schema → exit 0 with a zero-event report. An empty-but-well-shaped result is a valid revalidation outcome, not an error.
- Row missing
event_jsoncolumn → exit 2 withrow 1(the second row). event_jsonnon-string (dict) → exit 2 withrow 0: 'event_json' must be STRING.event_jsoninvalid JSON → exit 2 withrow 0: invalid JSON.event_jsondecodes to JSON array (not object) → exit 2 withrow 0: ... expected a JSON object.- Empty
.sqlfile → exit 2 withis empty.
- Happy path: 2
test_console_script_entry_point_registered(1) — locks thepyproject.toml[project.scripts]entry so a typo in the entry-point string fails CI rather than breaking the binary at user-install time.
Live BQ suite — tests/test_extractor_compilation_cli_revalidate_bq_live.py (1 case), gated behind BQAA_RUN_LIVE_TESTS=1 + BQAA_RUN_LIVE_BQ_REVALIDATE_TESTS=1 + PROJECT_ID + DATASET_ID. Creates a temp table, inserts two event_json rows, runs the CLI, asserts the report is written with both events as compiled_unchanged + parity_matches, deletes the table on the way out.
- Pagination strategy —
client.query(...).result()already paginates; the CLI iterates the full result set into memory. For ultra-large corpora a follow-up could add--events-row-limitor stream-based aggregation, but the current shape handles tens of thousands of events comfortably. - Scheduled execution — operator owns cron / Cloud Scheduler / GitHub Actions; the CLI is a one-shot.
- BQ persistence of reports —
--report-outwrites a local file; pushing it elsewhere is the caller's concern. - Multiple bundle roots — one fingerprint per run; the harness is designed for "what's currently deployed."
- Auto-row-shape inference — explicit non-goal. The
event_jsonsingle-column contract keeps the CLI predictable; the query writer owns projection.
extractor_compilation_revalidation.md— the underlyingrevalidate_compiled_extractors+check_thresholdsAPI. The CLI is a thin operational wrapper around it.extractor_compilation_bundle_loader.md—discover_bundlesis what the CLI uses internally to load compiled extractors.extractor_compilation_bq_bundle_mirror.md—sync_bundles_from_bqis the typical upstream of--bundles-rootfor Cloud-Run-style deployments.