Skip to content

Commit 128bc78

Browse files
authored
docs: end-to-end rollout guide for compiled structured extractors (#152)
* docs: end-to-end rollout guide for compiled structured extractors Phase C wrap-up of issue #75. Operational playbook stitching the five Phase C stages — Compile, Publish, Sync, Wire, Revalidate — into one flow. Treats Publish/Sync as the remote-runtime path; co-located deployments can shortcut to Compile -> Wire -> Revalidate. Structure: * Overview + ASCII pipeline diagram + cadence table. * Per-stage section: purpose, API + canonical Python snippet (or shell command for the real CLI in Stage 5), failure modes by stable code, pointer to the detailed per-PR doc. * Worked BKA example walked through all five stages: Python for compile / publish / sync / wire (no CLIs exist for those), shell for ``bqaa-revalidate-extractors`` (real CLI). Snippets use the actual call signatures verified against the BKA live test. * Trust-boundaries section documenting the four points where ``load_bundle`` runs — compile smoke gate, pre-publish, post-sync, runtime-startup discovery — so the trust model is one mental model across the pipeline. * Failure-recovery playbook keyed on the stable failure codes each stage emits (``duplicate_fingerprint``, ``bundle_load_failed``, ``manifest_row_unreadable``, ``invalid_bundle_path``, ``fingerprint_not_in_table``, etc.) with the one-line action for each. Index entry in docs/README.md positions the rollout guide as "Start here for compiled extractors" — the per-PR docs become deep dives once readers have the pipeline shape in their heads. No code changes. * docs(rollout-guide): four-trust-gates wording + self-contained sync snippet Addresses PR #152 round-1 reviewer findings. P2 - Trust-boundaries section previously claimed ``load_bundle`` runs at compile time. It doesn't: ``compile_extractor`` runs ``load_callable_from_source`` + ``run_smoke_test[_in_subprocess]`` and then writes the manifest. The actual ``load_bundle`` gate exists only at publish, sync, and runtime discovery — three places, not four. Re-framed as "four trust gates" with explicit annotation that gate 1 is the compile-time smoke check (NOT ``load_bundle`` itself — no manifest exists yet) and gates 2-4 are the real ``load_bundle`` runs. Same edit propagated to the docs/README.md index entry and the CHANGELOG bullet so the three places that describe the trust model use the same words. P3 - Stage 3's sync snippet reused ``store`` from Stage 2, which only works in a single process. In a distributed deployment the sync host is a different process from the publish host. Both the standalone Stage 3 example and the worked-BKA Stage 3 example now reconstruct ``BigQueryBundleStore`` against the same ``table_id`` (the typical pattern: the runtime host uses its own service- account ADC). A one-line note now explicitly calls out that the runtime host constructs the same store handle before calling ``sync_bundles_from_bq``. Sanity-checked: every public import used in the rollout doc's snippets resolves cleanly (``measure_compile``, ``compile_extractor``, ``compute_fingerprint``, ``BigQueryBundleStore``, ``publish_bundles_to_bq``, ``sync_bundles_from_bq``, ``OntologyGraphManager``, ``extract_bka_decision_event``). No code changes.
1 parent c0d6eac commit 128bc78

3 files changed

Lines changed: 467 additions & 0 deletions

File tree

CHANGELOG.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,29 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
99

1010
### Added
1111

12+
- **Compiled-extractor rollout guide** at
13+
[`docs/extractor_compilation_rollout_guide.md`](docs/extractor_compilation_rollout_guide.md).
14+
Operational playbook for the Phase C pipeline (issue
15+
[#75](https://github.com/GoogleCloudPlatform/BigQuery-Agent-Analytics-SDK/issues/75))
16+
stitching the five stages — Compile, Publish, Sync, Wire,
17+
Revalidate — into one flow. Treats Publish/Sync as the
18+
remote-runtime path; local / co-located deployments can
19+
shortcut to Compile → Wire → Revalidate. Worked BKA
20+
example uses Python snippets for the non-CLI stages
21+
(``measure_compile``, ``publish_bundles_to_bq``,
22+
``sync_bundles_from_bq``,
23+
``OntologyGraphManager.from_bundles_root``) and the real
24+
``bqaa-revalidate-extractors`` shell invocation only
25+
where a CLI actually exists. Documents the **four trust
26+
gates** across the pipeline — the compile-time smoke gate
27+
inside ``compile_extractor`` (``load_callable_from_source``
28+
+ ``run_smoke_test``, not ``load_bundle`` itself: there's
29+
no manifest at compile time) plus three real
30+
``load_bundle`` runs at pre-publish, post-sync, and
31+
runtime-startup discovery — so the trust model is one
32+
mental model across the pipeline.
33+
Includes a failure-recovery playbook keyed on the stable
34+
failure codes each stage emits.
1235
- **``--events-bq-query-file`` for ``bqaa-revalidate-extractors``**
1336
(issue [#75](https://github.com/GoogleCloudPlatform/BigQuery-Agent-Analytics-SDK/issues/75)
1437
CLI follow-up). The CLI now accepts a BigQuery event

docs/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ architecture, rationale, and implementation plans behind key SDK features.
3939
| [ontology/ontology-build.md](ontology/ontology-build.md) | `bq-agent-sdk ontology-build` orchestrator + `--skip-property-graph` reference |
4040
| [ontology/binding-validation.md](ontology/binding-validation.md) | `bq-agent-sdk binding-validate` pre-flight + `ontology-build --validate-binding[-strict]` reference |
4141
| [ontology/validation.md](ontology/validation.md) | `validate_extracted_graph(spec, graph)` post-extraction validator with NODE/FIELD/EDGE-scope failure classification |
42+
| [extractor_compilation_rollout_guide.md](extractor_compilation_rollout_guide.md) | **Start here for compiled extractors.** End-to-end rollout playbook stitching the five Phase C stages together: Compile → Publish → Sync → Wire → Revalidate. Covers when to run each stage, inputs/outputs, failure modes, trust boundaries (four gates across the pipeline: the compile-time smoke check plus three `load_bundle` runs at publish, sync, and runtime discovery), a worked BKA example with Python snippets + the real `bqaa-revalidate-extractors` shell invocation, and a failure-recovery playbook. Notes the local/co-located shortcut (Compile → Wire → Revalidate) vs the canonical distributed flow. Per-stage docs are the deep dives. |
4243
| [extractor_compilation_runtime_target.md](extractor_compilation_runtime_target.md) | Phase 1 runtime-target decision for compiled structured extractors (issue #75 P0.2): client-side Python via the existing `run_structured_extractors()` hook |
4344
| [extractor_compilation_scaffolding.md](extractor_compilation_scaffolding.md) | Compile-time scaffolding for compiled structured extractors (issue #75 PR 4b.1): fingerprint, manifest, AST allowlist, smoke-test runner, end-to-end `compile_extractor`. LLM-driven template fill is PR 4b.2; runtime loading is C2. |
4445
| [extractor_compilation_template_renderer.md](extractor_compilation_template_renderer.md) | Deterministic source generator for compiled structured extractors (issue #75 PR 4b.2.1): `render_extractor_source(plan)` turns a `ResolvedExtractorPlan` into Python source compatible with 4b.1's `compile_extractor`. LLM step that *resolves* raw rules into a plan is PR 4b.2.2. |

0 commit comments

Comments
 (0)