Design Documents

This directory contains design documents and proposals that describe the architecture, rationale, and implementation plans behind key SDK features.

Architecture & Vision

Document	Description
design.md	Original SDK architecture and design rationale
prd_unified_analytics_interface.md	PRD for unified analytics interface

Evaluation

Document	Description
hatteras_evaluation.md	Hatteras-style categorical evaluation design

Agent Context Graph

Extract decision traces from your agent's context graph — the requests an agent handled, the options it weighed, and the outcomes it committed — materialized into a queryable BigQuery property graph.

Document	Description
codelabs/periodic_materialization.md	Start here. Deploy a graph, seed events, materialize with `bqaa context-graph --graph`, query decision traces in GQL
guides/scheduled-context-graph-deploy.md	Take the deployed graph to a scheduled Cloud Run + Cloud Scheduler production deploy
guides/conversational-analytics-first.md	Ask the decision graph questions in plain English before dropping to GQL
context_graph_v2_design.md	Property Graph V2 design
context_graph_v3_design.md	Property Graph V3 with GQL and world-change detection

Ontology & Binding Internals (advanced)

Internal machinery behind the Agent Context Graph's extraction spec. Most users never touch these — bqaa context-graph --graph derives everything from the deployed property graph.

Document	Description
ontology_graph_v4_design.md	YAML-driven ontology extraction and materialization
ontology_graph_v5_design.md	V5: TTL import, mixed extraction, temporal lineage
learning_ontology_and_context_graph.md	Learning guide for ontology and context graph
implementation_plan_concept_index_runtime.md	Phased implementation plan for concept index + runtime entity resolution (issue #58)
ontology_runtime_reader.md	Ontology runtime reader (issue #58 reader follow-on to PR #92). `OntologyRuntime` loads ontology + binding + optional concept-index lookup. `EntityResolver` Protocol + two reference impls: `ExactEntityResolver` (in-memory) + `LabelSynonymResolver` (BQ-backed). `ConceptIndexLookup` is fingerprint-strict: eager `verify()` at construction + every `lookup_*` query includes `WHERE compile_fingerprint = @expected_fp` as defense in depth. Stable failure codes: `FingerprintMismatchError`, `MetaTableMissingError`, `MetaTableEmptyError`, `MetaTableMultipleRowsError`. `table_id` validated at construction (same regex discipline as Phase C's bundle mirror); `verify()` always re-queries (no cache); SKOS traversal helpers (`in_scheme`, `broader`, `narrower`, `related`) + `relationships_by_name` (tuple, never singular) reflect #58's traversal-first contract. NO embedding / LLM / fuzzy in this slice — those are explicit non-goals; future PRs can implement the Protocol without changing the runtime surface.

Component reference

Document	Description
ontology/ontology.md	Ontology core design — logical ontology spec
ontology/binding.md	Binding design — attaching ontology to physical tables
ontology/compilation.md	Compilation — resolving ontology + binding into backend DDL
ontology/cli.md	CLI design for the `gm` tool (validate, compile, import-owl)
ontology/owl-import.md	OWL import — converting OWL ontologies to YAML format
ontology/ontology-build.md	`bq-agent-sdk ontology-build` orchestrator + `--skip-property-graph` reference
ontology/binding-validation.md	`bq-agent-sdk binding-validate` pre-flight + `ontology-build --validate-binding[-strict]` reference
ontology/validation.md	`validate_extracted_graph(spec, graph)` post-extraction validator with NODE/FIELD/EDGE-scope failure classification
extractor_compilation_rollout_guide.md	Start here for compiled extractors. End-to-end rollout playbook stitching the five Phase C stages together: Compile → Publish → Sync → Wire → Revalidate. Covers when to run each stage, inputs/outputs, failure modes, trust boundaries (four gates across the pipeline: the compile-time smoke check plus three `load_bundle` runs at publish, sync, and runtime discovery), a worked BKA example with Python snippets + the real `bqaa-revalidate-extractors` shell invocation, and a failure-recovery playbook. Notes the local/co-located shortcut (Compile → Wire → Revalidate) vs the canonical distributed flow. Per-stage docs are the deep dives.
extractor_compilation_runtime_target.md	Phase 1 runtime-target decision for compiled structured extractors (issue #75 P0.2): client-side Python via the existing `run_structured_extractors()` hook
extractor_compilation_scaffolding.md	Compile-time scaffolding for compiled structured extractors (issue #75 PR 4b.1): fingerprint, manifest, AST allowlist, smoke-test runner, end-to-end `compile_extractor`. LLM-driven template fill is PR 4b.2; runtime loading is C2.
extractor_compilation_template_renderer.md	Deterministic source generator for compiled structured extractors (issue #75 PR 4b.2.1): `render_extractor_source(plan)` turns a `ResolvedExtractorPlan` into Python source compatible with 4b.1's `compile_extractor`. LLM step that resolves raw rules into a plan is PR 4b.2.2.
extractor_compilation_plan_parser.md	JSON-to-plan parser for compiled structured extractors (issue #75 PR 4b.2.2.a): `parse_resolved_extractor_plan_json(payload)` turns LLM-emitted JSON into a `ResolvedExtractorPlan` with structured `PlanParseError` codes. The deterministic boundary the LLM step in PR 4b.2.2.b will plug into.
extractor_compilation_plan_resolver.md	LLM-driven plan resolver for compiled structured extractors (issue #75 PR 4b.2.2.b): `build_resolution_prompt(rule, schema)` produces the prompt; `PlanResolver(llm_client).resolve(rule, schema)` wires prompt → LLM call → parser. Adapter-free `LLMClient` Protocol; concrete provider adapters and retry orchestration land separately.
extractor_compilation_diagnostics.md	Diagnostic builders for retry-prompt feedback (issue #75 PR 4b.2.2.c.1): `build_plan_parse_diagnostic`, `build_ast_diagnostic`, `build_smoke_diagnostic`, `build_compile_result_diagnostic` (covers `invalid_identifier` / `invalid_event_types` / `load_error` plus AST/smoke fall-through), plus a `build_gate_diagnostic(kind, payload)` dispatcher. Output is actionable, bounded (ten-entry caps; tracebacks reduced to their last line), and deterministic — ready for retry-prompt embedding in PR 4b.2.2.c.2.
extractor_compilation_retry_loop.md	Retry-on-gate-failure orchestrator for compiled structured extractors (issue #75 PR 4b.2.2.c.2): `compile_with_llm(rule, schema, llm_client, compile_source, max_attempts)` loops resolver → renderer → `compile_extractor`, feeding `build_compile_result_diagnostic` / `build_plan_parse_diagnostic` / synthesized `RenderError` strings back to the LLM via `build_retry_prompt`. Returns `RetryCompileResult` with per-attempt `AttemptRecord` history (one failure channel populated each: parser / render / compile). LLM exceptions propagate unchanged.
extractor_compilation_bka_measurement.md	Compile-and-measure utility + BKA-decision end-to-end proof (issue #75 PR 4c): `measure_compile(...)` runs `compile_with_llm`, loads the compiled bundle, and computes parity against a reference extractor; returns a JSON-serializable `CompileMeasurement` (loop outcome + per-axis parity counts + audit fields). CI path is deterministic; gated live path (`BQAA_RUN_LIVE_LLM_COMPILE_TESTS=1`) regenerates the checked-in measurement artifact at `tests/fixtures_extractor_compilation/bka_decision_measurement_report.json`.
extractor_compilation_bundle_loader.md	Bundle loader + minimal runtime discovery for compiled extractors (issue #75 PR C2.a): `load_bundle(bundle_dir, expected_fingerprint, expected_event_types)` and `discover_bundles(parent_dir, expected_fingerprint, event_type_allowlist)`. Stable `LoadFailure` codes (manifest_missing / unreadable, fingerprint_mismatch, event_types_mismatch, module_not_found, import_failed, function_not_found, function_signature_mismatch, event_type_collision); never raises through to the caller. Multi-event bundles register the same callable under each declared event_type; collisions fail closed. Out of scope: fallback wiring, BQ mirror, ontology-graph call-site swap.
extractor_compilation_runtime_fallback.md	Runtime fallback wiring for compiled structured extractors (issue #75 PR C2.b): `run_with_fallback(...)` returning `FallbackOutcome` (`decision` is one of `compiled_unchanged` / `compiled_filtered` / `fallback_for_event`). Validates compiled output via #76; on per-element failures drops just the offending nodes / edges (with orphan cleanup) AND downgrades the event's span from `fully_handled` to `partially_handled` so the AI transcript still sees the source span. EVENT-scope, exception, wrong-type, and unpinpointable failures all trigger whole-event fallback. Does not validate fallback output; fallback exceptions propagate. Orchestrator call-site swap is C2.c.
extractor_compilation_runtime_registry.md	Runtime extractor-registry adapter (issue #75 PR C2.c.1): `build_runtime_extractor_registry(...)` glues C2.a's `discover_bundles` + C2.b's `run_with_fallback` into one call, returning a `WrappedRegistry` with an `extractors` dict ready for `run_structured_extractors` plus `bundles_without_fallback` (compiled-only, skipped) and `fallbacks_without_bundle` (no usable compiled registry entry — "never built" and "rejected by discovery"; cross-reference `discovery.failures` for the reason). Compiled-only event_types are skipped and recorded (fail-closed); fallback-only event_types pass through unchanged. Non-callable fallbacks are rejected at build time with `TypeError` naming the event_type. The `on_outcome(event_type, outcome)` callback fires on every wrapped invocation (denominator metric); callback exceptions propagate. Out of scope: actual orchestrator call-site swap (C2.c.2), BQ mirror (C2.c.3), revalidation (C2.d).
extractor_compilation_orchestrator_swap.md	Orchestrator call-site swap (issue #75 PR C2.c.2): `OntologyGraphManager.from_bundles_root(...)` classmethod that builds the runtime registry internally and constructs a manager whose `extractors` dict is the wrapped registry, so existing `run_structured_extractors` calls inside `extract_graph` pick up compiled-with-fallback behavior with no other code changes. Adds `manager.runtime_registry: WrappedRegistry
extractor_compilation_bq_bundle_mirror.md	BigQuery-table bundle mirror (issue #75 PR C2.c.3): `publish_bundles_to_bq(bundle_root, store, ...)` + `sync_bundles_from_bq(store, dest_dir, ...)`. Mirror is a publish/sync utility, NOT a runtime loader — the runtime path stays `sync_bundles_from_bq → discover_bundles → from_bundles_root`. Both functions call `load_bundle` as a gate: publish refuses bundles that wouldn't load at the runtime; sync writes to a side-by-side staging directory and `load_bundle`-validates the staged copy before performing a staged replace of the target (the rmtree+move pair is not strictly atomic — a crash between the two leaves the bundle absent on disk and is recoverable by re-sync — but the load-bundle-failure direction is atomic, so a bad mirror row never destroys a previously-good local bundle). Strict bundle-shape (exactly `manifest.json` + the manifest's `module_filename`) plus shape-check on the manifest's `module_filename` (bare filename only — no separators, no `..`, no NUL; otherwise `manifest_row_unreadable`). Path-safety rejects traversal / absolute / backslash / NUL. `duplicate_fingerprint` rejects publish-side cases where two subdirs claim the same fingerprint (neither published). `duplicate_row` rejects two rows sharing the same `(fingerprint, bundle_path)` at sync. `malformed_row` shape check. Idempotent republish via DELETE+INSERT in `BigQueryBundleStore.publish_rows` (NOT a single atomic transaction; a transient INSERT failure is recoverable by re-running publish). `publish_rows` raises `ValueError` on duplicate input pairs as defense in depth. `BundleStore` Protocol for testability; `BigQueryBundleStore` is the concrete impl. Stable `MirrorFailure` codes; per-bundle problems accumulate, store exceptions propagate. Out of scope: GCS signed URLs, caching, garbage collection, multi-region.
extractor_compilation_revalidate_cli.md	`bqaa-revalidate-extractors` CLI (Phase C operationalization): one-shot binary that wraps `revalidate_compiled_extractors`. Event source flags are mutually exclusive: `--events-jsonl` for local JSONL files OR `--events-bq-query-file` for BigQuery (SQL must return one column named `event_json` STRING per row). `--bq-project` is optional with ADC fallback. Other flags: `--bundles-root`, `--reference-extractors-module`, `--thresholds-json` (optional), `--report-out`. Reference module exposes `EXTRACTORS` dict + `RESOLVED_GRAPH` (+ optional `SPEC`) so the CLI doesn't need ontology/binding flags. Fingerprint auto-detected from the first bundle's manifest; mixed fingerprints fail-closed. Exit codes: `0` pass / `1` threshold violation / `2` usage-or-input error. Report JSON includes both the raw `RevalidationReport` and the `ThresholdCheckResult`. Out of scope: pagination strategy for ultra-large corpora, scheduled execution, BQ persistence, auto-row-shape inference (explicit non-goal — the `event_json` contract keeps the CLI predictable).
extractor_compilation_revalidation.md	Revalidation harness (issue #75 PR C2.d): `revalidate_compiled_extractors(events, compiled_extractors, reference_extractors, resolved_graph, ...)` drives `run_with_fallback` (with a no-op fallback) over a batch of events AND calls the reference extractor directly, aggregating outcomes into a `RevalidationReport` with two orthogonal dimensions: runtime decision (`compiled_unchanged` / `compiled_filtered` / `fallback_for_event`, plus `compiled_path_faults` split out so bundle bugs are distinguishable from ontology drift) and agreement against reference (`parity_match` / `parity_divergence` / `parity_not_checked`). Parity uses three comparators: `_compare_nodes` and `_compare_span_handling` from `measurement.py` plus `_compare_edges` in `revalidation.py` (same edge_id set with matching relationship_name / endpoints / property-set per shared edge; duplicate edge_ids on either side reported as a divergence rather than silently collapsed by dict keying). The parity dimension catches schema-valid but semantically wrong outputs the schema-only check would miss. Every failure mode on the reference side becomes a parity divergence, never a batch abort: exceptions, non-`StructuredExtractionResult` returns (including `None`), and comparator crashes all funnel into the divergence channel with a descriptive string. `check_thresholds(report, RevalidationThresholds(...))` evaluates policy gates; threshold rates are validated to `[0, 1]` at construction so a typo like `=5` (intended as 5%) fails loud. JSON-serializable for persistence; deterministic. Out of scope: scheduled orchestration, BQ persistence, CLI, sampling strategy.

Deployment Surfaces

Document	Description
Agent Context Graph periodic materialization playbook	Customer deployment path for keeping the MAKO context graph fresh on a schedule: local dry-run, Cloud Run Job + Cloud Scheduler deploy with `--smoke`, IAM matrix, schedule guidance, JSON log shape, Cloud Monitoring alert filters, state-table inspection, cleanup, and troubleshooting.
proposal_bigquery_agent_cli.md	CLI proposal and command design
python_udf_support_design.md	BigQuery Python UDF architecture
remote_function_rationale.md	Cloud Run remote function rationale
implementation_plan_remote_function.md	Remote function implementation plan

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Design Documents

Architecture & Vision

Evaluation

Agent Context Graph

Ontology & Binding Internals (advanced)

Component reference

Deployment Surfaces

Uh oh!

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Design Documents

Architecture & Vision

Evaluation

Agent Context Graph

Ontology & Binding Internals (advanced)

Component reference

Deployment Surfaces