Skip to content

Latest commit

 

History

History
84 lines (69 loc) · 17.6 KB

File metadata and controls

84 lines (69 loc) · 17.6 KB

Design Documents

This directory contains design documents and proposals that describe the architecture, rationale, and implementation plans behind key SDK features.

Architecture & Vision

Document Description
design.md Original SDK architecture and design rationale
prd_unified_analytics_interface.md PRD for unified analytics interface

Evaluation

Document Description
hatteras_evaluation.md Hatteras-style categorical evaluation design

Agent Context Graph

Extract decision traces from your agent's context graph — the requests an agent handled, the options it weighed, and the outcomes it committed — materialized into a queryable BigQuery property graph.

Document Description
codelabs/periodic_materialization.md Start here. Deploy a graph, seed events, materialize with bqaa context-graph --graph, query decision traces in GQL
guides/scheduled-context-graph-deploy.md Take the deployed graph to a scheduled Cloud Run + Cloud Scheduler production deploy
guides/conversational-analytics-first.md Ask the decision graph questions in plain English before dropping to GQL
context_graph_v2_design.md Property Graph V2 design
context_graph_v3_design.md Property Graph V3 with GQL and world-change detection

Ontology & Binding Internals (advanced)

Internal machinery behind the Agent Context Graph's extraction spec. Most users never touch these — bqaa context-graph --graph derives everything from the deployed property graph.

Document Description
ontology_graph_v4_design.md YAML-driven ontology extraction and materialization
ontology_graph_v5_design.md V5: TTL import, mixed extraction, temporal lineage
learning_ontology_and_context_graph.md Learning guide for ontology and context graph
implementation_plan_concept_index_runtime.md Phased implementation plan for concept index + runtime entity resolution (issue #58)
ontology_runtime_reader.md Ontology runtime reader (issue #58 reader follow-on to PR #92). OntologyRuntime loads ontology + binding + optional concept-index lookup. EntityResolver Protocol + two reference impls: ExactEntityResolver (in-memory) + LabelSynonymResolver (BQ-backed). ConceptIndexLookup is fingerprint-strict: eager verify() at construction + every lookup_* query includes WHERE compile_fingerprint = @expected_fp as defense in depth. Stable failure codes: FingerprintMismatchError, MetaTableMissingError, MetaTableEmptyError, MetaTableMultipleRowsError. table_id validated at construction (same regex discipline as Phase C's bundle mirror); verify() always re-queries (no cache); SKOS traversal helpers (in_scheme, broader, narrower, related) + relationships_by_name (tuple, never singular) reflect #58's traversal-first contract. NO embedding / LLM / fuzzy in this slice — those are explicit non-goals; future PRs can implement the Protocol without changing the runtime surface.

Component reference

Document Description
ontology/ontology.md Ontology core design — logical ontology spec
ontology/binding.md Binding design — attaching ontology to physical tables
ontology/compilation.md Compilation — resolving ontology + binding into backend DDL
ontology/cli.md CLI design for the gm tool (validate, compile, import-owl)
ontology/owl-import.md OWL import — converting OWL ontologies to YAML format
ontology/ontology-build.md bq-agent-sdk ontology-build orchestrator + --skip-property-graph reference
ontology/binding-validation.md bq-agent-sdk binding-validate pre-flight + ontology-build --validate-binding[-strict] reference
ontology/validation.md validate_extracted_graph(spec, graph) post-extraction validator with NODE/FIELD/EDGE-scope failure classification
extractor_compilation_rollout_guide.md Start here for compiled extractors. End-to-end rollout playbook stitching the five Phase C stages together: Compile → Publish → Sync → Wire → Revalidate. Covers when to run each stage, inputs/outputs, failure modes, trust boundaries (four gates across the pipeline: the compile-time smoke check plus three load_bundle runs at publish, sync, and runtime discovery), a worked BKA example with Python snippets + the real bqaa-revalidate-extractors shell invocation, and a failure-recovery playbook. Notes the local/co-located shortcut (Compile → Wire → Revalidate) vs the canonical distributed flow. Per-stage docs are the deep dives.
extractor_compilation_runtime_target.md Phase 1 runtime-target decision for compiled structured extractors (issue #75 P0.2): client-side Python via the existing run_structured_extractors() hook
extractor_compilation_scaffolding.md Compile-time scaffolding for compiled structured extractors (issue #75 PR 4b.1): fingerprint, manifest, AST allowlist, smoke-test runner, end-to-end compile_extractor. LLM-driven template fill is PR 4b.2; runtime loading is C2.
extractor_compilation_template_renderer.md Deterministic source generator for compiled structured extractors (issue #75 PR 4b.2.1): render_extractor_source(plan) turns a ResolvedExtractorPlan into Python source compatible with 4b.1's compile_extractor. LLM step that resolves raw rules into a plan is PR 4b.2.2.
extractor_compilation_plan_parser.md JSON-to-plan parser for compiled structured extractors (issue #75 PR 4b.2.2.a): parse_resolved_extractor_plan_json(payload) turns LLM-emitted JSON into a ResolvedExtractorPlan with structured PlanParseError codes. The deterministic boundary the LLM step in PR 4b.2.2.b will plug into.
extractor_compilation_plan_resolver.md LLM-driven plan resolver for compiled structured extractors (issue #75 PR 4b.2.2.b): build_resolution_prompt(rule, schema) produces the prompt; PlanResolver(llm_client).resolve(rule, schema) wires prompt → LLM call → parser. Adapter-free LLMClient Protocol; concrete provider adapters and retry orchestration land separately.
extractor_compilation_diagnostics.md Diagnostic builders for retry-prompt feedback (issue #75 PR 4b.2.2.c.1): build_plan_parse_diagnostic, build_ast_diagnostic, build_smoke_diagnostic, build_compile_result_diagnostic (covers invalid_identifier / invalid_event_types / load_error plus AST/smoke fall-through), plus a build_gate_diagnostic(kind, payload) dispatcher. Output is actionable, bounded (ten-entry caps; tracebacks reduced to their last line), and deterministic — ready for retry-prompt embedding in PR 4b.2.2.c.2.
extractor_compilation_retry_loop.md Retry-on-gate-failure orchestrator for compiled structured extractors (issue #75 PR 4b.2.2.c.2): compile_with_llm(rule, schema, llm_client, compile_source, max_attempts) loops resolver → renderer → compile_extractor, feeding build_compile_result_diagnostic / build_plan_parse_diagnostic / synthesized RenderError strings back to the LLM via build_retry_prompt. Returns RetryCompileResult with per-attempt AttemptRecord history (one failure channel populated each: parser / render / compile). LLM exceptions propagate unchanged.
extractor_compilation_bka_measurement.md Compile-and-measure utility + BKA-decision end-to-end proof (issue #75 PR 4c): measure_compile(...) runs compile_with_llm, loads the compiled bundle, and computes parity against a reference extractor; returns a JSON-serializable CompileMeasurement (loop outcome + per-axis parity counts + audit fields). CI path is deterministic; gated live path (BQAA_RUN_LIVE_LLM_COMPILE_TESTS=1) regenerates the checked-in measurement artifact at tests/fixtures_extractor_compilation/bka_decision_measurement_report.json.
extractor_compilation_bundle_loader.md Bundle loader + minimal runtime discovery for compiled extractors (issue #75 PR C2.a): load_bundle(bundle_dir, expected_fingerprint, expected_event_types) and discover_bundles(parent_dir, expected_fingerprint, event_type_allowlist). Stable LoadFailure codes (manifest_missing / unreadable, fingerprint_mismatch, event_types_mismatch, module_not_found, import_failed, function_not_found, function_signature_mismatch, event_type_collision); never raises through to the caller. Multi-event bundles register the same callable under each declared event_type; collisions fail closed. Out of scope: fallback wiring, BQ mirror, ontology-graph call-site swap.
extractor_compilation_runtime_fallback.md Runtime fallback wiring for compiled structured extractors (issue #75 PR C2.b): run_with_fallback(...) returning FallbackOutcome (decision is one of compiled_unchanged / compiled_filtered / fallback_for_event). Validates compiled output via #76; on per-element failures drops just the offending nodes / edges (with orphan cleanup) AND downgrades the event's span from fully_handled to partially_handled so the AI transcript still sees the source span. EVENT-scope, exception, wrong-type, and unpinpointable failures all trigger whole-event fallback. Does not validate fallback output; fallback exceptions propagate. Orchestrator call-site swap is C2.c.
extractor_compilation_runtime_registry.md Runtime extractor-registry adapter (issue #75 PR C2.c.1): build_runtime_extractor_registry(...) glues C2.a's discover_bundles + C2.b's run_with_fallback into one call, returning a WrappedRegistry with an extractors dict ready for run_structured_extractors plus bundles_without_fallback (compiled-only, skipped) and fallbacks_without_bundle (no usable compiled registry entry — "never built" and "rejected by discovery"; cross-reference discovery.failures for the reason). Compiled-only event_types are skipped and recorded (fail-closed); fallback-only event_types pass through unchanged. Non-callable fallbacks are rejected at build time with TypeError naming the event_type. The on_outcome(event_type, outcome) callback fires on every wrapped invocation (denominator metric); callback exceptions propagate. Out of scope: actual orchestrator call-site swap (C2.c.2), BQ mirror (C2.c.3), revalidation (C2.d).
extractor_compilation_orchestrator_swap.md Orchestrator call-site swap (issue #75 PR C2.c.2): OntologyGraphManager.from_bundles_root(...) classmethod that builds the runtime registry internally and constructs a manager whose extractors dict is the wrapped registry, so existing run_structured_extractors calls inside extract_graph pick up compiled-with-fallback behavior with no other code changes. Adds `manager.runtime_registry: WrappedRegistry
extractor_compilation_bq_bundle_mirror.md BigQuery-table bundle mirror (issue #75 PR C2.c.3): publish_bundles_to_bq(bundle_root, store, ...) + sync_bundles_from_bq(store, dest_dir, ...). Mirror is a publish/sync utility, NOT a runtime loader — the runtime path stays sync_bundles_from_bq → discover_bundles → from_bundles_root. Both functions call load_bundle as a gate: publish refuses bundles that wouldn't load at the runtime; sync writes to a side-by-side staging directory and load_bundle-validates the staged copy before performing a staged replace of the target (the rmtree+move pair is not strictly atomic — a crash between the two leaves the bundle absent on disk and is recoverable by re-sync — but the load-bundle-failure direction is atomic, so a bad mirror row never destroys a previously-good local bundle). Strict bundle-shape (exactly manifest.json + the manifest's module_filename) plus shape-check on the manifest's module_filename (bare filename only — no separators, no .., no NUL; otherwise manifest_row_unreadable). Path-safety rejects traversal / absolute / backslash / NUL. duplicate_fingerprint rejects publish-side cases where two subdirs claim the same fingerprint (neither published). duplicate_row rejects two rows sharing the same (fingerprint, bundle_path) at sync. malformed_row shape check. Idempotent republish via DELETE+INSERT in BigQueryBundleStore.publish_rows (NOT a single atomic transaction; a transient INSERT failure is recoverable by re-running publish). publish_rows raises ValueError on duplicate input pairs as defense in depth. BundleStore Protocol for testability; BigQueryBundleStore is the concrete impl. Stable MirrorFailure codes; per-bundle problems accumulate, store exceptions propagate. Out of scope: GCS signed URLs, caching, garbage collection, multi-region.
extractor_compilation_revalidate_cli.md bqaa-revalidate-extractors CLI (Phase C operationalization): one-shot binary that wraps revalidate_compiled_extractors. Event source flags are mutually exclusive: --events-jsonl for local JSONL files OR --events-bq-query-file for BigQuery (SQL must return one column named event_json STRING per row). --bq-project is optional with ADC fallback. Other flags: --bundles-root, --reference-extractors-module, --thresholds-json (optional), --report-out. Reference module exposes EXTRACTORS dict + RESOLVED_GRAPH (+ optional SPEC) so the CLI doesn't need ontology/binding flags. Fingerprint auto-detected from the first bundle's manifest; mixed fingerprints fail-closed. Exit codes: 0 pass / 1 threshold violation / 2 usage-or-input error. Report JSON includes both the raw RevalidationReport and the ThresholdCheckResult. Out of scope: pagination strategy for ultra-large corpora, scheduled execution, BQ persistence, auto-row-shape inference (explicit non-goal — the event_json contract keeps the CLI predictable).
extractor_compilation_revalidation.md Revalidation harness (issue #75 PR C2.d): revalidate_compiled_extractors(events, compiled_extractors, reference_extractors, resolved_graph, ...) drives run_with_fallback (with a no-op fallback) over a batch of events AND calls the reference extractor directly, aggregating outcomes into a RevalidationReport with two orthogonal dimensions: runtime decision (compiled_unchanged / compiled_filtered / fallback_for_event, plus compiled_path_faults split out so bundle bugs are distinguishable from ontology drift) and agreement against reference (parity_match / parity_divergence / parity_not_checked). Parity uses three comparators: _compare_nodes and _compare_span_handling from measurement.py plus _compare_edges in revalidation.py (same edge_id set with matching relationship_name / endpoints / property-set per shared edge; duplicate edge_ids on either side reported as a divergence rather than silently collapsed by dict keying). The parity dimension catches schema-valid but semantically wrong outputs the schema-only check would miss. Every failure mode on the reference side becomes a parity divergence, never a batch abort: exceptions, non-StructuredExtractionResult returns (including None), and comparator crashes all funnel into the divergence channel with a descriptive string. check_thresholds(report, RevalidationThresholds(...)) evaluates policy gates; threshold rates are validated to [0, 1] at construction so a typo like =5 (intended as 5%) fails loud. JSON-serializable for persistence; deterministic. Out of scope: scheduled orchestration, BQ persistence, CLI, sampling strategy.

Deployment Surfaces

Document Description
Agent Context Graph periodic materialization playbook Customer deployment path for keeping the MAKO context graph fresh on a schedule: local dry-run, Cloud Run Job + Cloud Scheduler deploy with --smoke, IAM matrix, schedule guidance, JSON log shape, Cloud Monitoring alert filters, state-table inspection, cleanup, and troubleshooting.
proposal_bigquery_agent_cli.md CLI proposal and command design
python_udf_support_design.md BigQuery Python UDF architecture
remote_function_rationale.md Cloud Run remote function rationale
implementation_plan_remote_function.md Remote function implementation plan