Skip to content

Tracking: SDK production cleanup — rename migration_v5, consolidate context-graph surface, blogpost-ready audit #282

Description

@caohy1988

Tracking: SDK production cleanup — rename migration_v5, consolidate context-graph surface, blogpost-ready audit

Context

We are taking the SDK to production-ready in preparation for a public blogpost that highlights the context graph capabilities. The current repo layout has a few rough edges that block that:

  • examples/migration_v5/ is a load-bearing demo (MAKO ontology + the ontology-agnostic artifact pipeline + a runnable agent + the scheduled Cloud Run deployment), but its name does not describe what it does. "v5" was an internal milestone label; for an external reader landing from a blogpost it reads as version bookkeeping for something they have no context on.
  • examples/migration_v5/periodic_materialization/ will be promoted to adk.dev/integrations/bigquery-agent-analytics/ as user-manual content (see companion issue). The deep-link path that users land on cannot be migration_v5/....
  • The context-graph design docs are versioned in their filenames (docs/context_graph_v2_design.md, docs/context_graph_v3_design.md) rather than canonical + archive, so readers cannot tell which one to read.
  • The context-graph source surface is wider than it needs to be (20+ files across src/bigquery_agent_analytics/), with a mix of public API, internal helpers (one _-prefixed), and a _v2/_v3 design history that has bled into module names in a few places.
  • migration_v5 is referenced in 36 paths across the repo (file names, directory names, generated artifact paths, test names, deploy script literals), so the rename is non-trivial and needs to be done as one coordinated change.

This issue is the umbrella for that cleanup. Sub-issues will be filed for each renamable / auditable surface.

Audit findings

1. The migration_v5 name

From examples/migration_v5/README.md:

The demo's event source of truth is a runnable agent talking to the BQ AA plugin ... The artifact pipeline that turns a TTL into binding + DDL + property-graph SQL is ontology-agnostic ... and the MAKO config (in mako_artifacts.py) is one concrete configuration of it.

The demo is the ontology-driven context-graph extraction pipeline, with MAKO as the load-bearing example. Concrete rename candidates:

Candidate Pros Cons
examples/context_graph_demo/ Matches the blogpost framing; obvious from a deep link Mild overlap with src/bigquery_agent_analytics/context_graph.py (module) — namespace, not collision
examples/ontology_driven_context_graph/ More accurate description Long for deep links
examples/mako_context_graph/ Names the load-bearing example Bakes one config into the demo's directory name, but the pipeline is supposed to be ontology-agnostic
examples/context_graph_extraction/ Verb-based; matches "extraction" language Slightly less obvious than _demo for someone exploring examples/

Recommended: examples/context_graph_demo/. It matches the blogpost framing, scales to multiple ontology configs (which is the pluggable-contract direction the README already documents), and produces clean deep links like examples/context_graph_demo/periodic_materialization/README.md.

2. Files / paths to rename

36 paths today contain migration_v5. The rename touches:

  • examples/migration_v5/examples/context_graph_demo/ (whole directory tree)
  • examples/migration_v5_demo_notebook.ipynbexamples/context_graph_demo_notebook.ipynb
  • tests/test_migration_v5_ontology_artifacts.py and any other test_migration_v5_*.pytest_context_graph_demo_*.py
  • Generated dataset names like migration_v5_demo that appear in binding.yaml, table_ddl.sql, property_graph.sql — these are user-visible BigQuery dataset names. Decide: rename the generated default to context_graph_demo, or keep the legacy default with a deprecation note for users with existing tables.
  • examples/migration_v5/periodic_materialization/README.md references — every internal cross-link inside the deploy scripts, Terraform, and the README needs updating.
  • Test fixtures that include the path in golden output / snapshots.

3. Design-doc consolidation

docs/context_graph_v{2,3}_design.md (50KB + 28KB) and docs/learning_ontology_and_context_graph.md (25KB). Three problems:

  • Versioned filenames imply the reader needs to know which is current.
  • The "learning" doc reads like an explainer; the v3 design doc reads like a technical reference. They have overlapping scope but different audiences.
  • The blogpost will link readers to one canonical doc; today there isn't one.

Recommended consolidation:

  • docs/context_graph.md (canonical user-facing concept overview, sourced from the learning doc + the user-facing parts of v3).
  • docs/context_graph_design.md (technical reference, sourced from v3's design-doc material).
  • docs/_archive/context_graph_design_v2.md (preserved for history; out of the documented surface).

4. Context-graph source surface audit

src/bigquery_agent_analytics/ carries these context-graph-related modules:

context_graph.py
_ontology_routing.py
extracted_models.py
extractor_compilation/ (15 files)
graph_validation.py
materialize_window.py
ontology_graph.py
ontology_materializer.py
ontology_models.py
ontology_orchestrator.py
ontology_property_graph.py

Open questions:

  • _ontology_routing.py has the leading-underscore internal convention but lives in the public package. Decide: move to _internal/ (most common Python convention) or drop the underscore and treat it as public.
  • ontology_graph.py vs. ontology_property_graph.py vs. context_graph.py — three modules with overlapping mental-model names. Is each one a distinct concept? Audit and (if duplication exists) merge or document the distinction.
  • ontology_orchestrator.py and ontology_materializer.py — separate concerns or one renamed in flight?
  • extractor_compilation/ — 15 files. Verify which are public API and which should be in a sub-package internal sentinel (_compiler/, etc.).
  • materialize_window.py is the documented public API the periodic-materialization deploy depends on (per periodic_materialization/README.md). Make sure its docstring states that explicitly so the rename audit doesn't accidentally hide it.

The audit should produce a written docs/context_graph_surface_audit.md listing every module with: public vs. internal, primary concept, related modules, and a one-line summary. That doc becomes the input for any renaming/merging follow-ups.

5. README + SDK.md alignment

  • README.md is 7KB. For a public blogpost landing page it needs a clear "what is this, and what is the context graph" lede plus a 30-second quickstart pointing at examples/context_graph_demo/.
  • SDK.md is 64KB. Too long to be the front door. Consider:
    • Splitting into docs/sdk_reference.md (API reference) + docs/concepts.md (concepts) + retaining SDK.md as a thin landing-page index.
    • Or keeping it as one doc but adding a clear table-of-contents and section anchors so a blogpost can deep-link.

6. Notebook hygiene

  • examples/context_graph_adcp_demo.ipynb — "adcp" is an unexpanded acronym for someone landing fresh. Either expand it in the filename or document the term in the notebook's first cell. If it's a one-off demo not slated for blogpost reference, consider moving to examples/_archive/.
  • The top-level examples/migration_v5_demo_notebook.ipynb should move into the demo directory after rename: examples/context_graph_demo/context_graph_demo_notebook.ipynb.

7. Pre-blogpost checklist

  • All public-API functions have docstrings.
  • Type hints on public surface.
  • Error messages name the user-facing concept ("ontology binding", "context graph"), not internal model names.
  • Log output from bqaa context-graph is structured JSON (already true per periodic-materialization README) and the fields are documented.
  • bqaa --help / sub-command help text is curated.
  • examples/context_graph_demo/ runs end-to-end on a fresh BQ project with the documented quickstart.
  • pyproject.toml [project.urls] points at the right places (adk-docs page, repo, issues).
  • CHANGELOG.md notes the rename + design-doc consolidation.

Cleanup plan — proposed wave order

Wave Scope Why first / last
1 Source surface audit doc (docs/context_graph_surface_audit.md) Inputs the rest of the renames
2 Design-doc consolidation (canonical context_graph.md + context_graph_design.md; archive v2) Settles terminology before user-facing docs change
3 examples/migration_v5/examples/context_graph_demo/ rename + all cross-references + 36 path updates Atomic single PR; touches deploy scripts, tests, generated artifacts, Terraform, notebook
4 Module-level renames (per §4 audit outcomes) — _ontology_routing.py placement, ontology-graph deduplication if any After audit but independent of the demo rename
5 README / SDK.md re-shape for blogpost narrative Last so it can reference the new names
6 Pre-blogpost checklist run (docstrings, types, help text, end-to-end smoke) Final gate

Sub-issues to file

  • Wave 1: Write docs/context_graph_surface_audit.md — module inventory, public/internal classification, primary concept per module.
  • Wave 2: Consolidate context-graph design docs — docs/context_graph.md (concept), docs/context_graph_design.md (reference), archive v2.
  • Wave 3: Rename examples/migration_v5/examples/context_graph_demo/ (single atomic PR; 36 paths). Includes the top-level migration_v5_demo_notebook.ipynb moving into the demo directory.
  • Wave 3: Decide and apply BQ dataset-name policy (legacy default migration_v5_demo vs. new context_graph_demo; if changed, document the migration step for existing users).
  • Wave 3: Update periodic_materialization/ README + deploy scripts + Terraform for the new path. Coordinate timing with the adk-docs page update (companion issue) so external links don't break.
  • Wave 4: Module-level audit outcomes — _ontology_routing.py placement; ontology-graph naming deduplication.
  • Wave 4: Notebook hygiene — context_graph_adcp_demo.ipynb rename or archive decision.
  • Wave 5: README front-page rewrite for blogpost narrative.
  • Wave 5: SDK.md split decision (reference + concepts + thin index, or single curated doc with explicit TOC).
  • Wave 6: Pre-blogpost checklist completed (docstrings, type hints, error-message review, end-to-end smoke on fresh project, pyproject.toml URLs, CHANGELOG.md entry).

Acceptance

  • No migration_v5 path or identifier remains in the repo except inside docs/_archive/ historical material and the CHANGELOG entry that documents the rename.
  • A reader landing from a blogpost on README.md can find the context-graph demo in under three clicks and run it against their own BigQuery project from examples/context_graph_demo/README.md.
  • docs/context_graph.md is the single canonical concept doc. docs/context_graph_design.md is the single canonical technical reference. v2 lives in _archive/.
  • The periodic-materialization deploy works from the renamed path and the adk-docs page (companion issue) points at the new deep links.

Coordination

This work overlaps with the companion adk-docs user-manual issue. The adk-docs page should land after Wave 3 (rename) is merged so the deep links on adk.dev/integrations/bigquery-agent-analytics/ are stable from day one.

References

  • examples/migration_v5/README.md — the demo's own description (the pluggable contract and authorship boundary section is the source for the rename rationale).
  • examples/migration_v5/periodic_materialization/README.md — production-shape Cloud Run + Cloud Scheduler deploy that will be promoted to user-manual content.
  • docs/context_graph_v3_design.md, docs/context_graph_v2_design.md, docs/learning_ontology_and_context_graph.md — the three docs to consolidate.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions