You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Agent Context Graph: extract decision traces from your agent's context graph — a runnable ADK agent + BQ AA plugin streaming events, the codelab artifacts (codelab/), and the scheduled Cloud Run + Cloud Scheduler deploy (periodic_materialization/). Start with the codelab.
An agent that rewrites its own versioned SKILL.md from its conversation traces (no teacher model): flawed V0 → evolve_skill() → tool-first V1, golden-Q&A scored, with the anti-parroting rule and Skill Registry versioning. See the dedicated section below.
Decision-lineage property graph (issue #98): live ADK media-planner agent + BQ AA Plugin running across 6 campaign sessions → SDK build_context_graph(use_ai_generate=True, include_decisions=True) → six GQL blocks pasted into BigQuery Studio (one renders an interactive graph diagram, one is a portfolio roll-up)
Skill Evolution Lab — a self-improving agent
skill_evolution_lab/ is the runnable companion to the
blog post "Your Agent Can Learn From Its Own Conversations." One company-policy Q&A agent
reads its own conversation traces — successes and failures — and extracts a
structured, versioned SKILL.md. No teacher model, no managed optimizer.
The flaw with headroom. V0 is a deliberately flawed skill (a few facts
baked in plus "answer only from the above, else contact HR") that suppresses
a tool which already knows every answer. Only the skill is wrong — the model,
tools, and questions stay fixed across V0 and V1, so any delta is attributable
to the skill.
The engine, imported not copied.analyze_and_evolve.py imports the SDK's
reusable scripts/skill_evolution.py (the
same evolve_skill() the quality lab uses): it partitions scored
conversations, runs a fleet of parallel analysts, and consolidates recurring
rules into a new skill version.
Ground-truth scoring. Quality is graded against a golden Q&A answer key
(eval/eval_spec.json) via scripts/quality_report.py
(--eval-spec), not a no-ground-truth "usefulness" guess.
The anti-parroting rule. Multi-turn cases where the user asserts a wrong
correction; a good agent re-verifies with its tool and holds the right figure
instead of caving. The engine detects parroting (--tag-turns) and learns a
"re-verify, don't just agree" rule.
Skill Registry versioning. The evolved skill is mirrored to the Gemini
Enterprise Agent Platform Skill Registry as a new immutable revision
(V0 = revision 1, V1 = revision 2); reset.sh reverts both the local copy and
the registry to V0.
A verified run (gemini-3.5-flash, golden-grounded, 55-question held-out set):
V0 18.2% → V1 100% overall; corrections (anti-parroting) 0% → 100%;
evolved skill 2.9KB. Across four models × 3 seeds, mean V1 correctness is 90–99%
per model (V0 16–53%). See the example's
README and
VERIFICATION.
Note:ontology_graph_v4_demo.ipynb, ontology_graph_v5_demo.ipynb, and
ymgo_graph_spec.yaml are kept for reference. The current Agent Context Graph approach
needs none of these files: deploy your property graph to BigQuery and
bqaa context-graph --graph derives everything from it — start with the
codelab.