diff --git a/examples/README.md b/examples/README.md
index 35d0b887..66c32526 100644
--- a/examples/README.md
+++ b/examples/README.md
@@ -53,6 +53,7 @@ artifacts that demonstrate SDK capabilities.
 |-----------|-------------|
 | [context_graph/](context_graph/) | Agent Context Graph: extract decision traces from your agent's context graph — a runnable ADK agent + BQ AA plugin streaming events, the codelab artifacts ([codelab/](context_graph/codelab/)), and the scheduled Cloud Run + Cloud Scheduler deploy ([periodic_materialization/](context_graph/periodic_materialization/)). Start with the [codelab](../docs/codelabs/periodic_materialization.md). |
 | [agent_improvement_cycle/](agent_improvement_cycle/) | LoopAgent-driven prompt improvement cycle |
+| [self_evolving_agent_demo/](self_evolving_agent_demo/) | Metric-driven self-evolution demo for a single ADK agent. Uses trace signals to generate and gate a bounded prompt evolution. |
 | [decision_lineage_demo/](decision_lineage_demo/) | Decision-lineage property graph (issue #98): live ADK media-planner agent + BQ AA Plugin running across 6 campaign sessions → SDK `build_context_graph(use_ai_generate=True, include_decisions=True)` → six GQL blocks pasted into BigQuery Studio (one renders an interactive graph diagram, one is a portfolio roll-up) |
 
 ## Reference Artifacts
diff --git a/examples/self_evolving_agent_demo/.gitignore b/examples/self_evolving_agent_demo/.gitignore
new file mode 100644
index 00000000..b4de0192
--- /dev/null
+++ b/examples/self_evolving_agent_demo/.gitignore
@@ -0,0 +1,5 @@
+.env
+prompt_state.json
+reports/
+__pycache__/
+*/__pycache__/
diff --git a/examples/self_evolving_agent_demo/DEMO_NARRATION.md b/examples/self_evolving_agent_demo/DEMO_NARRATION.md
new file mode 100644
index 00000000..1a09698e
--- /dev/null
+++ b/examples/self_evolving_agent_demo/DEMO_NARRATION.md
@@ -0,0 +1,28 @@
+# Self-Evolving Agent Demo Narration
+
+## 30-second version
+
+This demo starts with a basketball analytics agent that answers correctly but
+wastes work. It logs every run to BigQuery through the analytics
+plugin. The SDK reads the traces, finds that the agent keeps calling a
+broad reference tool and spending excess tokens, generates a tighter V2
+prompt, reruns the same questions, and proves that quality stayed flat
+while token and tool usage dropped.
+
+## Walkthrough
+
+1. Run `./setup.sh`.
+2. Run `./run_e2e_demo.sh`.
+3. Watch the V1 run call broad and narrow sample tools.
+4. Watch `analyze_and_evolve.py` print the SDK-backed finding:
+   broad reference lookups were used on narrow tasks.
+5. Open `prompt_diff.md` to inspect the exact V1 -> generated V2 diff.
+6. Watch the V2 run use narrow tools directly.
+7. Open `comparison.md` for the final quality/token/tool diff.
+
+## Demo Message
+
+The important idea is not "save tokens" in isolation. The agent uses
+its own production-shaped traces as feedback. Token tracking gives the
+loop a measurable signal, but the goal is a self-evolving agent that
+gets cheaper or cleaner without losing answer quality.
diff --git a/examples/self_evolving_agent_demo/README.md b/examples/self_evolving_agent_demo/README.md
new file mode 100644
index 00000000..3f4f8283
--- /dev/null
+++ b/examples/self_evolving_agent_demo/README.md
@@ -0,0 +1,212 @@
+# Self-Evolving Agent Demo
+
+This demo shows a single ADK agent improving from its own logged
+behavior. The agent answers basketball analytics questions using deterministic
+fixture tools. V1 is intentionally wasteful: it loads broad basketball
+reference context and writes long scouting reports even when a narrow
+tool can answer the question. The BigQuery Agent Analytics Plugin logs
+the sessions to BigQuery, and the SDK reads those traces back to find a
+concrete improvement opportunity. The demo generates V2 during the run,
+then activates it only when the baseline answers already pass quality
+checks and the trace analysis shows broad-tool / token waste.
+
+```mermaid
+flowchart TD
+  A["Run sample agent V1"] --> B["Plugin logs agent_events to BigQuery"]
+  B --> C["SDK deterministic evaluators + trace SQL"]
+  C --> D["Find broad lookup and token waste"]
+  D --> E["Generate bounded V2 prompt"]
+  E --> F["Run same sample eval questions"]
+  F --> G["Show prompt diff + metric diff"]
+```
+
+The point is self-evolution. Token tracking is the measurement signal,
+not the product promise.
+
+This is a lightweight companion to `examples/agent_improvement_cycle/`.
+That demo shows a production-facing quality-improvement loop with
+Prompt Registry and Prompt Optimizer. This demo is intentionally smaller:
+it focuses on operational trace signals such as tool overuse and token
+waste, then gates a single generated prompt evolution against before/after
+metrics.
+
+## What Improves
+
+V1 behavior:
+
+- Calls `lookup_basketball_reference` before narrow tools.
+- Often calls more than one tool for a one-question task.
+- Produces long sectioned scouting reports.
+
+Generated V2 behavior:
+
+- Is created at runtime by a prompt generator from the SDK trace
+  summary, tool counts, quality summary, and available tool signatures.
+- Should use the cheapest sufficient narrow tool.
+- Should avoid `lookup_basketball_reference` unless no narrow tool fits.
+- Should give a short answer with decisive stats and a recommendation.
+
+The acceptance gate is:
+
+```mermaid
+flowchart TD
+  A["Generated V2"] --> B{"Quality not worse?"}
+  B -- no --> R["Reject"]
+  B -- yes --> C{"Avg tokens lower?"}
+  C -- no --> R
+  C -- yes --> D{"Broad lookup reduced?"}
+  D -- no --> R
+  D -- yes --> E{"No tool errors?"}
+  E -- no --> R
+  E -- yes --> P["Accept evolved prompt"]
+```
+
+## Run It
+
+Prerequisites:
+
+- Python 3.10+
+- `gcloud` and `bq` CLIs
+- Application Default Credentials
+- A Google Cloud project with billing enabled
+- IAM: BigQuery data editor/job user and Vertex AI user
+
+Setup:
+
+```bash
+./setup.sh
+```
+
+If your default `python3` is older than 3.10, run with:
+
+```bash
+PYTHON_BIN=python3.11 ./setup.sh
+PYTHON_BIN=python3.11 ./run_e2e_demo.sh
+```
+
+Run the end-to-end demo:
+
+```bash
+./run_e2e_demo.sh
+```
+
+Reset local prompt state and reports:
+
+```bash
+./reset.sh
+```
+
+Expected default one-run cost is typically well under `$1`: four V1
+agent sessions, one small prompt-generation call, four generated-V2
+agent sessions, small BigQuery reads, and SDK deterministic evaluators.
+The demo does not deploy Cloud Run,
+Scheduler, Workflows, or any long-running infrastructure.
+
+## Outputs
+
+Each run writes a timestamped directory under `reports/`:
+
+```text
+reports/run_<timestamp>/
+├── latest_eval_results_baseline.json  # V1 answers + session IDs
+├── candidate_prompt.json              # model-generated V2 prompt
+├── prompt_diff.md                     # exact V1 -> generated V2 diff
+├── self_evolution_analysis.json       # SDK-backed evolution decision
+├── latest_eval_results_evolved.json   # V2 answers + session IDs
+├── comparison.json                    # before/after gates
+└── comparison.md                      # readable metric diff report
+```
+
+For the main story, open these two files after a run:
+
+- `prompt_diff.md` — shows the exact prompt changes generated from
+  the trace/token signal.
+- `comparison.md` — shows quality, token, tool-call, and broad-lookup
+  deltas between agent V1 and generated V2.
+
+The tracked `VERIFICATION.md` file records the latest live end-to-end
+verification result for this demo.
+
+The raw traces land in:
+
+```text
+<PROJECT_ID>.self_evolving_agent_demo.agent_events
+```
+
+Override with:
+
+```bash
+export SELF_EVOLVING_DATASET_ID=my_dataset
+export SELF_EVOLVING_TABLE_ID=agent_events
+export SELF_EVOLVING_AGENT_MODEL=gemini-2.5-flash
+export SELF_EVOLVING_PROMPT_GENERATOR_MODEL=gemini-2.5-flash
+export DATASET_LOCATION=us-central1
+```
+
+Re-running `setup.sh` regenerates `.env` from the current environment.
+To customize a setting persistently, pass it as an environment variable
+when running setup, for example:
+
+```bash
+SELF_EVOLVING_AGENT_MODEL=gemini-2.5-pro ./setup.sh
+```
+
+Evolution thresholds can be tuned with:
+
+```bash
+python analyze_and_evolve.py \
+  --min-quality-pass-rate 1.0 \
+  --min-broad-lookup-rate 0.5 \
+  --max-avg-tool-calls 2.0
+```
+
+## File Map
+
+```text
+examples/self_evolving_agent_demo/
+├── README.md
+├── DEMO_NARRATION.md
+├── VERIFICATION.md
+├── setup.sh
+├── reset.sh
+├── run_e2e_demo.sh
+├── run_agent.py
+├── analyze_and_evolve.py
+├── compare_runs.py
+├── agent/
+│   ├── agent.py
+│   ├── prompts.py
+│   ├── prompt_store.py
+│   └── tools.py
+├── analytics/
+│   └── session_metrics.py
+└── eval/
+    └── eval_cases.json
+```
+
+## Productionization Roadmap
+
+The demo is intentionally one-shot. A production self-evolving loop
+would add durable orchestration, approvals, and rollout controls:
+
+```mermaid
+flowchart LR
+  A["Scheduler"] --> B["Cloud Run Job"]
+  B --> C["Analyze recent BigQuery traces"]
+  C --> D["Generate prompt or skill candidate"]
+  D --> E["Regression eval gate"]
+  E --> F["Human approval or policy gate"]
+  F --> G["Prompt Registry / config rollout"]
+  G --> H["Canary traffic"]
+  H --> C
+```
+
+Recommended next steps:
+
+- Store accepted and rejected candidates in BigQuery.
+- Add prompt registry support for managed version history.
+- Add a human approval step before production rollout.
+- Add canary routing and automatic rollback if quality or cost
+  regressions appear.
+- Extend the candidate generator from full-prompt generation to bounded
+  prompt/skill patch optimization.
diff --git a/examples/self_evolving_agent_demo/VERIFICATION.md b/examples/self_evolving_agent_demo/VERIFICATION.md
new file mode 100644
index 00000000..64fdd27f
--- /dev/null
+++ b/examples/self_evolving_agent_demo/VERIFICATION.md
@@ -0,0 +1,101 @@
+# Live Verification
+
+Last verified: 2026-06-09, America/Los_Angeles
+
+Run id: `run_20260609_171547`
+
+Command:
+
+```bash
+PYTHON_BIN=/path/to/python3.10+ ./run_e2e_demo.sh
+```
+
+Raw local artifacts were written to:
+
+```text
+reports/run_20260609_171547/
+```
+
+The raw `reports/` directory remains ignored because it is per-run output.
+This file records the live end-to-end result that should be stable enough
+to keep with the demo source.
+
+## What Ran
+
+```mermaid
+flowchart LR
+  A["ADK sample agent V1"] --> B["BigQuery analytics plugin"]
+  B --> C["BigQuery trace table"]
+  C --> D["SDK evaluators + trace SQL"]
+  D --> E["Gemini prompt generator"]
+  E --> F["Generated V2 prompt"]
+  F --> G["ADK sample agent V2"]
+  G --> H["Before/after gate report"]
+```
+
+The live run exercised:
+
+- ADK agent execution with Gemini.
+- BigQuery Agent Analytics Plugin trace logging.
+- BigQuery trace readback from
+  `rag-chatbot-485501.self_evolving_agent_demo.agent_events`.
+- SDK deterministic evaluator checks for token efficiency, cost, turn count,
+  and error rate.
+- Runtime generation of a replacement V2 prompt.
+- Evolved-agent rerun against the same deterministic sample eval set.
+- Before/after comparison gates.
+
+## Generated Change
+
+The generated V2 prompt changed the agent from broad-first behavior to a
+narrowest-sufficient-tool policy:
+
+- Player comparison -> `compare_players`.
+- Team comparison -> `compare_teams`.
+- Named-player scoring/profile/quick-read -> `get_player_stats`.
+- Named-team strategy/strengths/profile/late-game offense ->
+  `get_team_profile`.
+- `lookup_basketball_reference` only for broad, league-wide, or unsupported
+  ambiguous questions.
+
+Candidate source: `model`.
+
+It also changed the answer style from a long fixed scouting-report format
+to at most four bullets or 120 words.
+
+## Metrics
+
+| Metric | V1 | Generated V2 | Delta |
+|---|---:|---:|---:|
+| Quality pass rate | 100% | 100% | +0% |
+| Avg total tokens | 3640.2 | 1479.8 | -59.4% |
+| Avg tool calls | 2.5 | 1.0 | -60.0% |
+| Broad lookup calls | 4 | 0 | -4 |
+| Tool errors | 0 | 0 | +0 |
+
+## Gates
+
+| Gate | Result |
+|---|---:|
+| `quality_not_regressed` | PASS |
+| `tokens_reduced` | PASS |
+| `broad_lookup_reduced` | PASS |
+| `tool_errors_clear` | PASS |
+
+Final result: PASS.
+
+## Baseline SDK Signals
+
+The SDK-backed analysis observed the following V1 signals before generating
+the V2 prompt:
+
+- Sessions: 4.
+- Avg total tokens: 3640.2.
+- Avg tool calls: 2.5.
+- Broad lookup sessions: 4/4.
+- Quality pass rate: 100%.
+- Cost evaluator average observed value: 0.0015.
+
+The default one-run cost remains well under `$1`: the run uses four V1
+agent sessions, one prompt-generation call, four generated-V2 sessions,
+and small BigQuery reads.
diff --git a/examples/self_evolving_agent_demo/agent/__init__.py b/examples/self_evolving_agent_demo/agent/__init__.py
new file mode 100644
index 00000000..be4ae66a
--- /dev/null
+++ b/examples/self_evolving_agent_demo/agent/__init__.py
@@ -0,0 +1 @@
+"""self-evolving agent demo agent package."""
diff --git a/examples/self_evolving_agent_demo/agent/agent.py b/examples/self_evolving_agent_demo/agent/agent.py
new file mode 100644
index 00000000..cf505a25
--- /dev/null
+++ b/examples/self_evolving_agent_demo/agent/agent.py
@@ -0,0 +1,100 @@
+# Copyright 2026 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""ADK sample analytics agent used by the self-evolving demo."""
+
+from __future__ import annotations
+
+import os
+
+from dotenv import load_dotenv
+from google.adk.agents import Agent
+from google.adk.models import Gemini
+from google.adk.plugins.bigquery_agent_analytics_plugin import BigQueryAgentAnalyticsPlugin
+from google.adk.plugins.bigquery_agent_analytics_plugin import BigQueryLoggerConfig
+import google.auth
+from google.genai import types
+
+from .prompt_store import read_prompt
+from .tools import DEMO_TOOLS
+
+_DEMO_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
+_ENV_PATH = os.path.join(_DEMO_DIR, ".env")
+if os.path.exists(_ENV_PATH):
+  load_dotenv(dotenv_path=_ENV_PATH)
+
+try:
+  _, _auth_project = google.auth.default()
+except Exception:
+  _auth_project = None
+
+PROJECT_ID = os.getenv("PROJECT_ID") or os.getenv("GOOGLE_CLOUD_PROJECT")
+if not PROJECT_ID:
+  PROJECT_ID = _auth_project
+if not PROJECT_ID:
+  raise RuntimeError(
+      "Could not resolve PROJECT_ID from .env, GOOGLE_CLOUD_PROJECT, or ADC. "
+      "Run ./setup.sh or `gcloud config set project YOUR_PROJECT_ID`."
+  )
+
+DATASET_LOCATION = os.getenv("DATASET_LOCATION", "us-central1")
+DATASET_ID = os.getenv("SELF_EVOLVING_DATASET_ID", "self_evolving_agent_demo")
+TABLE_ID = os.getenv("SELF_EVOLVING_TABLE_ID", "agent_events")
+MODEL_ID = os.getenv("SELF_EVOLVING_AGENT_MODEL", "gemini-2.5-flash")
+AGENT_LOCATION = os.getenv("SELF_EVOLVING_AGENT_LOCATION", "us-central1")
+APP_NAME = "self_evolving_agent"
+
+
+def _configure_environment() -> None:
+  """Configure Vertex AI environment variables required by ADK Gemini."""
+  os.environ["GOOGLE_CLOUD_PROJECT"] = PROJECT_ID
+  os.environ["GOOGLE_CLOUD_LOCATION"] = AGENT_LOCATION
+  os.environ["GOOGLE_GENAI_USE_VERTEXAI"] = "true"
+
+
+def create_agent(prompt: str, model_id: str | None = None) -> Agent:
+  """Create the sample agent with the supplied system prompt."""
+  _configure_environment()
+  return Agent(
+      name=APP_NAME,
+      model=Gemini(
+          model=model_id or MODEL_ID,
+          retry_options=types.HttpRetryOptions(attempts=3),
+      ),
+      description=(
+          "Basketball analytics assistant with deterministic fixture tools."
+      ),
+      instruction=prompt,
+      tools=DEMO_TOOLS,
+  )
+
+
+_prompt, PROMPT_VERSION = read_prompt()
+root_agent = create_agent(_prompt)
+
+bq_logging_plugin = BigQueryAgentAnalyticsPlugin(
+    project_id=PROJECT_ID,
+    dataset_id=DATASET_ID,
+    table_id=TABLE_ID,
+    location=DATASET_LOCATION,
+    config=BigQueryLoggerConfig(
+        enabled=True,
+        max_content_length=50 * 1024,
+        # Small batches make rows visible quickly for this one-shot demo.
+        batch_size=1,
+        shutdown_timeout=15.0,
+    ),
+)
+
+app = root_agent
diff --git a/examples/self_evolving_agent_demo/agent/prompt_store.py b/examples/self_evolving_agent_demo/agent/prompt_store.py
new file mode 100644
index 00000000..a9ad5838
--- /dev/null
+++ b/examples/self_evolving_agent_demo/agent/prompt_store.py
@@ -0,0 +1,101 @@
+# Copyright 2026 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Tiny local prompt registry for the demo.
+
+The tracked source stays immutable during a run. The active prompt
+version is stored in ``prompt_state.json``, which is ignored by Git and
+created by setup/reset/evolution scripts.
+"""
+
+from __future__ import annotations
+
+import argparse
+from datetime import datetime
+from datetime import timezone
+import json
+import os
+from typing import Any
+
+from .prompts import V1_PROMPT
+
+_DEMO_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
+STATE_PATH = os.path.join(_DEMO_DIR, "prompt_state.json")
+
+
+def _state(version: str, prompt: str, rationale: str) -> dict[str, Any]:
+  return {
+      "version": version,
+      "prompt": prompt,
+      "rationale": rationale,
+      "updated_at": datetime.now(timezone.utc).isoformat(),
+  }
+
+
+def read_state() -> dict[str, Any]:
+  """Read the current prompt state, falling back to V1."""
+  if not os.path.exists(STATE_PATH):
+    return _state("v1", V1_PROMPT, "Default V1 prompt.")
+  with open(STATE_PATH) as f:
+    data = json.load(f)
+  version = str(data.get("version", "v1")).lower()
+  prompt = str(data.get("prompt") or V1_PROMPT)
+  return {
+      "version": version,
+      "prompt": prompt,
+      "rationale": str(data.get("rationale", "")),
+      "updated_at": str(data.get("updated_at", "")),
+  }
+
+
+def read_prompt() -> tuple[str, str]:
+  """Return ``(prompt, version)`` for agent construction."""
+  state = read_state()
+  return state["prompt"], state["version"]
+
+
+def write_prompt(version: str, prompt: str, rationale: str) -> dict[str, Any]:
+  """Persist prompt text as the active demo prompt version."""
+  normalized = version.strip().lower()
+  if normalized not in {"v1", "v2", "candidate"}:
+    raise ValueError(f"Unsupported prompt version: {version!r}")
+  if not prompt.strip():
+    raise ValueError("Prompt text must not be empty.")
+  state = _state(normalized, prompt.strip(), rationale)
+  with open(STATE_PATH, "w") as f:
+    json.dump(state, f, indent=2)
+    f.write("\n")
+  return state
+
+
+def reset_state() -> dict[str, Any]:
+  """Reset the demo to the intentionally inefficient V1 prompt."""
+  return write_prompt("v1", V1_PROMPT, "Reset to baseline V1 prompt.")
+
+
+def main() -> None:
+  parser = argparse.ArgumentParser(description="Manage demo prompt state.")
+  parser.add_argument("action", choices=["show", "reset"])
+  args = parser.parse_args()
+
+  if args.action == "reset":
+    state = reset_state()
+  else:
+    state = read_state()
+
+  print(json.dumps(state, indent=2))
+
+
+if __name__ == "__main__":
+  main()
diff --git a/examples/self_evolving_agent_demo/agent/prompts.py b/examples/self_evolving_agent_demo/agent/prompts.py
new file mode 100644
index 00000000..780ce577
--- /dev/null
+++ b/examples/self_evolving_agent_demo/agent/prompts.py
@@ -0,0 +1,44 @@
+# Copyright 2026 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Baseline prompt for the self-evolving agent demo.
+
+The demo starts with V1, which is intentionally wasteful: it asks the
+agent to load broad reference context and write long analyst notes even
+when a narrow tool can answer the question. V2 is generated at runtime
+from SDK trace analysis and stored in ``prompt_state.json``.
+"""
+
+V1_PROMPT = """\
+You are Courtside Scout, a basketball analytics assistant.
+
+You must be exhaustive. For every user question, first call
+`lookup_basketball_reference(query)` using the full user question so you have
+league-wide context. Then call any narrow tool that could possibly be
+relevant. If a player appears, call `get_player_stats`. If a team
+appears, call `get_team_profile`. If the user compares two players,
+also call `compare_players`. If the user compares two teams, also call
+`compare_teams`.
+
+Write a scouting-report style answer with these sections:
+1. Context
+2. Numbers
+3. Reasoning
+4. Caveats
+5. Recommendation
+
+Use six to eight bullets. Mention that the data is a synthetic demo
+fixture and that a live production agent would verify against a
+licensed stats feed.
+"""
diff --git a/examples/self_evolving_agent_demo/agent/tools.py b/examples/self_evolving_agent_demo/agent/tools.py
new file mode 100644
index 00000000..35d9f980
--- /dev/null
+++ b/examples/self_evolving_agent_demo/agent/tools.py
@@ -0,0 +1,318 @@
+# Copyright 2026 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Deterministic basketball fixture tools for the self-evolving demo.
+
+The data below is intentionally synthetic. The point of the demo is the
+agent evolution loop and trace analytics, not live sports accuracy.
+"""
+
+from __future__ import annotations
+
+from typing import Any
+
+SEASON = "2025-26-demo"
+
+PLAYERS: dict[str, dict[str, Any]] = {
+    "nikola jokic": {
+        "player": "Nikola Jokic",
+        "team": "Denver Nuggets",
+        "ppg": 26.4,
+        "rpg": 12.4,
+        "apg": 9.1,
+        "ts_pct": 0.662,
+        "usage_pct": 28.8,
+        "assist_rate": 43.0,
+        "strength": "elite half-court creation through post play and passing",
+    },
+    "joel embiid": {
+        "player": "Joel Embiid",
+        "team": "Philadelphia 76ers",
+        "ppg": 31.8,
+        "rpg": 10.9,
+        "apg": 5.6,
+        "ts_pct": 0.646,
+        "usage_pct": 35.1,
+        "assist_rate": 28.4,
+        "strength": "dominant scoring pressure, foul generation, and rim defense",
+    },
+    "shai gilgeous-alexander": {
+        "player": "Shai Gilgeous-Alexander",
+        "team": "Oklahoma City Thunder",
+        "ppg": 30.6,
+        "rpg": 5.8,
+        "apg": 6.4,
+        "ts_pct": 0.635,
+        "usage_pct": 32.4,
+        "assist_rate": 29.9,
+        "strength": "paint pressure, midrange scoring, and low-turnover creation",
+    },
+    "luka doncic": {
+        "player": "Luka Doncic",
+        "team": "Dallas Mavericks",
+        "ppg": 29.8,
+        "rpg": 8.7,
+        "apg": 9.4,
+        "ts_pct": 0.612,
+        "usage_pct": 34.8,
+        "assist_rate": 45.5,
+        "strength": "pick-and-roll control and skip-pass creation",
+    },
+    "jayson tatum": {
+        "player": "Jayson Tatum",
+        "team": "Boston Celtics",
+        "ppg": 27.1,
+        "rpg": 8.2,
+        "apg": 4.9,
+        "ts_pct": 0.604,
+        "usage_pct": 30.6,
+        "assist_rate": 22.5,
+        "strength": "two-way wing scoring with switchable defense",
+    },
+    "anthony edwards": {
+        "player": "Anthony Edwards",
+        "team": "Minnesota Timberwolves",
+        "ppg": 27.8,
+        "rpg": 5.5,
+        "apg": 5.1,
+        "ts_pct": 0.589,
+        "usage_pct": 31.8,
+        "assist_rate": 24.7,
+        "strength": "rim pressure, transition force, and late-clock shot making",
+    },
+}
+
+TEAMS: dict[str, dict[str, Any]] = {
+    "denver nuggets": {
+        "team": "Denver Nuggets",
+        "wins": 55,
+        "losses": 27,
+        "off_rating": 119.1,
+        "def_rating": 113.6,
+        "net_rating": 5.5,
+        "pace": 97.2,
+        "profile": "methodical half-court offense built around Jokic actions",
+        "late_game_edge": "high-value two-man actions and elite decision making",
+    },
+    "oklahoma city thunder": {
+        "team": "Oklahoma City Thunder",
+        "wins": 60,
+        "losses": 22,
+        "off_rating": 118.4,
+        "def_rating": 109.2,
+        "net_rating": 9.2,
+        "pace": 100.5,
+        "profile": "drive-heavy offense with aggressive point-of-attack defense",
+        "late_game_edge": "Shai isolation plus five-out spacing",
+    },
+    "boston celtics": {
+        "team": "Boston Celtics",
+        "wins": 58,
+        "losses": 24,
+        "off_rating": 120.2,
+        "def_rating": 111.1,
+        "net_rating": 9.1,
+        "pace": 98.9,
+        "profile": "spacing, three-point volume, and switchable wing size",
+        "late_game_edge": "multiple creators around elite spacing",
+    },
+    "dallas mavericks": {
+        "team": "Dallas Mavericks",
+        "wins": 50,
+        "losses": 32,
+        "off_rating": 117.2,
+        "def_rating": 114.5,
+        "net_rating": 2.7,
+        "pace": 99.4,
+        "profile": "pick-and-roll creation and corner spacing",
+        "late_game_edge": "Doncic advantage creation against switches",
+    },
+    "minnesota timberwolves": {
+        "team": "Minnesota Timberwolves",
+        "wins": 53,
+        "losses": 29,
+        "off_rating": 115.8,
+        "def_rating": 109.8,
+        "net_rating": 6.0,
+        "pace": 98.0,
+        "profile": "rim protection, size, and Edwards downhill creation",
+        "late_game_edge": "defense-to-offense swings and Edwards shot pressure",
+    },
+}
+
+PLAYER_ALIASES = {
+    "jokic": "nikola jokic",
+    "nikola": "nikola jokic",
+    "embiid": "joel embiid",
+    "joel": "joel embiid",
+    "shai": "shai gilgeous-alexander",
+    "sga": "shai gilgeous-alexander",
+    "gilgeous-alexander": "shai gilgeous-alexander",
+    "luka": "luka doncic",
+    "doncic": "luka doncic",
+    "tatum": "jayson tatum",
+    "jayson": "jayson tatum",
+    "edwards": "anthony edwards",
+    "anthony edwards": "anthony edwards",
+}
+
+TEAM_ALIASES = {
+    "nuggets": "denver nuggets",
+    "denver": "denver nuggets",
+    "thunder": "oklahoma city thunder",
+    "okc": "oklahoma city thunder",
+    "celtics": "boston celtics",
+    "boston": "boston celtics",
+    "mavericks": "dallas mavericks",
+    "mavs": "dallas mavericks",
+    "dallas": "dallas mavericks",
+    "timberwolves": "minnesota timberwolves",
+    "wolves": "minnesota timberwolves",
+    "minnesota": "minnesota timberwolves",
+}
+
+
+def _resolve_player(name: str) -> str:
+  key = name.lower().strip()
+  if key in PLAYERS:
+    return key
+  for alias, canonical in PLAYER_ALIASES.items():
+    if alias in key:
+      return canonical
+  raise ValueError(f"Unknown demo player: {name}")
+
+
+def _resolve_team(name: str) -> str:
+  key = name.lower().strip()
+  if key in TEAMS:
+    return key
+  for alias, canonical in TEAM_ALIASES.items():
+    if alias in key:
+      return canonical
+  raise ValueError(f"Unknown demo team: {name}")
+
+
+def lookup_basketball_reference(query: str) -> dict[str, Any]:
+  """Return a broad basketball reference packet for ambiguous questions.
+
+  This tool is intentionally verbose. V1 overuses it, which makes the
+  SDK token analysis find a concrete optimization opportunity.
+  """
+  return {
+      "query": query,
+      "season": SEASON,
+      "usage_note": (
+          "Broad reference packet. Prefer narrow tools for player, team, "
+          "and comparison questions when possible."
+      ),
+      "league_principles": [
+          "Net rating estimates team strength better than wins alone.",
+          "True shooting percentage helps compare scoring efficiency.",
+          "Usage rate indicates how much offense a player carries.",
+          "Assist rate and turnover context matter for primary creators.",
+          "Pace changes counting stats and should be considered in team reads.",
+          "Late-game offense rewards shot creation, spacing, and low turnovers.",
+          "Playoff defense values rim protection and switchable point-of-attack size.",
+          "Synthetic demo fixtures are stable so trace comparisons are repeatable.",
+      ],
+      "teams": list(TEAMS.values()),
+      "players": list(PLAYERS.values()),
+      "common_matchup_lenses": [
+          "creation burden",
+          "efficiency",
+          "rim pressure",
+          "spacing environment",
+          "defensive matchup flexibility",
+          "late-clock reliability",
+          "transition creation",
+          "bench context",
+      ],
+  }
+
+
+def get_player_stats(player: str, season: str = SEASON) -> dict[str, Any]:
+  """Return compact stats, strengths, and scoring profile for one player."""
+  data = dict(PLAYERS[_resolve_player(player)])
+  data["season"] = season
+  return data
+
+
+def get_team_profile(team: str, season: str = SEASON) -> dict[str, Any]:
+  """Return team profile, strengths, and late-game strategy data."""
+  data = dict(TEAMS[_resolve_team(team)])
+  data["season"] = season
+  return data
+
+
+def compare_players(
+    player_a: str,
+    player_b: str,
+    season: str = SEASON,
+) -> dict[str, Any]:
+  """Compare two demo players with a compact recommendation."""
+  left = get_player_stats(player_a, season)
+  right = get_player_stats(player_b, season)
+  left_score = (
+      left["ppg"] * 0.35
+      + left["apg"] * 0.30
+      + left["ts_pct"] * 20
+      + left["assist_rate"] * 0.10
+  )
+  right_score = (
+      right["ppg"] * 0.35
+      + right["apg"] * 0.30
+      + right["ts_pct"] * 20
+      + right["assist_rate"] * 0.10
+  )
+  winner = left if left_score >= right_score else right
+  return {
+      "season": season,
+      "player_a": left,
+      "player_b": right,
+      "recommended": winner["player"],
+      "reason": (
+          f"{winner['player']} has the stronger creation profile for this "
+          "question because of scoring efficiency plus playmaking load."
+      ),
+  }
+
+
+def compare_teams(
+    team_a: str,
+    team_b: str,
+    season: str = SEASON,
+) -> dict[str, Any]:
+  """Compare two demo teams with a compact recommendation."""
+  left = get_team_profile(team_a, season)
+  right = get_team_profile(team_b, season)
+  winner = left if left["net_rating"] >= right["net_rating"] else right
+  return {
+      "season": season,
+      "team_a": left,
+      "team_b": right,
+      "recommended": winner["team"],
+      "reason": (
+          f"{winner['team']} has the better demo profile by net rating "
+          "and role clarity."
+      ),
+  }
+
+
+DEMO_TOOLS = [
+    lookup_basketball_reference,
+    get_player_stats,
+    get_team_profile,
+    compare_players,
+    compare_teams,
+]
diff --git a/examples/self_evolving_agent_demo/analytics/__init__.py b/examples/self_evolving_agent_demo/analytics/__init__.py
new file mode 100644
index 00000000..fe4bc6bd
--- /dev/null
+++ b/examples/self_evolving_agent_demo/analytics/__init__.py
@@ -0,0 +1 @@
+"""Analytics helpers for the self-evolving agent demo."""
diff --git a/examples/self_evolving_agent_demo/analytics/session_metrics.py b/examples/self_evolving_agent_demo/analytics/session_metrics.py
new file mode 100644
index 00000000..0fa6a2ab
--- /dev/null
+++ b/examples/self_evolving_agent_demo/analytics/session_metrics.py
@@ -0,0 +1,322 @@
+# Copyright 2026 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Session metric helpers backed by BigQuery and SDK evaluators."""
+
+from __future__ import annotations
+
+from dataclasses import dataclass
+import json
+import os
+import time
+from typing import Any
+
+_DEMO_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
+_ENV_PATH = os.path.join(_DEMO_DIR, ".env")
+
+
+@dataclass(frozen=True)
+class DemoConfig:
+  project_id: str
+  dataset_id: str
+  table_id: str
+  location: str
+
+  @property
+  def table_ref(self) -> str:
+    return f"{self.project_id}.{self.dataset_id}.{self.table_id}"
+
+
+def load_config() -> DemoConfig:
+  """Load demo BigQuery configuration from ``.env`` and ADC."""
+  if os.path.exists(_ENV_PATH):
+    try:
+      from dotenv import load_dotenv
+
+      load_dotenv(dotenv_path=_ENV_PATH)
+    except ImportError:
+      pass
+  try:
+    import google.auth
+
+    _, auth_project = google.auth.default()
+  except Exception:
+    auth_project = None
+  project_id = (
+      os.getenv("PROJECT_ID")
+      or os.getenv("GOOGLE_CLOUD_PROJECT")
+      or auth_project
+  )
+  if not project_id:
+    raise RuntimeError(
+        "PROJECT_ID is not set and no default Google Cloud project was found."
+    )
+  return DemoConfig(
+      project_id=project_id,
+      dataset_id=os.getenv(
+          "SELF_EVOLVING_DATASET_ID", "self_evolving_agent_demo"
+      ),
+      table_id=os.getenv("SELF_EVOLVING_TABLE_ID", "agent_events"),
+      location=os.getenv("DATASET_LOCATION", "us-central1"),
+  )
+
+
+def load_session_ids(path: str) -> list[str]:
+  """Load non-empty session IDs from a run-agent result file."""
+  with open(path) as f:
+    data = json.load(f)
+  if isinstance(data, dict):
+    data = data.get("sessions", [])
+  return [str(r["session_id"]) for r in data if r.get("session_id")]
+
+
+def load_quality_summary(path: str) -> dict[str, Any]:
+  """Summarize deterministic quality fields from run-agent results."""
+  with open(path) as f:
+    rows = json.load(f)
+  if isinstance(rows, dict):
+    rows = rows.get("sessions", [])
+  total = len(rows)
+  passed = sum(1 for r in rows if r.get("quality_passed"))
+  expected_tool_used = sum(1 for r in rows if r.get("expected_tool_used"))
+  avoid_tool_used = sum(1 for r in rows if r.get("avoid_tool_used"))
+  return {
+      "total": total,
+      "passed": passed,
+      "pass_rate": passed / total if total else 0.0,
+      "expected_tool_used": expected_tool_used,
+      "avoid_tool_used": avoid_tool_used,
+  }
+
+
+def _bq_client(config: DemoConfig) -> Any:
+  from google.cloud import bigquery
+
+  return bigquery.Client(project=config.project_id, location=config.location)
+
+
+def fetch_session_metrics(
+    session_ids: list[str],
+    *,
+    attempts: int = 1,
+    wait_seconds: int = 0,
+) -> list[dict[str, Any]]:
+  """Fetch per-session token/tool metrics from the raw event table."""
+  if not session_ids:
+    return []
+  from google.cloud import bigquery
+
+  config = load_config()
+  client = _bq_client(config)
+  query = f"""
+    SELECT
+      session_id,
+      COUNT(*) AS event_count,
+      COUNTIF(event_type = 'LLM_REQUEST') AS llm_calls,
+      COUNTIF(event_type = 'LLM_RESPONSE') AS llm_responses,
+      COUNTIF(event_type = 'TOOL_STARTING') AS tool_calls,
+      COUNTIF(event_type = 'TOOL_ERROR') AS tool_errors,
+      COUNTIF(
+        event_type = 'TOOL_STARTING'
+        AND JSON_VALUE(content, '$.tool') = 'lookup_basketball_reference'
+      ) AS broad_lookup_calls,
+      SUM(COALESCE(
+        SAFE_CAST(JSON_VALUE(
+          attributes, '$.usage_metadata.prompt_token_count'
+        ) AS INT64),
+        SAFE_CAST(JSON_VALUE(content, '$.usage.prompt') AS INT64),
+        SAFE_CAST(JSON_VALUE(attributes, '$.input_tokens') AS INT64),
+        0
+      )) AS input_tokens,
+      SUM(COALESCE(
+        SAFE_CAST(JSON_VALUE(
+          attributes, '$.usage_metadata.candidates_token_count'
+        ) AS INT64),
+        SAFE_CAST(JSON_VALUE(content, '$.usage.completion') AS INT64),
+        SAFE_CAST(JSON_VALUE(attributes, '$.output_tokens') AS INT64),
+        0
+      )) AS output_tokens,
+      SUM(COALESCE(
+        SAFE_CAST(JSON_VALUE(
+          attributes, '$.usage_metadata.total_token_count'
+        ) AS INT64),
+        SAFE_CAST(JSON_VALUE(content, '$.usage.total') AS INT64),
+        COALESCE(
+          SAFE_CAST(JSON_VALUE(attributes, '$.input_tokens') AS INT64),
+          0
+        ) + COALESCE(
+          SAFE_CAST(JSON_VALUE(attributes, '$.output_tokens') AS INT64),
+          0
+        )
+      )) AS total_tokens
+    FROM `{config.table_ref}`
+    WHERE session_id IN UNNEST(@session_ids)
+    GROUP BY session_id
+    ORDER BY session_id
+  """
+  job_config = bigquery.QueryJobConfig(
+      query_parameters=[
+          bigquery.ArrayQueryParameter("session_ids", "STRING", session_ids),
+      ]
+  )
+  rows: list[dict[str, Any]] = []
+  for attempt in range(1, attempts + 1):
+    if wait_seconds and attempt > 1:
+      time.sleep(wait_seconds)
+    rows = [
+        dict(r) for r in client.query(query, job_config=job_config).result()
+    ]
+    if len(rows) >= len(set(session_ids)):
+      break
+  return rows
+
+
+def require_complete_session_metrics(
+    rows: list[dict[str, Any]],
+    session_ids: list[str],
+    *,
+    label: str,
+) -> None:
+  """Validate that BigQuery returned complete and usable metric rows."""
+  expected = {str(session_id) for session_id in session_ids if session_id}
+  observed = {str(row.get("session_id", "")) for row in rows}
+  missing = sorted(expected - observed)
+  if missing:
+    raise RuntimeError(
+        f"Only found {len(observed)}/{len(expected)} {label} sessions in "
+        "BigQuery after retries. Missing session IDs: " + ", ".join(missing)
+    )
+
+  total_events = sum(int(row.get("event_count") or 0) for row in rows)
+  total_tokens = sum(float(row.get("total_tokens") or 0) for row in rows)
+  if total_events and total_tokens == 0:
+    config = load_config()
+    raise RuntimeError(
+        "Token extraction produced zero total tokens even though trace events "
+        f"exist. The analytics plugin schema may have changed; inspect "
+        f"LLM_RESPONSE rows in `{config.table_ref}`."
+    )
+
+
+def fetch_tool_counts(session_ids: list[str]) -> list[dict[str, Any]]:
+  """Fetch aggregate tool-call counts for the selected sessions."""
+  if not session_ids:
+    return []
+  from google.cloud import bigquery
+
+  config = load_config()
+  client = _bq_client(config)
+  query = f"""
+    SELECT
+      JSON_VALUE(content, '$.tool') AS tool_name,
+      COUNT(*) AS calls
+    FROM `{config.table_ref}`
+    WHERE session_id IN UNNEST(@session_ids)
+      AND event_type = 'TOOL_STARTING'
+    GROUP BY tool_name
+    ORDER BY calls DESC, tool_name
+  """
+  job_config = bigquery.QueryJobConfig(
+      query_parameters=[
+          bigquery.ArrayQueryParameter("session_ids", "STRING", session_ids),
+      ]
+  )
+  return [dict(r) for r in client.query(query, job_config=job_config).result()]
+
+
+def summarize(rows: list[dict[str, Any]]) -> dict[str, Any]:
+  """Aggregate per-session metrics into a compact summary."""
+  if not rows:
+    return {
+        "sessions": 0,
+        "avg_total_tokens": 0.0,
+        "avg_input_tokens": 0.0,
+        "avg_output_tokens": 0.0,
+        "avg_tool_calls": 0.0,
+        "avg_llm_calls": 0.0,
+        "total_broad_lookup_calls": 0,
+        "sessions_with_broad_lookup": 0,
+        "broad_lookup_session_rate": 0.0,
+        "total_tool_errors": 0,
+    }
+  count = len(rows)
+
+  def total(name: str) -> float:
+    return sum(float(r.get(name) or 0) for r in rows)
+
+  broad_sessions = sum(1 for r in rows if int(r.get("broad_lookup_calls") or 0))
+  return {
+      "sessions": count,
+      "avg_total_tokens": round(total("total_tokens") / count, 1),
+      "avg_input_tokens": round(total("input_tokens") / count, 1),
+      "avg_output_tokens": round(total("output_tokens") / count, 1),
+      "avg_tool_calls": round(total("tool_calls") / count, 1),
+      "avg_llm_calls": round(total("llm_calls") / count, 1),
+      "total_broad_lookup_calls": int(total("broad_lookup_calls")),
+      "sessions_with_broad_lookup": broad_sessions,
+      "broad_lookup_session_rate": round(broad_sessions / count, 3),
+      "total_tool_errors": int(total("tool_errors")),
+  }
+
+
+def run_sdk_evaluators(
+    session_ids: list[str],
+    *,
+    token_budget: int,
+    max_cost_usd: float,
+    max_turns: int,
+) -> dict[str, Any]:
+  """Run SDK deterministic evaluator gates over the selected sessions."""
+  from bigquery_agent_analytics import Client
+  from bigquery_agent_analytics.trace import TraceFilter
+
+  try:
+    from bigquery_agent_analytics.evaluators import SystemEvaluator
+  except ImportError:
+    from bigquery_agent_analytics.evaluators import CodeEvaluator as SystemEvaluator
+
+  config = load_config()
+  client = Client(
+      project_id=config.project_id,
+      dataset_id=config.dataset_id,
+      table_id=config.table_id,
+      location=config.location,
+  )
+  filters = TraceFilter(session_ids=session_ids)
+  evaluators = {
+      "token_efficiency": SystemEvaluator.token_efficiency(
+          max_tokens=token_budget
+      ),
+      "cost": SystemEvaluator.cost_per_session(max_cost_usd=max_cost_usd),
+      "turn_count": SystemEvaluator.turn_count(max_turns=max_turns),
+      "error_rate": SystemEvaluator.error_rate(max_error_rate=0.0),
+  }
+  reports = {}
+  for name, evaluator in evaluators.items():
+    report = client.evaluate(evaluator=evaluator, filters=filters)
+    observed = []
+    for session_score in report.session_scores:
+      for detail in session_score.details.values():
+        if isinstance(detail, dict) and detail.get("observed") is not None:
+          observed.append(detail["observed"])
+    reports[name] = {
+        "total_sessions": report.total_sessions,
+        "passed_sessions": report.passed_sessions,
+        "failed_sessions": report.failed_sessions,
+        "pass_rate": report.pass_rate,
+        "avg_observed": (
+            round(sum(observed) / len(observed), 4) if observed else None
+        ),
+    }
+  return reports
diff --git a/examples/self_evolving_agent_demo/analyze_and_evolve.py b/examples/self_evolving_agent_demo/analyze_and_evolve.py
new file mode 100755
index 00000000..546a9b44
--- /dev/null
+++ b/examples/self_evolving_agent_demo/analyze_and_evolve.py
@@ -0,0 +1,377 @@
+#!/usr/bin/env python3
+# Copyright 2026 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Analyze baseline traces and promote an evolved prompt when warranted."""
+
+from __future__ import annotations
+
+import argparse
+import difflib
+import json
+import os
+import sys
+from typing import Any
+
+_DEMO_DIR = os.path.dirname(os.path.abspath(__file__))
+if _DEMO_DIR not in sys.path:
+  sys.path.insert(0, _DEMO_DIR)
+
+from agent.prompt_store import read_state
+from agent.prompt_store import write_prompt
+from agent.tools import DEMO_TOOLS
+from analytics.session_metrics import fetch_session_metrics
+from analytics.session_metrics import fetch_tool_counts
+from analytics.session_metrics import load_quality_summary
+from analytics.session_metrics import load_session_ids
+from analytics.session_metrics import require_complete_session_metrics
+from analytics.session_metrics import run_sdk_evaluators
+from analytics.session_metrics import summarize
+
+DEFAULT_MIN_BROAD_LOOKUP_RATE = 0.5
+DEFAULT_MAX_AVG_TOOL_CALLS = 2.0
+MIN_GENERATED_PROMPT_CHARS = 120
+
+
+def _tool_signatures() -> str:
+  lines = []
+  for tool in DEMO_TOOLS:
+    name = getattr(tool, "__name__", "unknown")
+    doc = (getattr(tool, "__doc__", "") or "").strip().splitlines()[0]
+    lines.append(f"- {name}: {doc}")
+  return "\n".join(lines)
+
+
+def _load_eval_contract(path: str) -> list[dict[str, str]]:
+  """Load the deterministic routing contract from run-agent results."""
+  with open(path) as f:
+    rows = json.load(f)
+  if isinstance(rows, dict):
+    rows = rows.get("sessions", [])
+  contract = []
+  for row in rows:
+    contract.append(
+        {
+            "case_id": str(row.get("case_id", "")),
+            "question": str(row.get("question", "")),
+            "expected_tool": str(row.get("expected_tool", "")),
+            "avoid_tool": str(row.get("avoid_tool", "")),
+        }
+    )
+  return contract
+
+
+def _observations(
+    summary: dict[str, Any],
+    *,
+    token_budget: int,
+    min_broad_lookup_rate: float,
+    max_avg_tool_calls: float,
+) -> list[str]:
+  obs = []
+  if summary["avg_total_tokens"] > token_budget:
+    obs.append("Average total tokens are above the configured session budget.")
+  if summary["broad_lookup_session_rate"] >= min_broad_lookup_rate:
+    obs.append(
+        "Most sessions used the broad basketball reference tool even though each "
+        "eval case has a narrow tool path."
+    )
+  if summary["avg_tool_calls"] > max_avg_tool_calls:
+    obs.append(
+        "Average tool calls are high for one-question single-turn tasks."
+    )
+  if not obs:
+    obs.append("No clear token or tool-use hotspot was detected.")
+  return obs
+
+
+def _generate_candidate_prompt(
+    *,
+    current_prompt: str,
+    observations: list[str],
+    summary: dict[str, Any],
+    tool_counts: list[dict[str, Any]],
+    quality: dict[str, Any],
+    eval_contract: list[dict[str, str]],
+    model_id: str,
+) -> dict[str, str]:
+  """Generate an improved prompt from trace analysis."""
+  prompt = f"""\
+You are improving an ADK basketball analytics agent prompt from its own trace
+analytics. Generate a complete replacement system prompt.
+
+Current prompt:
+```
+{current_prompt}
+```
+
+Available tools:
+{_tool_signatures()}
+
+SDK trace summary:
+{json.dumps(summary, indent=2)}
+
+Tool counts:
+{json.dumps(tool_counts, indent=2)}
+
+Deterministic quality summary:
+{json.dumps(quality, indent=2)}
+
+Deterministic routing contract from the eval run:
+{json.dumps(eval_contract, indent=2)}
+
+Observed issues:
+{json.dumps(observations, indent=2)}
+
+Requirements for the improved prompt:
+- Keep the same agent role and basketball analytics task.
+- Remove the broad-first behavior that caused lookup_basketball_reference overuse.
+- Instruct the agent to choose the narrowest sufficient tool.
+- Preserve every expected_tool / avoid_tool pair in the routing contract.
+- Treat a named-team strategy, strengths, profile, or late-game offense
+  question as a single-team question that calls get_team_profile.
+- Treat a named-player scoring, strengths, profile, or quick-read question
+  as a single-player question that calls get_player_stats.
+- Use lookup_basketball_reference only for league-wide or unsupported ambiguous
+  questions where no narrow player, team, or comparison tool fits.
+- Remove the fixed five-section scouting-report format from the old prompt.
+- Keep final answers to at most four bullets or 120 words.
+- Preserve answer quality and tool grounding.
+- Keep final answers concise.
+- Do not mention trace analytics, BigQuery, SDKs, prompts, or optimization to users.
+
+Return JSON with exactly:
+{{
+  "improved_prompt": "full replacement system prompt",
+  "changes_summary": "one sentence explaining the improvement"
+}}
+"""
+  from google import genai
+  from google.genai.types import GenerateContentConfig
+
+  client = genai.Client()
+  response = client.models.generate_content(
+      model=model_id,
+      contents=prompt,
+      config=GenerateContentConfig(
+          temperature=0.2,
+          response_mime_type="application/json",
+      ),
+  )
+  data = json.loads(response.text or "{}")
+  improved = str(data.get("improved_prompt", "")).strip()
+  changes = str(data.get("changes_summary", "")).strip()
+  # A complete system prompt should at least include role and routing guidance.
+  if len(improved) < MIN_GENERATED_PROMPT_CHARS:
+    raise ValueError("Generated prompt was too short.")
+  return {
+      "source": "model",
+      "improved_prompt": improved,
+      "changes_summary": changes or "Generated from SDK trace analysis.",
+  }
+
+
+def _write_prompt_diff(
+    *,
+    output_dir: str,
+    before_prompt: str,
+    after_prompt: str,
+    observations: list[str],
+    changes_summary: str,
+) -> str:
+  """Write a human-readable V1 -> generated V2 prompt diff."""
+  diff_lines = list(
+      difflib.unified_diff(
+          before_prompt.splitlines(),
+          after_prompt.splitlines(),
+          fromfile="agent_v1_prompt",
+          tofile="generated_agent_v2_prompt",
+          lineterm="",
+      )
+  )
+  path = os.path.join(output_dir, "prompt_diff.md")
+  with open(path, "w") as f:
+    f.write("# Prompt Diff: Agent V1 -> Generated V2\n\n")
+    f.write("## Trace Signal\n\n")
+    for obs in observations:
+      f.write(f"- {obs}\n")
+    f.write("\n## Generated Improvement\n\n")
+    f.write(f"{changes_summary}\n\n")
+    f.write("## Unified Diff\n\n")
+    f.write("```diff\n")
+    f.write("\n".join(diff_lines))
+    f.write("\n```\n")
+  return path
+
+
+def main() -> None:
+  parser = argparse.ArgumentParser(
+      description="Analyze demo sessions and evolve the active prompt."
+  )
+  parser.add_argument("--sessions", required=True)
+  parser.add_argument(
+      "--output-dir", default=os.path.join(_DEMO_DIR, "reports")
+  )
+  parser.add_argument("--token-budget", type=int, default=12000)
+  parser.add_argument("--max-cost-usd", type=float, default=0.05)
+  parser.add_argument("--max-turns", type=int, default=4)
+  parser.add_argument("--min-quality-pass-rate", type=float, default=1.0)
+  parser.add_argument(
+      "--min-broad-lookup-rate",
+      type=float,
+      default=DEFAULT_MIN_BROAD_LOOKUP_RATE,
+  )
+  parser.add_argument(
+      "--max-avg-tool-calls",
+      type=float,
+      default=DEFAULT_MAX_AVG_TOOL_CALLS,
+  )
+  parser.add_argument(
+      "--generator-model",
+      default=os.getenv(
+          "SELF_EVOLVING_PROMPT_GENERATOR_MODEL", "gemini-2.5-flash"
+      ),
+  )
+  parser.add_argument("--wait-seconds", type=int, default=15)
+  parser.add_argument("--attempts", type=int, default=6)
+  args = parser.parse_args()
+
+  os.makedirs(args.output_dir, exist_ok=True)
+  session_ids = load_session_ids(args.sessions)
+  if not session_ids:
+    raise SystemExit(f"No session IDs found in {args.sessions}")
+
+  rows = fetch_session_metrics(
+      session_ids,
+      attempts=args.attempts,
+      wait_seconds=args.wait_seconds,
+  )
+  try:
+    require_complete_session_metrics(rows, session_ids, label="baseline")
+  except RuntimeError as exc:
+    raise SystemExit(str(exc)) from exc
+
+  summary = summarize(rows)
+  tool_counts = fetch_tool_counts(session_ids)
+  quality = load_quality_summary(args.sessions)
+  eval_contract = _load_eval_contract(args.sessions)
+  sdk_reports = run_sdk_evaluators(
+      session_ids,
+      token_budget=args.token_budget,
+      max_cost_usd=args.max_cost_usd,
+      max_turns=args.max_turns,
+  )
+  observations = _observations(
+      summary,
+      token_budget=args.token_budget,
+      min_broad_lookup_rate=args.min_broad_lookup_rate,
+      max_avg_tool_calls=args.max_avg_tool_calls,
+  )
+  current_state = read_state()
+  should_promote = (
+      current_state["version"] == "v1"
+      and quality["pass_rate"] >= args.min_quality_pass_rate
+      and (
+          summary["broad_lookup_session_rate"] >= args.min_broad_lookup_rate
+          or summary["avg_total_tokens"] > args.token_budget
+      )
+  )
+
+  evolution = {
+      "from_version": current_state["version"],
+      "to_version": current_state["version"],
+      "promoted": False,
+      "rationale": "No candidate prompt generated.",
+  }
+  if should_promote:
+    try:
+      candidate = _generate_candidate_prompt(
+          current_prompt=current_state["prompt"],
+          observations=observations,
+          summary=summary,
+          tool_counts=tool_counts,
+          quality=quality,
+          eval_contract=eval_contract,
+          model_id=args.generator_model,
+      )
+    except Exception as exc:
+      raise SystemExit(
+          "Prompt generation failed; no fallback prompt was promoted. "
+          f"Original error: {exc}"
+      ) from exc
+    candidate_path = os.path.join(args.output_dir, "candidate_prompt.json")
+    with open(candidate_path, "w") as f:
+      json.dump(candidate, f, indent=2)
+      f.write("\n")
+    prompt_diff_path = _write_prompt_diff(
+        output_dir=args.output_dir,
+        before_prompt=current_state["prompt"],
+        after_prompt=candidate["improved_prompt"],
+        observations=observations,
+        changes_summary=candidate["changes_summary"],
+    )
+    rationale = (
+        "Generated V2 from SDK trace analysis because baseline quality met "
+        "the configured gate and an operational waste signal was detected."
+    )
+    write_prompt("v2", candidate["improved_prompt"], rationale)
+    evolution = {
+        "from_version": "v1",
+        "to_version": "v2",
+        "promoted": True,
+        "rationale": rationale,
+        "candidate_path": candidate_path,
+        "prompt_diff_path": prompt_diff_path,
+        "changes_summary": candidate["changes_summary"],
+        "candidate_source": candidate.get("source", "model"),
+        "generator_model": args.generator_model,
+    }
+
+  report = {
+      "quality": quality,
+      "session_summary": summary,
+      "tool_counts": tool_counts,
+      "sdk_evaluator_reports": sdk_reports,
+      "observations": observations,
+      "evolution": evolution,
+  }
+  output_path = os.path.join(args.output_dir, "self_evolution_analysis.json")
+  with open(output_path, "w") as f:
+    json.dump(report, f, indent=2)
+    f.write("\n")
+
+  print("")
+  print("  SDK-backed self-evolution analysis")
+  print("  ----------------------------------")
+  print(f"  Sessions:              {summary['sessions']}")
+  print(f"  Avg total tokens:      {summary['avg_total_tokens']}")
+  print(f"  Avg tool calls:        {summary['avg_tool_calls']}")
+  print(
+      "  Broad lookup sessions: "
+      f"{summary['sessions_with_broad_lookup']}/{summary['sessions']}"
+  )
+  print(f"  Quality pass rate:     {quality['pass_rate']:.0%}")
+  print(
+      f"  Evolution:             {evolution['from_version']} -> {evolution['to_version']}"
+  )
+  print(f"  Promoted:              {evolution['promoted']}")
+  if evolution.get("candidate_path"):
+    print(f"  Candidate prompt:      {evolution['candidate_path']}")
+  if evolution.get("prompt_diff_path"):
+    print(f"  Prompt diff:           {evolution['prompt_diff_path']}")
+  print(f"  Report:                {output_path}")
+
+
+if __name__ == "__main__":
+  main()
diff --git a/examples/self_evolving_agent_demo/compare_runs.py b/examples/self_evolving_agent_demo/compare_runs.py
new file mode 100755
index 00000000..41413f25
--- /dev/null
+++ b/examples/self_evolving_agent_demo/compare_runs.py
@@ -0,0 +1,266 @@
+#!/usr/bin/env python3
+# Copyright 2026 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Compare baseline and evolved demo runs."""
+
+from __future__ import annotations
+
+import argparse
+import json
+import os
+import sys
+from typing import Any
+
+_DEMO_DIR = os.path.dirname(os.path.abspath(__file__))
+if _DEMO_DIR not in sys.path:
+  sys.path.insert(0, _DEMO_DIR)
+
+from analytics.session_metrics import fetch_session_metrics
+from analytics.session_metrics import load_quality_summary
+from analytics.session_metrics import load_session_ids
+from analytics.session_metrics import require_complete_session_metrics
+from analytics.session_metrics import summarize
+
+
+def _load(
+    path: str, attempts: int, wait_seconds: int, label: str
+) -> tuple[dict, dict]:
+  ids = load_session_ids(path)
+  rows = fetch_session_metrics(
+      ids,
+      attempts=attempts,
+      wait_seconds=wait_seconds,
+  )
+  try:
+    require_complete_session_metrics(rows, ids, label=label)
+  except RuntimeError as exc:
+    raise SystemExit(str(exc)) from exc
+  return summarize(rows), load_quality_summary(path)
+
+
+def _pct_delta(before: float, after: float) -> float | None:
+  if before == 0:
+    return 0.0 if after == 0 else None
+  return round((after - before) / before, 4)
+
+
+def _format_pct_delta(value: float | None) -> str:
+  if value is None:
+    return "n/a"
+  return f"{value:+.1%}"
+
+
+def _read_candidate_metadata(output_dir: str) -> dict[str, str]:
+  path = os.path.join(output_dir, "candidate_prompt.json")
+  if not os.path.exists(path):
+    return {}
+  with open(path) as f:
+    data = json.load(f)
+  return {
+      "changes_summary": str(data.get("changes_summary", "")),
+      "source": str(data.get("source", "")),
+  }
+
+
+def _write_markdown_report(
+    *,
+    output_path: str,
+    result: dict[str, Any],
+) -> str:
+  """Write a concise operator-facing before/after report."""
+  output_dir = os.path.dirname(output_path)
+  path = os.path.join(output_dir, "comparison.md")
+  before_quality = result["before"]["quality"]
+  after_quality = result["after"]["quality"]
+  before_metrics = result["before"]["metrics"]
+  after_metrics = result["after"]["metrics"]
+  deltas = result["deltas"]
+  candidate_metadata = _read_candidate_metadata(output_dir)
+  candidate_summary = candidate_metadata.get("changes_summary", "")
+  candidate_source = candidate_metadata.get("source", "")
+  prompt_diff_path = os.path.join(output_dir, "prompt_diff.md")
+  has_prompt_diff = os.path.exists(prompt_diff_path)
+
+  with open(path, "w") as f:
+    f.write("# Agent V1 -> Generated V2 Comparison\n\n")
+    f.write("## What Trace Analysis Changed\n\n")
+    if candidate_summary:
+      f.write(f"{candidate_summary}\n\n")
+    else:
+      f.write(
+          "The generated V2 prompt was created from the baseline trace "
+          "summary, tool counts, quality summary, and available tool "
+          "signatures.\n\n"
+      )
+    if candidate_source:
+      f.write(f"Candidate source: `{candidate_source}`.\n\n")
+    if has_prompt_diff:
+      f.write("See `prompt_diff.md` for the exact prompt-level diff.\n\n")
+
+    f.write("## Before / After Metrics\n\n")
+    f.write("| Metric | V1 | Generated V2 | Delta |\n")
+    f.write("|---|---:|---:|---:|\n")
+    f.write(
+        "| Quality pass rate | "
+        f"{before_quality['pass_rate']:.0%} | "
+        f"{after_quality['pass_rate']:.0%} | "
+        f"{after_quality['pass_rate'] - before_quality['pass_rate']:+.0%} |\n"
+    )
+    f.write(
+        "| Avg total tokens | "
+        f"{before_metrics['avg_total_tokens']} | "
+        f"{after_metrics['avg_total_tokens']} | "
+        f"{_format_pct_delta(deltas['avg_total_tokens_pct'])} |\n"
+    )
+    f.write(
+        "| Avg tool calls | "
+        f"{before_metrics['avg_tool_calls']} | "
+        f"{after_metrics['avg_tool_calls']} | "
+        f"{_format_pct_delta(deltas['avg_tool_calls_pct'])} |\n"
+    )
+    f.write(
+        "| Broad lookup calls | "
+        f"{before_metrics['total_broad_lookup_calls']} | "
+        f"{after_metrics['total_broad_lookup_calls']} | "
+        f"{deltas['broad_lookup_calls']:+d} |\n"
+    )
+    f.write(
+        "| Tool errors | "
+        f"{before_metrics['total_tool_errors']} | "
+        f"{after_metrics['total_tool_errors']} | "
+        f"{after_metrics['total_tool_errors'] - before_metrics['total_tool_errors']:+d} |\n"
+    )
+
+    f.write("\n## Acceptance Gates\n\n")
+    for name, passed in result["gates"].items():
+      f.write(f"- `{name}`: {passed}\n")
+
+    f.write("\n## Why This Demonstrates Self-Evolution\n\n")
+    f.write(
+        "The demo does not just compare two static prompts. It uses the "
+        "baseline BigQuery traces to identify broad-tool overuse and token "
+        "waste, generates a replacement prompt from that evidence, reruns "
+        "the agent, then records whether the generated V2 preserved quality "
+        "while reducing the measured waste.\n"
+    )
+  return path
+
+
+def main() -> None:
+  parser = argparse.ArgumentParser(description="Compare two demo runs.")
+  parser.add_argument("--before", required=True)
+  parser.add_argument("--after", required=True)
+  parser.add_argument("--output", default=None)
+  parser.add_argument("--min-token-reduction", type=float, default=0.05)
+  parser.add_argument("--wait-seconds", type=int, default=15)
+  parser.add_argument("--attempts", type=int, default=6)
+  parser.add_argument(
+      "--fail-on-gate-failure",
+      action="store_true",
+      help="Exit nonzero when acceptance gates fail.",
+  )
+  args = parser.parse_args()
+
+  before_summary, before_quality = _load(
+      args.before, args.attempts, args.wait_seconds, "baseline"
+  )
+  after_summary, after_quality = _load(
+      args.after, args.attempts, args.wait_seconds, "evolved"
+  )
+  token_delta = _pct_delta(
+      before_summary["avg_total_tokens"],
+      after_summary["avg_total_tokens"],
+  )
+  tool_delta = _pct_delta(
+      before_summary["avg_tool_calls"],
+      after_summary["avg_tool_calls"],
+  )
+  broad_delta = (
+      after_summary["total_broad_lookup_calls"]
+      - before_summary["total_broad_lookup_calls"]
+  )
+  gates = {
+      "quality_not_regressed": (
+          after_quality["pass_rate"] >= before_quality["pass_rate"]
+      ),
+      "tokens_reduced": (
+          token_delta is not None and token_delta <= -args.min_token_reduction
+      ),
+      "broad_lookup_reduced": broad_delta < 0,
+      "tool_errors_clear": after_summary["total_tool_errors"] == 0,
+  }
+  result: dict[str, Any] = {
+      "before": {"quality": before_quality, "metrics": before_summary},
+      "after": {"quality": after_quality, "metrics": after_summary},
+      "deltas": {
+          "avg_total_tokens_pct": token_delta,
+          "avg_tool_calls_pct": tool_delta,
+          "broad_lookup_calls": broad_delta,
+      },
+      "gates": gates,
+      "passed": all(gates.values()),
+  }
+
+  if args.output:
+    os.makedirs(os.path.dirname(args.output), exist_ok=True)
+    markdown_path = _write_markdown_report(
+        output_path=args.output,
+        result=result,
+    )
+    result["artifacts"] = {
+        "markdown_report": markdown_path,
+        "prompt_diff": os.path.join(
+            os.path.dirname(args.output), "prompt_diff.md"
+        ),
+    }
+    with open(args.output, "w") as f:
+      json.dump(result, f, indent=2)
+      f.write("\n")
+
+  print("")
+  print("  Before/after self-evolution report")
+  print("  ----------------------------------")
+  print(
+      f"  Quality pass rate:  {before_quality['pass_rate']:.0%}"
+      f" -> {after_quality['pass_rate']:.0%}"
+  )
+  print(
+      f"  Avg total tokens:   {before_summary['avg_total_tokens']}"
+      f" -> {after_summary['avg_total_tokens']}"
+      f" ({_format_pct_delta(token_delta)})"
+  )
+  print(
+      f"  Avg tool calls:     {before_summary['avg_tool_calls']}"
+      f" -> {after_summary['avg_tool_calls']}"
+      f" ({_format_pct_delta(tool_delta)})"
+  )
+  print(
+      "  Broad lookup calls: "
+      f"{before_summary['total_broad_lookup_calls']}"
+      f" -> {after_summary['total_broad_lookup_calls']}"
+  )
+  print("  Gates:")
+  for name, passed in gates.items():
+    print(f"    {name}: {passed}")
+  if args.output:
+    print(f"  Report: {args.output}")
+    print(f"  Markdown: {markdown_path}")
+
+  if args.fail_on_gate_failure and not result["passed"]:
+    sys.exit(1)
+
+
+if __name__ == "__main__":
+  main()
diff --git a/examples/self_evolving_agent_demo/eval/eval_cases.json b/examples/self_evolving_agent_demo/eval/eval_cases.json
new file mode 100644
index 00000000..c3528642
--- /dev/null
+++ b/examples/self_evolving_agent_demo/eval/eval_cases.json
@@ -0,0 +1,28 @@
+{
+  "eval_cases": [
+    {
+      "id": "player_compare_jokic_embiid",
+      "question": "For half-court offense, who is the better hub: Nikola Jokic or Joel Embiid?",
+      "expected_tool": "compare_players",
+      "avoid_tool": "lookup_basketball_reference"
+    },
+    {
+      "id": "team_compare_celtics_thunder",
+      "question": "Which team has the stronger playoff profile, the Celtics or the Thunder?",
+      "expected_tool": "compare_teams",
+      "avoid_tool": "lookup_basketball_reference"
+    },
+    {
+      "id": "single_player_shai",
+      "question": "Give me a quick read on Shai Gilgeous-Alexander's scoring profile.",
+      "expected_tool": "get_player_stats",
+      "avoid_tool": "lookup_basketball_reference"
+    },
+    {
+      "id": "single_team_nuggets",
+      "question": "How should Denver build its late-game offense around the Nuggets' strengths?",
+      "expected_tool": "get_team_profile",
+      "avoid_tool": "lookup_basketball_reference"
+    }
+  ]
+}
diff --git a/examples/self_evolving_agent_demo/reset.sh b/examples/self_evolving_agent_demo/reset.sh
new file mode 100755
index 00000000..c7e1bd1c
--- /dev/null
+++ b/examples/self_evolving_agent_demo/reset.sh
@@ -0,0 +1,26 @@
+#!/usr/bin/env bash
+# Copyright 2026 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+PYTHON_BIN="${PYTHON_BIN:-python3}"
+
+cd "$SCRIPT_DIR"
+"$PYTHON_BIN" -m agent.prompt_store reset >/dev/null
+rm -rf "$SCRIPT_DIR/reports"
+
+echo "Demo state reset to V1. Reports were removed."
+echo "BigQuery data was left intact. Use setup.sh to recreate .env if needed."
diff --git a/examples/self_evolving_agent_demo/run_agent.py b/examples/self_evolving_agent_demo/run_agent.py
new file mode 100755
index 00000000..09550a83
--- /dev/null
+++ b/examples/self_evolving_agent_demo/run_agent.py
@@ -0,0 +1,201 @@
+#!/usr/bin/env python3
+# Copyright 2026 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Run demo eval questions through the ADK agent with BigQuery logging."""
+
+from __future__ import annotations
+
+import argparse
+import asyncio
+import json
+import os
+import sys
+from typing import Any
+
+_DEMO_DIR = os.path.dirname(os.path.abspath(__file__))
+if _DEMO_DIR not in sys.path:
+  sys.path.insert(0, _DEMO_DIR)
+
+
+def _load_cases(path: str) -> list[dict[str, Any]]:
+  with open(path) as f:
+    return json.load(f)["eval_cases"]
+
+
+def _part_text(part: Any) -> str:
+  text = getattr(part, "text", None)
+  return text or ""
+
+
+def _part_function_name(part: Any) -> str | None:
+  function_call = getattr(part, "function_call", None)
+  if not function_call:
+    return None
+  return getattr(function_call, "name", None)
+
+
+async def _run_case(
+    runner: Any,
+    case: dict[str, Any],
+    *,
+    user_id: str,
+    timeout_seconds: int,
+) -> dict[str, Any]:
+  from google.genai.types import Content
+  from google.genai.types import Part
+
+  session = await runner.session_service.create_session(
+      app_name=runner.app_name,
+      user_id=user_id,
+  )
+  user_message = Content(role="user", parts=[Part(text=case["question"])])
+  response_text = ""
+  tools_called: list[str] = []
+
+  async def _consume() -> None:
+    nonlocal response_text
+    async for event in runner.run_async(
+        user_id=user_id,
+        session_id=session.id,
+        new_message=user_message,
+    ):
+      if not event.content or not event.content.parts:
+        continue
+      for part in event.content.parts:
+        response_text += _part_text(part)
+        tool_name = _part_function_name(part)
+        if tool_name:
+          tools_called.append(tool_name)
+
+  await asyncio.wait_for(_consume(), timeout=timeout_seconds)
+
+  expected_tool = case.get("expected_tool", "")
+  avoid_tool = case.get("avoid_tool", "")
+  expected_tool_used = expected_tool in tools_called if expected_tool else True
+  avoid_tool_used = avoid_tool in tools_called if avoid_tool else False
+  # Quality checks answerability; avoid-tool overuse is the separate
+  # efficiency signal that drives this demo's evolution.
+  quality_passed = bool(response_text.strip()) and expected_tool_used
+  return {
+      "case_id": case["id"],
+      "question": case["question"],
+      "expected_tool": expected_tool,
+      "avoid_tool": avoid_tool,
+      "tools_called": tools_called,
+      "expected_tool_used": expected_tool_used,
+      "avoid_tool_used": avoid_tool_used,
+      "quality_passed": quality_passed,
+      "response": response_text.strip(),
+      "session_id": session.id,
+  }
+
+
+async def _run_all(args: argparse.Namespace) -> list[dict[str, Any]]:
+  from agent.agent import APP_NAME
+  from agent.agent import bq_logging_plugin
+  from agent.agent import PROMPT_VERSION
+  from agent.agent import root_agent
+  from google.adk.runners import InMemoryRunner
+
+  cases = _load_cases(args.eval_cases)
+  runner = InMemoryRunner(
+      agent=root_agent,
+      app_name=APP_NAME,
+      plugins=[bq_logging_plugin],
+  )
+  semaphore = asyncio.Semaphore(args.max_concurrency)
+
+  async def _guarded(i: int, case: dict[str, Any]) -> dict[str, Any]:
+    async with semaphore:
+      print(f"  [{i}/{len(cases)}] {case['id']}: {case['question']}")
+      try:
+        result = await _run_case(
+            runner,
+            case,
+            user_id=f"{args.label}_user",
+            timeout_seconds=args.timeout,
+        )
+      except Exception as exc:
+        result = {
+            "case_id": case["id"],
+            "question": case["question"],
+            "expected_tool": case.get("expected_tool", ""),
+            "avoid_tool": case.get("avoid_tool", ""),
+            "tools_called": [],
+            "expected_tool_used": False,
+            "avoid_tool_used": False,
+            "quality_passed": False,
+            "response": f"ERROR: {exc}",
+            "session_id": "",
+        }
+      result["label"] = args.label
+      result["prompt_version"] = PROMPT_VERSION
+      answer = result["response"].replace("\n", " ").strip()
+      if len(answer) > 180:
+        answer = answer[:180] + "..."
+      print(f"        tools: {', '.join(result['tools_called']) or 'none'}")
+      print(f"        pass:  {result['quality_passed']}")
+      print(f"        ans:   {answer}")
+      return result
+
+  return list(
+      await asyncio.gather(
+          *[_guarded(i, case) for i, case in enumerate(cases, 1)]
+      )
+  )
+
+
+def main() -> None:
+  parser = argparse.ArgumentParser(
+      description="Run self-evolving agent demo eval traffic."
+  )
+  parser.add_argument(
+      "--eval-cases",
+      default=os.path.join(_DEMO_DIR, "eval", "eval_cases.json"),
+  )
+  parser.add_argument(
+      "--output-dir", default=os.path.join(_DEMO_DIR, "reports")
+  )
+  parser.add_argument("--label", default="baseline")
+  parser.add_argument("--max-concurrency", type=int, default=2)
+  parser.add_argument("--timeout", type=int, default=180)
+  parser.add_argument(
+      "--allow-failures",
+      action="store_true",
+      help="Write results without exiting nonzero on quality failures.",
+  )
+  args = parser.parse_args()
+
+  os.makedirs(args.output_dir, exist_ok=True)
+  results = asyncio.run(_run_all(args))
+
+  labeled_path = os.path.join(
+      args.output_dir, f"latest_eval_results_{args.label}.json"
+  )
+  latest_path = os.path.join(args.output_dir, "latest_eval_results.json")
+  for path in (labeled_path, latest_path):
+    with open(path, "w") as f:
+      json.dump(results, f, indent=2)
+      f.write("\n")
+  print("")
+  print(f"  Results saved to: {labeled_path}")
+
+  failures = sum(1 for r in results if not r.get("quality_passed"))
+  if failures and not args.allow_failures:
+    sys.exit(1)
+
+
+if __name__ == "__main__":
+  main()
diff --git a/examples/self_evolving_agent_demo/run_e2e_demo.sh b/examples/self_evolving_agent_demo/run_e2e_demo.sh
new file mode 100755
index 00000000..02bc921b
--- /dev/null
+++ b/examples/self_evolving_agent_demo/run_e2e_demo.sh
@@ -0,0 +1,99 @@
+#!/usr/bin/env bash
+# Copyright 2026 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+PYTHON_BIN="${PYTHON_BIN:-python3}"
+
+if ! "$PYTHON_BIN" - <<'PY' >/dev/null; then
+import sys
+raise SystemExit(0 if sys.version_info >= (3, 10) else 1)
+PY
+  echo "ERROR: Python 3.10+ is required. Set PYTHON_BIN to a 3.10+ interpreter." >&2
+  exit 1
+fi
+
+if [[ -f "$SCRIPT_DIR/.env" ]]; then
+  set -a
+  source "$SCRIPT_DIR/.env"
+  set +a
+else
+  echo "ERROR: .env not found. Run ./setup.sh first." >&2
+  exit 1
+fi
+
+export PYTHONPATH="$SCRIPT_DIR:${PYTHONPATH:-}"
+export TOKEN_BUDGET="${TOKEN_BUDGET:-12000}"
+export MAX_COST_USD="${MAX_COST_USD:-0.05}"
+export SELF_EVOLVING_PROMPT_GENERATOR_MODEL="${SELF_EVOLVING_PROMPT_GENERATOR_MODEL:-gemini-2.5-flash}"
+
+RUN_ID="$(date +%Y%m%d_%H%M%S)"
+REPORTS_DIR="$SCRIPT_DIR/reports/run_${RUN_ID}"
+mkdir -p "$REPORTS_DIR"
+
+echo ""
+echo "============================================"
+echo "  Self-Evolving Agent Demo"
+echo "============================================"
+echo ""
+echo "Reports: $REPORTS_DIR"
+echo "Estimated one-run cloud cost: typically well under \$1 with defaults."
+echo ""
+
+cd "$SCRIPT_DIR"
+"$PYTHON_BIN" -m agent.prompt_store reset >/dev/null
+
+echo "[1/5] Run baseline V1 agent..."
+"$PYTHON_BIN" run_agent.py \
+  --label baseline \
+  --output-dir "$REPORTS_DIR" \
+  --allow-failures
+
+echo ""
+echo "[2/5] Analyze traces and generate evolved prompt..."
+"$PYTHON_BIN" analyze_and_evolve.py \
+  --sessions "$REPORTS_DIR/latest_eval_results_baseline.json" \
+  --output-dir "$REPORTS_DIR" \
+  --token-budget "$TOKEN_BUDGET" \
+  --max-cost-usd "$MAX_COST_USD" \
+  --generator-model "$SELF_EVOLVING_PROMPT_GENERATOR_MODEL"
+
+echo ""
+echo "[3/5] Run evolved agent..."
+"$PYTHON_BIN" run_agent.py \
+  --label evolved \
+  --output-dir "$REPORTS_DIR" \
+  --allow-failures
+
+echo ""
+echo "[4/5] Compare before and after..."
+"$PYTHON_BIN" compare_runs.py \
+  --before "$REPORTS_DIR/latest_eval_results_baseline.json" \
+  --after "$REPORTS_DIR/latest_eval_results_evolved.json" \
+  --output "$REPORTS_DIR/comparison.json"
+
+echo ""
+echo "[5/5] Done."
+echo ""
+echo "Key artifacts:"
+echo "  $REPORTS_DIR/latest_eval_results_baseline.json"
+echo "  $REPORTS_DIR/candidate_prompt.json"
+echo "  $REPORTS_DIR/prompt_diff.md"
+echo "  $REPORTS_DIR/self_evolution_analysis.json"
+echo "  $REPORTS_DIR/latest_eval_results_evolved.json"
+echo "  $REPORTS_DIR/comparison.json"
+echo "  $REPORTS_DIR/comparison.md"
+echo ""
diff --git a/examples/self_evolving_agent_demo/setup.sh b/examples/self_evolving_agent_demo/setup.sh
new file mode 100755
index 00000000..7f12f355
--- /dev/null
+++ b/examples/self_evolving_agent_demo/setup.sh
@@ -0,0 +1,124 @@
+#!/usr/bin/env bash
+# Copyright 2026 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+REPO_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
+ENV_FILE="$SCRIPT_DIR/.env"
+
+echo ""
+echo "============================================"
+echo "  Self-Evolving Agent Demo - Setup"
+echo "============================================"
+echo ""
+echo "Estimated one-run cloud cost: typically well under \$1 for the"
+echo "default four-question demo. Setup itself only enables APIs, installs"
+echo "local dependencies, and creates a small BigQuery dataset."
+echo ""
+
+PYTHON_BIN="${PYTHON_BIN:-python3}"
+if ! command -v "$PYTHON_BIN" &>/dev/null; then
+  echo "ERROR: $PYTHON_BIN is required." >&2
+  exit 1
+fi
+if ! "$PYTHON_BIN" - <<'PY' >/dev/null; then
+import sys
+raise SystemExit(0 if sys.version_info >= (3, 10) else 1)
+PY
+  echo "ERROR: Python 3.10+ is required. Set PYTHON_BIN to a 3.10+ interpreter." >&2
+  exit 1
+fi
+if ! command -v gcloud &>/dev/null; then
+  echo "ERROR: gcloud CLI is required." >&2
+  exit 1
+fi
+if ! command -v bq &>/dev/null; then
+  echo "ERROR: bq CLI is required." >&2
+  exit 1
+fi
+
+PROJECT_ID="${PROJECT_ID:-$(gcloud config get-value project 2>/dev/null || true)}"
+if [[ -z "$PROJECT_ID" ]]; then
+  echo "ERROR: No project set. Export PROJECT_ID or run:" >&2
+  echo "  gcloud config set project YOUR_PROJECT_ID" >&2
+  exit 1
+fi
+echo "Project: $PROJECT_ID"
+
+if ! gcloud auth application-default print-access-token &>/dev/null 2>&1; then
+  echo "Application default credentials not found. Starting login..."
+  gcloud auth application-default login
+fi
+
+echo ""
+echo "Enabling required APIs..."
+gcloud services enable bigquery.googleapis.com --project="$PROJECT_ID" >/dev/null
+gcloud services enable aiplatform.googleapis.com --project="$PROJECT_ID" >/dev/null
+echo "APIs enabled."
+
+echo ""
+echo "Installing local package dependencies..."
+"$PYTHON_BIN" -m pip install -e "$REPO_ROOT[improvement]" --quiet
+echo "Dependencies installed."
+
+DATASET_LOCATION="${DATASET_LOCATION:-${BQ_LOCATION:-us-central1}}"
+SELF_EVOLVING_DATASET_ID="${SELF_EVOLVING_DATASET_ID:-self_evolving_agent_demo}"
+SELF_EVOLVING_TABLE_ID="${SELF_EVOLVING_TABLE_ID:-agent_events}"
+SELF_EVOLVING_AGENT_MODEL="${SELF_EVOLVING_AGENT_MODEL:-gemini-2.5-flash}"
+SELF_EVOLVING_PROMPT_GENERATOR_MODEL="${SELF_EVOLVING_PROMPT_GENERATOR_MODEL:-gemini-2.5-flash}"
+SELF_EVOLVING_AGENT_LOCATION="${SELF_EVOLVING_AGENT_LOCATION:-us-central1}"
+TOKEN_BUDGET="${TOKEN_BUDGET:-12000}"
+MAX_COST_USD="${MAX_COST_USD:-0.05}"
+
+if ! bq show "${PROJECT_ID}:${SELF_EVOLVING_DATASET_ID}" &>/dev/null 2>&1; then
+  echo ""
+  echo "Creating BigQuery dataset: ${SELF_EVOLVING_DATASET_ID} (${DATASET_LOCATION})"
+  bq mk --dataset --location="$DATASET_LOCATION" \
+    "${PROJECT_ID}:${SELF_EVOLVING_DATASET_ID}" >/dev/null
+else
+  EXISTING_LOCATION="$(
+    bq show --format=prettyjson "${PROJECT_ID}:${SELF_EVOLVING_DATASET_ID}" \
+      | "$PYTHON_BIN" -c 'import json, sys; print(json.load(sys.stdin).get("location", ""))'
+  )"
+  if [[ "${EXISTING_LOCATION,,}" != "${DATASET_LOCATION,,}" ]]; then
+    echo "ERROR: Dataset ${SELF_EVOLVING_DATASET_ID} exists in ${EXISTING_LOCATION}," >&2
+    echo "but DATASET_LOCATION is ${DATASET_LOCATION}. Use a matching location or a new dataset ID." >&2
+    exit 1
+  fi
+fi
+
+cat > "$ENV_FILE" <<EOF
+# Self-Evolving Agent Demo Configuration
+PROJECT_ID=$PROJECT_ID
+DATASET_LOCATION=$DATASET_LOCATION
+SELF_EVOLVING_DATASET_ID=$SELF_EVOLVING_DATASET_ID
+SELF_EVOLVING_TABLE_ID=$SELF_EVOLVING_TABLE_ID
+SELF_EVOLVING_AGENT_MODEL=$SELF_EVOLVING_AGENT_MODEL
+SELF_EVOLVING_PROMPT_GENERATOR_MODEL=$SELF_EVOLVING_PROMPT_GENERATOR_MODEL
+SELF_EVOLVING_AGENT_LOCATION=$SELF_EVOLVING_AGENT_LOCATION
+TOKEN_BUDGET=$TOKEN_BUDGET
+MAX_COST_USD=$MAX_COST_USD
+GOOGLE_GENAI_USE_VERTEXAI=true
+EOF
+
+cd "$SCRIPT_DIR"
+"$PYTHON_BIN" -m agent.prompt_store reset >/dev/null
+
+echo ""
+echo "Setup complete."
+echo "Run:"
+echo "  cd $SCRIPT_DIR"
+echo "  ./run_e2e_demo.sh"
diff --git a/tests/test_self_evolving_agent_demo.py b/tests/test_self_evolving_agent_demo.py
new file mode 100644
index 00000000..05a008db
--- /dev/null
+++ b/tests/test_self_evolving_agent_demo.py
@@ -0,0 +1,99 @@
+# Copyright 2026 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Tests for pure helpers in the self-evolving agent demo."""
+
+from __future__ import annotations
+
+from pathlib import Path
+import sys
+
+import pytest
+
+_DEMO_DIR = (
+    Path(__file__).resolve().parents[1]
+    / "examples"
+    / "self_evolving_agent_demo"
+)
+sys.path.insert(0, str(_DEMO_DIR))
+
+from analytics.session_metrics import require_complete_session_metrics  # noqa: E402
+from analytics.session_metrics import summarize  # noqa: E402
+import analyze_and_evolve  # noqa: E402
+import compare_runs  # noqa: E402
+
+
+def test_summarize_empty_rows_has_stable_shape():
+  summary = summarize([])
+
+  assert summary == {
+      "sessions": 0,
+      "avg_total_tokens": 0.0,
+      "avg_input_tokens": 0.0,
+      "avg_output_tokens": 0.0,
+      "avg_tool_calls": 0.0,
+      "avg_llm_calls": 0.0,
+      "total_broad_lookup_calls": 0,
+      "sessions_with_broad_lookup": 0,
+      "broad_lookup_session_rate": 0.0,
+      "total_tool_errors": 0,
+  }
+
+
+def test_require_complete_session_metrics_rejects_missing_rows():
+  rows = [{"session_id": "s1", "event_count": 2, "total_tokens": 100}]
+
+  with pytest.raises(RuntimeError, match="Only found 1/2 baseline sessions"):
+    require_complete_session_metrics(rows, ["s1", "s2"], label="baseline")
+
+
+def test_require_complete_session_metrics_rejects_zero_token_schema(
+    monkeypatch: pytest.MonkeyPatch,
+):
+  monkeypatch.setenv("PROJECT_ID", "demo-project")
+  rows = [{"session_id": "s1", "event_count": 2, "total_tokens": 0}]
+
+  with pytest.raises(RuntimeError, match="Token extraction produced zero"):
+    require_complete_session_metrics(rows, ["s1"], label="baseline")
+
+
+def test_pct_delta_marks_zero_baseline_growth_as_not_applicable():
+  assert compare_runs._pct_delta(0, 0) == 0.0
+  assert compare_runs._pct_delta(0, 5) is None
+  assert compare_runs._format_pct_delta(None) == "n/a"
+  assert compare_runs._format_pct_delta(-0.25) == "-25.0%"
+
+
+def test_observations_use_configured_thresholds():
+  summary = {
+      "avg_total_tokens": 1500,
+      "broad_lookup_session_rate": 0.5,
+      "avg_tool_calls": 3.0,
+  }
+
+  observations = analyze_and_evolve._observations(
+      summary,
+      token_budget=1000,
+      min_broad_lookup_rate=0.5,
+      max_avg_tool_calls=2.0,
+  )
+
+  assert observations == [
+      "Average total tokens are above the configured session budget.",
+      (
+          "Most sessions used the broad basketball reference tool even though "
+          "each eval case has a narrow tool path."
+      ),
+      "Average tool calls are high for one-question single-turn tasks.",
+  ]