docs: add Common Gotchas section from session history review

sjarmak · sjarmak · commit 6886d66ec91b · 2026-03-06T22:41:36.000-05:00
Reviewed 8 Claude Code session transcripts (Dec 2025 - Mar 2026)
and extracted actionable learnings into CLAUDE.md. Covers gotchas for:
- Documentation generation (never edit generated files directly)
- Daytona/Harbor (image builds, config modes, SDK vs CLI)
- Docker/Build (QEMU segfaults, disk management, path sanitization)
- MCP configuration (config paths, CLI flags, prompt injection)
- Harbor result format (timing fields, trajectory generation)
- Validation/scoring (duplicated validators, install verification)
- Git/auth (scope management, env var export, push protection)

Also regenerated stale script registry and SCRIPT_INDEX.md.
diff --git a/AGENTS.md b/AGENTS.md
@@ -51,6 +51,45 @@ curl -fsSL https://raw.githubusercontent.com/steveyegge/beads/main/scripts/insta
 - `docs/reference/README.md` - stable specs and reference docs
 - `docs/explanations/README.md` - rationale and context docs
 
+## Common Gotchas (from session history)
+
+### Documentation Generation
+- **NEVER edit root `CLAUDE.md` or `AGENTS.md` directly.** Edit canonical sources under `docs/ops/` and regenerate. Direct edits cause `agent_guides_drift` failures in `repo_health.py`.
+- After removing directories from the repo, also clean references from `scripts/sync_agent_guides.py` (`LOCAL_SOURCES`) and `scripts/docs_consistency_check.py` (`LOCAL_AGENT_TARGET_DIRS`).
+
+### Daytona / Harbor
+- Daytona builds images from Dockerfiles at sandbox creation time (`Image.from_dockerfile()`). Dockerfile fixes pushed to `main` take effect on the next run -- **no manual image rebuild needed**. Exception: pre-built GHCR base images must be rebuilt separately.
+- Harbor+Daytona (`harbor run --environment-type daytona`) is the recommended production approach. The standalone `scripts/daytona_runner.py` is for quick validation only.
+- Use `BASELINE_MCP_TYPE` env var to control MCP configuration: `none`, `sourcegraph`, `deepsearch`.
+- Daytona SDK (`daytona_sdk`) over CLI for sandbox interaction -- the CLI is interactive-only for SSH.
+- GHCR packages default to **private** for personal accounts and visibility cannot be changed via API. Use the GitHub web UI or push to an org.
+
+### Docker / Build
+- `uv tool install` segfaults on ARM64/QEMU emulation. Use `pip install` instead, or switch to Daytona (native x86_64).
+- Build-push-clean pattern when building Docker images with limited disk (~45GB): build one image, push, then clean locally before the next.
+- Colons in agent names (e.g., `module:ClassName`) break Docker volume mounts. Sanitize paths: replace `:` with `__`.
+
+### MCP Configuration (inside sandboxes)
+- `.mcp.json` must be placed at `$CLAUDE_CONFIG_DIR` (typically `/logs/agent/sessions/`), not `/app/` or `/root/`.
+- Claude Code requires the `--mcp-config` CLI flag to load MCP config -- it does not auto-detect.
+- Inject MCP usage instructions into the task prompt. Agents won't use MCP tools just because they're available.
+- Set `NODE_TLS_REJECT_UNAUTHORIZED=0` for Node.js SSL in Docker containers (curl working does not mean Node.js fetch will work).
+
+### Harbor Result Format
+- Timing fields (`started_at`, `finished_at`) live at the **top level** of `result.json`, not nested under `timing`.
+- `trajectory.json` is generated by Harbor's `_convert_events_to_trajectory()` post-processing, NOT by Claude Code CLI directly.
+- SWE-bench `test.sh` redirects stdout to a temp file -- Harbor never sees the parser's `START_TEST_OUTPUT`/`END_TEST_OUTPUT` markers via its normal capture.
+
+### Validation / Scoring
+- `validators.py` is duplicated across `ccb_build` tasks. Changes must be applied to **all copies** (verify with `sha256sum`).
+- Install scripts that print "INSTALL_SUCCESS" regardless of actual outcome are common. Always verify the binary exists and is executable.
+- Agent completing in **<2 seconds** = agent never installed/ran (smoke test heuristic).
+
+### Git / Auth
+- `gh auth refresh` without `-s <scope>` is a no-op for adding scopes. Must use `gh auth refresh -h github.com -s write:packages` explicitly.
+- Environment variables must be **explicitly exported** for Harbor subprocesses. Use `set -a` before sourcing `.env.local`.
+- GitHub push protection blocks synthetic/fake API keys in test data. Use `git reset --soft origin/main` to squash intermediate commits that contained fake credentials.
+
 ## Maintenance
 - Root and local `AGENTS.md` / `CLAUDE.md` files are generated from sources in `docs/ops/`.
 - `docs/START_HERE_BY_TASK.md` is generated from `docs/ops/task_routes.json`.
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -51,6 +51,45 @@ curl -fsSL https://raw.githubusercontent.com/steveyegge/beads/main/scripts/insta
 - `docs/reference/README.md` - stable specs and reference docs
 - `docs/explanations/README.md` - rationale and context docs
 
+## Common Gotchas (from session history)
+
+### Documentation Generation
+- **NEVER edit root `CLAUDE.md` or `AGENTS.md` directly.** Edit canonical sources under `docs/ops/` and regenerate. Direct edits cause `agent_guides_drift` failures in `repo_health.py`.
+- After removing directories from the repo, also clean references from `scripts/sync_agent_guides.py` (`LOCAL_SOURCES`) and `scripts/docs_consistency_check.py` (`LOCAL_AGENT_TARGET_DIRS`).
+
+### Daytona / Harbor
+- Daytona builds images from Dockerfiles at sandbox creation time (`Image.from_dockerfile()`). Dockerfile fixes pushed to `main` take effect on the next run -- **no manual image rebuild needed**. Exception: pre-built GHCR base images must be rebuilt separately.
+- Harbor+Daytona (`harbor run --environment-type daytona`) is the recommended production approach. The standalone `scripts/daytona_runner.py` is for quick validation only.
+- Use `BASELINE_MCP_TYPE` env var to control MCP configuration: `none`, `sourcegraph`, `deepsearch`.
+- Daytona SDK (`daytona_sdk`) over CLI for sandbox interaction -- the CLI is interactive-only for SSH.
+- GHCR packages default to **private** for personal accounts and visibility cannot be changed via API. Use the GitHub web UI or push to an org.
+
+### Docker / Build
+- `uv tool install` segfaults on ARM64/QEMU emulation. Use `pip install` instead, or switch to Daytona (native x86_64).
+- Build-push-clean pattern when building Docker images with limited disk (~45GB): build one image, push, then clean locally before the next.
+- Colons in agent names (e.g., `module:ClassName`) break Docker volume mounts. Sanitize paths: replace `:` with `__`.
+
+### MCP Configuration (inside sandboxes)
+- `.mcp.json` must be placed at `$CLAUDE_CONFIG_DIR` (typically `/logs/agent/sessions/`), not `/app/` or `/root/`.
+- Claude Code requires the `--mcp-config` CLI flag to load MCP config -- it does not auto-detect.
+- Inject MCP usage instructions into the task prompt. Agents won't use MCP tools just because they're available.
+- Set `NODE_TLS_REJECT_UNAUTHORIZED=0` for Node.js SSL in Docker containers (curl working does not mean Node.js fetch will work).
+
+### Harbor Result Format
+- Timing fields (`started_at`, `finished_at`) live at the **top level** of `result.json`, not nested under `timing`.
+- `trajectory.json` is generated by Harbor's `_convert_events_to_trajectory()` post-processing, NOT by Claude Code CLI directly.
+- SWE-bench `test.sh` redirects stdout to a temp file -- Harbor never sees the parser's `START_TEST_OUTPUT`/`END_TEST_OUTPUT` markers via its normal capture.
+
+### Validation / Scoring
+- `validators.py` is duplicated across `ccb_build` tasks. Changes must be applied to **all copies** (verify with `sha256sum`).
+- Install scripts that print "INSTALL_SUCCESS" regardless of actual outcome are common. Always verify the binary exists and is executable.
+- Agent completing in **<2 seconds** = agent never installed/ran (smoke test heuristic).
+
+### Git / Auth
+- `gh auth refresh` without `-s <scope>` is a no-op for adding scopes. Must use `gh auth refresh -h github.com -s write:packages` explicitly.
+- Environment variables must be **explicitly exported** for Harbor subprocesses. Use `set -a` before sourcing `.env.local`.
+- GitHub push protection blocks synthetic/fake API keys in test data. Use `git reset --soft origin/main` to squash intermediate commits that contained fake credentials.
+
 ## Maintenance
 - Root and local `AGENTS.md` / `CLAUDE.md` files are generated from sources in `docs/ops/`.
 - `docs/START_HERE_BY_TASK.md` is generated from `docs/ops/task_routes.json`.
diff --git a/docs/ops/ROOT_AGENT_GUIDE.md b/docs/ops/ROOT_AGENT_GUIDE.md
@@ -51,6 +51,45 @@ curl -fsSL https://raw.githubusercontent.com/steveyegge/beads/main/scripts/insta
 - `docs/reference/README.md` - stable specs and reference docs
 - `docs/explanations/README.md` - rationale and context docs
 
+## Common Gotchas (from session history)
+
+### Documentation Generation
+- **NEVER edit root `CLAUDE.md` or `AGENTS.md` directly.** Edit canonical sources under `docs/ops/` and regenerate. Direct edits cause `agent_guides_drift` failures in `repo_health.py`.
+- After removing directories from the repo, also clean references from `scripts/sync_agent_guides.py` (`LOCAL_SOURCES`) and `scripts/docs_consistency_check.py` (`LOCAL_AGENT_TARGET_DIRS`).
+
+### Daytona / Harbor
+- Daytona builds images from Dockerfiles at sandbox creation time (`Image.from_dockerfile()`). Dockerfile fixes pushed to `main` take effect on the next run -- **no manual image rebuild needed**. Exception: pre-built GHCR base images must be rebuilt separately.
+- Harbor+Daytona (`harbor run --environment-type daytona`) is the recommended production approach. The standalone `scripts/daytona_runner.py` is for quick validation only.
+- Use `BASELINE_MCP_TYPE` env var to control MCP configuration: `none`, `sourcegraph`, `deepsearch`.
+- Daytona SDK (`daytona_sdk`) over CLI for sandbox interaction -- the CLI is interactive-only for SSH.
+- GHCR packages default to **private** for personal accounts and visibility cannot be changed via API. Use the GitHub web UI or push to an org.
+
+### Docker / Build
+- `uv tool install` segfaults on ARM64/QEMU emulation. Use `pip install` instead, or switch to Daytona (native x86_64).
+- Build-push-clean pattern when building Docker images with limited disk (~45GB): build one image, push, then clean locally before the next.
+- Colons in agent names (e.g., `module:ClassName`) break Docker volume mounts. Sanitize paths: replace `:` with `__`.
+
+### MCP Configuration (inside sandboxes)
+- `.mcp.json` must be placed at `$CLAUDE_CONFIG_DIR` (typically `/logs/agent/sessions/`), not `/app/` or `/root/`.
+- Claude Code requires the `--mcp-config` CLI flag to load MCP config -- it does not auto-detect.
+- Inject MCP usage instructions into the task prompt. Agents won't use MCP tools just because they're available.
+- Set `NODE_TLS_REJECT_UNAUTHORIZED=0` for Node.js SSL in Docker containers (curl working does not mean Node.js fetch will work).
+
+### Harbor Result Format
+- Timing fields (`started_at`, `finished_at`) live at the **top level** of `result.json`, not nested under `timing`.
+- `trajectory.json` is generated by Harbor's `_convert_events_to_trajectory()` post-processing, NOT by Claude Code CLI directly.
+- SWE-bench `test.sh` redirects stdout to a temp file -- Harbor never sees the parser's `START_TEST_OUTPUT`/`END_TEST_OUTPUT` markers via its normal capture.
+
+### Validation / Scoring
+- `validators.py` is duplicated across `ccb_build` tasks. Changes must be applied to **all copies** (verify with `sha256sum`).
+- Install scripts that print "INSTALL_SUCCESS" regardless of actual outcome are common. Always verify the binary exists and is executable.
+- Agent completing in **<2 seconds** = agent never installed/ran (smoke test heuristic).
+
+### Git / Auth
+- `gh auth refresh` without `-s <scope>` is a no-op for adding scopes. Must use `gh auth refresh -h github.com -s write:packages` explicitly.
+- Environment variables must be **explicitly exported** for Harbor subprocesses. Use `set -a` before sourcing `.env.local`.
+- GitHub push protection blocks synthetic/fake API keys in test data. Use `git reset --soft origin/main` to squash intermediate commits that contained fake credentials.
+
 ## Maintenance
 - Root and local `AGENTS.md` / `CLAUDE.md` files are generated from sources in `docs/ops/`.
 - `docs/START_HERE_BY_TASK.md` is generated from `docs/ops/task_routes.json`.
diff --git a/docs/ops/SCRIPT_INDEX.md b/docs/ops/SCRIPT_INDEX.md
@@ -111,7 +111,6 @@ Generated from `scripts/registry.json` by `scripts/generate_script_index.py`.
 
 ## Infra & Mirrors
 
-- `scripts/build_conversation_db.py` - Infrastructure or mirror management script for build conversation db.
 - `scripts/build_daytona_registry.py` - Infrastructure or mirror management script for build daytona registry.
 - `scripts/build_linux_base_images.sh` - Infrastructure or mirror management script for build linux base images.
 - `scripts/create_mcp_expansion_mirrors.sh` - Infrastructure or mirror management script for create mcp expansion mirrors.
@@ -176,9 +175,6 @@ Generated from `scripts/registry.json` by `scripts/generate_script_index.py`.
 - `scripts/backfill_triage_from_manifest.py` [one_off] - Historical one-off script: backfill triage from manifest.
 - `scripts/check_harness_readiness.py` - Utility script for check harness readiness.
 - `scripts/compare_contextbench_results.py` - Utility script for compare contextbench results.
-- `scripts/compare_ir_old_vs_new_gt.py` - Utility script for compare ir old vs new gt.
-- `scripts/compare_old_new_ground_truth.py` - Utility script for compare old new ground truth.
-- `scripts/compute_analysis_ir_metrics.py` - Utility script for compute analysis ir metrics.
 - `scripts/compute_bootstrap_cis.py` - Utility script for compute bootstrap cis.
 - `scripts/context_retrieval_agent.py` - Utility script for context retrieval agent.
 - `scripts/control_plane.py` - Utility script for control plane.
@@ -188,16 +184,13 @@ Generated from `scripts/registry.json` by `scripts/generate_script_index.py`.
 - `scripts/daytona_curator_runner.py` - Utility script for daytona curator runner.
 - `scripts/daytona_poc_runner.py` - Utility script for daytona poc runner.
 - `scripts/daytona_runner.py` - Utility script for daytona runner.
-- `scripts/daytona_snapshot_cleanup.py` - Utility script for daytona snapshot cleanup.
 - `scripts/dependeval_eval_dr.py` - Utility script for dependeval eval dr.
 - `scripts/dependeval_eval_me.py` - Utility script for dependeval eval me.
 - `scripts/docgen_quality_sweep.py` - Utility script for docgen quality sweep.
 - `scripts/doe_power_curves.py` - Utility script for doe power curves.
 - `scripts/doe_select_tasks.py` - Utility script for doe select tasks.
 - `scripts/ds_hybrid_retrieval.py` - Utility script for ds hybrid retrieval.
 - `scripts/ds_wrapper.sh` - Utility script for ds wrapper.
-- `scripts/export_conversation_blog_assets.py` - Utility script for export conversation blog assets.
-- `scripts/export_engineering_diary_assets.py` - Utility script for export engineering diary assets.
 - `scripts/export_official_results.py` - Utility script for export official results.
 - `scripts/extract_analysis_metrics.py` - Utility script for extract analysis metrics.
 - `scripts/extract_build_diary.py` - Utility script for extract build diary.
@@ -221,8 +214,6 @@ Generated from `scripts/registry.json` by `scripts/generate_script_index.py`.
 - `scripts/plot_build_diary.py` - Utility script for plot build diary.
 - `scripts/plot_build_diary_supplementary.py` - Utility script for plot build diary supplementary.
 - `scripts/plot_build_narrative.py` - Utility script for plot build narrative.
-- `scripts/plot_conversation_blog_svgs.py` - Utility script for plot conversation blog svgs.
-- `scripts/plot_csb_mcp_blog_figures.py` - Utility script for plot csb mcp blog figures.
 - `scripts/prepare_analysis_runs.py` - Utility script for prepare analysis runs.
 - `scripts/promote_agent_oracles.py` - Utility script for promote agent oracles.
 - `scripts/promote_blocked.py` - Utility script for promote blocked.
diff --git a/scripts/registry.json b/scripts/registry.json
@@ -162,14 +162,6 @@
       "language": "python",
       "summary": "Historical one-off script: backfill triage from manifest."
     },
-    {
-      "name": "build_conversation_db.py",
-      "path": "scripts/build_conversation_db.py",
-      "category": "infra_mirrors",
-      "status": "maintained",
-      "language": "python",
-      "summary": "Infrastructure or mirror management script for build conversation db."
-    },
     {
       "name": "build_daytona_registry.py",
       "path": "scripts/build_daytona_registry.py",
@@ -218,22 +210,6 @@
       "language": "python",
       "summary": "Utility script for compare contextbench results."
     },
-    {
-      "name": "compare_ir_old_vs_new_gt.py",
-      "path": "scripts/compare_ir_old_vs_new_gt.py",
-      "category": "misc",
-      "status": "maintained",
-      "language": "python",
-      "summary": "Utility script for compare ir old vs new gt."
-    },
-    {
-      "name": "compare_old_new_ground_truth.py",
-      "path": "scripts/compare_old_new_ground_truth.py",
-      "category": "misc",
-      "status": "maintained",
-      "language": "python",
-      "summary": "Utility script for compare old new ground truth."
-    },
     {
       "name": "comprehensive_analysis.py",
       "path": "scripts/comprehensive_analysis.py",
@@ -242,14 +218,6 @@
       "language": "python",
       "summary": "Analysis/comparison script for comprehensive analysis."
     },
-    {
-      "name": "compute_analysis_ir_metrics.py",
-      "path": "scripts/compute_analysis_ir_metrics.py",
-      "category": "misc",
-      "status": "maintained",
-      "language": "python",
-      "summary": "Utility script for compute analysis ir metrics."
-    },
     {
       "name": "compute_bootstrap_cis.py",
       "path": "scripts/compute_bootstrap_cis.py",
@@ -426,14 +394,6 @@
       "language": "python",
       "summary": "Utility script for daytona runner."
     },
-    {
-      "name": "daytona_snapshot_cleanup.py",
-      "path": "scripts/daytona_snapshot_cleanup.py",
-      "category": "misc",
-      "status": "maintained",
-      "language": "python",
-      "summary": "Utility script for daytona snapshot cleanup."
-    },
     {
       "name": "dependeval_eval_dr.py",
       "path": "scripts/dependeval_eval_dr.py",
@@ -530,22 +490,6 @@
       "language": "python",
       "summary": "Helper library/wrapper used by other scripts (eval matrix)."
     },
-    {
-      "name": "export_conversation_blog_assets.py",
-      "path": "scripts/export_conversation_blog_assets.py",
-      "category": "misc",
-      "status": "maintained",
-      "language": "python",
-      "summary": "Utility script for export conversation blog assets."
-    },
-    {
-      "name": "export_engineering_diary_assets.py",
-      "path": "scripts/export_engineering_diary_assets.py",
-      "category": "misc",
-      "status": "maintained",
-      "language": "python",
-      "summary": "Utility script for export engineering diary assets."
-    },
     {
       "name": "export_official_results.py",
       "path": "scripts/export_official_results.py",
@@ -1074,22 +1018,6 @@
       "language": "python",
       "summary": "Utility script for plot build narrative."
     },
-    {
-      "name": "plot_conversation_blog_svgs.py",
-      "path": "scripts/plot_conversation_blog_svgs.py",
-      "category": "misc",
-      "status": "maintained",
-      "language": "python",
-      "summary": "Utility script for plot conversation blog svgs."
-    },
-    {
-      "name": "plot_csb_mcp_blog_figures.py",
-      "path": "scripts/plot_csb_mcp_blog_figures.py",
-      "category": "misc",
-      "status": "maintained",
-      "language": "python",
-      "summary": "Utility script for plot csb mcp blog figures."
-    },
     {
       "name": "prebuild_images.sh",
       "path": "scripts/prebuild_images.sh",
@@ -1648,10 +1576,10 @@
     "core_operations": 13,
     "data_management": 10,
     "generation": 7,
-    "infra_mirrors": 20,
+    "infra_mirrors": 19,
     "library_helpers": 7,
     "migration": 4,
-    "misc": 88,
+    "misc": 80,
     "qa_quality": 10,
     "submission_reporting": 7,
     "task_creation_selection": 13,