Skip to content

Commit 6886d66

Browse files
committed
docs: add Common Gotchas section from session history review
Reviewed 8 Claude Code session transcripts (Dec 2025 - Mar 2026) and extracted actionable learnings into CLAUDE.md. Covers gotchas for: - Documentation generation (never edit generated files directly) - Daytona/Harbor (image builds, config modes, SDK vs CLI) - Docker/Build (QEMU segfaults, disk management, path sanitization) - MCP configuration (config paths, CLI flags, prompt injection) - Harbor result format (timing fields, trajectory generation) - Validation/scoring (duplicated validators, install verification) - Git/auth (scope management, env var export, push protection) Also regenerated stale script registry and SCRIPT_INDEX.md.
1 parent c7d78c5 commit 6886d66

File tree

5 files changed

+119
-83
lines changed

5 files changed

+119
-83
lines changed

AGENTS.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,45 @@ curl -fsSL https://raw.githubusercontent.com/steveyegge/beads/main/scripts/insta
5151
- `docs/reference/README.md` - stable specs and reference docs
5252
- `docs/explanations/README.md` - rationale and context docs
5353

54+
## Common Gotchas (from session history)
55+
56+
### Documentation Generation
57+
- **NEVER edit root `CLAUDE.md` or `AGENTS.md` directly.** Edit canonical sources under `docs/ops/` and regenerate. Direct edits cause `agent_guides_drift` failures in `repo_health.py`.
58+
- After removing directories from the repo, also clean references from `scripts/sync_agent_guides.py` (`LOCAL_SOURCES`) and `scripts/docs_consistency_check.py` (`LOCAL_AGENT_TARGET_DIRS`).
59+
60+
### Daytona / Harbor
61+
- Daytona builds images from Dockerfiles at sandbox creation time (`Image.from_dockerfile()`). Dockerfile fixes pushed to `main` take effect on the next run -- **no manual image rebuild needed**. Exception: pre-built GHCR base images must be rebuilt separately.
62+
- Harbor+Daytona (`harbor run --environment-type daytona`) is the recommended production approach. The standalone `scripts/daytona_runner.py` is for quick validation only.
63+
- Use `BASELINE_MCP_TYPE` env var to control MCP configuration: `none`, `sourcegraph`, `deepsearch`.
64+
- Daytona SDK (`daytona_sdk`) over CLI for sandbox interaction -- the CLI is interactive-only for SSH.
65+
- GHCR packages default to **private** for personal accounts and visibility cannot be changed via API. Use the GitHub web UI or push to an org.
66+
67+
### Docker / Build
68+
- `uv tool install` segfaults on ARM64/QEMU emulation. Use `pip install` instead, or switch to Daytona (native x86_64).
69+
- Build-push-clean pattern when building Docker images with limited disk (~45GB): build one image, push, then clean locally before the next.
70+
- Colons in agent names (e.g., `module:ClassName`) break Docker volume mounts. Sanitize paths: replace `:` with `__`.
71+
72+
### MCP Configuration (inside sandboxes)
73+
- `.mcp.json` must be placed at `$CLAUDE_CONFIG_DIR` (typically `/logs/agent/sessions/`), not `/app/` or `/root/`.
74+
- Claude Code requires the `--mcp-config` CLI flag to load MCP config -- it does not auto-detect.
75+
- Inject MCP usage instructions into the task prompt. Agents won't use MCP tools just because they're available.
76+
- Set `NODE_TLS_REJECT_UNAUTHORIZED=0` for Node.js SSL in Docker containers (curl working does not mean Node.js fetch will work).
77+
78+
### Harbor Result Format
79+
- Timing fields (`started_at`, `finished_at`) live at the **top level** of `result.json`, not nested under `timing`.
80+
- `trajectory.json` is generated by Harbor's `_convert_events_to_trajectory()` post-processing, NOT by Claude Code CLI directly.
81+
- SWE-bench `test.sh` redirects stdout to a temp file -- Harbor never sees the parser's `START_TEST_OUTPUT`/`END_TEST_OUTPUT` markers via its normal capture.
82+
83+
### Validation / Scoring
84+
- `validators.py` is duplicated across `ccb_build` tasks. Changes must be applied to **all copies** (verify with `sha256sum`).
85+
- Install scripts that print "INSTALL_SUCCESS" regardless of actual outcome are common. Always verify the binary exists and is executable.
86+
- Agent completing in **<2 seconds** = agent never installed/ran (smoke test heuristic).
87+
88+
### Git / Auth
89+
- `gh auth refresh` without `-s <scope>` is a no-op for adding scopes. Must use `gh auth refresh -h github.com -s write:packages` explicitly.
90+
- Environment variables must be **explicitly exported** for Harbor subprocesses. Use `set -a` before sourcing `.env.local`.
91+
- GitHub push protection blocks synthetic/fake API keys in test data. Use `git reset --soft origin/main` to squash intermediate commits that contained fake credentials.
92+
5493
## Maintenance
5594
- Root and local `AGENTS.md` / `CLAUDE.md` files are generated from sources in `docs/ops/`.
5695
- `docs/START_HERE_BY_TASK.md` is generated from `docs/ops/task_routes.json`.

CLAUDE.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,45 @@ curl -fsSL https://raw.githubusercontent.com/steveyegge/beads/main/scripts/insta
5151
- `docs/reference/README.md` - stable specs and reference docs
5252
- `docs/explanations/README.md` - rationale and context docs
5353

54+
## Common Gotchas (from session history)
55+
56+
### Documentation Generation
57+
- **NEVER edit root `CLAUDE.md` or `AGENTS.md` directly.** Edit canonical sources under `docs/ops/` and regenerate. Direct edits cause `agent_guides_drift` failures in `repo_health.py`.
58+
- After removing directories from the repo, also clean references from `scripts/sync_agent_guides.py` (`LOCAL_SOURCES`) and `scripts/docs_consistency_check.py` (`LOCAL_AGENT_TARGET_DIRS`).
59+
60+
### Daytona / Harbor
61+
- Daytona builds images from Dockerfiles at sandbox creation time (`Image.from_dockerfile()`). Dockerfile fixes pushed to `main` take effect on the next run -- **no manual image rebuild needed**. Exception: pre-built GHCR base images must be rebuilt separately.
62+
- Harbor+Daytona (`harbor run --environment-type daytona`) is the recommended production approach. The standalone `scripts/daytona_runner.py` is for quick validation only.
63+
- Use `BASELINE_MCP_TYPE` env var to control MCP configuration: `none`, `sourcegraph`, `deepsearch`.
64+
- Daytona SDK (`daytona_sdk`) over CLI for sandbox interaction -- the CLI is interactive-only for SSH.
65+
- GHCR packages default to **private** for personal accounts and visibility cannot be changed via API. Use the GitHub web UI or push to an org.
66+
67+
### Docker / Build
68+
- `uv tool install` segfaults on ARM64/QEMU emulation. Use `pip install` instead, or switch to Daytona (native x86_64).
69+
- Build-push-clean pattern when building Docker images with limited disk (~45GB): build one image, push, then clean locally before the next.
70+
- Colons in agent names (e.g., `module:ClassName`) break Docker volume mounts. Sanitize paths: replace `:` with `__`.
71+
72+
### MCP Configuration (inside sandboxes)
73+
- `.mcp.json` must be placed at `$CLAUDE_CONFIG_DIR` (typically `/logs/agent/sessions/`), not `/app/` or `/root/`.
74+
- Claude Code requires the `--mcp-config` CLI flag to load MCP config -- it does not auto-detect.
75+
- Inject MCP usage instructions into the task prompt. Agents won't use MCP tools just because they're available.
76+
- Set `NODE_TLS_REJECT_UNAUTHORIZED=0` for Node.js SSL in Docker containers (curl working does not mean Node.js fetch will work).
77+
78+
### Harbor Result Format
79+
- Timing fields (`started_at`, `finished_at`) live at the **top level** of `result.json`, not nested under `timing`.
80+
- `trajectory.json` is generated by Harbor's `_convert_events_to_trajectory()` post-processing, NOT by Claude Code CLI directly.
81+
- SWE-bench `test.sh` redirects stdout to a temp file -- Harbor never sees the parser's `START_TEST_OUTPUT`/`END_TEST_OUTPUT` markers via its normal capture.
82+
83+
### Validation / Scoring
84+
- `validators.py` is duplicated across `ccb_build` tasks. Changes must be applied to **all copies** (verify with `sha256sum`).
85+
- Install scripts that print "INSTALL_SUCCESS" regardless of actual outcome are common. Always verify the binary exists and is executable.
86+
- Agent completing in **<2 seconds** = agent never installed/ran (smoke test heuristic).
87+
88+
### Git / Auth
89+
- `gh auth refresh` without `-s <scope>` is a no-op for adding scopes. Must use `gh auth refresh -h github.com -s write:packages` explicitly.
90+
- Environment variables must be **explicitly exported** for Harbor subprocesses. Use `set -a` before sourcing `.env.local`.
91+
- GitHub push protection blocks synthetic/fake API keys in test data. Use `git reset --soft origin/main` to squash intermediate commits that contained fake credentials.
92+
5493
## Maintenance
5594
- Root and local `AGENTS.md` / `CLAUDE.md` files are generated from sources in `docs/ops/`.
5695
- `docs/START_HERE_BY_TASK.md` is generated from `docs/ops/task_routes.json`.

docs/ops/ROOT_AGENT_GUIDE.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,45 @@ curl -fsSL https://raw.githubusercontent.com/steveyegge/beads/main/scripts/insta
5151
- `docs/reference/README.md` - stable specs and reference docs
5252
- `docs/explanations/README.md` - rationale and context docs
5353

54+
## Common Gotchas (from session history)
55+
56+
### Documentation Generation
57+
- **NEVER edit root `CLAUDE.md` or `AGENTS.md` directly.** Edit canonical sources under `docs/ops/` and regenerate. Direct edits cause `agent_guides_drift` failures in `repo_health.py`.
58+
- After removing directories from the repo, also clean references from `scripts/sync_agent_guides.py` (`LOCAL_SOURCES`) and `scripts/docs_consistency_check.py` (`LOCAL_AGENT_TARGET_DIRS`).
59+
60+
### Daytona / Harbor
61+
- Daytona builds images from Dockerfiles at sandbox creation time (`Image.from_dockerfile()`). Dockerfile fixes pushed to `main` take effect on the next run -- **no manual image rebuild needed**. Exception: pre-built GHCR base images must be rebuilt separately.
62+
- Harbor+Daytona (`harbor run --environment-type daytona`) is the recommended production approach. The standalone `scripts/daytona_runner.py` is for quick validation only.
63+
- Use `BASELINE_MCP_TYPE` env var to control MCP configuration: `none`, `sourcegraph`, `deepsearch`.
64+
- Daytona SDK (`daytona_sdk`) over CLI for sandbox interaction -- the CLI is interactive-only for SSH.
65+
- GHCR packages default to **private** for personal accounts and visibility cannot be changed via API. Use the GitHub web UI or push to an org.
66+
67+
### Docker / Build
68+
- `uv tool install` segfaults on ARM64/QEMU emulation. Use `pip install` instead, or switch to Daytona (native x86_64).
69+
- Build-push-clean pattern when building Docker images with limited disk (~45GB): build one image, push, then clean locally before the next.
70+
- Colons in agent names (e.g., `module:ClassName`) break Docker volume mounts. Sanitize paths: replace `:` with `__`.
71+
72+
### MCP Configuration (inside sandboxes)
73+
- `.mcp.json` must be placed at `$CLAUDE_CONFIG_DIR` (typically `/logs/agent/sessions/`), not `/app/` or `/root/`.
74+
- Claude Code requires the `--mcp-config` CLI flag to load MCP config -- it does not auto-detect.
75+
- Inject MCP usage instructions into the task prompt. Agents won't use MCP tools just because they're available.
76+
- Set `NODE_TLS_REJECT_UNAUTHORIZED=0` for Node.js SSL in Docker containers (curl working does not mean Node.js fetch will work).
77+
78+
### Harbor Result Format
79+
- Timing fields (`started_at`, `finished_at`) live at the **top level** of `result.json`, not nested under `timing`.
80+
- `trajectory.json` is generated by Harbor's `_convert_events_to_trajectory()` post-processing, NOT by Claude Code CLI directly.
81+
- SWE-bench `test.sh` redirects stdout to a temp file -- Harbor never sees the parser's `START_TEST_OUTPUT`/`END_TEST_OUTPUT` markers via its normal capture.
82+
83+
### Validation / Scoring
84+
- `validators.py` is duplicated across `ccb_build` tasks. Changes must be applied to **all copies** (verify with `sha256sum`).
85+
- Install scripts that print "INSTALL_SUCCESS" regardless of actual outcome are common. Always verify the binary exists and is executable.
86+
- Agent completing in **<2 seconds** = agent never installed/ran (smoke test heuristic).
87+
88+
### Git / Auth
89+
- `gh auth refresh` without `-s <scope>` is a no-op for adding scopes. Must use `gh auth refresh -h github.com -s write:packages` explicitly.
90+
- Environment variables must be **explicitly exported** for Harbor subprocesses. Use `set -a` before sourcing `.env.local`.
91+
- GitHub push protection blocks synthetic/fake API keys in test data. Use `git reset --soft origin/main` to squash intermediate commits that contained fake credentials.
92+
5493
## Maintenance
5594
- Root and local `AGENTS.md` / `CLAUDE.md` files are generated from sources in `docs/ops/`.
5695
- `docs/START_HERE_BY_TASK.md` is generated from `docs/ops/task_routes.json`.

docs/ops/SCRIPT_INDEX.md

Lines changed: 0 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -111,7 +111,6 @@ Generated from `scripts/registry.json` by `scripts/generate_script_index.py`.
111111

112112
## Infra & Mirrors
113113

114-
- `scripts/build_conversation_db.py` - Infrastructure or mirror management script for build conversation db.
115114
- `scripts/build_daytona_registry.py` - Infrastructure or mirror management script for build daytona registry.
116115
- `scripts/build_linux_base_images.sh` - Infrastructure or mirror management script for build linux base images.
117116
- `scripts/create_mcp_expansion_mirrors.sh` - Infrastructure or mirror management script for create mcp expansion mirrors.
@@ -176,9 +175,6 @@ Generated from `scripts/registry.json` by `scripts/generate_script_index.py`.
176175
- `scripts/backfill_triage_from_manifest.py` [one_off] - Historical one-off script: backfill triage from manifest.
177176
- `scripts/check_harness_readiness.py` - Utility script for check harness readiness.
178177
- `scripts/compare_contextbench_results.py` - Utility script for compare contextbench results.
179-
- `scripts/compare_ir_old_vs_new_gt.py` - Utility script for compare ir old vs new gt.
180-
- `scripts/compare_old_new_ground_truth.py` - Utility script for compare old new ground truth.
181-
- `scripts/compute_analysis_ir_metrics.py` - Utility script for compute analysis ir metrics.
182178
- `scripts/compute_bootstrap_cis.py` - Utility script for compute bootstrap cis.
183179
- `scripts/context_retrieval_agent.py` - Utility script for context retrieval agent.
184180
- `scripts/control_plane.py` - Utility script for control plane.
@@ -188,16 +184,13 @@ Generated from `scripts/registry.json` by `scripts/generate_script_index.py`.
188184
- `scripts/daytona_curator_runner.py` - Utility script for daytona curator runner.
189185
- `scripts/daytona_poc_runner.py` - Utility script for daytona poc runner.
190186
- `scripts/daytona_runner.py` - Utility script for daytona runner.
191-
- `scripts/daytona_snapshot_cleanup.py` - Utility script for daytona snapshot cleanup.
192187
- `scripts/dependeval_eval_dr.py` - Utility script for dependeval eval dr.
193188
- `scripts/dependeval_eval_me.py` - Utility script for dependeval eval me.
194189
- `scripts/docgen_quality_sweep.py` - Utility script for docgen quality sweep.
195190
- `scripts/doe_power_curves.py` - Utility script for doe power curves.
196191
- `scripts/doe_select_tasks.py` - Utility script for doe select tasks.
197192
- `scripts/ds_hybrid_retrieval.py` - Utility script for ds hybrid retrieval.
198193
- `scripts/ds_wrapper.sh` - Utility script for ds wrapper.
199-
- `scripts/export_conversation_blog_assets.py` - Utility script for export conversation blog assets.
200-
- `scripts/export_engineering_diary_assets.py` - Utility script for export engineering diary assets.
201194
- `scripts/export_official_results.py` - Utility script for export official results.
202195
- `scripts/extract_analysis_metrics.py` - Utility script for extract analysis metrics.
203196
- `scripts/extract_build_diary.py` - Utility script for extract build diary.
@@ -221,8 +214,6 @@ Generated from `scripts/registry.json` by `scripts/generate_script_index.py`.
221214
- `scripts/plot_build_diary.py` - Utility script for plot build diary.
222215
- `scripts/plot_build_diary_supplementary.py` - Utility script for plot build diary supplementary.
223216
- `scripts/plot_build_narrative.py` - Utility script for plot build narrative.
224-
- `scripts/plot_conversation_blog_svgs.py` - Utility script for plot conversation blog svgs.
225-
- `scripts/plot_csb_mcp_blog_figures.py` - Utility script for plot csb mcp blog figures.
226217
- `scripts/prepare_analysis_runs.py` - Utility script for prepare analysis runs.
227218
- `scripts/promote_agent_oracles.py` - Utility script for promote agent oracles.
228219
- `scripts/promote_blocked.py` - Utility script for promote blocked.

scripts/registry.json

Lines changed: 2 additions & 74 deletions
Original file line numberDiff line numberDiff line change
@@ -162,14 +162,6 @@
162162
"language": "python",
163163
"summary": "Historical one-off script: backfill triage from manifest."
164164
},
165-
{
166-
"name": "build_conversation_db.py",
167-
"path": "scripts/build_conversation_db.py",
168-
"category": "infra_mirrors",
169-
"status": "maintained",
170-
"language": "python",
171-
"summary": "Infrastructure or mirror management script for build conversation db."
172-
},
173165
{
174166
"name": "build_daytona_registry.py",
175167
"path": "scripts/build_daytona_registry.py",
@@ -218,22 +210,6 @@
218210
"language": "python",
219211
"summary": "Utility script for compare contextbench results."
220212
},
221-
{
222-
"name": "compare_ir_old_vs_new_gt.py",
223-
"path": "scripts/compare_ir_old_vs_new_gt.py",
224-
"category": "misc",
225-
"status": "maintained",
226-
"language": "python",
227-
"summary": "Utility script for compare ir old vs new gt."
228-
},
229-
{
230-
"name": "compare_old_new_ground_truth.py",
231-
"path": "scripts/compare_old_new_ground_truth.py",
232-
"category": "misc",
233-
"status": "maintained",
234-
"language": "python",
235-
"summary": "Utility script for compare old new ground truth."
236-
},
237213
{
238214
"name": "comprehensive_analysis.py",
239215
"path": "scripts/comprehensive_analysis.py",
@@ -242,14 +218,6 @@
242218
"language": "python",
243219
"summary": "Analysis/comparison script for comprehensive analysis."
244220
},
245-
{
246-
"name": "compute_analysis_ir_metrics.py",
247-
"path": "scripts/compute_analysis_ir_metrics.py",
248-
"category": "misc",
249-
"status": "maintained",
250-
"language": "python",
251-
"summary": "Utility script for compute analysis ir metrics."
252-
},
253221
{
254222
"name": "compute_bootstrap_cis.py",
255223
"path": "scripts/compute_bootstrap_cis.py",
@@ -426,14 +394,6 @@
426394
"language": "python",
427395
"summary": "Utility script for daytona runner."
428396
},
429-
{
430-
"name": "daytona_snapshot_cleanup.py",
431-
"path": "scripts/daytona_snapshot_cleanup.py",
432-
"category": "misc",
433-
"status": "maintained",
434-
"language": "python",
435-
"summary": "Utility script for daytona snapshot cleanup."
436-
},
437397
{
438398
"name": "dependeval_eval_dr.py",
439399
"path": "scripts/dependeval_eval_dr.py",
@@ -530,22 +490,6 @@
530490
"language": "python",
531491
"summary": "Helper library/wrapper used by other scripts (eval matrix)."
532492
},
533-
{
534-
"name": "export_conversation_blog_assets.py",
535-
"path": "scripts/export_conversation_blog_assets.py",
536-
"category": "misc",
537-
"status": "maintained",
538-
"language": "python",
539-
"summary": "Utility script for export conversation blog assets."
540-
},
541-
{
542-
"name": "export_engineering_diary_assets.py",
543-
"path": "scripts/export_engineering_diary_assets.py",
544-
"category": "misc",
545-
"status": "maintained",
546-
"language": "python",
547-
"summary": "Utility script for export engineering diary assets."
548-
},
549493
{
550494
"name": "export_official_results.py",
551495
"path": "scripts/export_official_results.py",
@@ -1074,22 +1018,6 @@
10741018
"language": "python",
10751019
"summary": "Utility script for plot build narrative."
10761020
},
1077-
{
1078-
"name": "plot_conversation_blog_svgs.py",
1079-
"path": "scripts/plot_conversation_blog_svgs.py",
1080-
"category": "misc",
1081-
"status": "maintained",
1082-
"language": "python",
1083-
"summary": "Utility script for plot conversation blog svgs."
1084-
},
1085-
{
1086-
"name": "plot_csb_mcp_blog_figures.py",
1087-
"path": "scripts/plot_csb_mcp_blog_figures.py",
1088-
"category": "misc",
1089-
"status": "maintained",
1090-
"language": "python",
1091-
"summary": "Utility script for plot csb mcp blog figures."
1092-
},
10931021
{
10941022
"name": "prebuild_images.sh",
10951023
"path": "scripts/prebuild_images.sh",
@@ -1648,10 +1576,10 @@
16481576
"core_operations": 13,
16491577
"data_management": 10,
16501578
"generation": 7,
1651-
"infra_mirrors": 20,
1579+
"infra_mirrors": 19,
16521580
"library_helpers": 7,
16531581
"migration": 4,
1654-
"misc": 88,
1582+
"misc": 80,
16551583
"qa_quality": 10,
16561584
"submission_reporting": 7,
16571585
"task_creation_selection": 13,

0 commit comments

Comments
 (0)