Merge remote-tracking branch 'origin/main'

sjarmak · sjarmak · commit 2c0667c03127 · 2026-03-07T14:23:46.000Z
# Conflicts:
#	docs/ops/SCRIPT_INDEX.md
#	scripts/registry.json
diff --git a/.gitignore b/.gitignore
@@ -57,6 +57,8 @@ scripts/plot_csb_mcp_blog_figures.py
 ralph/
 ralph-*/
 reports/
+!reports/nightly/
+!reports/nightly/**
 eval_reports/
 tmp/
 *.log
diff --git a/AGENTS.md b/AGENTS.md
@@ -51,6 +51,45 @@ curl -fsSL https://raw.githubusercontent.com/steveyegge/beads/main/scripts/insta
 - `docs/reference/README.md` - stable specs and reference docs
 - `docs/explanations/README.md` - rationale and context docs
 
+## Common Gotchas (from session history)
+
+### Documentation Generation
+- **NEVER edit root `CLAUDE.md` or `AGENTS.md` directly.** Edit canonical sources under `docs/ops/` and regenerate. Direct edits cause `agent_guides_drift` failures in `repo_health.py`.
+- After removing directories from the repo, also clean references from `scripts/sync_agent_guides.py` (`LOCAL_SOURCES`) and `scripts/docs_consistency_check.py` (`LOCAL_AGENT_TARGET_DIRS`).
+
+### Daytona / Harbor
+- Daytona builds images from Dockerfiles at sandbox creation time (`Image.from_dockerfile()`). Dockerfile fixes pushed to `main` take effect on the next run -- **no manual image rebuild needed**. Exception: pre-built GHCR base images must be rebuilt separately.
+- Harbor+Daytona (`harbor run --environment-type daytona`) is the recommended production approach. The standalone `scripts/daytona_runner.py` is for quick validation only.
+- Use `BASELINE_MCP_TYPE` env var to control MCP configuration: `none`, `sourcegraph`, `deepsearch`.
+- Daytona SDK (`daytona_sdk`) over CLI for sandbox interaction -- the CLI is interactive-only for SSH.
+- GHCR packages default to **private** for personal accounts and visibility cannot be changed via API. Use the GitHub web UI or push to an org.
+
+### Docker / Build
+- `uv tool install` segfaults on ARM64/QEMU emulation. Use `pip install` instead, or switch to Daytona (native x86_64).
+- Build-push-clean pattern when building Docker images with limited disk (~45GB): build one image, push, then clean locally before the next.
+- Colons in agent names (e.g., `module:ClassName`) break Docker volume mounts. Sanitize paths: replace `:` with `__`.
+
+### MCP Configuration (inside sandboxes)
+- `.mcp.json` must be placed at `$CLAUDE_CONFIG_DIR` (typically `/logs/agent/sessions/`), not `/app/` or `/root/`.
+- Claude Code requires the `--mcp-config` CLI flag to load MCP config -- it does not auto-detect.
+- Inject MCP usage instructions into the task prompt. Agents won't use MCP tools just because they're available.
+- Set `NODE_TLS_REJECT_UNAUTHORIZED=0` for Node.js SSL in Docker containers (curl working does not mean Node.js fetch will work).
+
+### Harbor Result Format
+- Timing fields (`started_at`, `finished_at`) live at the **top level** of `result.json`, not nested under `timing`.
+- `trajectory.json` is generated by Harbor's `_convert_events_to_trajectory()` post-processing, NOT by Claude Code CLI directly.
+- SWE-bench `test.sh` redirects stdout to a temp file -- Harbor never sees the parser's `START_TEST_OUTPUT`/`END_TEST_OUTPUT` markers via its normal capture.
+
+### Validation / Scoring
+- `validators.py` is duplicated across `ccb_build` tasks. Changes must be applied to **all copies** (verify with `sha256sum`).
+- Install scripts that print "INSTALL_SUCCESS" regardless of actual outcome are common. Always verify the binary exists and is executable.
+- Agent completing in **<2 seconds** = agent never installed/ran (smoke test heuristic).
+
+### Git / Auth
+- `gh auth refresh` without `-s <scope>` is a no-op for adding scopes. Must use `gh auth refresh -h github.com -s write:packages` explicitly.
+- Environment variables must be **explicitly exported** for Harbor subprocesses. Use `set -a` before sourcing `.env.local`.
+- GitHub push protection blocks synthetic/fake API keys in test data. Use `git reset --soft origin/main` to squash intermediate commits that contained fake credentials.
+
 ## Maintenance
 - Root and local `AGENTS.md` / `CLAUDE.md` files are generated from sources in `docs/ops/`.
 - `docs/START_HERE_BY_TASK.md` is generated from `docs/ops/task_routes.json`.
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -51,6 +51,45 @@ curl -fsSL https://raw.githubusercontent.com/steveyegge/beads/main/scripts/insta
 - `docs/reference/README.md` - stable specs and reference docs
 - `docs/explanations/README.md` - rationale and context docs
 
+## Common Gotchas (from session history)
+
+### Documentation Generation
+- **NEVER edit root `CLAUDE.md` or `AGENTS.md` directly.** Edit canonical sources under `docs/ops/` and regenerate. Direct edits cause `agent_guides_drift` failures in `repo_health.py`.
+- After removing directories from the repo, also clean references from `scripts/sync_agent_guides.py` (`LOCAL_SOURCES`) and `scripts/docs_consistency_check.py` (`LOCAL_AGENT_TARGET_DIRS`).
+
+### Daytona / Harbor
+- Daytona builds images from Dockerfiles at sandbox creation time (`Image.from_dockerfile()`). Dockerfile fixes pushed to `main` take effect on the next run -- **no manual image rebuild needed**. Exception: pre-built GHCR base images must be rebuilt separately.
+- Harbor+Daytona (`harbor run --environment-type daytona`) is the recommended production approach. The standalone `scripts/daytona_runner.py` is for quick validation only.
+- Use `BASELINE_MCP_TYPE` env var to control MCP configuration: `none`, `sourcegraph`, `deepsearch`.
+- Daytona SDK (`daytona_sdk`) over CLI for sandbox interaction -- the CLI is interactive-only for SSH.
+- GHCR packages default to **private** for personal accounts and visibility cannot be changed via API. Use the GitHub web UI or push to an org.
+
+### Docker / Build
+- `uv tool install` segfaults on ARM64/QEMU emulation. Use `pip install` instead, or switch to Daytona (native x86_64).
+- Build-push-clean pattern when building Docker images with limited disk (~45GB): build one image, push, then clean locally before the next.
+- Colons in agent names (e.g., `module:ClassName`) break Docker volume mounts. Sanitize paths: replace `:` with `__`.
+
+### MCP Configuration (inside sandboxes)
+- `.mcp.json` must be placed at `$CLAUDE_CONFIG_DIR` (typically `/logs/agent/sessions/`), not `/app/` or `/root/`.
+- Claude Code requires the `--mcp-config` CLI flag to load MCP config -- it does not auto-detect.
+- Inject MCP usage instructions into the task prompt. Agents won't use MCP tools just because they're available.
+- Set `NODE_TLS_REJECT_UNAUTHORIZED=0` for Node.js SSL in Docker containers (curl working does not mean Node.js fetch will work).
+
+### Harbor Result Format
+- Timing fields (`started_at`, `finished_at`) live at the **top level** of `result.json`, not nested under `timing`.
+- `trajectory.json` is generated by Harbor's `_convert_events_to_trajectory()` post-processing, NOT by Claude Code CLI directly.
+- SWE-bench `test.sh` redirects stdout to a temp file -- Harbor never sees the parser's `START_TEST_OUTPUT`/`END_TEST_OUTPUT` markers via its normal capture.
+
+### Validation / Scoring
+- `validators.py` is duplicated across `ccb_build` tasks. Changes must be applied to **all copies** (verify with `sha256sum`).
+- Install scripts that print "INSTALL_SUCCESS" regardless of actual outcome are common. Always verify the binary exists and is executable.
+- Agent completing in **<2 seconds** = agent never installed/ran (smoke test heuristic).
+
+### Git / Auth
+- `gh auth refresh` without `-s <scope>` is a no-op for adding scopes. Must use `gh auth refresh -h github.com -s write:packages` explicitly.
+- Environment variables must be **explicitly exported** for Harbor subprocesses. Use `set -a` before sourcing `.env.local`.
+- GitHub push protection blocks synthetic/fake API keys in test data. Use `git reset --soft origin/main` to squash intermediate commits that contained fake credentials.
+
 ## Maintenance
 - Root and local `AGENTS.md` / `CLAUDE.md` files are generated from sources in `docs/ops/`.
 - `docs/START_HERE_BY_TASK.md` is generated from `docs/ops/task_routes.json`.
diff --git a/docs/ops/ROOT_AGENT_GUIDE.md b/docs/ops/ROOT_AGENT_GUIDE.md
@@ -51,6 +51,45 @@ curl -fsSL https://raw.githubusercontent.com/steveyegge/beads/main/scripts/insta
 - `docs/reference/README.md` - stable specs and reference docs
 - `docs/explanations/README.md` - rationale and context docs
 
+## Common Gotchas (from session history)
+
+### Documentation Generation
+- **NEVER edit root `CLAUDE.md` or `AGENTS.md` directly.** Edit canonical sources under `docs/ops/` and regenerate. Direct edits cause `agent_guides_drift` failures in `repo_health.py`.
+- After removing directories from the repo, also clean references from `scripts/sync_agent_guides.py` (`LOCAL_SOURCES`) and `scripts/docs_consistency_check.py` (`LOCAL_AGENT_TARGET_DIRS`).
+
+### Daytona / Harbor
+- Daytona builds images from Dockerfiles at sandbox creation time (`Image.from_dockerfile()`). Dockerfile fixes pushed to `main` take effect on the next run -- **no manual image rebuild needed**. Exception: pre-built GHCR base images must be rebuilt separately.
+- Harbor+Daytona (`harbor run --environment-type daytona`) is the recommended production approach. The standalone `scripts/daytona_runner.py` is for quick validation only.
+- Use `BASELINE_MCP_TYPE` env var to control MCP configuration: `none`, `sourcegraph`, `deepsearch`.
+- Daytona SDK (`daytona_sdk`) over CLI for sandbox interaction -- the CLI is interactive-only for SSH.
+- GHCR packages default to **private** for personal accounts and visibility cannot be changed via API. Use the GitHub web UI or push to an org.
+
+### Docker / Build
+- `uv tool install` segfaults on ARM64/QEMU emulation. Use `pip install` instead, or switch to Daytona (native x86_64).
+- Build-push-clean pattern when building Docker images with limited disk (~45GB): build one image, push, then clean locally before the next.
+- Colons in agent names (e.g., `module:ClassName`) break Docker volume mounts. Sanitize paths: replace `:` with `__`.
+
+### MCP Configuration (inside sandboxes)
+- `.mcp.json` must be placed at `$CLAUDE_CONFIG_DIR` (typically `/logs/agent/sessions/`), not `/app/` or `/root/`.
+- Claude Code requires the `--mcp-config` CLI flag to load MCP config -- it does not auto-detect.
+- Inject MCP usage instructions into the task prompt. Agents won't use MCP tools just because they're available.
+- Set `NODE_TLS_REJECT_UNAUTHORIZED=0` for Node.js SSL in Docker containers (curl working does not mean Node.js fetch will work).
+
+### Harbor Result Format
+- Timing fields (`started_at`, `finished_at`) live at the **top level** of `result.json`, not nested under `timing`.
+- `trajectory.json` is generated by Harbor's `_convert_events_to_trajectory()` post-processing, NOT by Claude Code CLI directly.
+- SWE-bench `test.sh` redirects stdout to a temp file -- Harbor never sees the parser's `START_TEST_OUTPUT`/`END_TEST_OUTPUT` markers via its normal capture.
+
+### Validation / Scoring
+- `validators.py` is duplicated across `ccb_build` tasks. Changes must be applied to **all copies** (verify with `sha256sum`).
+- Install scripts that print "INSTALL_SUCCESS" regardless of actual outcome are common. Always verify the binary exists and is executable.
+- Agent completing in **<2 seconds** = agent never installed/ran (smoke test heuristic).
+
+### Git / Auth
+- `gh auth refresh` without `-s <scope>` is a no-op for adding scopes. Must use `gh auth refresh -h github.com -s write:packages` explicitly.
+- Environment variables must be **explicitly exported** for Harbor subprocesses. Use `set -a` before sourcing `.env.local`.
+- GitHub push protection blocks synthetic/fake API keys in test data. Use `git reset --soft origin/main` to squash intermediate commits that contained fake credentials.
+
 ## Maintenance
 - Root and local `AGENTS.md` / `CLAUDE.md` files are generated from sources in `docs/ops/`.
 - `docs/START_HERE_BY_TASK.md` is generated from `docs/ops/task_routes.json`.
diff --git a/reports/nightly/2026-03-06-review.md b/reports/nightly/2026-03-06-review.md