feat: Daytona 125-parallel auto-detect + parallelism policy docs

sjarmak · claude · sjarmak · commit 6e7b136160a2 · 2026-03-01T18:07:30.000Z
- run_selected_tasks.sh: auto-detect HARBOR_ENV=daytona and set
  PARALLEL_TASKS=125 (Tier 3 limit: 125 concurrent sandboxes).
  Local Docker stays at 12 slots (3 accounts x 4 sessions).
- _common.sh: add Daytona transient error patterns and retry logic,
  fix token refresh output capture, add User-Agent header.
- Agent guides (source + generated): document parallelism policy
  — Daytona=125 concurrent, local Docker=12, never cap below 125.
  Updated sweap-images count from 21 to 18.

Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
diff --git a/AGENTS.md b/AGENTS.md
@@ -8,15 +8,16 @@ full operations manual.
 - All work happens on `main`. Do not create feature branches.
 - Every `harbor run` must be gated by interactive confirmation.
 - Before commit/push, run `python3 scripts/repo_health.py` (or `--quick` for docs/config-only changes).
+- **Daytona is the default execution environment.** Do not use local Docker unless a task is Daytona-incompatible (18 sweap-images tasks). See `docs/DAYTONA.md`.
+- **Parallelism**: Daytona runs at 125 concurrent sandboxes (auto-detected when `HARBOR_ENV=daytona`). Local Docker runs at 12 slots (3 accounts x 4 sessions). Never artificially cap Daytona parallelism below 125.
 
 ## Minimal Loading Policy
 - Default load order: this file + one relevant skill + one relevant doc.
 - Do not open broad catalogs (`docs/TASK_CATALOG.md`, large script lists, full reports) unless required.
 - Prefer directory-local `AGENTS.md` / `CLAUDE.md` when working under `scripts/`, `configs/`, `tasks/`, or `docs/`.
 
 ## Fast Routing By Intent
-- Launch or rerun benchmarks: `docs/START_HERE_BY_TASK.md` -> "Launch / Rerun Benchmarks"
-- Run benchmarks on Daytona (cloud, no Docker needed): `docs/DAYTONA.md`
+- Launch or rerun benchmarks: `docs/DAYTONA.md` (Daytona, preferred) or `docs/START_HERE_BY_TASK.md`
 - Monitor / status: `docs/START_HERE_BY_TASK.md` -> "Monitor Active Runs"
 - Triage failures: `docs/START_HERE_BY_TASK.md` -> "Triage Failed Tasks"
 - Compare configs / MCP impact / IR: `docs/START_HERE_BY_TASK.md` -> "Analyze Results"
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -8,15 +8,16 @@ full operations manual.
 - All work happens on `main`. Do not create feature branches.
 - Every `harbor run` must be gated by interactive confirmation.
 - Before commit/push, run `python3 scripts/repo_health.py` (or `--quick` for docs/config-only changes).
+- **Daytona is the default execution environment.** Do not use local Docker unless a task is Daytona-incompatible (18 sweap-images tasks). See `docs/DAYTONA.md`.
+- **Parallelism**: Daytona runs at 125 concurrent sandboxes (auto-detected when `HARBOR_ENV=daytona`). Local Docker runs at 12 slots (3 accounts x 4 sessions). Never artificially cap Daytona parallelism below 125.
 
 ## Minimal Loading Policy
 - Default load order: this file + one relevant skill + one relevant doc.
 - Do not open broad catalogs (`docs/TASK_CATALOG.md`, large script lists, full reports) unless required.
 - Prefer directory-local `AGENTS.md` / `CLAUDE.md` when working under `scripts/`, `configs/`, `tasks/`, or `docs/`.
 
 ## Fast Routing By Intent
-- Launch or rerun benchmarks: `docs/START_HERE_BY_TASK.md` -> "Launch / Rerun Benchmarks"
-- Run benchmarks on Daytona (cloud, no Docker needed): `docs/DAYTONA.md`
+- Launch or rerun benchmarks: `docs/DAYTONA.md` (Daytona, preferred) or `docs/START_HERE_BY_TASK.md`
 - Monitor / status: `docs/START_HERE_BY_TASK.md` -> "Monitor Active Runs"
 - Triage failures: `docs/START_HERE_BY_TASK.md` -> "Triage Failed Tasks"
 - Compare configs / MCP impact / IR: `docs/START_HERE_BY_TASK.md` -> "Analyze Results"
diff --git a/configs/AGENTS.md b/configs/AGENTS.md
@@ -7,11 +7,16 @@ Use this file when working in `configs/` launchers and run orchestration wrapper
 - Do not reintroduce `--yes` for `configs/run_selected_tasks.sh`.
 - Validate config naming and paired-run semantics via shared helpers in `configs/_common.sh`.
 
+## Parallelism Policy
+- **Daytona (default)**: 125 concurrent sandboxes. `run_selected_tasks.sh` auto-detects `HARBOR_ENV=daytona` and sets `--parallel 125`. Daytona's Tier 3 limit is 125 concurrent sandboxes (250 vCPU / 2 per sandbox). The job pool queue (`_wait_for_slot`) ensures we never exceed 125 in-flight processes.
+- **Local Docker**: 12 concurrent slots (3 accounts x 4 sessions). Only for sweap-images tasks (9 ccb_debug + 9 ccb_fix) that cannot run on Daytona.
+- **Do NOT hardcode `--parallel`** unless you have a specific reason. Let `run_selected_tasks.sh` auto-detect from the environment.
+
 ## Navigation Rules
 - Start with `configs/_common.sh` for shared run policy and confirmation behavior.
 - Use `configs/run_selected_tasks.sh` for selected-task execution flows.
 - Use `configs/*_2config.sh` wrappers for paired baseline/MCP runs.
-- For Daytona cloud execution, add `--environment-type daytona` to `harbor run` commands. See `docs/DAYTONA.md` for prerequisites and capacity planning.
+- **Daytona is the default execution environment** — all production and variance runs use `HARBOR_ENV=daytona`. Local Docker is only for the 18 sweap-images tasks that are Daytona-incompatible. See `docs/DAYTONA.md` for prerequisites and capacity planning.
 
 ## When Editing
 - Preserve `confirm_launch()` gating behavior.
diff --git a/configs/CLAUDE.md b/configs/CLAUDE.md
@@ -7,11 +7,16 @@ Use this file when working in `configs/` launchers and run orchestration wrapper
 - Do not reintroduce `--yes` for `configs/run_selected_tasks.sh`.
 - Validate config naming and paired-run semantics via shared helpers in `configs/_common.sh`.
 
+## Parallelism Policy
+- **Daytona (default)**: 125 concurrent sandboxes. `run_selected_tasks.sh` auto-detects `HARBOR_ENV=daytona` and sets `--parallel 125`. Daytona's Tier 3 limit is 125 concurrent sandboxes (250 vCPU / 2 per sandbox). The job pool queue (`_wait_for_slot`) ensures we never exceed 125 in-flight processes.
+- **Local Docker**: 12 concurrent slots (3 accounts x 4 sessions). Only for sweap-images tasks (9 ccb_debug + 9 ccb_fix) that cannot run on Daytona.
+- **Do NOT hardcode `--parallel`** unless you have a specific reason. Let `run_selected_tasks.sh` auto-detect from the environment.
+
 ## Navigation Rules
 - Start with `configs/_common.sh` for shared run policy and confirmation behavior.
 - Use `configs/run_selected_tasks.sh` for selected-task execution flows.
 - Use `configs/*_2config.sh` wrappers for paired baseline/MCP runs.
-- For Daytona cloud execution, add `--environment-type daytona` to `harbor run` commands. See `docs/DAYTONA.md` for prerequisites and capacity planning.
+- **Daytona is the default execution environment** — all production and variance runs use `HARBOR_ENV=daytona`. Local Docker is only for the 18 sweap-images tasks that are Daytona-incompatible. See `docs/DAYTONA.md` for prerequisites and capacity planning.
 
 ## When Editing
 - Preserve `confirm_launch()` gating behavior.
diff --git a/configs/_common.sh b/configs/_common.sh
@@ -300,7 +300,7 @@ payload = json.dumps({
 req = urllib.request.Request(
     "https://console.anthropic.com/api/oauth/token",
     data=payload,
-    headers={"Content-Type": "application/json"},
+    headers={"Content-Type": "application/json", "User-Agent": "ccb-token-refresh/1.0"},
     method="POST"
 )
 
@@ -525,7 +525,11 @@ TOKCHK
         expiring)
             # Token is expired or expiring soon — try refresh
             echo "    Token expiring soon — attempting refresh..."
-            if HOME="$account_home" refresh_claude_token 2>&1 | sed 's/^/    /'; then
+            local _refresh_out
+            _refresh_out=$(HOME="$account_home" refresh_claude_token 2>&1)
+            local _refresh_rc=$?
+            echo "$_refresh_out" | sed 's/^/    /'
+            if [ "$_refresh_rc" -eq 0 ]; then
                 echo "    Token refreshed successfully"
                 return 0
             else
@@ -656,6 +660,13 @@ ensure_fresh_token_all() {
 # If a failed task's log matches any of these, it's eligible for retry on a different account.
 RATE_LIMIT_PATTERNS="rate.limit|429|too many requests|throttl|overloaded|token.*refresh.*fail|credentials.*expired|403.*Forbidden|capacity|resource_exhausted"
 
+# Daytona-specific transient error patterns (sandbox resource contention).
+# These are retried on the SAME account (not an auth issue) with backoff.
+DAYTONA_TRANSIENT_PATTERNS="[Ss]andbox not found|[Ss]andbox.*missing|[Ss]andbox.*does not exist|DaytonaError"
+
+# Maximum number of Daytona retry attempts per task (with exponential backoff).
+DAYTONA_MAX_RETRIES=${DAYTONA_MAX_RETRIES:-3}
+
 # Check if a task failure looks like a rate-limit / account-exhaustion error.
 # Args: $1 = task_id, $2 = log directory (where ${task_id}.log might be)
 # Returns 0 if rate-limited, 1 otherwise.
@@ -683,6 +694,32 @@ _is_rate_limited() {
     return 1
 }
 
+# Check if a task failure looks like a transient Daytona sandbox error.
+# These are NOT account-related — retry on the same account after a delay.
+# Args: $1 = task_id, $2 = log directory
+# Returns 0 if Daytona transient error, 1 otherwise.
+_is_daytona_transient() {
+    local task_id=$1
+    local log_dir=$2
+
+    local log_file="${log_dir}/${task_id}.log"
+    if [ -f "$log_file" ]; then
+        if grep -qEi "$DAYTONA_TRANSIENT_PATTERNS" "$log_file" 2>/dev/null; then
+            return 0
+        fi
+    fi
+
+    local result_files
+    result_files=$(find "$log_dir" -name "result.json" -newer "$log_dir" -path "*${task_id}*" 2>/dev/null || true)
+    for rf in $result_files; do
+        if grep -qEi "$DAYTONA_TRANSIENT_PATTERNS" "$rf" 2>/dev/null; then
+            return 0
+        fi
+    done
+
+    return 1
+}
+
 # Pick a different account home than the one that failed.
 # Args: $1 = failed account home
 # Prints the alternate account home, or empty if only one account.
@@ -715,11 +752,16 @@ run_tasks_parallel() {
     local account_idx=0
     local num_accounts=${#CLAUDE_HOMES[@]}
 
-    # Retry queue: tasks to retry on a different account
+    # Retry queue: tasks to retry on a different account (rate-limit)
     local retry_tasks=()
     local retry_homes=()
     # Track which tasks already retried (prevent infinite loops)
     declare -A _retried
+    # Daytona retry queue: tasks to retry on same account after backoff
+    local daytona_retry_tasks=()
+    local daytona_retry_homes=()
+    # Track Daytona retry counts per task (up to DAYTONA_MAX_RETRIES)
+    declare -A _daytona_retry_count
 
     # Infer log directory from the calling script's jobs_subdir variable (if set)
     local _log_dir="${jobs_subdir:-}"
@@ -772,6 +814,21 @@ run_tasks_parallel() {
                             echo "WARNING: Task $_task rate-limited but no alternate account available"
                             failed=1
                         fi
+                    # Check if this is a Daytona transient error (sandbox not found)
+                    elif [ -n "$_log_dir" ] && \
+                         _is_daytona_transient "$_task" "$_log_dir"; then
+                        local _count="${_daytona_retry_count[$_task]:-0}"
+                        _count=$((_count + 1))
+                        if [ "$_count" -le "$DAYTONA_MAX_RETRIES" ]; then
+                            local _backoff=$(( 15 * _count ))  # 15s, 30s, 45s
+                            echo "DAYTONA RETRY ($_count/$DAYTONA_MAX_RETRIES): Task $_task sandbox error, retrying in ${_backoff}s on same account"
+                            _daytona_retry_count[$_task]=$_count
+                            daytona_retry_tasks+=("$_task")
+                            daytona_retry_homes+=("$_home")
+                        else
+                            echo "DAYTONA EXHAUSTED: Task $_task failed after $DAYTONA_MAX_RETRIES retries"
+                            failed=1
+                        fi
                     else
                         echo "ERROR: Task $_task (PID $done_pid) exited with code $_exit"
                         failed=1
@@ -833,8 +890,8 @@ run_tasks_parallel() {
         _launch "$task_id" "$task_home"
     done
 
-    # Wait for remaining tasks, then process retry queue
-    while [ ${#pids[@]} -gt 0 ] || [ ${#retry_tasks[@]} -gt 0 ]; do
+    # Wait for remaining tasks, then process retry queues
+    while [ ${#pids[@]} -gt 0 ] || [ ${#retry_tasks[@]} -gt 0 ] || [ ${#daytona_retry_tasks[@]} -gt 0 ]; do
         if [ "$abort" = true ]; then break; fi
 
         # Drain running PIDs
@@ -846,7 +903,7 @@ run_tasks_parallel() {
             fi
         done
 
-        # Launch any queued retries
+        # Launch any queued rate-limit retries (different account)
         if [ ${#retry_tasks[@]} -gt 0 ]; then
             echo "Processing ${#retry_tasks[@]} rate-limit retry task(s)..."
             for ri in "${!retry_tasks[@]}"; do
@@ -863,6 +920,29 @@ run_tasks_parallel() {
             retry_tasks=()
             retry_homes=()
         fi
+
+        # Launch any queued Daytona retries (same account, with backoff)
+        if [ ${#daytona_retry_tasks[@]} -gt 0 ]; then
+            local _dt_count=${#daytona_retry_tasks[@]}
+            echo "Processing $_dt_count Daytona retry task(s) with backoff..."
+            for ri in "${!daytona_retry_tasks[@]}"; do
+                if [ "$abort" = true ]; then break; fi
+                local _task="${daytona_retry_tasks[$ri]}"
+                local _backoff=$(( 15 * ${_daytona_retry_count[$_task]:-1} ))
+                echo "  Waiting ${_backoff}s before retrying $_task..."
+                sleep "$_backoff"
+                while [ ${#pids[@]} -ge $PARALLEL_JOBS ]; do
+                    _reap_one
+                    if [ "$abort" = true ]; then break 2; fi
+                    if [ -z "$done_pid" ]; then
+                        sleep 2
+                    fi
+                done
+                _launch "${daytona_retry_tasks[$ri]}" "${daytona_retry_homes[$ri]}"
+            done
+            daytona_retry_tasks=()
+            daytona_retry_homes=()
+        fi
     done
 
     # Restore real HOME
diff --git a/configs/run_selected_tasks.sh b/configs/run_selected_tasks.sh
@@ -185,10 +185,16 @@ fi
 
 ensure_fresh_token_all  # also populates CLAUDE_HOMES[] via setup_multi_accounts
 
-# Auto-detect PARALLEL_TASKS from account count when not explicitly set via --parallel
+# Auto-detect PARALLEL_TASKS: Daytona supports 125 concurrent sandboxes,
+# local Docker is limited by account sessions.
 if [ "$PARALLEL_TASKS" -eq 0 ]; then
-    PARALLEL_TASKS=$PARALLEL_JOBS  # inherits SESSIONS_PER_ACCOUNT * num_accounts from _common.sh
-    echo "Parallel tasks auto-set to $PARALLEL_TASKS (from $SESSIONS_PER_ACCOUNT sessions x ${#CLAUDE_HOMES[@]} accounts)"
+    if [ "${HARBOR_ENV:-}" = "daytona" ]; then
+        PARALLEL_TASKS=125
+        echo "Parallel tasks auto-set to $PARALLEL_TASKS (Daytona mode, 125 concurrent sandboxes)"
+    else
+        PARALLEL_TASKS=$PARALLEL_JOBS  # inherits SESSIONS_PER_ACCOUNT * num_accounts from _common.sh
+        echo "Parallel tasks auto-set to $PARALLEL_TASKS (local Docker, $SESSIONS_PER_ACCOUNT sessions x ${#CLAUDE_HOMES[@]} accounts)"
+    fi
 fi
 
 # Derive baseline config and mcp_type values from FULL_CONFIG
diff --git a/docs/ops/ROOT_AGENT_GUIDE.md b/docs/ops/ROOT_AGENT_GUIDE.md
@@ -8,15 +8,16 @@ full operations manual.
 - All work happens on `main`. Do not create feature branches.
 - Every `harbor run` must be gated by interactive confirmation.
 - Before commit/push, run `python3 scripts/repo_health.py` (or `--quick` for docs/config-only changes).
+- **Daytona is the default execution environment.** Do not use local Docker unless a task is Daytona-incompatible (18 sweap-images tasks). See `docs/DAYTONA.md`.
+- **Parallelism**: Daytona runs at 125 concurrent sandboxes (auto-detected when `HARBOR_ENV=daytona`). Local Docker runs at 12 slots (3 accounts x 4 sessions). Never artificially cap Daytona parallelism below 125.
 
 ## Minimal Loading Policy
 - Default load order: this file + one relevant skill + one relevant doc.
 - Do not open broad catalogs (`docs/TASK_CATALOG.md`, large script lists, full reports) unless required.
 - Prefer directory-local `AGENTS.md` / `CLAUDE.md` when working under `scripts/`, `configs/`, `tasks/`, or `docs/`.
 
 ## Fast Routing By Intent
-- Launch or rerun benchmarks: `docs/START_HERE_BY_TASK.md` -> "Launch / Rerun Benchmarks"
-- Run benchmarks on Daytona (cloud, no Docker needed): `docs/DAYTONA.md`
+- Launch or rerun benchmarks: `docs/DAYTONA.md` (Daytona, preferred) or `docs/START_HERE_BY_TASK.md`
 - Monitor / status: `docs/START_HERE_BY_TASK.md` -> "Monitor Active Runs"
 - Triage failures: `docs/START_HERE_BY_TASK.md` -> "Triage Failed Tasks"
 - Compare configs / MCP impact / IR: `docs/START_HERE_BY_TASK.md` -> "Analyze Results"
diff --git a/docs/ops/local_guides/configs.md b/docs/ops/local_guides/configs.md
@@ -7,11 +7,16 @@ Use this file when working in `configs/` launchers and run orchestration wrapper
 - Do not reintroduce `--yes` for `configs/run_selected_tasks.sh`.
 - Validate config naming and paired-run semantics via shared helpers in `configs/_common.sh`.
 
+## Parallelism Policy
+- **Daytona (default)**: 125 concurrent sandboxes. `run_selected_tasks.sh` auto-detects `HARBOR_ENV=daytona` and sets `--parallel 125`. Daytona's Tier 3 limit is 125 concurrent sandboxes (250 vCPU / 2 per sandbox). The job pool queue (`_wait_for_slot`) ensures we never exceed 125 in-flight processes.
+- **Local Docker**: 12 concurrent slots (3 accounts x 4 sessions). Only for sweap-images tasks (9 ccb_debug + 9 ccb_fix) that cannot run on Daytona.
+- **Do NOT hardcode `--parallel`** unless you have a specific reason. Let `run_selected_tasks.sh` auto-detect from the environment.
+
 ## Navigation Rules
 - Start with `configs/_common.sh` for shared run policy and confirmation behavior.
 - Use `configs/run_selected_tasks.sh` for selected-task execution flows.
 - Use `configs/*_2config.sh` wrappers for paired baseline/MCP runs.
-- For Daytona cloud execution, add `--environment-type daytona` to `harbor run` commands. See `docs/DAYTONA.md` for prerequisites and capacity planning.
+- **Daytona is the default execution environment** — all production and variance runs use `HARBOR_ENV=daytona`. Local Docker is only for the 18 sweap-images tasks that are Daytona-incompatible. See `docs/DAYTONA.md` for prerequisites and capacity planning.
 
 ## When Editing
 - Preserve `confirm_launch()` gating behavior.