Skip to content

Commit 6e7b136

Browse files
sjarmakclaude
andcommitted
feat: Daytona 125-parallel auto-detect + parallelism policy docs
- run_selected_tasks.sh: auto-detect HARBOR_ENV=daytona and set PARALLEL_TASKS=125 (Tier 3 limit: 125 concurrent sandboxes). Local Docker stays at 12 slots (3 accounts x 4 sessions). - _common.sh: add Daytona transient error patterns and retry logic, fix token refresh output capture, add User-Agent header. - Agent guides (source + generated): document parallelism policy — Daytona=125 concurrent, local Docker=12, never cap below 125. Updated sweap-images count from 21 to 18. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent cfe7081 commit 6e7b136

File tree

8 files changed

+122
-18
lines changed

8 files changed

+122
-18
lines changed

AGENTS.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,15 +8,16 @@ full operations manual.
88
- All work happens on `main`. Do not create feature branches.
99
- Every `harbor run` must be gated by interactive confirmation.
1010
- Before commit/push, run `python3 scripts/repo_health.py` (or `--quick` for docs/config-only changes).
11+
- **Daytona is the default execution environment.** Do not use local Docker unless a task is Daytona-incompatible (18 sweap-images tasks). See `docs/DAYTONA.md`.
12+
- **Parallelism**: Daytona runs at 125 concurrent sandboxes (auto-detected when `HARBOR_ENV=daytona`). Local Docker runs at 12 slots (3 accounts x 4 sessions). Never artificially cap Daytona parallelism below 125.
1113

1214
## Minimal Loading Policy
1315
- Default load order: this file + one relevant skill + one relevant doc.
1416
- Do not open broad catalogs (`docs/TASK_CATALOG.md`, large script lists, full reports) unless required.
1517
- Prefer directory-local `AGENTS.md` / `CLAUDE.md` when working under `scripts/`, `configs/`, `tasks/`, or `docs/`.
1618

1719
## Fast Routing By Intent
18-
- Launch or rerun benchmarks: `docs/START_HERE_BY_TASK.md` -> "Launch / Rerun Benchmarks"
19-
- Run benchmarks on Daytona (cloud, no Docker needed): `docs/DAYTONA.md`
20+
- Launch or rerun benchmarks: `docs/DAYTONA.md` (Daytona, preferred) or `docs/START_HERE_BY_TASK.md`
2021
- Monitor / status: `docs/START_HERE_BY_TASK.md` -> "Monitor Active Runs"
2122
- Triage failures: `docs/START_HERE_BY_TASK.md` -> "Triage Failed Tasks"
2223
- Compare configs / MCP impact / IR: `docs/START_HERE_BY_TASK.md` -> "Analyze Results"

CLAUDE.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,15 +8,16 @@ full operations manual.
88
- All work happens on `main`. Do not create feature branches.
99
- Every `harbor run` must be gated by interactive confirmation.
1010
- Before commit/push, run `python3 scripts/repo_health.py` (or `--quick` for docs/config-only changes).
11+
- **Daytona is the default execution environment.** Do not use local Docker unless a task is Daytona-incompatible (18 sweap-images tasks). See `docs/DAYTONA.md`.
12+
- **Parallelism**: Daytona runs at 125 concurrent sandboxes (auto-detected when `HARBOR_ENV=daytona`). Local Docker runs at 12 slots (3 accounts x 4 sessions). Never artificially cap Daytona parallelism below 125.
1113

1214
## Minimal Loading Policy
1315
- Default load order: this file + one relevant skill + one relevant doc.
1416
- Do not open broad catalogs (`docs/TASK_CATALOG.md`, large script lists, full reports) unless required.
1517
- Prefer directory-local `AGENTS.md` / `CLAUDE.md` when working under `scripts/`, `configs/`, `tasks/`, or `docs/`.
1618

1719
## Fast Routing By Intent
18-
- Launch or rerun benchmarks: `docs/START_HERE_BY_TASK.md` -> "Launch / Rerun Benchmarks"
19-
- Run benchmarks on Daytona (cloud, no Docker needed): `docs/DAYTONA.md`
20+
- Launch or rerun benchmarks: `docs/DAYTONA.md` (Daytona, preferred) or `docs/START_HERE_BY_TASK.md`
2021
- Monitor / status: `docs/START_HERE_BY_TASK.md` -> "Monitor Active Runs"
2122
- Triage failures: `docs/START_HERE_BY_TASK.md` -> "Triage Failed Tasks"
2223
- Compare configs / MCP impact / IR: `docs/START_HERE_BY_TASK.md` -> "Analyze Results"

configs/AGENTS.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,11 +7,16 @@ Use this file when working in `configs/` launchers and run orchestration wrapper
77
- Do not reintroduce `--yes` for `configs/run_selected_tasks.sh`.
88
- Validate config naming and paired-run semantics via shared helpers in `configs/_common.sh`.
99

10+
## Parallelism Policy
11+
- **Daytona (default)**: 125 concurrent sandboxes. `run_selected_tasks.sh` auto-detects `HARBOR_ENV=daytona` and sets `--parallel 125`. Daytona's Tier 3 limit is 125 concurrent sandboxes (250 vCPU / 2 per sandbox). The job pool queue (`_wait_for_slot`) ensures we never exceed 125 in-flight processes.
12+
- **Local Docker**: 12 concurrent slots (3 accounts x 4 sessions). Only for sweap-images tasks (9 ccb_debug + 9 ccb_fix) that cannot run on Daytona.
13+
- **Do NOT hardcode `--parallel`** unless you have a specific reason. Let `run_selected_tasks.sh` auto-detect from the environment.
14+
1015
## Navigation Rules
1116
- Start with `configs/_common.sh` for shared run policy and confirmation behavior.
1217
- Use `configs/run_selected_tasks.sh` for selected-task execution flows.
1318
- Use `configs/*_2config.sh` wrappers for paired baseline/MCP runs.
14-
- For Daytona cloud execution, add `--environment-type daytona` to `harbor run` commands. See `docs/DAYTONA.md` for prerequisites and capacity planning.
19+
- **Daytona is the default execution environment** — all production and variance runs use `HARBOR_ENV=daytona`. Local Docker is only for the 18 sweap-images tasks that are Daytona-incompatible. See `docs/DAYTONA.md` for prerequisites and capacity planning.
1520

1621
## When Editing
1722
- Preserve `confirm_launch()` gating behavior.

configs/CLAUDE.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,11 +7,16 @@ Use this file when working in `configs/` launchers and run orchestration wrapper
77
- Do not reintroduce `--yes` for `configs/run_selected_tasks.sh`.
88
- Validate config naming and paired-run semantics via shared helpers in `configs/_common.sh`.
99

10+
## Parallelism Policy
11+
- **Daytona (default)**: 125 concurrent sandboxes. `run_selected_tasks.sh` auto-detects `HARBOR_ENV=daytona` and sets `--parallel 125`. Daytona's Tier 3 limit is 125 concurrent sandboxes (250 vCPU / 2 per sandbox). The job pool queue (`_wait_for_slot`) ensures we never exceed 125 in-flight processes.
12+
- **Local Docker**: 12 concurrent slots (3 accounts x 4 sessions). Only for sweap-images tasks (9 ccb_debug + 9 ccb_fix) that cannot run on Daytona.
13+
- **Do NOT hardcode `--parallel`** unless you have a specific reason. Let `run_selected_tasks.sh` auto-detect from the environment.
14+
1015
## Navigation Rules
1116
- Start with `configs/_common.sh` for shared run policy and confirmation behavior.
1217
- Use `configs/run_selected_tasks.sh` for selected-task execution flows.
1318
- Use `configs/*_2config.sh` wrappers for paired baseline/MCP runs.
14-
- For Daytona cloud execution, add `--environment-type daytona` to `harbor run` commands. See `docs/DAYTONA.md` for prerequisites and capacity planning.
19+
- **Daytona is the default execution environment** — all production and variance runs use `HARBOR_ENV=daytona`. Local Docker is only for the 18 sweap-images tasks that are Daytona-incompatible. See `docs/DAYTONA.md` for prerequisites and capacity planning.
1520

1621
## When Editing
1722
- Preserve `confirm_launch()` gating behavior.

configs/_common.sh

Lines changed: 86 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -300,7 +300,7 @@ payload = json.dumps({
300300
req = urllib.request.Request(
301301
"https://console.anthropic.com/api/oauth/token",
302302
data=payload,
303-
headers={"Content-Type": "application/json"},
303+
headers={"Content-Type": "application/json", "User-Agent": "ccb-token-refresh/1.0"},
304304
method="POST"
305305
)
306306
@@ -525,7 +525,11 @@ TOKCHK
525525
expiring)
526526
# Token is expired or expiring soon — try refresh
527527
echo " Token expiring soon — attempting refresh..."
528-
if HOME="$account_home" refresh_claude_token 2>&1 | sed 's/^/ /'; then
528+
local _refresh_out
529+
_refresh_out=$(HOME="$account_home" refresh_claude_token 2>&1)
530+
local _refresh_rc=$?
531+
echo "$_refresh_out" | sed 's/^/ /'
532+
if [ "$_refresh_rc" -eq 0 ]; then
529533
echo " Token refreshed successfully"
530534
return 0
531535
else
@@ -656,6 +660,13 @@ ensure_fresh_token_all() {
656660
# If a failed task's log matches any of these, it's eligible for retry on a different account.
657661
RATE_LIMIT_PATTERNS="rate.limit|429|too many requests|throttl|overloaded|token.*refresh.*fail|credentials.*expired|403.*Forbidden|capacity|resource_exhausted"
658662

663+
# Daytona-specific transient error patterns (sandbox resource contention).
664+
# These are retried on the SAME account (not an auth issue) with backoff.
665+
DAYTONA_TRANSIENT_PATTERNS="[Ss]andbox not found|[Ss]andbox.*missing|[Ss]andbox.*does not exist|DaytonaError"
666+
667+
# Maximum number of Daytona retry attempts per task (with exponential backoff).
668+
DAYTONA_MAX_RETRIES=${DAYTONA_MAX_RETRIES:-3}
669+
659670
# Check if a task failure looks like a rate-limit / account-exhaustion error.
660671
# Args: $1 = task_id, $2 = log directory (where ${task_id}.log might be)
661672
# Returns 0 if rate-limited, 1 otherwise.
@@ -683,6 +694,32 @@ _is_rate_limited() {
683694
return 1
684695
}
685696

697+
# Check if a task failure looks like a transient Daytona sandbox error.
698+
# These are NOT account-related — retry on the same account after a delay.
699+
# Args: $1 = task_id, $2 = log directory
700+
# Returns 0 if Daytona transient error, 1 otherwise.
701+
_is_daytona_transient() {
702+
local task_id=$1
703+
local log_dir=$2
704+
705+
local log_file="${log_dir}/${task_id}.log"
706+
if [ -f "$log_file" ]; then
707+
if grep -qEi "$DAYTONA_TRANSIENT_PATTERNS" "$log_file" 2>/dev/null; then
708+
return 0
709+
fi
710+
fi
711+
712+
local result_files
713+
result_files=$(find "$log_dir" -name "result.json" -newer "$log_dir" -path "*${task_id}*" 2>/dev/null || true)
714+
for rf in $result_files; do
715+
if grep -qEi "$DAYTONA_TRANSIENT_PATTERNS" "$rf" 2>/dev/null; then
716+
return 0
717+
fi
718+
done
719+
720+
return 1
721+
}
722+
686723
# Pick a different account home than the one that failed.
687724
# Args: $1 = failed account home
688725
# Prints the alternate account home, or empty if only one account.
@@ -715,11 +752,16 @@ run_tasks_parallel() {
715752
local account_idx=0
716753
local num_accounts=${#CLAUDE_HOMES[@]}
717754

718-
# Retry queue: tasks to retry on a different account
755+
# Retry queue: tasks to retry on a different account (rate-limit)
719756
local retry_tasks=()
720757
local retry_homes=()
721758
# Track which tasks already retried (prevent infinite loops)
722759
declare -A _retried
760+
# Daytona retry queue: tasks to retry on same account after backoff
761+
local daytona_retry_tasks=()
762+
local daytona_retry_homes=()
763+
# Track Daytona retry counts per task (up to DAYTONA_MAX_RETRIES)
764+
declare -A _daytona_retry_count
723765

724766
# Infer log directory from the calling script's jobs_subdir variable (if set)
725767
local _log_dir="${jobs_subdir:-}"
@@ -772,6 +814,21 @@ run_tasks_parallel() {
772814
echo "WARNING: Task $_task rate-limited but no alternate account available"
773815
failed=1
774816
fi
817+
# Check if this is a Daytona transient error (sandbox not found)
818+
elif [ -n "$_log_dir" ] && \
819+
_is_daytona_transient "$_task" "$_log_dir"; then
820+
local _count="${_daytona_retry_count[$_task]:-0}"
821+
_count=$((_count + 1))
822+
if [ "$_count" -le "$DAYTONA_MAX_RETRIES" ]; then
823+
local _backoff=$(( 15 * _count )) # 15s, 30s, 45s
824+
echo "DAYTONA RETRY ($_count/$DAYTONA_MAX_RETRIES): Task $_task sandbox error, retrying in ${_backoff}s on same account"
825+
_daytona_retry_count[$_task]=$_count
826+
daytona_retry_tasks+=("$_task")
827+
daytona_retry_homes+=("$_home")
828+
else
829+
echo "DAYTONA EXHAUSTED: Task $_task failed after $DAYTONA_MAX_RETRIES retries"
830+
failed=1
831+
fi
775832
else
776833
echo "ERROR: Task $_task (PID $done_pid) exited with code $_exit"
777834
failed=1
@@ -833,8 +890,8 @@ run_tasks_parallel() {
833890
_launch "$task_id" "$task_home"
834891
done
835892

836-
# Wait for remaining tasks, then process retry queue
837-
while [ ${#pids[@]} -gt 0 ] || [ ${#retry_tasks[@]} -gt 0 ]; do
893+
# Wait for remaining tasks, then process retry queues
894+
while [ ${#pids[@]} -gt 0 ] || [ ${#retry_tasks[@]} -gt 0 ] || [ ${#daytona_retry_tasks[@]} -gt 0 ]; do
838895
if [ "$abort" = true ]; then break; fi
839896

840897
# Drain running PIDs
@@ -846,7 +903,7 @@ run_tasks_parallel() {
846903
fi
847904
done
848905

849-
# Launch any queued retries
906+
# Launch any queued rate-limit retries (different account)
850907
if [ ${#retry_tasks[@]} -gt 0 ]; then
851908
echo "Processing ${#retry_tasks[@]} rate-limit retry task(s)..."
852909
for ri in "${!retry_tasks[@]}"; do
@@ -863,6 +920,29 @@ run_tasks_parallel() {
863920
retry_tasks=()
864921
retry_homes=()
865922
fi
923+
924+
# Launch any queued Daytona retries (same account, with backoff)
925+
if [ ${#daytona_retry_tasks[@]} -gt 0 ]; then
926+
local _dt_count=${#daytona_retry_tasks[@]}
927+
echo "Processing $_dt_count Daytona retry task(s) with backoff..."
928+
for ri in "${!daytona_retry_tasks[@]}"; do
929+
if [ "$abort" = true ]; then break; fi
930+
local _task="${daytona_retry_tasks[$ri]}"
931+
local _backoff=$(( 15 * ${_daytona_retry_count[$_task]:-1} ))
932+
echo " Waiting ${_backoff}s before retrying $_task..."
933+
sleep "$_backoff"
934+
while [ ${#pids[@]} -ge $PARALLEL_JOBS ]; do
935+
_reap_one
936+
if [ "$abort" = true ]; then break 2; fi
937+
if [ -z "$done_pid" ]; then
938+
sleep 2
939+
fi
940+
done
941+
_launch "${daytona_retry_tasks[$ri]}" "${daytona_retry_homes[$ri]}"
942+
done
943+
daytona_retry_tasks=()
944+
daytona_retry_homes=()
945+
fi
866946
done
867947

868948
# Restore real HOME

configs/run_selected_tasks.sh

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -185,10 +185,16 @@ fi
185185

186186
ensure_fresh_token_all # also populates CLAUDE_HOMES[] via setup_multi_accounts
187187

188-
# Auto-detect PARALLEL_TASKS from account count when not explicitly set via --parallel
188+
# Auto-detect PARALLEL_TASKS: Daytona supports 125 concurrent sandboxes,
189+
# local Docker is limited by account sessions.
189190
if [ "$PARALLEL_TASKS" -eq 0 ]; then
190-
PARALLEL_TASKS=$PARALLEL_JOBS # inherits SESSIONS_PER_ACCOUNT * num_accounts from _common.sh
191-
echo "Parallel tasks auto-set to $PARALLEL_TASKS (from $SESSIONS_PER_ACCOUNT sessions x ${#CLAUDE_HOMES[@]} accounts)"
191+
if [ "${HARBOR_ENV:-}" = "daytona" ]; then
192+
PARALLEL_TASKS=125
193+
echo "Parallel tasks auto-set to $PARALLEL_TASKS (Daytona mode, 125 concurrent sandboxes)"
194+
else
195+
PARALLEL_TASKS=$PARALLEL_JOBS # inherits SESSIONS_PER_ACCOUNT * num_accounts from _common.sh
196+
echo "Parallel tasks auto-set to $PARALLEL_TASKS (local Docker, $SESSIONS_PER_ACCOUNT sessions x ${#CLAUDE_HOMES[@]} accounts)"
197+
fi
192198
fi
193199

194200
# Derive baseline config and mcp_type values from FULL_CONFIG

docs/ops/ROOT_AGENT_GUIDE.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,15 +8,16 @@ full operations manual.
88
- All work happens on `main`. Do not create feature branches.
99
- Every `harbor run` must be gated by interactive confirmation.
1010
- Before commit/push, run `python3 scripts/repo_health.py` (or `--quick` for docs/config-only changes).
11+
- **Daytona is the default execution environment.** Do not use local Docker unless a task is Daytona-incompatible (18 sweap-images tasks). See `docs/DAYTONA.md`.
12+
- **Parallelism**: Daytona runs at 125 concurrent sandboxes (auto-detected when `HARBOR_ENV=daytona`). Local Docker runs at 12 slots (3 accounts x 4 sessions). Never artificially cap Daytona parallelism below 125.
1113

1214
## Minimal Loading Policy
1315
- Default load order: this file + one relevant skill + one relevant doc.
1416
- Do not open broad catalogs (`docs/TASK_CATALOG.md`, large script lists, full reports) unless required.
1517
- Prefer directory-local `AGENTS.md` / `CLAUDE.md` when working under `scripts/`, `configs/`, `tasks/`, or `docs/`.
1618

1719
## Fast Routing By Intent
18-
- Launch or rerun benchmarks: `docs/START_HERE_BY_TASK.md` -> "Launch / Rerun Benchmarks"
19-
- Run benchmarks on Daytona (cloud, no Docker needed): `docs/DAYTONA.md`
20+
- Launch or rerun benchmarks: `docs/DAYTONA.md` (Daytona, preferred) or `docs/START_HERE_BY_TASK.md`
2021
- Monitor / status: `docs/START_HERE_BY_TASK.md` -> "Monitor Active Runs"
2122
- Triage failures: `docs/START_HERE_BY_TASK.md` -> "Triage Failed Tasks"
2223
- Compare configs / MCP impact / IR: `docs/START_HERE_BY_TASK.md` -> "Analyze Results"

docs/ops/local_guides/configs.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,11 +7,16 @@ Use this file when working in `configs/` launchers and run orchestration wrapper
77
- Do not reintroduce `--yes` for `configs/run_selected_tasks.sh`.
88
- Validate config naming and paired-run semantics via shared helpers in `configs/_common.sh`.
99

10+
## Parallelism Policy
11+
- **Daytona (default)**: 125 concurrent sandboxes. `run_selected_tasks.sh` auto-detects `HARBOR_ENV=daytona` and sets `--parallel 125`. Daytona's Tier 3 limit is 125 concurrent sandboxes (250 vCPU / 2 per sandbox). The job pool queue (`_wait_for_slot`) ensures we never exceed 125 in-flight processes.
12+
- **Local Docker**: 12 concurrent slots (3 accounts x 4 sessions). Only for sweap-images tasks (9 ccb_debug + 9 ccb_fix) that cannot run on Daytona.
13+
- **Do NOT hardcode `--parallel`** unless you have a specific reason. Let `run_selected_tasks.sh` auto-detect from the environment.
14+
1015
## Navigation Rules
1116
- Start with `configs/_common.sh` for shared run policy and confirmation behavior.
1217
- Use `configs/run_selected_tasks.sh` for selected-task execution flows.
1318
- Use `configs/*_2config.sh` wrappers for paired baseline/MCP runs.
14-
- For Daytona cloud execution, add `--environment-type daytona` to `harbor run` commands. See `docs/DAYTONA.md` for prerequisites and capacity planning.
19+
- **Daytona is the default execution environment** — all production and variance runs use `HARBOR_ENV=daytona`. Local Docker is only for the 18 sweap-images tasks that are Daytona-incompatible. See `docs/DAYTONA.md` for prerequisites and capacity planning.
1520

1621
## When Editing
1722
- Preserve `confirm_launch()` gating behavior.

0 commit comments

Comments
 (0)