Skip to content

Commit 8a83f02

Browse files
sjarmakclaude
andcommitted
feat: three-dimensional config naming scheme
Rename config identifiers to encode three independent dimensions: {agent}-{source}-{verifier} baseline-local-direct = no MCP, full source, git-change verifier mcp-remote-direct = MCP, source deleted, git-change verifier baseline-local-artifact = no MCP, full source, artifact verifier mcp-remote-artifact = MCP, source deleted, artifact verifier Config layer only — internal mcp_type values (sourcegraph_full, artifact_full, none) and BASELINE_MCP_TYPE env var unchanged. Key changes: - New scripts/config_utils.py: discover_configs(), is_mcp_config(), config_short_name() for auto-discovery from run directories - configs/_common.sh: config_to_mcp_type() mapping function with VERIFIER_MODE/SOURCE_ACCESS globals, baseline_config_for() helper - configs/sdlc_suite_2config.sh: Dockerfile swap logic uses SOURCE_ACCESS/VERIFIER_MODE instead of hardcoded config strings - 13 analysis scripts: replace hardcoded CONFIGS lists with discover_configs() auto-discovery (works with both old and new names) - 5 agent harness scripts updated (codex, cursor, gemini, copilot, openhands) - docs/CONFIGS.md: three-dimension naming table and legacy mapping Backward compatible: legacy names (baseline, sourcegraph_full, artifact_full) accepted by all mapping functions and auto-discovery. Existing run directories are not renamed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent e15e7bb commit 8a83f02

27 files changed

+375
-165
lines changed

AGENTS.md

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -44,12 +44,13 @@ Use these defaults unless there is a task-specific reason not to.
4444
- Planning/prioritization: `whats-next`
4545

4646
## Evaluation Configs
47-
Two configs per task: **Baseline** (full local code, no MCP) and **MCP-Full**
48-
(local source truncated, Sourcegraph MCP enabled). MCP-Full uses
49-
`Dockerfile.sg_only` so the agent cannot read source locally and must discover
50-
code via MCP tools. The verifier restores the full repo before scoring.
51-
See `docs/CONFIGS.md` for the full environment model, tool lists, and how to
52-
add sg_only support to new tasks.
47+
Config names encode three dimensions: `{agent}-{source}-{verifier}`.
48+
Standard pairing: **baseline-local-direct** (full local code, no MCP) and
49+
**mcp-remote-direct** (source deleted, Sourcegraph MCP). Artifact evaluation
50+
uses **baseline-local-artifact** + **mcp-remote-artifact** (review.json output).
51+
MCP configs use `Dockerfile.sg_only` or `Dockerfile.artifact_only` so the
52+
agent must discover code via MCP tools. The verifier restores the full repo
53+
before scoring. See `docs/CONFIGS.md` for the full config matrix.
5354

5455
## Standard Workflow
5556
0. **Before commit or push:** Run `python3 scripts/repo_health.py` (or `--quick`). Fix any failures so main stays clean and drift is caught early (see `docs/REPO_HEALTH.md`).

CLAUDE.md

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -47,12 +47,13 @@ Use these defaults unless there is a task-specific reason not to.
4747
- Planning/prioritization: `whats-next`
4848

4949
## Evaluation Configs
50-
Two configs per task: **Baseline** (full local code, no MCP) and **MCP-Full**
51-
(local source truncated, Sourcegraph MCP enabled). MCP-Full uses
52-
`Dockerfile.sg_only` so the agent cannot read source locally and must discover
53-
code via MCP tools. The verifier restores the full repo before scoring.
54-
See `docs/CONFIGS.md` for the full environment model, tool lists, and how to
55-
add sg_only support to new tasks.
50+
Config names encode three dimensions: `{agent}-{source}-{verifier}`.
51+
Standard pairing: **baseline-local-direct** (full local code, no MCP) and
52+
**mcp-remote-direct** (source deleted, Sourcegraph MCP). Artifact evaluation
53+
uses **baseline-local-artifact** + **mcp-remote-artifact** (review.json output).
54+
MCP configs use `Dockerfile.sg_only` or `Dockerfile.artifact_only` so the
55+
agent must discover code via MCP tools. The verifier restores the full repo
56+
before scoring. See `docs/CONFIGS.md` for the full config matrix.
5657

5758
## Standard Workflow
5859
0. **Before commit or push:** Run `python3 scripts/repo_health.py` (or `--quick`). Fix any failures so main stays clean and drift is caught early (see `docs/REPO_HEALTH.md`).

configs/_common.sh

Lines changed: 66 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,59 @@ load_credentials() {
3535
fi
3636
}
3737

38+
# ============================================
39+
# CONFIG NAME MAPPING
40+
# ============================================
41+
# Three-dimensional config names: {agent}-{source}-{verifier}
42+
# agent: baseline (no MCP) | mcp (Sourcegraph MCP)
43+
# source: local (full source) | remote (source deleted)
44+
# verifier: direct (git changes) | artifact (review.json)
45+
#
46+
# These map to internal Harbor mcp_type values via config_to_mcp_type().
47+
# Legacy names (baseline, sourcegraph_full, artifact_full) are accepted
48+
# for backward compatibility with existing run directories.
49+
50+
VERIFIER_MODE="direct"
51+
SOURCE_ACCESS="local"
52+
53+
# Map composite config name → internal mcp_type for Harbor.
54+
# Side effects: sets VERIFIER_MODE and SOURCE_ACCESS globals.
55+
config_to_mcp_type() {
56+
local config_name="$1"
57+
case "$config_name" in
58+
baseline-local-direct)
59+
VERIFIER_MODE="direct"; SOURCE_ACCESS="local"; echo "none" ;;
60+
mcp-remote-direct)
61+
VERIFIER_MODE="direct"; SOURCE_ACCESS="remote"; echo "sourcegraph_full" ;;
62+
baseline-local-artifact)
63+
VERIFIER_MODE="artifact"; SOURCE_ACCESS="local"; echo "none" ;;
64+
mcp-remote-artifact)
65+
VERIFIER_MODE="artifact"; SOURCE_ACCESS="remote"; echo "artifact_full" ;;
66+
# Legacy names
67+
baseline)
68+
VERIFIER_MODE="direct"; SOURCE_ACCESS="local"; echo "none" ;;
69+
sourcegraph_full)
70+
VERIFIER_MODE="direct"; SOURCE_ACCESS="remote"; echo "sourcegraph_full" ;;
71+
artifact_full)
72+
VERIFIER_MODE="artifact"; SOURCE_ACCESS="remote"; echo "artifact_full" ;;
73+
none)
74+
VERIFIER_MODE="direct"; SOURCE_ACCESS="local"; echo "none" ;;
75+
*)
76+
echo "WARNING: Unknown config name: $config_name" >&2
77+
VERIFIER_MODE="direct"; SOURCE_ACCESS="local"; echo "$config_name" ;;
78+
esac
79+
}
80+
81+
# Derive the baseline config name that pairs with a given FULL_CONFIG.
82+
# Artifact full configs pair with artifact baselines.
83+
baseline_config_for() {
84+
local full="$1"
85+
case "$full" in
86+
*-artifact|artifact_full) echo "baseline-local-artifact" ;;
87+
*) echo "baseline-local-direct" ;;
88+
esac
89+
}
90+
3891
# ============================================
3992
# VERIFIER DEBUG MODE
4093
# ============================================
@@ -934,8 +987,7 @@ run_canary_then_batch() {
934987
# run_paired_configs TASK_IDS _my_run_fn "$JOBS_BASE"
935988
#
936989
# The run function must accept: $1=task_id $2=task_home $3=config_mode $4=mcp_type $5=jobs_base
937-
# It is responsible for creating the jobs_subdir (e.g., baseline/ or sourcegraph_full/) and
938-
# launching harbor.
990+
# It is responsible for creating the jobs_subdir and launching harbor.
939991
#
940992
# This launches 2 containers per task (1 baseline + 1 MCP) simultaneously, so the total
941993
# concurrent containers is 2x the number of tasks. PARALLEL_JOBS limits total concurrent PIDs.
@@ -952,18 +1004,25 @@ run_paired_configs() {
9521004
echo "Paired execution: $num_tasks tasks x 2 configs"
9531005
echo "========================================"
9541006
echo ""
955-
local full_config="${FULL_CONFIG:-sourcegraph_full}"
956-
echo "Each task launches baseline + ${full_config} simultaneously."
1007+
local full_config="${FULL_CONFIG:-mcp-remote-direct}"
1008+
local bl_config
1009+
bl_config=$(baseline_config_for "$full_config")
1010+
echo "Each task launches ${bl_config} + ${full_config} simultaneously."
9571011
echo "Total concurrent containers: up to $((num_tasks * 2)) (limited by PARALLEL_JOBS=$PARALLEL_JOBS)"
9581012
echo ""
9591013

960-
mkdir -p "${jobs_base}/baseline" "${jobs_base}/${full_config}"
1014+
mkdir -p "${jobs_base}/${bl_config}" "${jobs_base}/${full_config}"
1015+
1016+
# Resolve mcp_type values once
1017+
local bl_mcp full_mcp
1018+
bl_mcp=$(config_to_mcp_type "$bl_config")
1019+
full_mcp=$(config_to_mcp_type "$full_config")
9611020

9621021
# Build paired task list: each task gets two entries
9631022
local paired_ids=()
9641023
for task_id in "${_paired_task_ids[@]}"; do
965-
paired_ids+=("${task_id}|baseline|none")
966-
paired_ids+=("${task_id}|${full_config}|${full_config}")
1024+
paired_ids+=("${task_id}|${bl_config}|${bl_mcp}")
1025+
paired_ids+=("${task_id}|${full_config}|${full_mcp}")
9671026
done
9681027

9691028
# Wrapper command function that splits the paired ID

configs/codex_2config.sh

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@
22
# Codex Harness 2-Config Runner
33
#
44
# Runs selected tasks across 2 configurations:
5-
# 1. Baseline (BASELINE_MCP_TYPE=none)
6-
# 2. MCP-Full (BASELINE_MCP_TYPE=sourcegraph_full)
5+
# 1. baseline-local-direct (BASELINE_MCP_TYPE=none)
6+
# 2. mcp-remote-direct (BASELINE_MCP_TYPE=sourcegraph_full)
77
#
88
# Usage:
99
# ./configs/codex_2config.sh [OPTIONS]
@@ -201,11 +201,11 @@ run_mode() {
201201
}
202202

203203
if [ "$RUN_BASELINE" = true ]; then
204-
run_mode "baseline" "none"
204+
run_mode "baseline-local-direct" "none"
205205
fi
206206

207207
if [ "$RUN_FULL" = true ]; then
208-
run_mode "sourcegraph_full" "sourcegraph_full"
208+
run_mode "mcp-remote-direct" "sourcegraph_full"
209209
fi
210210

211211
print_validation_summary "$JOBS_BASE"

configs/copilot_2config.sh

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@
22
# Copilot Harness 2-Config Runner
33
#
44
# Runs selected tasks across 2 configurations:
5-
# 1. Baseline (BASELINE_MCP_TYPE=none)
6-
# 2. MCP-Full (BASELINE_MCP_TYPE=sourcegraph_full)
5+
# 1. baseline-local-direct (BASELINE_MCP_TYPE=none)
6+
# 2. mcp-remote-direct (BASELINE_MCP_TYPE=sourcegraph_full)
77
#
88
# Usage:
99
# ./configs/copilot_2config.sh [OPTIONS]
@@ -201,11 +201,11 @@ run_mode() {
201201
}
202202

203203
if [ "$RUN_BASELINE" = true ]; then
204-
run_mode "baseline" "none"
204+
run_mode "baseline-local-direct" "none"
205205
fi
206206

207207
if [ "$RUN_FULL" = true ]; then
208-
run_mode "sourcegraph_full" "sourcegraph_full"
208+
run_mode "mcp-remote-direct" "sourcegraph_full"
209209
fi
210210

211211
print_validation_summary "$JOBS_BASE"

configs/cursor_2config.sh

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@
22
# Cursor Harness 2-Config Runner
33
#
44
# Runs selected tasks across 2 configurations:
5-
# 1. Baseline (BASELINE_MCP_TYPE=none)
6-
# 2. MCP-Full (BASELINE_MCP_TYPE=sourcegraph_full)
5+
# 1. baseline-local-direct (BASELINE_MCP_TYPE=none)
6+
# 2. mcp-remote-direct (BASELINE_MCP_TYPE=sourcegraph_full)
77
#
88
# Usage:
99
# ./configs/cursor_2config.sh [OPTIONS]
@@ -201,11 +201,11 @@ run_mode() {
201201
}
202202

203203
if [ "$RUN_BASELINE" = true ]; then
204-
run_mode "baseline" "none"
204+
run_mode "baseline-local-direct" "none"
205205
fi
206206

207207
if [ "$RUN_FULL" = true ]; then
208-
run_mode "sourcegraph_full" "sourcegraph_full"
208+
run_mode "mcp-remote-direct" "sourcegraph_full"
209209
fi
210210

211211
print_validation_summary "$JOBS_BASE"

configs/gemini_2config.sh

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@
22
# Gemini Harness 2-Config Runner
33
#
44
# Runs selected tasks across 2 configurations:
5-
# 1. Baseline (BASELINE_MCP_TYPE=none)
6-
# 2. MCP-Full (BASELINE_MCP_TYPE=sourcegraph_full)
5+
# 1. baseline-local-direct (BASELINE_MCP_TYPE=none)
6+
# 2. mcp-remote-direct (BASELINE_MCP_TYPE=sourcegraph_full)
77
#
88
# Usage:
99
# ./configs/gemini_2config.sh [OPTIONS]
@@ -201,11 +201,11 @@ run_mode() {
201201
}
202202

203203
if [ "$RUN_BASELINE" = true ]; then
204-
run_mode "baseline" "none"
204+
run_mode "baseline-local-direct" "none"
205205
fi
206206

207207
if [ "$RUN_FULL" = true ]; then
208-
run_mode "sourcegraph_full" "sourcegraph_full"
208+
run_mode "mcp-remote-direct" "sourcegraph_full"
209209
fi
210210

211211
print_validation_summary "$JOBS_BASE"

configs/openhands_2config.sh

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@
22
# OpenHands Harness 2-Config Runner
33
#
44
# Runs selected tasks across 2 configurations:
5-
# 1. Baseline (BASELINE_MCP_TYPE=none)
6-
# 2. MCP-Full (BASELINE_MCP_TYPE=sourcegraph_full)
5+
# 1. baseline-local-direct (BASELINE_MCP_TYPE=none)
6+
# 2. mcp-remote-direct (BASELINE_MCP_TYPE=sourcegraph_full)
77
#
88
# Usage:
99
# ./configs/openhands_2config.sh [OPTIONS]
@@ -211,11 +211,11 @@ run_mode() {
211211
}
212212

213213
if [ "$RUN_BASELINE" = true ]; then
214-
run_mode "baseline" "none"
214+
run_mode "baseline-local-direct" "none"
215215
fi
216216

217217
if [ "$RUN_FULL" = true ]; then
218-
run_mode "sourcegraph_full" "sourcegraph_full"
218+
run_mode "mcp-remote-direct" "sourcegraph_full"
219219
fi
220220

221221
print_validation_summary "$JOBS_BASE"

configs/run_selected_tasks.sh

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
# --selection-file PATH Use alternate selection file (default: selected_benchmark_tasks.json)
1717
# --use-case-category CATEGORY Filter by MCP-unique use case category (A-J), only valid with --selection-file
1818
# --baseline-only Run only baseline (no MCP)
19-
# --full-only Run only MCP-Full (sourcegraph_full)
19+
# --full-only Run only MCP-Full (mcp-remote-direct)
2020
# --model MODEL Override model (default: claude-opus-4-6)
2121
# --concurrency N Concurrent tasks (default: 2)
2222
# --category CATEGORY Run category (default: staging)
@@ -199,7 +199,7 @@ echo "Source: $SELECTION_FILE"
199199
echo "Model: $MODEL"
200200
echo "Total tasks: $TOTAL_TASKS"
201201
echo "Concurrency: $CONCURRENCY"
202-
echo "Configs: baseline=$RUN_BASELINE sourcegraph_full=$RUN_FULL"
202+
echo "Configs: baseline-local-direct=$RUN_BASELINE mcp-remote-direct=$RUN_FULL"
203203
echo "Skip done: $SKIP_COMPLETED"
204204
[ -n "$USE_CASE_CATEGORY_FILTER" ] && echo "Category: $USE_CASE_CATEGORY_FILTER"
205205
echo ""
@@ -324,10 +324,10 @@ run_benchmark() {
324324
# ============================================
325325
for bm in $(echo "${!BENCHMARK_COUNTS[@]}" | tr ' ' '\n' | sort); do
326326
if [ "$RUN_BASELINE" = true ]; then
327-
run_benchmark "$bm" "baseline" "none"
327+
run_benchmark "$bm" "baseline-local-direct" "none"
328328
fi
329329
if [ "$RUN_FULL" = true ]; then
330-
run_benchmark "$bm" "sourcegraph_full" "sourcegraph_full"
330+
run_benchmark "$bm" "mcp-remote-direct" "sourcegraph_full"
331331
fi
332332
done
333333

0 commit comments

Comments
 (0)