Skip to content

Commit a0d7471

Browse files
sjarmakclaude
andcommitted
feat: scaffold 4 DOE rebalance tasks — SDLC 146 → 150 (Neyman-optimal)
Add 3 ccb_feature tasks (C/Rust/TypeScript) and 1 ccb_fix task (Java) to complete the Neyman-optimal allocation targets: - postgres-copy-csv-header-feat-001 (C, postgres/postgres) - servo-css-container-query-feat-001 (Rust, servo/servo) - vscode-custom-fold-region-feat-001 (TypeScript, microsoft/vscode) - flink-window-late-data-fix-001 (Java, apache/flink) Final SDLC counts: fix=26 feature=23 test=18 debug=18 refactor=16 design=14 document=13 secure=12 understand=10 (total=150). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 46b4ce9 commit a0d7471

File tree

30 files changed

+1804
-5
lines changed

30 files changed

+1804
-5
lines changed
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# postgres-copy-csv-header-feat-001: COPY FROM WITH HEADER MATCH
2+
3+
## Task Type: Feature Implementation (SQL Command Extension)
4+
5+
Implement HEADER MATCH option for PostgreSQL COPY FROM CSV.
6+
7+
## Key Reference Files
8+
- `src/backend/parser/gram.y` — SQL grammar (search for `copy_opt_list`, `HEADER`)
9+
- `src/backend/commands/copy.c` — COPY entry point and option processing
10+
- `src/backend/commands/copyfrom.c` — COPY FROM reader implementation
11+
- `src/include/commands/copy.h` — CopyHeaderChoice enum, CopyFormatOptions struct
12+
- `src/test/regress/sql/copy.sql` — existing regression tests
13+
14+
## Search Strategy
15+
- Search for `CopyHeaderChoice` to find the enum defining header behavior
16+
- Search for `header_line` in copyfrom.c to find where the header row is read/skipped
17+
- Search for `copy_opt_list` in gram.y to find the grammar rule for COPY options
18+
- Search for `attname` or `NameStr` in copyfrom.c for column name access patterns
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
FROM gcc:14-bookworm
2+
3+
ENV DEBIAN_FRONTEND=noninteractive
4+
5+
RUN apt-get update && apt-get install -y --no-install-recommends \
6+
git \
7+
curl \
8+
python3 \
9+
python3-pip \
10+
bison \
11+
flex \
12+
libreadline-dev \
13+
zlib1g-dev \
14+
ca-certificates \
15+
&& rm -rf /var/lib/apt/lists/*
16+
17+
RUN (adduser --disabled-password --gecos '' claude 2>/dev/null || true)
18+
19+
RUN mkdir -p /workspace && chown claude:claude /workspace
20+
21+
USER claude
22+
WORKDIR /workspace
23+
RUN git clone --depth 1 https://github.com/sg-evals/postgres--5a461dc4.git . && \
24+
git config user.email "agent@example.com" && \
25+
git config user.name "Agent"
26+
USER root
27+
28+
RUN mkdir -p /logs/agent /logs/verifier && \
29+
chown -R claude:claude /logs
30+
31+
ENTRYPOINT []
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# postgres-copy-csv-header-feat-001 — sg_only_env variant (v2: clone-at-verify)
2+
# Verifier clones mirror at verification time via clone manifest.
3+
4+
FROM gcc:14-bookworm
5+
6+
ENV SOURCEGRAPH_REPO_NAME=sg-evals/postgres--5a461dc4
7+
ENV DEBIAN_FRONTEND=noninteractive
8+
9+
RUN apt-get update && apt-get install -y --no-install-recommends \
10+
git ca-certificates python3 curl \
11+
bison flex libreadline-dev zlib1g-dev \
12+
&& rm -rf /var/lib/apt/lists/*
13+
14+
WORKDIR /workspace
15+
16+
RUN git init && \
17+
git config user.email "agent@example.com" && \
18+
git config user.name "Agent"
19+
20+
RUN mkdir -p /logs/agent /logs/verifier
21+
22+
# Clone manifest for verifier (clone-at-verify strategy)
23+
RUN echo '{"workdir":"/workspace","repos":[{"mirror":"sg-evals/postgres--5a461dc4","target_dir":"."}]}' > /tmp/.sg_only_clone_manifest.json
24+
25+
RUN touch /tmp/.sg_only_mode
26+
27+
RUN (adduser --disabled-password --gecos '' claude 2>/dev/null || true) && \
28+
for d in /workspace /logs; do [ -d "$d" ] && chown -R claude:claude "$d"; done || true
29+
30+
ENTRYPOINT []
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# Task: Implement COPY FROM CSV WITH HEADER MATCH for PostgreSQL
2+
3+
## Objective
4+
Add a `HEADER MATCH` option to PostgreSQL's `COPY FROM` command that validates the CSV file's header row column names match the target table's column names (in order and spelling). Currently `COPY ... WITH (HEADER)` simply skips the first row; `HEADER MATCH` should verify it matches.
5+
6+
## Requirements
7+
8+
1. **Extend the COPY grammar** (`src/backend/parser/gram.y`):
9+
- Add `MATCH` as a valid keyword after `HEADER` in COPY options
10+
- The option should be: `HEADER MATCH` (two tokens) as an alternative to `HEADER` / `HEADER TRUE`
11+
- Store the distinction in the CopyStmt or DefElem representation
12+
13+
2. **Update COPY option processing** (`src/backend/commands/copy.c` or `src/backend/commands/copyfrom.c`):
14+
- Parse the `HEADER MATCH` option and set a flag (e.g., `header_line` enum: OFF, ON, MATCH)
15+
- Only valid for CSV format with COPY FROM (not COPY TO)
16+
- Raise an error if HEADER MATCH is used with non-CSV format or COPY TO
17+
18+
3. **Implement header validation** in the COPY FROM reader:
19+
- After reading the first line of the CSV file, compare each column name to the corresponding column in the target table's column list
20+
- Column comparison should be case-insensitive
21+
- If extra/missing/mismatched columns are found, raise an ERROR with a descriptive message listing the mismatched column names
22+
23+
4. **Add a regression test** in `src/test/regress/sql/copy.sql` (or a new file):
24+
- Test HEADER MATCH with matching headers (should succeed)
25+
- Test HEADER MATCH with mismatched headers (should fail with appropriate error)
26+
- Test HEADER MATCH with wrong column count (should fail)
27+
28+
## Key Reference Files
29+
- `src/backend/parser/gram.y` — SQL grammar (search for `copy_opt_list` and `HEADER`)
30+
- `src/backend/commands/copy.c` — COPY command entry point, option processing
31+
- `src/backend/commands/copyfrom.c` — COPY FROM implementation
32+
- `src/include/commands/copy.h` — CopyHeaderChoice enum and CopyFormatOptions
33+
- `src/include/parser/kwlist.h` — keyword list (MATCH may need adding)
34+
- `src/test/regress/sql/copy.sql` — existing COPY regression tests
35+
36+
## Success Criteria
37+
- HEADER MATCH syntax accepted in SQL grammar
38+
- CopyHeaderChoice enum or equivalent extended with MATCH value
39+
- Header validation logic reads CSV header and compares to table columns
40+
- Appropriate error raised on mismatch
41+
- Regression test file exists with match/mismatch test cases
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
version = "1.0"
2+
3+
[metadata]
4+
name = "postgres-copy-csv-header-feat-001"
5+
description = "Implement COPY FROM CSV WITH HEADER MATCH option that validates column headers match target table columns"
6+
license = "PostgreSQL"
7+
author_name = "CodeContextBench"
8+
author_email = "ccb@example.com"
9+
10+
[task]
11+
id = "postgres-copy-csv-header-feat-001"
12+
repo = "postgres/postgres"
13+
category = "feature_implementation"
14+
language = "c"
15+
difficulty = "expert"
16+
time_limit_sec = 1800
17+
18+
[verification]
19+
type = "test"
20+
command = "bash /tests/test.sh"
21+
reward_type = "checklist"
22+
description = "Checks feature implementation: grammar changes, COPY option parsing, header validation, error handling, regression test"
23+
24+
[environment]
25+
build_timeout_sec = 900.0
26+
cpus = 4
27+
memory = "8G"
28+
storage = "20G"
29+
30+
[environment.setup_scripts]
31+
mcp_config = '''
32+
#!/bin/bash
33+
echo "Sourcegraph MCP available for code search"
34+
'''
Lines changed: 194 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,194 @@
1+
#!/bin/bash
2+
# SG-only verifier wrapper: restore full repo + overlay agent changes
3+
#
4+
# Source this at the TOP of test.sh for build-requiring tasks that use
5+
# sg_only_env mode. It detects /tmp/.sg_only_mode and:
6+
#
7+
# PRIMARY PATH (clone manifest):
8+
# 1. Reads clone manifest from /tmp/.sg_only_clone_manifest.json
9+
# 2. Backs up agent-written files (non-empty, non-git, non-test)
10+
# 3. Clones each mirror repo with --depth 1
11+
# 4. Re-runs inject_defects.sh if specified in manifest
12+
# 5. Overlays agent changes on top
13+
#
14+
# LEGACY FALLBACK (pre-v2 images):
15+
# If manifest is missing but /repo_full/ exists, restores from /repo_full/
16+
# as before. This ensures unregenerated images still work during rollout.
17+
#
18+
# For non-sg_only runs, this script is a no-op.
19+
#
20+
# Usage in test.sh:
21+
# #!/bin/bash
22+
# # Source the sg_only wrapper (no-op if not in sg_only mode)
23+
# if [ -f /tests/sgonly_verifier_wrapper.sh ]; then
24+
# source /tests/sgonly_verifier_wrapper.sh
25+
# fi
26+
# # ... rest of test.sh as normal ...
27+
28+
if [ ! -f /tmp/.sg_only_mode ]; then
29+
# Not in sg_only mode — nothing to do
30+
return 0 2>/dev/null || exit 0
31+
fi
32+
33+
# Idempotency guard: skip if already sourced (avoids double-clone when
34+
# test.sh sources this wrapper and then eval.sh sources it again)
35+
if [ -n "${_SG_ONLY_RESTORED:-}" ]; then
36+
return 0 2>/dev/null || exit 0
37+
fi
38+
export _SG_ONLY_RESTORED=1
39+
40+
echo "[sg_only_verifier] Detected sg_only mode, restoring full repo..."
41+
42+
# ---------------------------------------------------------------------------
43+
# Helper: back up agent-written files from a directory
44+
# ---------------------------------------------------------------------------
45+
backup_agent_files() {
46+
local srcdir="$1"
47+
if [ ! -d "$srcdir" ]; then
48+
return
49+
fi
50+
cd "$srcdir"
51+
mkdir -p /tmp/agent_work
52+
find . -type f -size +0 \
53+
! -path './.git/*' \
54+
! -path './tests/*' \
55+
! -path './.claude/*' \
56+
-print0 | while IFS= read -r -d '' f; do
57+
mkdir -p "/tmp/agent_work/$(dirname "$f")"
58+
cp "$f" "/tmp/agent_work/$f"
59+
done
60+
echo "[sg_only_verifier] Backed up agent-written files from $srcdir"
61+
}
62+
63+
# ---------------------------------------------------------------------------
64+
# Helper: overlay agent-written files back onto a directory
65+
# ---------------------------------------------------------------------------
66+
overlay_agent_files() {
67+
local targetdir="$1"
68+
if [ ! -d /tmp/agent_work ]; then
69+
return
70+
fi
71+
cd /tmp/agent_work
72+
find . -type f -print0 | while IFS= read -r -d '' f; do
73+
local target="${targetdir}/${f#./}"
74+
mkdir -p "$(dirname "$target")"
75+
cp "$f" "$target"
76+
done
77+
echo "[sg_only_verifier] Overlaid agent changes onto $targetdir"
78+
}
79+
80+
# ---------------------------------------------------------------------------
81+
# PRIMARY PATH: clone manifest
82+
# ---------------------------------------------------------------------------
83+
MANIFEST="/tmp/.sg_only_clone_manifest.json"
84+
85+
if [ -f "$MANIFEST" ]; then
86+
echo "[sg_only_verifier] Found clone manifest, using clone-at-verify strategy"
87+
88+
# Parse manifest with python3 (always available in our images)
89+
WORKDIR=$(python3 -c "import json; m=json.load(open('$MANIFEST')); print(m.get('workdir', '/workspace'))")
90+
echo "[sg_only_verifier] Working directory: $WORKDIR"
91+
92+
# 1. Back up agent-written files
93+
backup_agent_files "$WORKDIR"
94+
95+
# 2. Clone each mirror repo
96+
REPO_COUNT=$(python3 -c "import json; m=json.load(open('$MANIFEST')); print(len(m.get('repos', [])))")
97+
for i in $(seq 0 $((REPO_COUNT - 1))); do
98+
MIRROR=$(python3 -c "import json; m=json.load(open('$MANIFEST')); print(m['repos'][$i]['mirror'])")
99+
TARGET_DIR=$(python3 -c "import json; m=json.load(open('$MANIFEST')); print(m['repos'][$i].get('target_dir', '.'))")
100+
CLONE_URL="https://github.com/${MIRROR}.git"
101+
102+
if [ "$TARGET_DIR" = "." ]; then
103+
CLONE_TARGET="$WORKDIR"
104+
else
105+
CLONE_TARGET="${WORKDIR}/${TARGET_DIR}"
106+
fi
107+
108+
echo "[sg_only_verifier] Cloning $MIRROR -> $CLONE_TARGET"
109+
110+
# Remove existing directory contents (truncated files) but preserve .git
111+
# for target_dir="." we need to be careful with the working directory
112+
if [ "$TARGET_DIR" = "." ]; then
113+
# For root workspace: remove everything except .git, then clone into temp and move
114+
TMPCLONE=$(mktemp -d)
115+
if git clone --depth 1 "$CLONE_URL" "$TMPCLONE" 2>/dev/null; then
116+
# Remove old files (except .git and tests)
117+
find "$CLONE_TARGET" -mindepth 1 -maxdepth 1 \
118+
! -name '.git' ! -name 'tests' ! -name '.claude' \
119+
-exec rm -rf {} + 2>/dev/null || true
120+
# Copy cloned files (except .git)
121+
cd "$TMPCLONE"
122+
find . -mindepth 1 -maxdepth 1 ! -name '.git' -exec cp -a {} "$CLONE_TARGET/" \;
123+
# If workspace has no HEAD (bare git init), use mirror .git
124+
# so that git diff HEAD works for diff-based verifiers.
125+
if ! git -C "$CLONE_TARGET" rev-parse HEAD >/dev/null 2>&1; then
126+
rm -rf "$CLONE_TARGET/.git"
127+
cp -a "$TMPCLONE/.git" "$CLONE_TARGET/.git"
128+
echo "[sg_only_verifier] Replaced empty .git with mirror .git for diff baseline"
129+
fi
130+
cd /
131+
rm -rf "$TMPCLONE"
132+
echo "[sg_only_verifier] Restored $MIRROR to $CLONE_TARGET"
133+
else
134+
echo "[sg_only_verifier] WARNING: Failed to clone $CLONE_URL"
135+
rm -rf "$TMPCLONE"
136+
fi
137+
else
138+
# For subdirectory: remove and re-clone
139+
rm -rf "$CLONE_TARGET"
140+
if git clone --depth 1 "$CLONE_URL" "$CLONE_TARGET" 2>/dev/null; then
141+
echo "[sg_only_verifier] Restored $MIRROR to $CLONE_TARGET"
142+
else
143+
echo "[sg_only_verifier] WARNING: Failed to clone $CLONE_URL"
144+
fi
145+
fi
146+
done
147+
148+
# 3. Re-run inject_defects if specified
149+
INJECT_SCRIPT=$(python3 -c "import json; m=json.load(open('$MANIFEST')); print(m.get('inject_defects', ''))")
150+
if [ -n "$INJECT_SCRIPT" ] && [ -f "$INJECT_SCRIPT" ]; then
151+
echo "[sg_only_verifier] Running defect injection: $INJECT_SCRIPT"
152+
cd "$WORKDIR"
153+
chmod +x "$INJECT_SCRIPT"
154+
bash "$INJECT_SCRIPT"
155+
echo "[sg_only_verifier] Defect injection complete"
156+
fi
157+
158+
# 4. Overlay agent changes
159+
overlay_agent_files "$WORKDIR"
160+
161+
# Return to working directory
162+
cd "$WORKDIR"
163+
echo "[sg_only_verifier] Clone-at-verify restore complete, proceeding with tests"
164+
165+
return 0 2>/dev/null || exit 0
166+
fi
167+
168+
# ---------------------------------------------------------------------------
169+
# LEGACY FALLBACK: /repo_full/ restore (for pre-v2 images)
170+
# ---------------------------------------------------------------------------
171+
echo "[sg_only_verifier] No clone manifest found, trying legacy /repo_full/ restore..."
172+
173+
# Read the working directory
174+
WORKDIR="$(cat /tmp/.sg_only_workdir 2>/dev/null || echo '/app')"
175+
echo "[sg_only_verifier] Working directory: $WORKDIR"
176+
177+
if [ ! -d /repo_full ]; then
178+
echo "[sg_only_verifier] WARNING: /repo_full not found, cannot restore"
179+
return 0 2>/dev/null || exit 0
180+
fi
181+
182+
# 1. Find files the agent wrote (non-empty, non-git, non-test files)
183+
backup_agent_files "$WORKDIR"
184+
185+
# 2. Restore full repo from backup
186+
rsync -a --delete /repo_full/ "$WORKDIR/"
187+
echo "[sg_only_verifier] Restored full repo from /repo_full/"
188+
189+
# 3. Overlay agent's changes
190+
overlay_agent_files "$WORKDIR"
191+
192+
# Return to working directory
193+
cd "$WORKDIR"
194+
echo "[sg_only_verifier] Legacy restore complete, proceeding with tests"

0 commit comments

Comments
 (0)