Skip to content

Commit cf25248

Browse files
sjarmakclaude
andcommitted
Fix no_changes_guard: reward.txt now written by Python (38 verifiers)
The no_changes_guard in write_scored_result correctly set reward=0.0 in Python when git confirmed zero agent changes, but reward.txt was still written by a bash `echo "$score"` that used the original (non-zeroed) shell variable. Harbor reads reward.txt for the final score, so the guard was ineffective — agents that crashed before running could still get non-zero scores from heuristic checks matching pre-existing repo state. Fix: Python block now writes reward.txt directly (respecting the guard), bash echo removed. Affects 38 test.sh files across 9 SDLC suites. Also adds OH rerun configs for 12 tainted tasks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent d3b3370 commit cf25248

File tree

40 files changed

+288
-38
lines changed
  • benchmarks
    • csb_sdlc_debug/tidb-query-plan-regression-debug-001/tests
    • csb_sdlc_design/elasticsearch-shard-alloc-design-001/tests
    • csb_sdlc_document
      • godot-gdscript-api-docgen-001/tests
      • grpc-channel-api-docgen-001/tests
    • csb_sdlc_feature
      • cilium-policy-audit-logger-feat-001/tests
      • cilium-policy-quota-feat-001/tests
      • curl-http3-priority-feat-001/tests
      • django-rate-limit-middleware-feat-001/tests
      • envoy-custom-header-filter-feat-001/tests
      • numpy-rolling-median-feat-001/tests
      • pandas-merge-asof-indicator-feat-001/tests
      • postgres-copy-csv-header-feat-001/tests
      • prometheus-silence-bulk-api-feat-001/tests
      • pytorch-gradient-noise-feat-001/tests
      • servo-css-container-query-feat-001/tests
      • terraform-compact-diff-fmt-feat-001/tests
      • vscode-custom-fold-region-feat-001/tests
    • csb_sdlc_refactor
      • beam-pipeline-builder-refac-001/tests
      • cilium-endpoint-manager-refac-001/tests
      • django-request-factory-refac-001/tests
      • envoy-listener-manager-refac-001/tests
      • istio-discovery-server-refac-001/tests
      • kubernetes-scheduler-profile-refac-001/tests
      • numpy-array-dispatch-refac-001/tests
      • pandas-index-engine-refac-001/tests
      • prometheus-query-engine-refac-001/tests
      • pytorch-optimizer-foreach-refac-001/tests
      • roslyn-symbol-resolver-refac-001/tests
      • terraform-eval-context-refac-001/tests
    • csb_sdlc_secure
      • ceph-rgw-auth-secure-001/tests
      • typescript-type-narrowing-secure-001/tests
    • csb_sdlc_test
    • csb_sdlc_understand/clickhouse-mergetree-arch-understand-001/tests
  • configs

40 files changed

+288
-38
lines changed

benchmarks/csb_sdlc_debug/tidb-query-plan-regression-debug-001/tests/test.sh

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,8 @@ payload = {
5252
}
5353
with open("/logs/verifier/validation_result.json", "w") as f:
5454
json.dump(payload, f, indent=2)
55+
with open("/logs/verifier/reward.txt", "w") as f:
56+
f.write(f"{reward:.4f}\n")
5557
PYEOF
5658
echo "0.0" > /logs/verifier/reward.txt
5759
}
@@ -134,8 +136,10 @@ if details:
134136
payload["details"] = details
135137
with open("/logs/verifier/validation_result.json", "w") as f:
136138
json.dump(payload, f, indent=2)
139+
with open("/logs/verifier/reward.txt", "w") as f:
140+
f.write(f"{reward:.4f}\n")
137141
PYEOF
138-
echo "$score" > /logs/verifier/reward.txt
142+
# reward.txt now written by Python (respects no_changes_guard)
139143
}
140144

141145

benchmarks/csb_sdlc_design/elasticsearch-shard-alloc-design-001/tests/test.sh

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,8 @@ payload = {
5252
}
5353
with open("/logs/verifier/validation_result.json", "w") as f:
5454
json.dump(payload, f, indent=2)
55+
with open("/logs/verifier/reward.txt", "w") as f:
56+
f.write(f"{reward:.4f}\n")
5557
PYEOF
5658
echo "0.0" > /logs/verifier/reward.txt
5759
}
@@ -134,8 +136,10 @@ if details:
134136
payload["details"] = details
135137
with open("/logs/verifier/validation_result.json", "w") as f:
136138
json.dump(payload, f, indent=2)
139+
with open("/logs/verifier/reward.txt", "w") as f:
140+
f.write(f"{reward:.4f}\n")
137141
PYEOF
138-
echo "$score" > /logs/verifier/reward.txt
142+
# reward.txt now written by Python (respects no_changes_guard)
139143
}
140144

141145

benchmarks/csb_sdlc_document/godot-gdscript-api-docgen-001/tests/test.sh

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,8 @@ payload = {
5252
}
5353
with open("/logs/verifier/validation_result.json", "w") as f:
5454
json.dump(payload, f, indent=2)
55+
with open("/logs/verifier/reward.txt", "w") as f:
56+
f.write(f"{reward:.4f}\n")
5557
PYEOF
5658
echo "0.0" > /logs/verifier/reward.txt
5759
}
@@ -134,8 +136,10 @@ if details:
134136
payload["details"] = details
135137
with open("/logs/verifier/validation_result.json", "w") as f:
136138
json.dump(payload, f, indent=2)
139+
with open("/logs/verifier/reward.txt", "w") as f:
140+
f.write(f"{reward:.4f}\n")
137141
PYEOF
138-
echo "$score" > /logs/verifier/reward.txt
142+
# reward.txt now written by Python (respects no_changes_guard)
139143
}
140144

141145

benchmarks/csb_sdlc_document/grpc-channel-api-docgen-001/tests/test.sh

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,8 @@ payload = {
5252
}
5353
with open("/logs/verifier/validation_result.json", "w") as f:
5454
json.dump(payload, f, indent=2)
55+
with open("/logs/verifier/reward.txt", "w") as f:
56+
f.write(f"{reward:.4f}\n")
5557
PYEOF
5658
echo "0.0" > /logs/verifier/reward.txt
5759
}
@@ -134,8 +136,10 @@ if details:
134136
payload["details"] = details
135137
with open("/logs/verifier/validation_result.json", "w") as f:
136138
json.dump(payload, f, indent=2)
139+
with open("/logs/verifier/reward.txt", "w") as f:
140+
f.write(f"{reward:.4f}\n")
137141
PYEOF
138-
echo "$score" > /logs/verifier/reward.txt
142+
# reward.txt now written by Python (respects no_changes_guard)
139143
}
140144

141145

benchmarks/csb_sdlc_feature/cilium-policy-audit-logger-feat-001/tests/test.sh

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,8 @@ payload = {
5252
}
5353
with open("/logs/verifier/validation_result.json", "w") as f:
5454
json.dump(payload, f, indent=2)
55+
with open("/logs/verifier/reward.txt", "w") as f:
56+
f.write(f"{reward:.4f}\n")
5557
PYEOF
5658
echo "0.0" > /logs/verifier/reward.txt
5759
}
@@ -134,8 +136,10 @@ if details:
134136
payload["details"] = details
135137
with open("/logs/verifier/validation_result.json", "w") as f:
136138
json.dump(payload, f, indent=2)
139+
with open("/logs/verifier/reward.txt", "w") as f:
140+
f.write(f"{reward:.4f}\n")
137141
PYEOF
138-
echo "$score" > /logs/verifier/reward.txt
142+
# reward.txt now written by Python (respects no_changes_guard)
139143
}
140144

141145

benchmarks/csb_sdlc_feature/cilium-policy-quota-feat-001/tests/test.sh

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,8 @@ payload = {
5252
}
5353
with open("/logs/verifier/validation_result.json", "w") as f:
5454
json.dump(payload, f, indent=2)
55+
with open("/logs/verifier/reward.txt", "w") as f:
56+
f.write(f"{reward:.4f}\n")
5557
PYEOF
5658
echo "0.0" > /logs/verifier/reward.txt
5759
}
@@ -134,8 +136,10 @@ if details:
134136
payload["details"] = details
135137
with open("/logs/verifier/validation_result.json", "w") as f:
136138
json.dump(payload, f, indent=2)
139+
with open("/logs/verifier/reward.txt", "w") as f:
140+
f.write(f"{reward:.4f}\n")
137141
PYEOF
138-
echo "$score" > /logs/verifier/reward.txt
142+
# reward.txt now written by Python (respects no_changes_guard)
139143
}
140144

141145

benchmarks/csb_sdlc_feature/curl-http3-priority-feat-001/tests/test.sh

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,8 @@ payload = {
5252
}
5353
with open("/logs/verifier/validation_result.json", "w") as f:
5454
json.dump(payload, f, indent=2)
55+
with open("/logs/verifier/reward.txt", "w") as f:
56+
f.write(f"{reward:.4f}\n")
5557
PYEOF
5658
echo "0.0" > /logs/verifier/reward.txt
5759
}
@@ -134,8 +136,10 @@ if details:
134136
payload["details"] = details
135137
with open("/logs/verifier/validation_result.json", "w") as f:
136138
json.dump(payload, f, indent=2)
139+
with open("/logs/verifier/reward.txt", "w") as f:
140+
f.write(f"{reward:.4f}\n")
137141
PYEOF
138-
echo "$score" > /logs/verifier/reward.txt
142+
# reward.txt now written by Python (respects no_changes_guard)
139143
}
140144

141145

benchmarks/csb_sdlc_feature/django-rate-limit-middleware-feat-001/tests/test.sh

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,8 @@ payload = {
5252
}
5353
with open("/logs/verifier/validation_result.json", "w") as f:
5454
json.dump(payload, f, indent=2)
55+
with open("/logs/verifier/reward.txt", "w") as f:
56+
f.write(f"{reward:.4f}\n")
5557
PYEOF
5658
echo "0.0" > /logs/verifier/reward.txt
5759
}
@@ -134,8 +136,10 @@ if details:
134136
payload["details"] = details
135137
with open("/logs/verifier/validation_result.json", "w") as f:
136138
json.dump(payload, f, indent=2)
139+
with open("/logs/verifier/reward.txt", "w") as f:
140+
f.write(f"{reward:.4f}\n")
137141
PYEOF
138-
echo "$score" > /logs/verifier/reward.txt
142+
# reward.txt now written by Python (respects no_changes_guard)
139143
}
140144

141145

benchmarks/csb_sdlc_feature/envoy-custom-header-filter-feat-001/tests/test.sh

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,8 @@ payload = {
5252
}
5353
with open("/logs/verifier/validation_result.json", "w") as f:
5454
json.dump(payload, f, indent=2)
55+
with open("/logs/verifier/reward.txt", "w") as f:
56+
f.write(f"{reward:.4f}\n")
5557
PYEOF
5658
echo "0.0" > /logs/verifier/reward.txt
5759
}
@@ -134,8 +136,10 @@ if details:
134136
payload["details"] = details
135137
with open("/logs/verifier/validation_result.json", "w") as f:
136138
json.dump(payload, f, indent=2)
139+
with open("/logs/verifier/reward.txt", "w") as f:
140+
f.write(f"{reward:.4f}\n")
137141
PYEOF
138-
echo "$score" > /logs/verifier/reward.txt
142+
# reward.txt now written by Python (respects no_changes_guard)
139143
}
140144

141145

benchmarks/csb_sdlc_feature/numpy-rolling-median-feat-001/tests/test.sh

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,8 @@ payload = {
5252
}
5353
with open("/logs/verifier/validation_result.json", "w") as f:
5454
json.dump(payload, f, indent=2)
55+
with open("/logs/verifier/reward.txt", "w") as f:
56+
f.write(f"{reward:.4f}\n")
5557
PYEOF
5658
echo "0.0" > /logs/verifier/reward.txt
5759
}
@@ -134,8 +136,10 @@ if details:
134136
payload["details"] = details
135137
with open("/logs/verifier/validation_result.json", "w") as f:
136138
json.dump(payload, f, indent=2)
139+
with open("/logs/verifier/reward.txt", "w") as f:
140+
f.write(f"{reward:.4f}\n")
137141
PYEOF
138-
echo "$score" > /logs/verifier/reward.txt
142+
# reward.txt now written by Python (respects no_changes_guard)
139143
}
140144

141145

0 commit comments

Comments
 (0)