review: address all PR #267 review findings (critical / important / suggestions) by sriumcp · Pull Request #268 · AI-native-Systems-Research/agentic-strategy-evolution

sriumcp · 2026-06-01T17:44:20Z

Follow-up to merged PR #267. Addresses every finding from the multi-agent review (code-reviewer, pr-test-analyzer, silent-failure-hunter, comment-analyzer) that wasn't blocking-for-merge but is worth landing now while the context is fresh.

Critical fixes

C1 iteration.py:1272-1287 — drop except ImportError: pass for our own orchestrator.lineage. Hoisted imports to module top. The preceding comment explicitly promised "loud failure"; the silent pass defeated it. A broken intra-package module is a self-inflicted bug, not an optional dependency.
C2 validate.py:298-301 — _validate_locked_workload now surfaces malformed/unreadable workload yamls as deviations instead of silently continue-ing. F20's whole purpose is catching workload drift; silent-skip on malformed input was the inverse of that.

Important fixes

I1 plot_specs.py — drop the wrong campaign_yaml_dir fallback that resolved to parent of work_dir. Threaded campaign_path through setup_work_dir (now recorded as state.json["config_ref"]), with a new _campaign_yaml_dir_from_state reader. Plot script paths now resolve correctly in production, not just tests.
I2 plot_specs.py:108-120 — _pick_interpreter rewritten as _build_command. The previous version returned the script as both interpreter and argv[1], so an executable Rust binary or shell script with no recognized extension would invoke itself with itself as its first argument.
I3 sdk_dispatch.py:180-183 — broad except Exception: pass in aclose cleanup now logs at WARNING. The four documented races stay silent; only the unknown-class fallback gains observability.

Bug surfaced and fixed by F19 tests

When sdk_timeouts.turn_silence_threshold_seconds was unset entirely, __init__ defaulted to silence_threshold_seconds (600) and applied it to every phase via the scalar branch — defeating F19's per-phase split. Now distinguishes:

absent → per-phase defaults stand (design=600, execute_analyze=120, report=240).
explicit scalar → applied uniformly (back-compat).
explicit map → operator values overlay defaults.

Code-reviewer suggestions

S1 attach_to_state is no longer dead code; docstring corrected; RuntimeError on JSON decode failure (was silent return).
S2 nous package stages reproduce.sh / Dockerfile / PACKAGE_README.md to a temp directory and tars from there. The work_dir on disk is unchanged across successive nous package invocations.
S3 Redundant except (OSError, Exception) in summarize_lineage replaced with narrow handlers that record campaign_yaml_error into the summary so nous lineage shows the operator why derived_from couldn't be determined.
S4 campaign.plot_specs schema description corrected — invokes per-iteration during finalize, not as a separate end-of-campaign rollup.
S5 Snapshot guard short-circuit removed; idempotency lives in snapshot_iter_files itself.

Cumulative-patch failure visibility

emit_cumulative_patch now writes a patches/cumulative.patch.error sidecar with git stderr when emission fails. summarize_lineage reads it; nous lineage surfaces the message. Without the sidecar, a failed emission was a single warning line in orchestrator.log that downstream derived_from campaigns would silently miss months later.

Comment errors

reproducibility.attach_to_state 24h re-init claim removed (no such check exists).
_emit_high_build_warning docstring no longer claims an unimplemented OR clause.
compute_campaign_spec_diff docstring says "five sub-keys", not "three".
campaign schema: silence-threshold fallback chain corrected.
nous resume error: dropped the false "re-emit reproducibility metadata" claim (first-capture-wins).
sdk_dispatch path-walk comment expanded to count all four .parent calls; dead except (IndexError, ValueError) replaced with a real boundary check.
--immediate help: "tool-call return" → "event boundary".
README precondition list: principles.json clarification moved out of the numbered list.
BLIS quote in execute_analyze.md gains a "see repo_commit in reproducibility_metadata to reproduce" pointer.
_walk_locked_workload tenant-tracking ternary documented.
_resolve_turn_silence_threshold docstring acknowledges merged steps 3+4 in _phase_silence_thresholds.

New tests (25 added → 1303 total)

test_f4_augment_writes_spec_diff_under_auto_approve — F4 contract end-to-end with auto_approve=True.
test_f4_augment_emits_stub_when_summarizer_failed — stub path emits diff too.
test_f19_resolve_turn_silence_threshold_per_phase_map — each phase returns its declared value.
test_f19_resolve_turn_silence_threshold_partial_map_falls_back_to_default — partial map respects per-phase defaults for unset keys.
test_f19_resolve_turn_silence_threshold_scalar_back_compat — scalar applies uniformly.
test_f19_resolve_turn_silence_threshold_no_config_uses_phase_defaults — caught the bug.
test_f17_attach_to_state_first_capture_wins — idempotency.
test_f17_attach_to_state_writes_when_missing — fresh-state path.
test_f17_attach_to_state_raises_on_malformed_state — loud failure.
test_f17_repo_dirty_false_on_clean_tree / _true_on_modified_tree — porcelain capture.
test_f17_snapshot_iter_files_copies_hardware_config — per-iter snapshot.
test_f11_warning_threshold_boundary (parametrized 3/4/5/6) — boundary pinned.
test_f11_suggestion_formula_matches_120_plus_30_per_file — formula pinned.
test_f11_no_warning_when_operator_already_raised — caller suppression.
test_f12_aclose_runtime_error_is_swallowed — RuntimeError ≠ propagation.
test_f12_aclose_arbitrary_exception_does_not_mask_primary — broad-fallback behavior.
test_f20_locked_workload_diff_passes_with_declared_deviation — fixed tautology, real errors == [] assertion.
test_f20_malformed_workload_yaml_surfaces_as_deviation — C2 regression guard.
test_f21_apply_derived_from_patch_succeeds_with_clean_apply / _fails_when_check_rejects — both branches.
test_f21_full_round_trip — emit → resolve → apply.
test_f21_emit_cumulative_patch_writes_error_sidecar_on_failure — sidecar contract.

Test plan

pytest tests/test_friction_245.py -v — 57 tests pass (was 32, +25).
pytest tests/ -q — 1303 pass, 2 skipped, 0 regressions.
CLAUDE.md compliance verified — no live LLM calls, mocks at the seam.

🤖 Generated with Claude Code

…native-Systems-Research#267 Addresses every finding from the multi-agent PR review: Critical: - C1: drop ``except ImportError: pass`` for orchestrator.lineage in iteration.py. Hoist imports to module top — a broken intra-package module is a self-inflicted bug, not an optional dependency, and the preceding comment promised "loud failure" anyway. - C2: ``_validate_locked_workload`` now surfaces malformed/unreadable workload yamls as deviations instead of silently skipping them. Catching workload drift is the whole point of F20. Important: - I1: ``invoke_plot_specs`` no longer guesses ``campaign_yaml_dir`` from work_dir's parent. Threaded ``campaign_path`` through ``setup_work_dir`` (recorded in state.json[\"config_ref\"]) and a new ``_campaign_yaml_dir_from_state`` reader resolves it for the finalize step. Plot script paths now resolve correctly in production, not just tests. - I2: ``_pick_interpreter`` rewritten as ``_build_command``, returning the proper argv list. The previous version invoked executable scripts with themselves as ``argv[1]``. - I3: aclose cleanup's broad ``except Exception: pass`` now logs at WARNING instead of swallowing silently. The narrow tuple of documented races (TimeoutError, CancelledError, RuntimeError, GeneratorExit) is still silent — only the unknown-class fallback gains observability. - I4-I10: 25 new behavioral tests covering F4 auto-approve diff emission, F19 ``_resolve_turn_silence_threshold`` per-phase + scalar back-compat (uncovered a real bug — fix below), F17 attach_to_state idempotency + repo_dirty capture + snapshot_iter_files, F11 boundary at total_files=4/5 + formula assertion, F12 RuntimeError + non-documented exception handling, F20 declared-deviation pin, F21 apply_derived_from_patch round-trip + cumulative.patch.error sidecar. Bug surfaced and fixed by F19 tests: - When ``sdk_timeouts.turn_silence_threshold_seconds`` was unset entirely, the old init applied 600 to every phase (defeating F19's per-phase split). Now distinguished from the explicit scalar form: absent → per-phase defaults stand; explicit scalar → applied uniformly (back-compat). Code-reviewer suggestions: - S1: ``attach_to_state`` is no longer dead code. Docstring updated; RuntimeError on JSON decode failure (was: silent return). - S2: ``nous package`` stages reproduce.sh / Dockerfile / PACKAGE_README.md to a temp directory and tars from there — the work_dir on disk is unchanged. - S3: redundant ``except (OSError, Exception)`` in ``summarize_lineage`` replaced with narrow handlers that record ``campaign_yaml_error`` into the summary so ``nous lineage`` shows the operator why derived_from couldn't be determined. - S4: ``campaign.plot_specs`` schema description corrected — runs per-iteration during finalize, not as a separate end-of-campaign rollup. - S5: snapshot guard short-circuit removed; idempotency lives in ``snapshot_iter_files`` itself. Comment errors fixed: - ``reproducibility.attach_to_state`` 24h re-init claim removed. - ``_emit_high_build_warning`` docstring no longer claims an unimplemented OR clause. - ``compute_campaign_spec_diff`` docstring says "five sub-keys", not "three". - campaign schema: silence-threshold fallback chain corrected. - ``nous resume`` error: dropped the false "re-emit reproducibility metadata" claim (first-capture-wins). - sdk_dispatch path-walk comment expanded to count all four ``.parent`` calls. Dead ``except (IndexError, ValueError)`` replaced with a real boundary check. - ``--immediate`` help: "tool-call return" → "event boundary". - README precondition list: principles.json clarification moved out of the numbered list (not a precondition). - BLIS quote in execute_analyze.md gains a "see repo_commit in reproducibility_metadata to reproduce the snapshot" pointer. - ``_walk_locked_workload`` tenant-tracking ternary documented. - ``_resolve_turn_silence_threshold`` docstring acknowledges that step 3 and step 4 of the resolution chain are merged in ``_phase_silence_thresholds``. Cumulative.patch failure visibility: - ``emit_cumulative_patch`` now writes a ``patches/cumulative.patch.error`` sidecar with git stderr when emission fails. ``summarize_lineage`` reads it; ``nous lineage`` surfaces the message. Without the sidecar, a failed emission was a single warning line in orchestrator.log that downstream ``derived_from`` campaigns would silently miss months later. Tests: 1303 passing (was 1278), 2 skipped, 0 regressions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

review: address all PR #267 review findings (critical / important / suggestions)#268

review: address all PR #267 review findings (critical / important / suggestions)#268
sriumcp wants to merge 1 commit into
AI-native-Systems-Research:reflectivefrom
sriumcp:friction-245-review-fixes

sriumcp commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sriumcp commented Jun 1, 2026

Critical fixes

Important fixes

Bug surfaced and fixed by F19 tests

Code-reviewer suggestions

Cumulative-patch failure visibility

Comment errors

New tests (25 added → 1303 total)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant