Skip to content

review: address all PR #267 review findings (critical / important / suggestions)#268

Open
sriumcp wants to merge 1 commit into
AI-native-Systems-Research:reflectivefrom
sriumcp:friction-245-review-fixes
Open

review: address all PR #267 review findings (critical / important / suggestions)#268
sriumcp wants to merge 1 commit into
AI-native-Systems-Research:reflectivefrom
sriumcp:friction-245-review-fixes

Conversation

@sriumcp
Copy link
Copy Markdown
Collaborator

@sriumcp sriumcp commented Jun 1, 2026

Refs #245

Follow-up to merged PR #267. Addresses every finding from the multi-agent review (code-reviewer, pr-test-analyzer, silent-failure-hunter, comment-analyzer) that wasn't blocking-for-merge but is worth landing now while the context is fresh.

Critical fixes

  • C1 iteration.py:1272-1287 — drop except ImportError: pass for our own orchestrator.lineage. Hoisted imports to module top. The preceding comment explicitly promised "loud failure"; the silent pass defeated it. A broken intra-package module is a self-inflicted bug, not an optional dependency.
  • C2 validate.py:298-301_validate_locked_workload now surfaces malformed/unreadable workload yamls as deviations instead of silently continue-ing. F20's whole purpose is catching workload drift; silent-skip on malformed input was the inverse of that.

Important fixes

  • I1 plot_specs.py — drop the wrong campaign_yaml_dir fallback that resolved to parent of work_dir. Threaded campaign_path through setup_work_dir (now recorded as state.json["config_ref"]), with a new _campaign_yaml_dir_from_state reader. Plot script paths now resolve correctly in production, not just tests.
  • I2 plot_specs.py:108-120_pick_interpreter rewritten as _build_command. The previous version returned the script as both interpreter and argv[1], so an executable Rust binary or shell script with no recognized extension would invoke itself with itself as its first argument.
  • I3 sdk_dispatch.py:180-183 — broad except Exception: pass in aclose cleanup now logs at WARNING. The four documented races stay silent; only the unknown-class fallback gains observability.

Bug surfaced and fixed by F19 tests

When sdk_timeouts.turn_silence_threshold_seconds was unset entirely, __init__ defaulted to silence_threshold_seconds (600) and applied it to every phase via the scalar branch — defeating F19's per-phase split. Now distinguishes:

  • absent → per-phase defaults stand (design=600, execute_analyze=120, report=240).
  • explicit scalar → applied uniformly (back-compat).
  • explicit map → operator values overlay defaults.

Code-reviewer suggestions

  • S1 attach_to_state is no longer dead code; docstring corrected; RuntimeError on JSON decode failure (was silent return).
  • S2 nous package stages reproduce.sh / Dockerfile / PACKAGE_README.md to a temp directory and tars from there. The work_dir on disk is unchanged across successive nous package invocations.
  • S3 Redundant except (OSError, Exception) in summarize_lineage replaced with narrow handlers that record campaign_yaml_error into the summary so nous lineage shows the operator why derived_from couldn't be determined.
  • S4 campaign.plot_specs schema description corrected — invokes per-iteration during finalize, not as a separate end-of-campaign rollup.
  • S5 Snapshot guard short-circuit removed; idempotency lives in snapshot_iter_files itself.

Cumulative-patch failure visibility

emit_cumulative_patch now writes a patches/cumulative.patch.error sidecar with git stderr when emission fails. summarize_lineage reads it; nous lineage surfaces the message. Without the sidecar, a failed emission was a single warning line in orchestrator.log that downstream derived_from campaigns would silently miss months later.

Comment errors

  • reproducibility.attach_to_state 24h re-init claim removed (no such check exists).
  • _emit_high_build_warning docstring no longer claims an unimplemented OR clause.
  • compute_campaign_spec_diff docstring says "five sub-keys", not "three".
  • campaign schema: silence-threshold fallback chain corrected.
  • nous resume error: dropped the false "re-emit reproducibility metadata" claim (first-capture-wins).
  • sdk_dispatch path-walk comment expanded to count all four .parent calls; dead except (IndexError, ValueError) replaced with a real boundary check.
  • --immediate help: "tool-call return" → "event boundary".
  • README precondition list: principles.json clarification moved out of the numbered list.
  • BLIS quote in execute_analyze.md gains a "see repo_commit in reproducibility_metadata to reproduce" pointer.
  • _walk_locked_workload tenant-tracking ternary documented.
  • _resolve_turn_silence_threshold docstring acknowledges merged steps 3+4 in _phase_silence_thresholds.

New tests (25 added → 1303 total)

  • test_f4_augment_writes_spec_diff_under_auto_approve — F4 contract end-to-end with auto_approve=True.
  • test_f4_augment_emits_stub_when_summarizer_failed — stub path emits diff too.
  • test_f19_resolve_turn_silence_threshold_per_phase_map — each phase returns its declared value.
  • test_f19_resolve_turn_silence_threshold_partial_map_falls_back_to_default — partial map respects per-phase defaults for unset keys.
  • test_f19_resolve_turn_silence_threshold_scalar_back_compat — scalar applies uniformly.
  • test_f19_resolve_turn_silence_threshold_no_config_uses_phase_defaults — caught the bug.
  • test_f17_attach_to_state_first_capture_wins — idempotency.
  • test_f17_attach_to_state_writes_when_missing — fresh-state path.
  • test_f17_attach_to_state_raises_on_malformed_state — loud failure.
  • test_f17_repo_dirty_false_on_clean_tree / _true_on_modified_tree — porcelain capture.
  • test_f17_snapshot_iter_files_copies_hardware_config — per-iter snapshot.
  • test_f11_warning_threshold_boundary (parametrized 3/4/5/6) — boundary pinned.
  • test_f11_suggestion_formula_matches_120_plus_30_per_file — formula pinned.
  • test_f11_no_warning_when_operator_already_raised — caller suppression.
  • test_f12_aclose_runtime_error_is_swallowed — RuntimeError ≠ propagation.
  • test_f12_aclose_arbitrary_exception_does_not_mask_primary — broad-fallback behavior.
  • test_f20_locked_workload_diff_passes_with_declared_deviation — fixed tautology, real errors == [] assertion.
  • test_f20_malformed_workload_yaml_surfaces_as_deviation — C2 regression guard.
  • test_f21_apply_derived_from_patch_succeeds_with_clean_apply / _fails_when_check_rejects — both branches.
  • test_f21_full_round_trip — emit → resolve → apply.
  • test_f21_emit_cumulative_patch_writes_error_sidecar_on_failure — sidecar contract.

Test plan

  • pytest tests/test_friction_245.py -v — 57 tests pass (was 32, +25).
  • pytest tests/ -q — 1303 pass, 2 skipped, 0 regressions.
  • CLAUDE.md compliance verified — no live LLM calls, mocks at the seam.

🤖 Generated with Claude Code

…native-Systems-Research#267

Addresses every finding from the multi-agent PR review:

Critical:
- C1: drop ``except ImportError: pass`` for orchestrator.lineage in
  iteration.py. Hoist imports to module top — a broken intra-package
  module is a self-inflicted bug, not an optional dependency, and the
  preceding comment promised "loud failure" anyway.
- C2: ``_validate_locked_workload`` now surfaces malformed/unreadable
  workload yamls as deviations instead of silently skipping them.
  Catching workload drift is the whole point of F20.

Important:
- I1: ``invoke_plot_specs`` no longer guesses ``campaign_yaml_dir``
  from work_dir's parent. Threaded ``campaign_path`` through
  ``setup_work_dir`` (recorded in state.json[\"config_ref\"]) and a
  new ``_campaign_yaml_dir_from_state`` reader resolves it for the
  finalize step. Plot script paths now resolve correctly in
  production, not just tests.
- I2: ``_pick_interpreter`` rewritten as ``_build_command``,
  returning the proper argv list. The previous version invoked
  executable scripts with themselves as ``argv[1]``.
- I3: aclose cleanup's broad ``except Exception: pass`` now logs
  at WARNING instead of swallowing silently. The narrow tuple of
  documented races (TimeoutError, CancelledError, RuntimeError,
  GeneratorExit) is still silent — only the unknown-class fallback
  gains observability.
- I4-I10: 25 new behavioral tests covering F4 auto-approve diff
  emission, F19 ``_resolve_turn_silence_threshold`` per-phase +
  scalar back-compat (uncovered a real bug — fix below), F17
  attach_to_state idempotency + repo_dirty capture +
  snapshot_iter_files, F11 boundary at total_files=4/5 +
  formula assertion, F12 RuntimeError + non-documented exception
  handling, F20 declared-deviation pin, F21 apply_derived_from_patch
  round-trip + cumulative.patch.error sidecar.

Bug surfaced and fixed by F19 tests:
- When ``sdk_timeouts.turn_silence_threshold_seconds`` was unset
  entirely, the old init applied 600 to every phase (defeating
  F19's per-phase split). Now distinguished from the explicit
  scalar form: absent → per-phase defaults stand;
  explicit scalar → applied uniformly (back-compat).

Code-reviewer suggestions:
- S1: ``attach_to_state`` is no longer dead code. Docstring updated;
  RuntimeError on JSON decode failure (was: silent return).
- S2: ``nous package`` stages reproduce.sh / Dockerfile /
  PACKAGE_README.md to a temp directory and tars from there —
  the work_dir on disk is unchanged.
- S3: redundant ``except (OSError, Exception)`` in
  ``summarize_lineage`` replaced with narrow handlers that record
  ``campaign_yaml_error`` into the summary so ``nous lineage``
  shows the operator why derived_from couldn't be determined.
- S4: ``campaign.plot_specs`` schema description corrected — runs
  per-iteration during finalize, not as a separate end-of-campaign
  rollup.
- S5: snapshot guard short-circuit removed; idempotency lives in
  ``snapshot_iter_files`` itself.

Comment errors fixed:
- ``reproducibility.attach_to_state`` 24h re-init claim removed.
- ``_emit_high_build_warning`` docstring no longer claims an
  unimplemented OR clause.
- ``compute_campaign_spec_diff`` docstring says "five sub-keys",
  not "three".
- campaign schema: silence-threshold fallback chain corrected.
- ``nous resume`` error: dropped the false "re-emit reproducibility
  metadata" claim (first-capture-wins).
- sdk_dispatch path-walk comment expanded to count all four
  ``.parent`` calls. Dead ``except (IndexError, ValueError)``
  replaced with a real boundary check.
- ``--immediate`` help: "tool-call return" → "event boundary".
- README precondition list: principles.json clarification moved
  out of the numbered list (not a precondition).
- BLIS quote in execute_analyze.md gains a "see repo_commit in
  reproducibility_metadata to reproduce the snapshot" pointer.
- ``_walk_locked_workload`` tenant-tracking ternary documented.
- ``_resolve_turn_silence_threshold`` docstring acknowledges that
  step 3 and step 4 of the resolution chain are merged in
  ``_phase_silence_thresholds``.

Cumulative.patch failure visibility:
- ``emit_cumulative_patch`` now writes a
  ``patches/cumulative.patch.error`` sidecar with git stderr when
  emission fails. ``summarize_lineage`` reads it; ``nous lineage``
  surfaces the message. Without the sidecar, a failed emission
  was a single warning line in orchestrator.log that downstream
  ``derived_from`` campaigns would silently miss months later.

Tests: 1303 passing (was 1278), 2 skipped, 0 regressions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant