review: address all PR #267 review findings (critical / important / suggestions)#268
Open
sriumcp wants to merge 1 commit into
Open
Conversation
…native-Systems-Research#267 Addresses every finding from the multi-agent PR review: Critical: - C1: drop ``except ImportError: pass`` for orchestrator.lineage in iteration.py. Hoist imports to module top — a broken intra-package module is a self-inflicted bug, not an optional dependency, and the preceding comment promised "loud failure" anyway. - C2: ``_validate_locked_workload`` now surfaces malformed/unreadable workload yamls as deviations instead of silently skipping them. Catching workload drift is the whole point of F20. Important: - I1: ``invoke_plot_specs`` no longer guesses ``campaign_yaml_dir`` from work_dir's parent. Threaded ``campaign_path`` through ``setup_work_dir`` (recorded in state.json[\"config_ref\"]) and a new ``_campaign_yaml_dir_from_state`` reader resolves it for the finalize step. Plot script paths now resolve correctly in production, not just tests. - I2: ``_pick_interpreter`` rewritten as ``_build_command``, returning the proper argv list. The previous version invoked executable scripts with themselves as ``argv[1]``. - I3: aclose cleanup's broad ``except Exception: pass`` now logs at WARNING instead of swallowing silently. The narrow tuple of documented races (TimeoutError, CancelledError, RuntimeError, GeneratorExit) is still silent — only the unknown-class fallback gains observability. - I4-I10: 25 new behavioral tests covering F4 auto-approve diff emission, F19 ``_resolve_turn_silence_threshold`` per-phase + scalar back-compat (uncovered a real bug — fix below), F17 attach_to_state idempotency + repo_dirty capture + snapshot_iter_files, F11 boundary at total_files=4/5 + formula assertion, F12 RuntimeError + non-documented exception handling, F20 declared-deviation pin, F21 apply_derived_from_patch round-trip + cumulative.patch.error sidecar. Bug surfaced and fixed by F19 tests: - When ``sdk_timeouts.turn_silence_threshold_seconds`` was unset entirely, the old init applied 600 to every phase (defeating F19's per-phase split). Now distinguished from the explicit scalar form: absent → per-phase defaults stand; explicit scalar → applied uniformly (back-compat). Code-reviewer suggestions: - S1: ``attach_to_state`` is no longer dead code. Docstring updated; RuntimeError on JSON decode failure (was: silent return). - S2: ``nous package`` stages reproduce.sh / Dockerfile / PACKAGE_README.md to a temp directory and tars from there — the work_dir on disk is unchanged. - S3: redundant ``except (OSError, Exception)`` in ``summarize_lineage`` replaced with narrow handlers that record ``campaign_yaml_error`` into the summary so ``nous lineage`` shows the operator why derived_from couldn't be determined. - S4: ``campaign.plot_specs`` schema description corrected — runs per-iteration during finalize, not as a separate end-of-campaign rollup. - S5: snapshot guard short-circuit removed; idempotency lives in ``snapshot_iter_files`` itself. Comment errors fixed: - ``reproducibility.attach_to_state`` 24h re-init claim removed. - ``_emit_high_build_warning`` docstring no longer claims an unimplemented OR clause. - ``compute_campaign_spec_diff`` docstring says "five sub-keys", not "three". - campaign schema: silence-threshold fallback chain corrected. - ``nous resume`` error: dropped the false "re-emit reproducibility metadata" claim (first-capture-wins). - sdk_dispatch path-walk comment expanded to count all four ``.parent`` calls. Dead ``except (IndexError, ValueError)`` replaced with a real boundary check. - ``--immediate`` help: "tool-call return" → "event boundary". - README precondition list: principles.json clarification moved out of the numbered list (not a precondition). - BLIS quote in execute_analyze.md gains a "see repo_commit in reproducibility_metadata to reproduce the snapshot" pointer. - ``_walk_locked_workload`` tenant-tracking ternary documented. - ``_resolve_turn_silence_threshold`` docstring acknowledges that step 3 and step 4 of the resolution chain are merged in ``_phase_silence_thresholds``. Cumulative.patch failure visibility: - ``emit_cumulative_patch`` now writes a ``patches/cumulative.patch.error`` sidecar with git stderr when emission fails. ``summarize_lineage`` reads it; ``nous lineage`` surfaces the message. Without the sidecar, a failed emission was a single warning line in orchestrator.log that downstream ``derived_from`` campaigns would silently miss months later. Tests: 1303 passing (was 1278), 2 skipped, 0 regressions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Refs #245
Follow-up to merged PR #267. Addresses every finding from the multi-agent review (code-reviewer, pr-test-analyzer, silent-failure-hunter, comment-analyzer) that wasn't blocking-for-merge but is worth landing now while the context is fresh.
Critical fixes
iteration.py:1272-1287— dropexcept ImportError: passfor our ownorchestrator.lineage. Hoisted imports to module top. The preceding comment explicitly promised "loud failure"; the silent pass defeated it. A broken intra-package module is a self-inflicted bug, not an optional dependency.validate.py:298-301—_validate_locked_workloadnow surfaces malformed/unreadable workload yamls as deviations instead of silentlycontinue-ing. F20's whole purpose is catching workload drift; silent-skip on malformed input was the inverse of that.Important fixes
plot_specs.py— drop the wrongcampaign_yaml_dirfallback that resolved to parent of work_dir. Threadedcampaign_paththroughsetup_work_dir(now recorded asstate.json["config_ref"]), with a new_campaign_yaml_dir_from_statereader. Plot script paths now resolve correctly in production, not just tests.plot_specs.py:108-120—_pick_interpreterrewritten as_build_command. The previous version returned the script as both interpreter andargv[1], so an executable Rust binary or shell script with no recognized extension would invoke itself with itself as its first argument.sdk_dispatch.py:180-183— broadexcept Exception: passin aclose cleanup now logs at WARNING. The four documented races stay silent; only the unknown-class fallback gains observability.Bug surfaced and fixed by F19 tests
When
sdk_timeouts.turn_silence_threshold_secondswas unset entirely,__init__defaulted tosilence_threshold_seconds(600) and applied it to every phase via the scalar branch — defeating F19's per-phase split. Now distinguishes:Code-reviewer suggestions
attach_to_stateis no longer dead code; docstring corrected;RuntimeErroron JSON decode failure (was silent return).nous packagestages reproduce.sh / Dockerfile / PACKAGE_README.md to a temp directory and tars from there. The work_dir on disk is unchanged across successivenous packageinvocations.except (OSError, Exception)insummarize_lineagereplaced with narrow handlers that recordcampaign_yaml_errorinto the summary sonous lineageshows the operator whyderived_fromcouldn't be determined.campaign.plot_specsschema description corrected — invokes per-iteration during finalize, not as a separate end-of-campaign rollup.snapshot_iter_filesitself.Cumulative-patch failure visibility
emit_cumulative_patchnow writes apatches/cumulative.patch.errorsidecar with git stderr when emission fails.summarize_lineagereads it;nous lineagesurfaces the message. Without the sidecar, a failed emission was a single warning line inorchestrator.logthat downstreamderived_fromcampaigns would silently miss months later.Comment errors
reproducibility.attach_to_state24h re-init claim removed (no such check exists)._emit_high_build_warningdocstring no longer claims an unimplemented OR clause.compute_campaign_spec_diffdocstring says "five sub-keys", not "three".nous resumeerror: dropped the false "re-emit reproducibility metadata" claim (first-capture-wins).sdk_dispatchpath-walk comment expanded to count all four.parentcalls; deadexcept (IndexError, ValueError)replaced with a real boundary check.--immediatehelp: "tool-call return" → "event boundary".execute_analyze.mdgains a "seerepo_commitin reproducibility_metadata to reproduce" pointer._walk_locked_workloadtenant-tracking ternary documented._resolve_turn_silence_thresholddocstring acknowledges merged steps 3+4 in_phase_silence_thresholds.New tests (25 added → 1303 total)
test_f4_augment_writes_spec_diff_under_auto_approve— F4 contract end-to-end withauto_approve=True.test_f4_augment_emits_stub_when_summarizer_failed— stub path emits diff too.test_f19_resolve_turn_silence_threshold_per_phase_map— each phase returns its declared value.test_f19_resolve_turn_silence_threshold_partial_map_falls_back_to_default— partial map respects per-phase defaults for unset keys.test_f19_resolve_turn_silence_threshold_scalar_back_compat— scalar applies uniformly.test_f19_resolve_turn_silence_threshold_no_config_uses_phase_defaults— caught the bug.test_f17_attach_to_state_first_capture_wins— idempotency.test_f17_attach_to_state_writes_when_missing— fresh-state path.test_f17_attach_to_state_raises_on_malformed_state— loud failure.test_f17_repo_dirty_false_on_clean_tree/_true_on_modified_tree— porcelain capture.test_f17_snapshot_iter_files_copies_hardware_config— per-iter snapshot.test_f11_warning_threshold_boundary(parametrized 3/4/5/6) — boundary pinned.test_f11_suggestion_formula_matches_120_plus_30_per_file— formula pinned.test_f11_no_warning_when_operator_already_raised— caller suppression.test_f12_aclose_runtime_error_is_swallowed— RuntimeError ≠ propagation.test_f12_aclose_arbitrary_exception_does_not_mask_primary— broad-fallback behavior.test_f20_locked_workload_diff_passes_with_declared_deviation— fixed tautology, realerrors == []assertion.test_f20_malformed_workload_yaml_surfaces_as_deviation— C2 regression guard.test_f21_apply_derived_from_patch_succeeds_with_clean_apply/_fails_when_check_rejects— both branches.test_f21_full_round_trip— emit → resolve → apply.test_f21_emit_cumulative_patch_writes_error_sidecar_on_failure— sidecar contract.Test plan
pytest tests/test_friction_245.py -v— 57 tests pass (was 32, +25).pytest tests/ -q— 1303 pass, 2 skipped, 0 regressions.🤖 Generated with Claude Code