fix: resolve all 21 sub-issues of friction-report tracker #245#267
Merged
sriumcp merged 1 commit intoJun 1, 2026
Merged
Conversation
…ystems-Research#245 Closes AI-native-Systems-Research#246 Closes AI-native-Systems-Research#247 Closes AI-native-Systems-Research#248 Closes AI-native-Systems-Research#249 Closes AI-native-Systems-Research#250 Closes AI-native-Systems-Research#251 Closes AI-native-Systems-Research#252 Closes AI-native-Systems-Research#253 Closes AI-native-Systems-Research#254 Closes AI-native-Systems-Research#255 Closes AI-native-Systems-Research#256 Closes AI-native-Systems-Research#257 Closes AI-native-Systems-Research#258 Closes AI-native-Systems-Research#259 Closes AI-native-Systems-Research#260 Closes AI-native-Systems-Research#261 Closes AI-native-Systems-Research#262 Closes AI-native-Systems-Research#263 Closes AI-native-Systems-Research#264 Closes AI-native-Systems-Research#265 Closes AI-native-Systems-Research#266 Closes AI-native-Systems-Research#245 External campaign-author friction report from running the paper-memorytime-mirage campaign on nous against BLIS surfaced 21 distinct points of friction, clustered around five themes: A. Spec-fidelity (F1, F2, F3, F4, F10, F13, F20) — nous validated *self-consistency* (executor matches bundle) but not *spec-fidelity* (bundle matches campaign) under --auto-approve. Headline architectural primitive: campaign.locked_parameters (AI-native-Systems-Research#246/F1) hard-fails any deviation, regardless of --auto-approve. Adoption: locked_workload (AI-native-Systems-Research#265/F20), unlocked_parameters_audit (AI-native-Systems-Research#261/F16), methodology hierarchy (AI-native-Systems-Research#247/F2), depth_overrides+invalidates (AI-native-Systems-Research#248/F3), gate-summary diff (AI-native-Systems-Research#249/F4), auto-approve safety docs (AI-native-Systems-Research#255/F10), create-campaign scaffold + authoring guide (AI-native-Systems-Research#258/F13). B. Apparatus discipline (F7, F14, F16) — invariants must validate ATTRIBUTION, not upstream totals. Methodology prompt sections in design.md/execute_analyze.md cover the bug-class question with the BLIS runningBatch vs RequestMap worked example. Authoring guide covers rehearsal-as-instrument and pre-lock unit checks. C. Lifecycle / portability (F5, F11, F12, F19, F21) — per-phase silence threshold (AI-native-Systems-Research#264/F19) closes the active stall where DESIGN's heavy reasoning trips an EXECUTE_ANALYZE-tuned watchdog. F21 lands cross-campaign code reuse via cumulative.patch + derived_from + nous lineage. F5 stop --immediate, F11 high-BUILD warning, F12 asyncio race fix. D. Reproducibility (F17, F18) — reproducibility_metadata auto-captured at INIT (target repo commit, hardware-config sha, language versions, latency-config snapshots). nous package tarballs work_dir + reproduce.sh + Dockerfile + README for paper artifact evaluation. E. Hygiene (F6, F8, F9, F15) — F6 worktree_extras tracked-path warning at campaign load; F8 nous resume diagnostic for work_dir confusion; F9 nous clean --orphaned; F15 physical_realism_check schema + soft warning. See docs/friction-245-resolution.md for the per-F-entry → file map. Every change is tagged in code with (#NNN / F<n>) so git blame + issue tracker form a complete audit trail. New modules: orchestrator/reproducibility.py — F17 capture orchestrator/lineage.py — F21 cumulative patches + derived_from orchestrator/plot_specs.py — F18 figure pipeline New CLI subcommands: nous lineage, nous clean, nous package, nous stop --immediate New schema fields: campaign: locked_parameters, locked_workload, derived_from, plot_specs, reproducibility_metadata, sdk_timeouts.turn_silence_threshold_seconds (per-phase map) bundle: physical_realism_check, unlocked_parameters_audit, workload_changes_from_canonical, rehearsal_subset.depth_overrides, timing_observations.recommended_turn_silence_threshold_seconds (per-phase map) Tests: 32 new in tests/test_friction_245.py, 1278 total passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
a30f09c
into
AI-native-Systems-Research:reflective
2 checks passed
sriumcp
added a commit
to sriumcp/agentic-strategy-evolution
that referenced
this pull request
Jun 1, 2026
…native-Systems-Research#267 Addresses every finding from the multi-agent PR review: Critical: - C1: drop ``except ImportError: pass`` for orchestrator.lineage in iteration.py. Hoist imports to module top — a broken intra-package module is a self-inflicted bug, not an optional dependency, and the preceding comment promised "loud failure" anyway. - C2: ``_validate_locked_workload`` now surfaces malformed/unreadable workload yamls as deviations instead of silently skipping them. Catching workload drift is the whole point of F20. Important: - I1: ``invoke_plot_specs`` no longer guesses ``campaign_yaml_dir`` from work_dir's parent. Threaded ``campaign_path`` through ``setup_work_dir`` (recorded in state.json[\"config_ref\"]) and a new ``_campaign_yaml_dir_from_state`` reader resolves it for the finalize step. Plot script paths now resolve correctly in production, not just tests. - I2: ``_pick_interpreter`` rewritten as ``_build_command``, returning the proper argv list. The previous version invoked executable scripts with themselves as ``argv[1]``. - I3: aclose cleanup's broad ``except Exception: pass`` now logs at WARNING instead of swallowing silently. The narrow tuple of documented races (TimeoutError, CancelledError, RuntimeError, GeneratorExit) is still silent — only the unknown-class fallback gains observability. - I4-I10: 25 new behavioral tests covering F4 auto-approve diff emission, F19 ``_resolve_turn_silence_threshold`` per-phase + scalar back-compat (uncovered a real bug — fix below), F17 attach_to_state idempotency + repo_dirty capture + snapshot_iter_files, F11 boundary at total_files=4/5 + formula assertion, F12 RuntimeError + non-documented exception handling, F20 declared-deviation pin, F21 apply_derived_from_patch round-trip + cumulative.patch.error sidecar. Bug surfaced and fixed by F19 tests: - When ``sdk_timeouts.turn_silence_threshold_seconds`` was unset entirely, the old init applied 600 to every phase (defeating F19's per-phase split). Now distinguished from the explicit scalar form: absent → per-phase defaults stand; explicit scalar → applied uniformly (back-compat). Code-reviewer suggestions: - S1: ``attach_to_state`` is no longer dead code. Docstring updated; RuntimeError on JSON decode failure (was: silent return). - S2: ``nous package`` stages reproduce.sh / Dockerfile / PACKAGE_README.md to a temp directory and tars from there — the work_dir on disk is unchanged. - S3: redundant ``except (OSError, Exception)`` in ``summarize_lineage`` replaced with narrow handlers that record ``campaign_yaml_error`` into the summary so ``nous lineage`` shows the operator why derived_from couldn't be determined. - S4: ``campaign.plot_specs`` schema description corrected — runs per-iteration during finalize, not as a separate end-of-campaign rollup. - S5: snapshot guard short-circuit removed; idempotency lives in ``snapshot_iter_files`` itself. Comment errors fixed: - ``reproducibility.attach_to_state`` 24h re-init claim removed. - ``_emit_high_build_warning`` docstring no longer claims an unimplemented OR clause. - ``compute_campaign_spec_diff`` docstring says "five sub-keys", not "three". - campaign schema: silence-threshold fallback chain corrected. - ``nous resume`` error: dropped the false "re-emit reproducibility metadata" claim (first-capture-wins). - sdk_dispatch path-walk comment expanded to count all four ``.parent`` calls. Dead ``except (IndexError, ValueError)`` replaced with a real boundary check. - ``--immediate`` help: "tool-call return" → "event boundary". - README precondition list: principles.json clarification moved out of the numbered list (not a precondition). - BLIS quote in execute_analyze.md gains a "see repo_commit in reproducibility_metadata to reproduce the snapshot" pointer. - ``_walk_locked_workload`` tenant-tracking ternary documented. - ``_resolve_turn_silence_threshold`` docstring acknowledges that step 3 and step 4 of the resolution chain are merged in ``_phase_silence_thresholds``. Cumulative.patch failure visibility: - ``emit_cumulative_patch`` now writes a ``patches/cumulative.patch.error`` sidecar with git stderr when emission fails. ``summarize_lineage`` reads it; ``nous lineage`` surfaces the message. Without the sidecar, a failed emission was a single warning line in orchestrator.log that downstream ``derived_from`` campaigns would silently miss months later. Tests: 1303 passing (was 1278), 2 skipped, 0 regressions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
External campaign-author friction report from running the
paper-memorytime-mirage campaign on nous against BLIS surfaced
21 distinct points of friction. This PR resolves the entire tracker
in a single coherent change, organized by theme.
--auto-approve.campaign.locked_parameters([F1] Addlocked_parametersschema field to enforce campaign spec-fidelity in bundles #246) hard-fails bundle deviations regardless of--auto-approve.locked_workload([F20] Schema: workload yaml deviations needlocked_workload+workload_changes_from_canonical#265) does the same for workload yamls. Methodology prompt now declares the campaign > target-repo-docs hierarchy.runningBatch/RequestMapworked example), the unlocked-parameters audit, and the rehearsal-as-instrument positive pattern.cumulative.patch+campaign.derived_from+nous lineagefor cross-campaign code reuse.reproducibility_metadataauto-captured at INIT (target repo commit, hardware-config sha, language versions, latency-config snapshots).nous packagetarballs work_dir + reproduce.sh + Dockerfile + README.worktree_extrastracked-path warning,nous resumediagnostic,nous clean --orphaned,physical_realism_checkschema.The full per-F-entry → file map is in
docs/friction-245-resolution.md. Every change is tagged in code with(#NNN / F<n>)sogit blame+ the issue tracker form a complete audit trail.Why this PR is correct
A newcomer (or AI agent) navigating cold should be able to verify each F-entry's resolution in three steps:
docs/friction-245-resolution.md.(#NNN / F<n>)for cross-reference.The architectural primitives are deliberately small and orthogonal:
orchestrator/validate.py).No refactors. Every behavior change is opt-in (legacy campaigns without
locked_parameterskeep working unchanged), and every schema addition is backward-compatible (oneOf'd into the existing fields where shape changed).New modules
orchestrator/reproducibility.py— F17 capture + per-iter snapshotsorchestrator/lineage.py— F21cumulative.patch+derived_fromresolutionorchestrator/plot_specs.py— F18 figure pipelineNew CLI surface
nous stop --immediate— F5 event-boundary haltnous lineage <run_id>— F21 inheritance inspectionnous clean --orphaned— F9 stale-worktree cleanupnous package <run_id>— F18 paper artifact tarballNew schema fields
locked_parameterslocked_workloadderived_fromplot_specsreproducibility_metadata(auto-populated)sdk_timeouts.turn_silence_threshold_seconds(now accepts per-phase map)experiment_spec.physical_realism_checkexperiment_spec.unlocked_parameters_auditworkload_changes_from_canonicalexperiment_spec.rehearsal_subset.depth_overridestiming_observations.recommended_turn_silence_threshold_seconds(per-phase map)reproducibility_metadataTest plan
pytest tests/test_friction_245.py— 32 new tests covering F1, F3, F4, F11, F15, F17, F18, F19, F20, F21 (the F-entries with behavioral changes; F2/F7/F16 are prompt-text changes; F5/F6/F8/F9/F10/F12/F13/F14 are CLI/docs/runtime side effects).pytest tests/— 1278 tests passing, 2 skipped, 0 regressions.python -c "import yaml, jsonschema; ..."— schema additions parse and validate against jsonschema.Authoring discipline (for posterity)
The PR introduces
docs/campaign-authoring-guide.mdwhich captures:total_kv_blocksmattered in paper-memorytime-mirage).CLAUDE.md and
docs/architecture.mdlink to it.Out of scope
uv.lockis untracked at session start (pre-existing). I deliberately did not commit it — that's a separate concern from #245 resolution.🤖 Generated with Claude Code