Preserve pre-edit proof in post-tool telemetry#4
Merged
NagyVikt merged 1 commit intoApr 30, 2026
Merged
Conversation
Colony needs to correlate one tool call across PreToolUse and PostToolUse without treating after-the-fact edit data as pre-edit proof. The hook now derives a stable trace_id from session/tool_use_id, records successful pre emissions locally, marks missing_pre_tool_use when a mutating post event lacks that marker, and attaches compact failure summaries to failed post events. Constraint: Post-tool data must not count as claim-before-edit proof Rejected: Infer pre safety from PostToolUse file paths | that would fake claim-before-edit coverage Confidence: high Scope-risk: moderate Tested: npm run build Tested: node --test dist/scripts/__tests__/codex-native-hook.test.js Tested: node --test dist/colony/__tests__/bridge.test.js Tested: ./node_modules/.bin/biome lint src/scripts/codex-native-hook.ts src/scripts/__tests__/codex-native-hook.test.ts Tested: npm run check:no-unused Not-tested: full npm test Co-authored-by: OmX <omx@oh-my-codex.dev>
NagyVikt
added a commit
that referenced
this pull request
May 6, 2026
* Avoid model-backed team status inspect by default Team status pane inspection currently points users directly at `omx sparkshell`, which can spend Spark/model quota every time long pane output is summarized. Default status output now prints raw `tmux capture-pane` inspect commands and requires `--model-inspect` for model-backed sparkshell summaries. Constraint: Keep JSON status metadata backward-compatible while making human-facing text hints quota-safe by default. Rejected: Disabling sparkshell globally | sparkshell remains useful when explicitly requested. Confidence: high Scope-risk: narrow Directive: Preserve raw inspect as the safe default for any future team status inspection hints. Tested: npm run build; node --test dist/cli/__tests__/team.test.js; npx tsc --noEmit; npm run check:no-unused * Keep default status summaries quota-free Replace the human inspect_summary command field at render time so default text output mirrors raw tmux guidance while --model-inspect keeps the sparkshell command.\n\nConstraint: PR #1981 blocker requires default human status output to avoid omx sparkshell outside the --model-inspect hint.\nRejected: Appending an additional default command | left the original sparkshell command visible in inspect_summary.\nConfidence: high\nScope-risk: narrow\nDirective: Keep JSON/model-inspect sparkshell guidance intact when changing pane status presentation.\nTested: npm run build; node --test dist/cli/__tests__/team.test.js; npm run lint -- src/cli/team.ts src/cli/__tests__/team.test.ts\nNot-tested: full npm test * Give Codex workers longer startup evidence windows Codex-backed interactive team workers can take longer than five seconds to move from hook notification to observable task progress. Treating that slow startup as no evidence causes false worker-dead classifications and failed teams even when the pane is alive. Increase the default startup evidence floor and launch cap, while keeping the explicit OMX_TEAM_STARTUP_EVIDENCE_TIMEOUT_MS override for tests and operators. Add a regression test that only succeeds when the default window waits long enough for delayed worker progress evidence. Validation: npm run build; node --test --test-name-pattern='uses a production startup evidence window that can tolerate slow Codex startup|startTeam rejects interactive startup when tmux fallback never produces worker startup evidence' dist/team/__tests__/runtime.test.js * Make Codex edits visible before mutation Codex native hooks only watched Bash, so Colony saw most file edits after the mutation. The hook generator now covers edit-family tools, and native PreToolUse dispatch forwards a best-effort Colony pre-edit bridge before local OMX checks continue. Constraint: Claim-before-edit telemetry requires a signal before file mutation Rejected: Count PostToolUse late claims as claim-before-edit | late telemetry is useful audit data but cannot prove pre-edit safety Confidence: high Scope-risk: moderate Directive: Keep Colony bridge warnings allow-only unless policy explicitly changes to blocking edits Tested: npm ci; npm run build; npm run lint; node --test dist/scripts/__tests__/codex-native-hook.test.js dist/config/__tests__/codex-hooks.test.js; npm run verify:native-agents; npm run verify:plugin-bundle; npm test Not-tested: live Colony CLI against production task binding * Keep Colony bridge storage writable Codex native hooks now run the Colony pre-edit bridge with a repo-local writable COLONY_HOME by default. This prevents sandboxed hook processes from trying to write the user-level Colony database while keeping Colony advisory output best-effort. Constraint: Codex hook sandbox can read home config but cannot write ~/.colony/data.db reliably. Rejected: Keep direct Colony hooks in hooks.json | they hard-fail on PostToolUse before OMX can degrade gracefully. Confidence: high Scope-risk: narrow Directive: Keep Colony bridge failures advisory; do not let Colony storage errors fail native hooks. Tested: npm run build Tested: node --test dist/scripts/__tests__/codex-native-hook.test.js * Centralize Colony hook transport behind OMX bridge OMX native hooks now call a dedicated ColonyBridge for lifecycle telemetry instead of owning CLI spawn details inline. The bridge sends a local-first lifecycle envelope to the Colony CLI, records local failure telemetry, and keeps hook execution advisory by default when Colony is missing, slow, or failing. Constraint: Colony coordination must not become an MCP dependency for hook safety. Constraint: Plain OMX usage must continue when the Colony CLI is unavailable. Rejected: Keep direct pre-tool-use spawn logic in codex-native-hook.ts | lifecycle transport would remain scattered and hard to extend. Confidence: high Scope-risk: moderate Directive: Keep Colony bridge failures warn-only unless a future task explicitly changes default policy. Tested: npm run build Tested: node --test dist/colony/__tests__/bridge.test.js dist/scripts/__tests__/codex-native-hook.test.js Tested: npm run lint * Make Autopilot loopback explicit through review-gated planning Autopilot now treats planning, execution, and review as one bounded loop: ralplan produces the contract, ralph implements and verifies it, and code-review either approves or sends findings back into planning with persisted handoff state. Constraint: Issue #2000 requires Autopilot to use existing ralplan, ralph, code-review, hook, state, and pipeline primitives instead of a broad lifecycle. Rejected: Preserving expansion/QA/validation as primary Autopilot phases | it conflicts with the strict review loop. Confidence: high Scope-risk: moderate Directive: Keep Autopilot phases limited to ralplan, ralph, code-review, complete/failed/cancelled unless issue #2000 is intentionally superseded. Tested: npm run build Tested: npm run sync:plugin:check Tested: node dist/scripts/generate-catalog-docs.js --check Tested: node --test dist/hooks/__tests__/autopilot-skill-contract.test.js dist/hooks/__tests__/keyword-detector.test.js dist/hooks/__tests__/notify-hook-auto-nudge.test.js dist/hooks/__tests__/prompt-guidance-contract.test.js dist/hooks/__tests__/skill-guidance-contract.test.js dist/hooks/__tests__/deep-interview-contract.test.js dist/state/__tests__/workflow-transition.test.js dist/mcp/__tests__/state-server.test.js dist/modes/__tests__/base-autoresearch-contract.test.js dist/pipeline/__tests__/orchestrator.test.js dist/pipeline/__tests__/stages.test.js Tested: git diff --check Not-tested: Full npm test after final targeted fixes; live end-to-end GitHub workflow execution through a real Autopilot run. Co-authored-by: OmX <omx@oh-my-codex.dev> * Retire broken direct Colony hook wrappers Setup refresh previously preserved direct Colony hook commands as user-owned hooks, so old PostToolUse entries could survive in fresh Codex agents even after the native OMX bridge was installed. This classifies those old direct Colony hook wrappers as retired OMX-managed hook commands for setup/uninstall stripping, while active coverage checks still require codex-native-hook.js. Constraint: Existing user hook commands must remain preserved during setup refresh. Rejected: Remove all non-native PostToolUse hooks | would delete legitimate user-owned hooks. Confidence: high Scope-risk: narrow Directive: Do not count retired direct Colony wrappers as active native hook coverage; they are cleanup targets only. Tested: npm run build Tested: node --test dist/config/__tests__/codex-hooks.test.js Tested: node --test dist/cli/__tests__/setup-hooks-shared-ownership.test.js Tested: node --test dist/scripts/__tests__/codex-native-hook.test.js Tested: setup refresh fixture from ~/.codex/hooks.json.bak-colony-direct-hooks-1777456691802 removed 0 direct wrappers left and kept 5 native wrappers * Keep Colony bridge transport failures quiet The native bridge is best-effort, but transport failures were being converted into allow-only hook output. In Codex this made fresh PostToolUse runs print noisy failures like spawnSync colony EPERM even though the hook continued safely. This keeps real Colony advisory output visible while treating bridge transport failures as telemetry-only events under .omx/logs. Constraint: Valid Colony hook advisories must still merge with native hook output. Rejected: Disable the Colony bridge entirely | would lose auto-claim payload delivery when the CLI transport is available. Confidence: high Scope-risk: narrow Directive: Bridge transport warnings are operational telemetry, not user-facing PostToolUse feedback. Tested: npm run build Tested: npm run check:no-unused Tested: node --test dist/scripts/__tests__/codex-native-hook.test.js Tested: node --test dist/config/__tests__/codex-hooks.test.js Tested: PostToolUse /dev/null smoke with spawnSync colony EPERM produced no stdout and exit 0 * Preserve strict autopilot review-loop handoffs Strict Ralph now starts from ralplan output while legacy ralph-verify keeps team execution input, and review-loop state reaches ralplan without polluting persisted handoff keys.\n\nConstraint: PR #2001 review requested fixes for ralplan consumption, non-clean review replanning, and code_review handoff contract alignment.\nRejected: Persisting stage-name code-review handoffs | contract and hook seed already use code_review.\nConfidence: high\nScope-risk: narrow\nDirective: Keep handoff_artifacts keyed by contract names when adding pipeline phases.\nTested: npm run build; node --test dist/pipeline/__tests__/stages.test.js dist/pipeline/__tests__/orchestrator.test.js; npm run lint; npm run verify:native-agents; npm run sync:plugin:check; npm run verify:plugin-bundle; npx tsc --noEmit --pretty false --project tsconfig.json; git diff --check\nNot-tested: full npm test suite * add * Expose Colony claim edit trace identity OMX forwards Codex hook payloads to Colony, but claim and edit rows can only correlate if both sides expose the same compact identity fields. This adds a trace id derived from Codex tool_use_id when available, records normalized file paths for edit and patch transports, and writes OMX_COLONY_TRACE=1 JSONL rows for claim/edit comparison without changing normal hook behavior. Constraint: Diagnostic trace must not include full patch or edit bodies. Rejected: Add more agent prompt nudges | claims already exist, the defect is correlation identity. Confidence: high Scope-risk: narrow Directive: Keep trace rows compact and audit-only; do not count PostToolUse recovery as pre-edit success. Tested: npm run build; node --test dist/colony/__tests__/bridge.test.js; node --test --test-name-pattern 'bridges PreToolUse Edit payloads to Colony before the edit proceeds' dist/scripts/__tests__/codex-native-hook.test.js; npm run check:no-unused; npm run lint -- src/colony/bridge.ts src/colony/__tests__/bridge.test.ts Not-tested: live Colony database correlation against production sessions * Avoid invalid Codex config environment table Codex CLI 0.125.0 rejects top-level env as an unknown config property, so OMX should seed explore routing through shell_environment_policy.set and migrate legacy env entries during setup refresh. Constraint: Codex config schema accepts shell_environment_policy.set for environment injection, not a root env table. Rejected: Keep writing [env] and rely on Codex tolerating unknown tables | current schema diagnostics reject it. Confidence: high Scope-risk: narrow Directive: Keep future setup-managed environment variables under shell_environment_policy.set unless Codex introduces a documented replacement. Tested: npm run build Tested: node --test dist/config/__tests__/generator-idempotent.test.js dist/cli/__tests__/doctor-warning-copy.test.js dist/cli/__tests__/setup-scope.test.js Tested: npm run lint Tested: npm run test:ci:compiled Tested: git diff --check Not-tested: live Codex App schema UI beyond local Codex CLI config parsing evidence * Preserve TOML env entries during schema migration Codex review found that the legacy [env] migration copied only assignment lines, which could truncate multiline TOML values. This keeps migrated and stripped table keys as full TOML entries instead of single lines. Constraint: Existing setup refresh must migrate legacy [env] into [shell_environment_policy.set] without parsing and rewriting the whole user config. Rejected: Keep line-only extraction | it corrupts multiline strings and arrays during migration. Confidence: high Scope-risk: narrow Directive: Preserve full TOML entry ranges when moving or removing table keys, not just the first assignment line. Tested: npm run build Tested: node --test dist/config/__tests__/generator-idempotent.test.js Tested: npm run lint Tested: npm run test:ci:compiled Tested: git diff --check * Fix tmux session isolation: pane-scoped tags (#2005) * Isolate tmux nudges by instance ownership Tag OMX-managed tmux sessions with the launching session id and make notify-hook resolution prefer matching session options before stale pane targets. Reject panes whose tagged tmux session belongs to another OMX instance. Constraint: concurrent OMX sessions can reuse static pane ids, so hook routing must verify tmux session ownership before send-keys. Rejected: trusting .omx/tmux-hook.json pane ids alone | pane ids can point at unrelated sessions after concurrent launches or stale config healing. Confidence: high Scope-risk: moderate Directive: keep tmux nudge paths fail-closed when @omx_instance_id is present and mismatched. Tested: npm run build; npm run lint; npm run check:no-unused; node --test dist/hooks/__tests__/notify-hook-tmux-heal.test.js dist/cli/__tests__/index.test.js dist/team/__tests__/tmux-session.test.js Not-tested: full npm test completion in final commit pass; skipped per supervisor request. * Fix P1 regression: switch to pane-scoped isolation tags Switched from session-scoped @omx_instance_id to pane-scoped @omx_pane_instance_id to prevent last-writer-wins conflicts when multiple OMX instances share a single tmux session. Changes: - src/scripts/notify-hook/managed-tmux.ts: use pane-scoped tags - src/team/tmux-session.ts: set pane tags on spawn - Tests updated for pane-scoped logic Constraint: Multi-instance isolation within same session Rejected: Session-scoped tags (last-writer-wins regression) Confidence: high Scope-risk: moderate Directive: future isolation work should prefer pane-level or PID-level scope Tested: lint and typecheck Not-tested: live multi-instance tmux validation Co-authored-by: Hermes Agent <hermes@nousresearch.com> * Preserve pane mismatch reason for hook consumers Keep pane-scoped ownership while preserving the existing pane_instance_mismatch reason consumed by notify-hook state and tests. Constraint: hook state consumers assert stable last_reason values Rejected: expose new tmux_pane_instance_mismatch reason | breaks existing pane mismatch contract Confidence: high Scope-risk: narrow Directive: new tmux ownership scopes must preserve public hook-state reason strings unless migrations update consumers Tested: npm run build; node --test dist/hooks/__tests__/notify-hook-managed-tmux.test.js dist/hooks/__tests__/notify-hook-tmux-heal.test.js dist/team/__tests__/tmux-session.test.js; npm run lint; npm run check:no-unused Not-tested: live multi-instance tmux manual validation --------- Co-authored-by: Hermes Agent <hermes@nousresearch.com> * Align prompt surfaces with GPT-5.5 guidance Refresh the shared prompt-guidance contract around outcome-first goals, concise preambles, validation expectations, and explicit stop rules while preserving OMX lifecycle and exact-model invariants. Constraint: Issue #2007 requested aligning skills, agents, AGENTS.md, hook messages, and prompt guidance with the official OpenAI GPT-5.5 prompt guidance. Rejected: Full workflow-contract rewrite | too broad and risked weakening OMX runtime gates. Rejected: Updating gpt-5.4-mini seams | exact-model adaptation remains an intentional invariant outside this prompt-behavior refresh. Confidence: high Scope-risk: moderate Directive: Keep future prompt edits outcome-first, but do not widen exact gpt-5.4-mini model adaptation without a separate scoped issue. Tested: npm run build; node --test prompt-guidance/skill/terminal/native-config/generator-notify set; npm run prompt:inventory; git diff --check Not-tested: Full npm test suite * Keep plugin skill mirror aligned with prompt guidance Refresh the checked-in plugin skill copies after the canonical prompt-guidance rename so CI validates one source of truth rather than stale bundled skill text. Constraint: PR #2008 changed canonical root skill guidance headings from GPT-5.4 to GPT-5.5. Rejected: Relaxing plugin mirror checks | the CI failure correctly caught bundled skill drift. Confidence: high Scope-risk: narrow Directive: Run npm run sync:plugin after canonical skill prompt edits. Tested: npm run build; npm run sync:plugin:check; npm run test:plugin-boundaries:compiled; Node 20 cli-core-rest lane command; npm run prompt:inventory; git diff --check Not-tested: Remaining unrelated CI jobs still pending before this push * Teach reviews to reject masking workaround fixes Add code-reviewer-only guidance that treats fallback/workaround patches as review blockers when they hide failures or avoid the primary contract, while preserving a narrow compatibility exception with documentation, tests, and visible failure behavior. Constraint: Issue #2009 requested a focused code-reviewer ruleset change with no runtime behavior changes. Rejected: Runtime enforcement or code-review skill changes | prompt guidance is the requested surface and keeps behavior unchanged. Confidence: high Scope-risk: narrow Directive: Keep fallback rejection scoped to masking workarounds; do not reject explicitly justified compatibility boundaries. Tested: git diff --check; npm run build; node --test dist/hooks/__tests__/prompt-guidance-wave-two.test.js; npm run lint; npm run verify:native-agents; npm run verify:plugin-bundle; lsp_diagnostics prompt-guidance-wave-two.test.ts * Keep plugin MCP servers alive through omx mcp-serve (#2013) * Keep plugin-launched MCP servers alive The installable Codex plugin exposes OMX MCP servers through the `omx mcp-serve` wrapper. That wrapper was exiting immediately after dispatch, which terminated stdio MCP servers before clients could complete initialization. The wrapper now preserves normal one-shot CLI exit behavior while letting `mcp-serve` own its stdio lifecycle. A behavioral packaging contract initializes the packaged wrapper through stdio so the plugin MCP discovery path stays protected without hardwired config entries. Constraint: Codex plugin MCP discovery should work from plugin metadata without adding explicit OMX MCP tables to config.toml Rejected: Add [mcp_servers.omx_*] entries during setup | would bypass plugin auto-discovery and reintroduce hardwired config Rejected: Source-regex-only regression test | too brittle and weaker than proving stdio initialization works Confidence: high Scope-risk: narrow Tested: npm run build Tested: node --test dist/cli/__tests__/package-bin-contract.test.js dist/cli/__tests__/mcp-serve.test.js dist/cli/__tests__/codex-plugin-layout.test.js Tested: Manual plugin MCP initialize/write smoke tests on three independent systems Not-tested: Full npm test suite * Prove every plugin MCP target initializes through the bin wrapper Keep the contributor's lifecycle fix intact while widening the regression from the single state server to the full first-party plugin MCP roster used by .mcp.json. Constraint: Issue #2011 reports all plugin-scoped first-party MCP servers failing through omx mcp-serve. Rejected: Replacing plugin metadata with direct node dist/mcp entrypoints | this would bypass the intended public omx mcp-serve contract. Confidence: high Scope-risk: narrow Directive: Keep omx mcp-serve covered as a bin-equivalent stdio handshake, not only as a command-dispatch unit test. Tested: npm run build; node --test dist/cli/__tests__/package-bin-contract.test.js; node --test dist/cli/__tests__/package-bin-contract.test.js dist/cli/__tests__/mcp-serve.test.js dist/cli/__tests__/codex-plugin-layout.test.js; npm run test:plugin-boundaries:compiled; bounded timeout/head initialize probes for state, memory, code-intel, trace, wiki. Not-tested: full npm run test:ci:compiled was interrupted after substantial progress because the broad team runtime lane is long-running in this tmux session. --------- Co-authored-by: aeyeopsdev <aeyeopsdev@users.noreply.github.com> * Make omx question feel like a blocking popup wizard (#2014) Batch questions now share one normalized record contract, render in a large adaptive leader-targeted split, support arrow/back review flow, and return all answers through both JSON and the tmux side-channel. Constraint: no external Claude integration, no new runtime dependencies, no persistent UI after answer, and no exact visual clone. Rejected: fixed-height single-question pane | could not cover multi-question content or leader-pane popup expectation. Rejected: first-answer-only return injection | silently drops batch answers after the first question. Confidence: high Scope-risk: moderate Directive: Keep questions/answers as the primary payload contract; prompt/answer are legacy projections only. Tested: npm run build; targeted compiled tests 85/85; npx tsc --noEmit --pretty false; git diff --check; npm run lint; npm run check:no-unused; live omx question batch panel demo. Not-tested: full test:ci:compiled after final fixes; earlier full run had an unrelated notify-fallback watcher flake that passed focused rerun. Co-authored-by: bellman <bellman@bellmanui-MacBookAir.local> * docs: document .omx-config.json schema and effective model routing (#2016) * Clarify supported OMX config routing Document the supported .omx-config.json schema surfaces and model routing precedence so users can choose safe cost-saving or max-quality defaults without inventing unsupported keys. Constraint: Docs-only fix for #2015; schema claims are grounded in current config, notification, agent, and team routing code. Rejected: Runtime schema validation changes | Issue scope requests docs-only guidance. Confidence: high Scope-risk: narrow Directive: Keep future examples limited to keys recognized by the installed OMX version. Tested: JSON fences parsed; local markdown links checked; key coverage grep; git diff --check. Not-tested: No runtime behavior changed. * Clarify config routing precedence after review Constraint: PR #2016 requested a docs-only follow-up that preserves issue #2015 schema-routing scope.\nRejected: Changing source behavior or adding new config surfaces | review blockers only require correcting documentation wording.\nConfidence: high\nScope-risk: narrow\nDirective: Keep this reference tied to current source precedence instead of generalizing all .omx-config.json readers as one path.\nTested: markdown fence sanity; JSON fenced-block parse; local relative link existence; git diff --check\nNot-tested: full test suite, because this is a docs-only wording fix. * Align guidance with the omx question response contract Update prompt and skill surfaces so structured-question guidance follows the enhanced popup renderer and primary answers[] payload while keeping deep-interview one round at a time. Constraint: Follow-up PR only updates guidance and matching hook tests; no runtime feature changes or new dependencies. Rejected: Batch multiple deep-interview rounds | violates the Socratic one-question ambiguity gate. Confidence: high Scope-risk: narrow Directive: Keep root skills and plugin mirrors synchronized when changing workflow guidance. Tested: npm run build; npm run sync:plugin:check; node --test dist/scripts/__tests__/codex-native-hook.test.js dist/config/__tests__/generator-idempotent.test.js dist/catalog/__tests__/plugin-bundle-ssot.test.js dist/cli/__tests__/codex-plugin-layout.test.js; npm run lint; npm run check:no-unused; npx tsc --noEmit --pretty false; git diff --check Not-tested: Full npm test suite was not run; targeted hook, generator, plugin, lint, type, and mirror checks covered the changed surfaces. Co-authored-by: OmX <omx@oh-my-codex.dev> * Warn when gpt-5.5 context settings exceed OMX guidance Add a doctor-only diagnostic so oversized context settings are visible without changing explicit user configuration. Constraint: Issue #2018 requires preserving config semantics and avoiding unsupported Codex/API limit claims. Rejected: Clamp oversized context settings | would rewrite user config and overclaim an authoritative hard limit. Confidence: high Scope-risk: narrow Directive: Keep this warning framed as an OMX setup recommendation unless authoritative runtime limit evidence is added. Tested: npm run build; node --test dist/cli/__tests__/doctor-context-window-warning.test.js dist/cli/__tests__/doctor-invalid-config.test.js dist/config/__tests__/generator-idempotent.test.js dist/config/__tests__/generator-notify.test.js; npx biome lint affected files; npx tsc -p tsconfig.json --noEmit; manual doctor smoke. Not-tested: Full npm test suite. Co-authored-by: OmX <omx@oh-my-codex.dev> * Preserve question guidance while satisfying legacy contract gates Restore the literal legacy contract phrases that the hook guidance tests enforce without reverting the PR's primary answers[] guidance. Constraint: PR #2019 follow-up must stay limited to guidance wording and preserve the structured-question answers[] contract. Rejected: Updating the tests to accept only new wording | would broaden the contract instead of fixing the PR's compatibility regression. Confidence: high Scope-risk: narrow Directive: Keep root skills and plugin mirrors synchronized when changing workflow guidance. Tested: npm run build; node --test dist/hooks/__tests__/consensus-execution-handoff.test.js dist/hooks/__tests__/deep-interview-contract.test.js; node dist/scripts/run-test-files.js dist/hooks/__tests__ dist/hooks/code-simplifier/__tests__ dist/hooks/extensibility/__tests__ dist/notifications/__tests__ dist/mcp/__tests__ dist/hud/__tests__ dist/verification/__tests__ dist/openclaw/__tests__; npm run sync:plugin:check; npm run lint; npm run check:no-unused; git diff --check Not-tested: Full npm test suite was not run; the failed CI shard and focused contract tests passed locally. * Keep deep interview legacy guidance wording Restore the structured-question equivalent wording and legacy selected_values phrase while preserving the answers[] source-of-truth guidance. Tested: npm run build; node --test dist/hooks/__tests__/deep-interview-contract.test.js * Scope Ralph Stop to its owning session Native Stop now treats active Ralph state as authoritative only when the available OMX, Codex/native, thread, and pane identifiers do not contradict the current Stop payload. This fails closed for question-only panes that happen to share a project state directory while preserving same-session continuation. Constraint: Fix #2023 without broad deletion of user state across Zellij/tmux/Codex surfaces. Rejected: Scanning sibling/root Ralph state after a session-bound Stop payload | it can revive stale or unrelated panes. Confidence: high Scope-risk: narrow Directive: Keep Ralph Stop ownership checks additive and mismatch-based; do not reintroduce cross-session fallback for session-bound Stop payloads. Tested: npm run build; npm run lint; node --test dist/scripts/__tests__/codex-native-hook.test.js Not-tested: Live Zellij pane integration. Co-authored-by: OmX <omx@oh-my-codex.dev> * Harden reviewed 0.15.2 release edges Fail closed when review evidence is absent, keep managed tmux pane verification on preferred-mode paths, preserve batch question payload shape, and align release metadata for 0.15.2 after QA. Constraint: Release requested only if code review and ultraqa gates were clean from v0.15.1 diffs. Rejected: Keep silent approve defaults or bypass preferred pane verification | review found those paths could mask unsafe releases. Confidence: high Scope-risk: moderate Directive: Keep code-review stage defaults fail-closed unless explicit review evidence is supplied. Tested: npm test (4346/4346 pass plus catalog check); targeted patched notify/question/pipeline/team tests passed. Not-tested: Remote release workflow execution after tag push. * Avoid double-feeding generated AGENTS instructions Suppress only OMX-generated AGENTS content when composing the session-scoped model instructions file so Codex can keep real user/project AGENTS guidance without repeating the generated orchestration brain. Constraint: issue #2025 requires exactly one authoritative generated OMX AGENTS surface per session/turn while preserving distinct real AGENTS.md discovery. Rejected: disabling model_instructions_file injection entirely | would drop runtime overlay and rely on host behavior beyond this repo. Rejected: omitting every marker-containing AGENTS source | merged AGENTS files can contain user guidance around OMX-managed blocks. Confidence: high Scope-risk: narrow Directive: Keep suppression marker-bounded; do not broaden to all host-discoverable AGENTS without launch-mode evidence. Tested: npm install; npm run build; npm run lint; node --test dist/hooks/__tests__/agents-overlay.test.js; node --test dist/hooks/__tests__/agents-overlay.test.js dist/cli/__tests__/index.test.js Not-tested: full npm test Co-authored-by: OmX <omx@oh-my-codex.dev> * Expose bridge proof through OMX CLI (#2) * Avoid model-backed team status inspect by default Team status pane inspection currently points users directly at `omx sparkshell`, which can spend Spark/model quota every time long pane output is summarized. Default status output now prints raw `tmux capture-pane` inspect commands and requires `--model-inspect` for model-backed sparkshell summaries. Constraint: Keep JSON status metadata backward-compatible while making human-facing text hints quota-safe by default. Rejected: Disabling sparkshell globally | sparkshell remains useful when explicitly requested. Confidence: high Scope-risk: narrow Directive: Preserve raw inspect as the safe default for any future team status inspection hints. Tested: npm run build; node --test dist/cli/__tests__/team.test.js; npx tsc --noEmit; npm run check:no-unused * Keep default status summaries quota-free Replace the human inspect_summary command field at render time so default text output mirrors raw tmux guidance while --model-inspect keeps the sparkshell command.\n\nConstraint: PR #1981 blocker requires default human status output to avoid omx sparkshell outside the --model-inspect hint.\nRejected: Appending an additional default command | left the original sparkshell command visible in inspect_summary.\nConfidence: high\nScope-risk: narrow\nDirective: Keep JSON/model-inspect sparkshell guidance intact when changing pane status presentation.\nTested: npm run build; node --test dist/cli/__tests__/team.test.js; npm run lint -- src/cli/team.ts src/cli/__tests__/team.test.ts\nNot-tested: full npm test * Make Autopilot loopback explicit through review-gated planning Autopilot now treats planning, execution, and review as one bounded loop: ralplan produces the contract, ralph implements and verifies it, and code-review either approves or sends findings back into planning with persisted handoff state. Constraint: Issue #2000 requires Autopilot to use existing ralplan, ralph, code-review, hook, state, and pipeline primitives instead of a broad lifecycle. Rejected: Preserving expansion/QA/validation as primary Autopilot phases | it conflicts with the strict review loop. Confidence: high Scope-risk: moderate Directive: Keep Autopilot phases limited to ralplan, ralph, code-review, complete/failed/cancelled unless issue #2000 is intentionally superseded. Tested: npm run build Tested: npm run sync:plugin:check Tested: node dist/scripts/generate-catalog-docs.js --check Tested: node --test dist/hooks/__tests__/autopilot-skill-contract.test.js dist/hooks/__tests__/keyword-detector.test.js dist/hooks/__tests__/notify-hook-auto-nudge.test.js dist/hooks/__tests__/prompt-guidance-contract.test.js dist/hooks/__tests__/skill-guidance-contract.test.js dist/hooks/__tests__/deep-interview-contract.test.js dist/state/__tests__/workflow-transition.test.js dist/mcp/__tests__/state-server.test.js dist/modes/__tests__/base-autoresearch-contract.test.js dist/pipeline/__tests__/orchestrator.test.js dist/pipeline/__tests__/stages.test.js Tested: git diff --check Not-tested: Full npm test after final targeted fixes; live end-to-end GitHub workflow execution through a real Autopilot run. Co-authored-by: OmX <omx@oh-my-codex.dev> * Preserve strict autopilot review-loop handoffs Strict Ralph now starts from ralplan output while legacy ralph-verify keeps team execution input, and review-loop state reaches ralplan without polluting persisted handoff keys.\n\nConstraint: PR #2001 review requested fixes for ralplan consumption, non-clean review replanning, and code_review handoff contract alignment.\nRejected: Persisting stage-name code-review handoffs | contract and hook seed already use code_review.\nConfidence: high\nScope-risk: narrow\nDirective: Keep handoff_artifacts keyed by contract names when adding pipeline phases.\nTested: npm run build; node --test dist/pipeline/__tests__/stages.test.js dist/pipeline/__tests__/orchestrator.test.js; npm run lint; npm run verify:native-agents; npm run sync:plugin:check; npm run verify:plugin-bundle; npx tsc --noEmit --pretty false --project tsconfig.json; git diff --check\nNot-tested: full npm test suite * Fix tmux session isolation: pane-scoped tags (#2005) * Isolate tmux nudges by instance ownership Tag OMX-managed tmux sessions with the launching session id and make notify-hook resolution prefer matching session options before stale pane targets. Reject panes whose tagged tmux session belongs to another OMX instance. Constraint: concurrent OMX sessions can reuse static pane ids, so hook routing must verify tmux session ownership before send-keys. Rejected: trusting .omx/tmux-hook.json pane ids alone | pane ids can point at unrelated sessions after concurrent launches or stale config healing. Confidence: high Scope-risk: moderate Directive: keep tmux nudge paths fail-closed when @omx_instance_id is present and mismatched. Tested: npm run build; npm run lint; npm run check:no-unused; node --test dist/hooks/__tests__/notify-hook-tmux-heal.test.js dist/cli/__tests__/index.test.js dist/team/__tests__/tmux-session.test.js Not-tested: full npm test completion in final commit pass; skipped per supervisor request. * Fix P1 regression: switch to pane-scoped isolation tags Switched from session-scoped @omx_instance_id to pane-scoped @omx_pane_instance_id to prevent last-writer-wins conflicts when multiple OMX instances share a single tmux session. Changes: - src/scripts/notify-hook/managed-tmux.ts: use pane-scoped tags - src/team/tmux-session.ts: set pane tags on spawn - Tests updated for pane-scoped logic Constraint: Multi-instance isolation within same session Rejected: Session-scoped tags (last-writer-wins regression) Confidence: high Scope-risk: moderate Directive: future isolation work should prefer pane-level or PID-level scope Tested: lint and typecheck Not-tested: live multi-instance tmux validation Co-authored-by: Hermes Agent <hermes@nousresearch.com> * Preserve pane mismatch reason for hook consumers Keep pane-scoped ownership while preserving the existing pane_instance_mismatch reason consumed by notify-hook state and tests. Constraint: hook state consumers assert stable last_reason values Rejected: expose new tmux_pane_instance_mismatch reason | breaks existing pane mismatch contract Confidence: high Scope-risk: narrow Directive: new tmux ownership scopes must preserve public hook-state reason strings unless migrations update consumers Tested: npm run build; node --test dist/hooks/__tests__/notify-hook-managed-tmux.test.js dist/hooks/__tests__/notify-hook-tmux-heal.test.js dist/team/__tests__/tmux-session.test.js; npm run lint; npm run check:no-unused Not-tested: live multi-instance tmux manual validation --------- Co-authored-by: Hermes Agent <hermes@nousresearch.com> * Align prompt surfaces with GPT-5.5 guidance Refresh the shared prompt-guidance contract around outcome-first goals, concise preambles, validation expectations, and explicit stop rules while preserving OMX lifecycle and exact-model invariants. Constraint: Issue #2007 requested aligning skills, agents, AGENTS.md, hook messages, and prompt guidance with the official OpenAI GPT-5.5 prompt guidance. Rejected: Full workflow-contract rewrite | too broad and risked weakening OMX runtime gates. Rejected: Updating gpt-5.4-mini seams | exact-model adaptation remains an intentional invariant outside this prompt-behavior refresh. Confidence: high Scope-risk: moderate Directive: Keep future prompt edits outcome-first, but do not widen exact gpt-5.4-mini model adaptation without a separate scoped issue. Tested: npm run build; node --test prompt-guidance/skill/terminal/native-config/generator-notify set; npm run prompt:inventory; git diff --check Not-tested: Full npm test suite * Keep plugin skill mirror aligned with prompt guidance Refresh the checked-in plugin skill copies after the canonical prompt-guidance rename so CI validates one source of truth rather than stale bundled skill text. Constraint: PR #2008 changed canonical root skill guidance headings from GPT-5.4 to GPT-5.5. Rejected: Relaxing plugin mirror checks | the CI failure correctly caught bundled skill drift. Confidence: high Scope-risk: narrow Directive: Run npm run sync:plugin after canonical skill prompt edits. Tested: npm run build; npm run sync:plugin:check; npm run test:plugin-boundaries:compiled; Node 20 cli-core-rest lane command; npm run prompt:inventory; git diff --check Not-tested: Remaining unrelated CI jobs still pending before this push * Teach reviews to reject masking workaround fixes Add code-reviewer-only guidance that treats fallback/workaround patches as review blockers when they hide failures or avoid the primary contract, while preserving a narrow compatibility exception with documentation, tests, and visible failure behavior. Constraint: Issue #2009 requested a focused code-reviewer ruleset change with no runtime behavior changes. Rejected: Runtime enforcement or code-review skill changes | prompt guidance is the requested surface and keeps behavior unchanged. Confidence: high Scope-risk: narrow Directive: Keep fallback rejection scoped to masking workarounds; do not reject explicitly justified compatibility boundaries. Tested: git diff --check; npm run build; node --test dist/hooks/__tests__/prompt-guidance-wave-two.test.js; npm run lint; npm run verify:native-agents; npm run verify:plugin-bundle; lsp_diagnostics prompt-guidance-wave-two.test.ts * Keep plugin MCP servers alive through omx mcp-serve (#2013) * Keep plugin-launched MCP servers alive The installable Codex plugin exposes OMX MCP servers through the `omx mcp-serve` wrapper. That wrapper was exiting immediately after dispatch, which terminated stdio MCP servers before clients could complete initialization. The wrapper now preserves normal one-shot CLI exit behavior while letting `mcp-serve` own its stdio lifecycle. A behavioral packaging contract initializes the packaged wrapper through stdio so the plugin MCP discovery path stays protected without hardwired config entries. Constraint: Codex plugin MCP discovery should work from plugin metadata without adding explicit OMX MCP tables to config.toml Rejected: Add [mcp_servers.omx_*] entries during setup | would bypass plugin auto-discovery and reintroduce hardwired config Rejected: Source-regex-only regression test | too brittle and weaker than proving stdio initialization works Confidence: high Scope-risk: narrow Tested: npm run build Tested: node --test dist/cli/__tests__/package-bin-contract.test.js dist/cli/__tests__/mcp-serve.test.js dist/cli/__tests__/codex-plugin-layout.test.js Tested: Manual plugin MCP initialize/write smoke tests on three independent systems Not-tested: Full npm test suite * Prove every plugin MCP target initializes through the bin wrapper Keep the contributor's lifecycle fix intact while widening the regression from the single state server to the full first-party plugin MCP roster used by .mcp.json. Constraint: Issue #2011 reports all plugin-scoped first-party MCP servers failing through omx mcp-serve. Rejected: Replacing plugin metadata with direct node dist/mcp entrypoints | this would bypass the intended public omx mcp-serve contract. Confidence: high Scope-risk: narrow Directive: Keep omx mcp-serve covered as a bin-equivalent stdio handshake, not only as a command-dispatch unit test. Tested: npm run build; node --test dist/cli/__tests__/package-bin-contract.test.js; node --test dist/cli/__tests__/package-bin-contract.test.js dist/cli/__tests__/mcp-serve.test.js dist/cli/__tests__/codex-plugin-layout.test.js; npm run test:plugin-boundaries:compiled; bounded timeout/head initialize probes for state, memory, code-intel, trace, wiki. Not-tested: full npm run test:ci:compiled was interrupted after substantial progress because the broad team runtime lane is long-running in this tmux session. --------- Co-authored-by: aeyeopsdev <aeyeopsdev@users.noreply.github.com> * Make omx question feel like a blocking popup wizard (#2014) Batch questions now share one normalized record contract, render in a large adaptive leader-targeted split, support arrow/back review flow, and return all answers through both JSON and the tmux side-channel. Constraint: no external Claude integration, no new runtime dependencies, no persistent UI after answer, and no exact visual clone. Rejected: fixed-height single-question pane | could not cover multi-question content or leader-pane popup expectation. Rejected: first-answer-only return injection | silently drops batch answers after the first question. Confidence: high Scope-risk: moderate Directive: Keep questions/answers as the primary payload contract; prompt/answer are legacy projections only. Tested: npm run build; targeted compiled tests 85/85; npx tsc --noEmit --pretty false; git diff --check; npm run lint; npm run check:no-unused; live omx question batch panel demo. Not-tested: full test:ci:compiled after final fixes; earlier full run had an unrelated notify-fallback watcher flake that passed focused rerun. Co-authored-by: bellman <bellman@bellmanui-MacBookAir.local> * docs: document .omx-config.json schema and effective model routing (#2016) * Clarify supported OMX config routing Document the supported .omx-config.json schema surfaces and model routing precedence so users can choose safe cost-saving or max-quality defaults without inventing unsupported keys. Constraint: Docs-only fix for #2015; schema claims are grounded in current config, notification, agent, and team routing code. Rejected: Runtime schema validation changes | Issue scope requests docs-only guidance. Confidence: high Scope-risk: narrow Directive: Keep future examples limited to keys recognized by the installed OMX version. Tested: JSON fences parsed; local markdown links checked; key coverage grep; git diff --check. Not-tested: No runtime behavior changed. * Clarify config routing precedence after review Constraint: PR #2016 requested a docs-only follow-up that preserves issue #2015 schema-routing scope.\nRejected: Changing source behavior or adding new config surfaces | review blockers only require correcting documentation wording.\nConfidence: high\nScope-risk: narrow\nDirective: Keep this reference tied to current source precedence instead of generalizing all .omx-config.json readers as one path.\nTested: markdown fence sanity; JSON fenced-block parse; local relative link existence; git diff --check\nNot-tested: full test suite, because this is a docs-only wording fix. * Align guidance with the omx question response contract Update prompt and skill surfaces so structured-question guidance follows the enhanced popup renderer and primary answers[] payload while keeping deep-interview one round at a time. Constraint: Follow-up PR only updates guidance and matching hook tests; no runtime feature changes or new dependencies. Rejected: Batch multiple deep-interview rounds | violates the Socratic one-question ambiguity gate. Confidence: high Scope-risk: narrow Directive: Keep root skills and plugin mirrors synchronized when changing workflow guidance. Tested: npm run build; npm run sync:plugin:check; node --test dist/scripts/__tests__/codex-native-hook.test.js dist/config/__tests__/generator-idempotent.test.js dist/catalog/__tests__/plugin-bundle-ssot.test.js dist/cli/__tests__/codex-plugin-layout.test.js; npm run lint; npm run check:no-unused; npx tsc --noEmit --pretty false; git diff --check Not-tested: Full npm test suite was not run; targeted hook, generator, plugin, lint, type, and mirror checks covered the changed surfaces. Co-authored-by: OmX <omx@oh-my-codex.dev> * Preserve question guidance while satisfying legacy contract gates Restore the literal legacy contract phrases that the hook guidance tests enforce without reverting the PR's primary answers[] guidance. Constraint: PR #2019 follow-up must stay limited to guidance wording and preserve the structured-question answers[] contract. Rejected: Updating the tests to accept only new wording | would broaden the contract instead of fixing the PR's compatibility regression. Confidence: high Scope-risk: narrow Directive: Keep root skills and plugin mirrors synchronized when changing workflow guidance. Tested: npm run build; node --test dist/hooks/__tests__/consensus-execution-handoff.test.js dist/hooks/__tests__/deep-interview-contract.test.js; node dist/scripts/run-test-files.js dist/hooks/__tests__ dist/hooks/code-simplifier/__tests__ dist/hooks/extensibility/__tests__ dist/notifications/__tests__ dist/mcp/__tests__ dist/hud/__tests__ dist/verification/__tests__ dist/openclaw/__tests__; npm run sync:plugin:check; npm run lint; npm run check:no-unused; git diff --check Not-tested: Full npm test suite was not run; the failed CI shard and focused contract tests passed locally. * Keep deep interview legacy guidance wording Restore the structured-question equivalent wording and legacy selected_values phrase while preserving the answers[] source-of-truth guidance. Tested: npm run build; node --test dist/hooks/__tests__/deep-interview-contract.test.js * Harden reviewed 0.15.2 release edges Fail closed when review evidence is absent, keep managed tmux pane verification on preferred-mode paths, preserve batch question payload shape, and align release metadata for 0.15.2 after QA. Constraint: Release requested only if code review and ultraqa gates were clean from v0.15.1 diffs. Rejected: Keep silent approve defaults or bypass preferred pane verification | review found those paths could mask unsafe releases. Confidence: high Scope-risk: moderate Directive: Keep code-review stage defaults fail-closed unless explicit review evidence is supplied. Tested: npm test (4346/4346 pass plus catalog check); targeted patched notify/question/pipeline/team tests passed. Not-tested: Remote release workflow execution after tag push. * Expose bridge proof through OMX CLI Users needed a local way to verify the OMX to Colony bridge without inferring state from Colony health. The new colony command reads local bridge telemetry, reports pending spooled lifecycle events, and runs a smoke event through the Colony CLI transport without using MCP or hosted services. Constraint: Smoke must not fake claim-before-edit or emit edit metadata Constraint: Bridge failure paths must keep OMX running and leave warning telemetry plus a spooled event Rejected: Call Colony MCP directly from OMX | hook safety must not depend on MCP availability Confidence: high Scope-risk: narrow Tested: npm run build Tested: node --test dist/cli/__tests__/colony.test.js Tested: ./node_modules/.bin/biome lint src/cli/colony.ts src/cli/__tests__/colony.test.ts src/cli/index.ts Tested: node dist/cli/omx.js colony smoke --json Not-tested: full npm test suite Co-authored-by: OmX <omx@oh-my-codex.dev> --------- Co-authored-by: HaD0Yun <HaD0Yun@users.noreply.github.com> Co-authored-by: Yeachan-Heo <54757707+Yeachan-Heo@users.noreply.github.com> Co-authored-by: OmX <omx@oh-my-codex.dev> Co-authored-by: Dean <31391056+deanpress@users.noreply.github.com> Co-authored-by: Hermes Agent <hermes@nousresearch.com> Co-authored-by: aeyeopsdev <aeyeopsdev@users.noreply.github.com> Co-authored-by: bellman <bellman@bellmanui-MacBookAir.local> Co-authored-by: NagyVikt <nagy.viktordp@gmail.com> * add * Guard sloppy fallback PreToolUse framing Add a native Bash PreToolUse advisory that catches vague quick-hack fallback/workaround framing only when the command is write-like and lacks concrete test or architecture grounding. Cover warning, suppression, and existing git enforcement priority with native hook tests. Constraint: Advisory-only; existing hard blocks and concrete compatibility/fail-safe code remain allowed. Tested: npm run build && npm run lint && node --test dist/scripts/__tests__/codex-native-hook.test.js Related: #2028 Co-authored-by: OmX <omx@oh-my-codex.dev> * Fix chained PreToolUse slop detection Keep read-only suppression from short-circuiting commands that later perform write-like mutations. Add a regression for rg followed by cat heredoc writing sloppy fallback code so the advisory fires on the risky segment while pure read-only inspections stay silent. Constraint: Preserve advisory-only behavior and existing read-only false-positive suppression. Tested: npm run build && npm run lint && node --test dist/scripts/__tests__/codex-native-hook.test.js Related: #2028 Co-authored-by: OmX <omx@oh-my-codex.dev> * Keep Colony bridge proof durable across outages Lands the Agent6, Agent7, and Agent3 OMX bridge slices on the current local dev base without carrying unrelated upstream drift. The hook now sends lifecycle and runtime summary payloads through the Colony bridge CLI, keeps edit-capable pre-tool coverage, records compact path metadata before mutation, and spools failed bridge sends for later replay. Constraint: source branches were authored on mixed bases and direct finish would pull unrelated upstream changes Rejected: finish the original Agent6 branch directly | it was based on origin/dev rather than the local fork/dev target Rejected: keep append-only dated spool files | replay needs dedupe, retry metadata, and bounded retention Confidence: high Scope-risk: moderate Directive: do not send raw edit bodies through Colony; keep tool_input compact and path-focused Directive: replay spooled records with the current Colony CLI executable, not the executable path captured during outage Tested: npm run build Tested: node --test dist/colony/__tests__/bridge.test.js dist/scripts/__tests__/codex-native-hook.test.js dist/config/__tests__/codex-hooks.test.js Related: edc01eac Related: 8eccca30 Related: 1cda8881 * Stop installing legacy Codex notify hooks (#3) Codex invokes the legacy notify command with the turn payload in argv. Large turns can fail before notify-hook starts, producing noisy legacy_notify and hook-transport errors even though native hooks already own the useful lifecycle surfaces. This keeps setup stripping old OMX notify entries but no longer emits a replacement notify key. Native hooks.json remains the managed hook path for SessionStart, PreToolUse, PostToolUse, UserPromptSubmit, and Stop. Constraint: Current Codex native hooks already cover managed lifecycle behavior. Rejected: Keep notify and truncate inside notify-hook | the process can fail before the script starts because argv is too large. Confidence: high Scope-risk: narrow Directive: Do not re-add Codex notify without proving large turn payloads cannot hit argv limits. Tested: npm run build Tested: node --test dist/config/__tests__/generator-notify.test.js dist/config/__tests__/generator-idempotent.test.js Tested: node --test dist/config/__tests__/codex-hooks.test.js dist/scripts/__tests__/codex-native-hook.test.js Tested: direct PostToolUse smoke via node dist/scripts/codex-native-hook.js Not-tested: Global install/config refresh blocked by approval quota Co-authored-by: NagyVikt <nagy.viktordp@gmail.com> * Preserve pre-edit proof in post-tool telemetry (#4) Colony needs to correlate one tool call across PreToolUse and PostToolUse without treating after-the-fact edit data as pre-edit proof. The hook now derives a stable trace_id from session/tool_use_id, records successful pre emissions locally, marks missing_pre_tool_use when a mutating post event lacks that marker, and attaches compact failure summaries to failed post events. Constraint: Post-tool data must not count as claim-before-edit proof Rejected: Infer pre safety from PostToolUse file paths | that would fake claim-before-edit coverage Confidence: high Scope-risk: moderate Tested: npm run build Tested: node --test dist/scripts/__tests__/codex-native-hook.test.js Tested: node --test dist/colony/__tests__/bridge.test.js Tested: ./node_modules/.bin/biome lint src/scripts/codex-native-hook.ts src/scripts/__tests__/codex-native-hook.test.ts Tested: npm run check:no-unused Not-tested: full npm test Co-authored-by: NagyVikt <nagy.viktordp@gmail.com> Co-authored-by: OmX <omx@oh-my-codex.dev> * Reduce team startup assignment latency after pane split (#2032) Merged after OMX review approval and green CI. * Fix detached tmux postLaunch tearing down live OMX sessions (#2031) Merged after OMX review approval and green CI. * Fix project Codex NUX config pollution (#2034) Merged after OMX review approval and green CI. * Keep native hook startup alive through symlinked installs (#5) Codex invokes the globally installed hook through an npm symlink, while Node resolves import.meta.url to the real local checkout. The hook main guard now accepts the realpath, and the stdin reader keeps the native hook process alive while Codex streams hook payloads. PostToolUse installation no longer subscribes to Bash, so large test output stays out of post-hook stdin while write-tool and MCP review paths remain covered. Constraint: Global oh-my-codex is symlinked into /home/deadpool/Documents/recodee/oh-my-codex Constraint: Bash PostToolUse payloads can be very large after test runs Rejected: Keep PostToolUse on all tools | large Bash payloads can still produce broken-pipe hook writes Confidence: high Scope-risk: narrow Tested: npm run build Tested: node --test dist/scripts/__tests__/codex-native-hook.test.js Tested: live hooks.json PostToolUse matchers exclude Bash Not-tested: Full npm test Co-authored-by: NagyVikt <nagy.viktordp@gmail.com> Co-authored-by: OmX <omx@oh-my-codex.dev> * fix: tolerate tmux without extended keys option (#2036) Merged after OMX review approval comment and green CI. * fix(plugin): detect mirror CLI calls in spaced paths (#2037) Current upstream/dev still gates direct script execution on an exact file:// string comparison. When the repo path contains spaces, Node encodes import.meta.url while process.argv[1] stays unescaped, so the equality check fails and the mirror CLI never enters the sync path. This change only touches the direct-invocation junction in src/scripts/sync-plugin-mirror.ts and the focused regression coverage in src/catalog/__tests__/plugin-bundle-ssot.test.ts. Mirror contents, metadata policy, sync and check semantics, and user-facing log output stay unchanged. The compatibility choice is to compare normalized filesystem paths instead of raw file URLs. Converting import.meta.url with fileURLToPath() and resolving argvPath keeps space-containing and relative launch paths working without changing non-CLI imports. The explicit helper also keeps the top-level launch contract testable without a broader refactor. Verification: - npm run build - node --test dist/catalog/__tests__/plugin-bundle-ssot.test.js - node dist/scripts/sync-plugin-mirror.js --check * test(codex-home): isolate ambient defaults in env-sensitive suites (#2038) Current upstream/dev test suites still inherit ambient CODEX_HOME and default model env from the operator shell. Under a poisoned CODEX_HOME plus default model env, native-config, explore, model-contract, and runtime assertions change their expected fallback models without any source change. This patch only hardens test harnesses and test-local env setup in src/agents/__tests__/native-config.test.ts, src/cli/__tests__/exec.test.ts, src/cli/__tests__/explore.test.ts, src/team/__tests__/model-contract.test.ts, and src/team/__tests__/runtime.test.ts. Production model resolution, runtime behavior, and config parsing stay unchanged. The compatibility choice is to blank or replace ambient defaults at the test boundary instead of stubbing production code. CLI helper tests now clear CODEX_HOME unless a case explicitly overrides it, and suites that assert default model lanes run inside isolated empty CODEX_HOME roots with frontier, standard, and spark env cleared. The async runtime helper extends that boundary to startup flows that materialize worker launch args. Verification: - npm run build - node --test dist/agents/__tests__/native-config.test.js dist/cli/__tests__/explore.test.js dist/team/__tests__/model-contract.test.js dist/team/__tests__/runtime.test.js dist/cli/__tests__/exec.test.js - hostile-env spot check with poisoned CODEX_HOME plus default-model env for native-config, explore, model-contract, and focused runtime cases * Preserve approved handoff context narrowly (#2040) Merged after OMX review approval comment and green CI. * Fix prompt worker startup death handling (#2041) * fix(team): treat ready prompts as startup evidence (#2042) On current upstream/dev, interactive Codex workers can miss the initial startup-direct trigger while the pane is still bootstrapping, later reach a real ready prompt, and still be downgraded to codex_startup_no_evidence_after_fallback after a notified startup dispatch. That records a recoverable startup issue even though tmux already observed the worker ready to accept input. This patch keeps the scope on the interactive startup-evidence junctions in src/team/runtime.ts and src/team/__tests__/runtime.test.ts. startTeam now records when the existing ready-wait path has already observed a ready prompt, and dispatchCriticalInboxInstruction treats that prompt as settled startup evidence for notified receipts and direct-fallback confirmation paths. The new regression test forces the startup-direct capture to see a bootstrapping pane first, then verifies that a later ready prompt prevents the false startup issue. Compatibility stays narrow by synthesizing ready_prompt evidence only after the current tmux readiness wait has already succeeded. readWorkerStartupEvidence, worker state files, prompt-mode workers, and direct-trigger success paths stay unchanged. The implicit design choice is to reuse the already-observed ready prompt instead of broadening persisted startup evidence semantics. Verification: - npm run build - Focused interactive repro against dist/team/runtime.js confirmed a bootstrapping-to-ready worker now stays neutral and keeps its startup dispatch request in notified state - CODEX_HOME=<empty temp dir> node --test dist/team/__tests__/runtime.test.js (113/114 passing; unchanged unrelated sandbox failure: shutdownTeam reaps detached prompt-worker descendants) * test: make HUD watch coalescing deterministic (#2043) * fix(planning): exact-match approved launch hint selection Split approved launch-hint lookup into exact selection and ambiguity-aware outcome flows so callers can bind to a specific PRD, task, or command instead of inheriting the last visible launch line from the latest approved PRD. Junction points: - explicit prdPath selection now scopes artifact binding to the requested approved PRD - explicit task and command selection now fail closed when one PRD exposes multiple matching launch hints - short omx team team follow-ups can distinguish ambiguous from absent approved hints - explicit Ralph task reuse now resolves against the matching approved hint instead of an unrelated latest hint Design decisions: - preserve the existing nullable readApprovedExecutionLaunchHint API and add an outcome helper for ambiguity-aware callers - keep the patch limited to exact-match and ambiguity handling on current upstream/dev - do not port same-task lineage fallback, context-pack ref materialization, or new lifecycle states Validation: - npm run build - node --test dist/planning/__tests__/artifacts.test.js dist/cli/__tests__/team.test.js dist/cli/__tests__/ralph.test.js Document-refresh: not-needed | the patch narrows approved launch-hint selection behavior in code and tests without changing operator-facing docs or setup flows. * Keep OMX MCP serve process alive (#2044) Co-authored-by: Peter Gagarinov <4868370+pgagarinov@users.noreply.github.com> * fix(team): make runtime launch instruction portable Route team-exec launch output through dist/team/runtime-cli.js instead of embedding descriptor.task into a raw omx team shell command. Junction points: - buildTeamInstruction previously serialized the full task directly into shell text, which is brittle for quoted content and long plan payloads - runtime-cli only accepted stdin JSON, so pipeline launches could not pass a portable inline payload Design decisions: - keep the runtime-cli payload limited to current upstream fields: teamName, workerCount, agentTypes, tasks, and cwd - derive teamName and task fanout with existing team planning helpers so the runtime launch stays aligned with current omx team behavior - use base64url transport plus platform-specific quoting so POSIX keeps staffing comments while Windows omits shell-style comment suffixes - leave approved-execution binding and task-routing changes for later rows Validation: - npm run build - node --test dist/pipeline/__tests__/stages.test.js dist/team/__tests__/runtime-cli.test.js Document-refresh: not-needed | this changes pipeline launch transport and runtime-cli input parsing only, without altering operator-facing docs or setup flow. * feat(planning): timestamp canonical handoff artifacts Latest-plan selection on upstream/dev still orders planning artifacts by filename and still treats the full PRD suffix as the binding key for adjacent sidecars. That leaves two gaps once canonical PRDs start using prd-<timestamp>-<slug>.md names: newer plans can lose newest-first selection, and timestamped PRDs can fall back to stale slug-only test specs. Junction points: - Ralph canonical migration still writes prd-<slug>.md names and keeps the first sorted canonical PRD, so migrated artifacts never enter a timestamp-aware ordering. - Planning artifact selection needs timestamp parsing for PRD/test-spec pairing, but upstream/dev sidecars still bind by the exact latest-plan suffix and would become stale if row 3 canonicalized those lookups. Design decisions: - add a shared artifact-name helper for timestamp parsing, canonical slug extraction, timestamp-aware ordering, and exact timestamped test-spec lookup - require exact test-spec-<timestamp>-<slug>.md matches for timestamped PRDs while keeping legacy non-timestamped test-spec aliases compatible - keep deep-interview selection limited to canonical non-autoresearch artifacts, but preserve upstream/dev exact latest-plan suffix matching for repo-context and team-dag sidecars - leave context-pack status, context-tool, and extra timestamped diagnostics out of this row Validation: - npm run build - node --test dist/planning/__tests__/artifacts.test.js dist/ralph/__tests__/persistence.test.js Document-refresh: not-needed | this updates planning and Ralph artifact selection internals and their regression tests only; no operator-facing workflow or setup documentation changed. * fix(team): preserve runtime launch inheritance semantics The runtime-cli transport on upstream/dev introduced two regressions that the old omx team command path did not have. Junction points: - buildTeamInstruction hard-coded agentTypes=["codex"], so runtime-cli always overwrote OMX_TEAM_WORKER_CLI_MAP and lost caller-selected claude or gemini worker mappings. - buildTeamInstruction serialized tasks with buildTeamExecutionPlan directly, which skipped the same parseTeamArgs plus buildRepoAwareTeamExecutionPlan path that omx team uses when an approved launch hint opts into DAG handoff. Design decisions: - make agentTypes optional in the runtime-cli payload and preserve the inherited OMX_TEAM_WORKER_CLI_MAP when the payload does not set an explicit provider override. - build the serialized launch payload from parseTeamArgs plus buildRepoAwareTeamExecutionPlan so approved DAG tasks and decomposition metadata survive the transport rewrite. - keep the payload on current upstrea…
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary: link Colony PreToolUse and PostToolUse payloads by trace_id, flag missing pre-tool proof, and surface failed PostToolUse telemetry. Verification: npm run build; node --test dist/scripts/tests/codex-native-hook.test.js dist/colony/tests/bridge.test.js.