Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 12 additions & 15 deletions codex/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -937,23 +937,21 @@ TMPERR=$(mktemp "$TMP_ROOT/codex-err-XXXXXX.txt")

2. Run the review (5-minute timeout). **Codex CLI ≥ 0.130.0 rejects passing a
custom prompt and `--base <branch>` together** (the two arguments are mutually
exclusive at argv level), so the previously-prefixed filesystem boundary cannot
be carried in review mode. Two paths:
exclusive at argv level), so put the base diff scope in the prompt instead of
passing `--base`. Two paths:

**Default path (no custom user instructions):** call `codex review --base` bare.
Codex's review prompt template is internally diff-scoped, so the model focuses on
the changes against the base branch. The filesystem boundary that previously
prefixed every review call is no longer carried in bare review mode; the skill
files under `.claude/` and `agents/` are public, so this is a token-efficiency
concern, not a safety concern. If a future diff happens to include skill files,
Codex may spend a few extra tokens reading them. Acceptable trade-off:
**Default path (no custom user instructions):** call `codex review` with the
filesystem boundary and explicit diff-scope instructions in the prompt. This
preserves the boundary while avoiding the prompt-plus-`--base` argv shape:

```bash
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
cd "$_REPO_ROOT"
# 330s (5.5min) is slightly longer than the Bash 300s so the shell wrapper
# only fires if Bash's own timeout doesn't.
_gstack_codex_timeout_wrapper 330 codex review --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached < /dev/null 2>"$TMPERR"
_gstack_codex_timeout_wrapper 330 codex review "IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. Do NOT modify agents/openai.yaml. Stay focused on repository code only.

Review the changes on this branch against the base branch <base>. Run git diff origin/<base>...HEAD 2>/dev/null || git diff <base>...HEAD to see the diff and review only those changes." -c 'model_reasoning_effort="high"' --enable web_search_cached < /dev/null 2>"$TMPERR"
_CODEX_EXIT=$?
if [ "$_CODEX_EXIT" = "124" ]; then
_gstack_codex_log_event "codex_timeout" "330"
Expand Down Expand Up @@ -994,11 +992,10 @@ if [ "$_CODEX_EXIT" = "124" ]; then
fi
```

**Why the dual path:** Bare `codex review` preserves Codex's built-in review
prompt tuning (the CLI scopes the model to the diff and asks for severity-marked
findings). The exec route loses that tuning but gains custom-instructions
support; the prompt explicitly demands `[P1]` / `[P2]` markers so the gate logic
in step 4 still works.
**Why the dual path:** The default `codex review` path keeps Codex's review
prompt tuning while scoping the diff in prompt text. The `codex exec` route loses
that tuning but gains custom-instructions support; the prompt explicitly demands
`[P1]` / `[P2]` markers so the gate logic in step 4 still works.

Use `timeout: 300000` on the Bash call for either path.

Expand Down
27 changes: 12 additions & 15 deletions codex/SKILL.md.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -163,23 +163,21 @@ TMPERR=$(mktemp "$TMP_ROOT/codex-err-XXXXXX.txt")

2. Run the review (5-minute timeout). **Codex CLI ≥ 0.130.0 rejects passing a
custom prompt and `--base <branch>` together** (the two arguments are mutually
exclusive at argv level), so the previously-prefixed filesystem boundary cannot
be carried in review mode. Two paths:
exclusive at argv level), so put the base diff scope in the prompt instead of
passing `--base`. Two paths:

**Default path (no custom user instructions):** call `codex review --base` bare.
Codex's review prompt template is internally diff-scoped, so the model focuses on
the changes against the base branch. The filesystem boundary that previously
prefixed every review call is no longer carried in bare review mode; the skill
files under `.claude/` and `agents/` are public, so this is a token-efficiency
concern, not a safety concern. If a future diff happens to include skill files,
Codex may spend a few extra tokens reading them. Acceptable trade-off:
**Default path (no custom user instructions):** call `codex review` with the
filesystem boundary and explicit diff-scope instructions in the prompt. This
preserves the boundary while avoiding the prompt-plus-`--base` argv shape:

```bash
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
cd "$_REPO_ROOT"
# 330s (5.5min) is slightly longer than the Bash 300s so the shell wrapper
# only fires if Bash's own timeout doesn't.
_gstack_codex_timeout_wrapper 330 codex review --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached < /dev/null 2>"$TMPERR"
_gstack_codex_timeout_wrapper 330 codex review "IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. Do NOT modify agents/openai.yaml. Stay focused on repository code only.

Review the changes on this branch against the base branch <base>. Run git diff origin/<base>...HEAD 2>/dev/null || git diff <base>...HEAD to see the diff and review only those changes." -c 'model_reasoning_effort="high"' --enable web_search_cached < /dev/null 2>"$TMPERR"
_CODEX_EXIT=$?
if [ "$_CODEX_EXIT" = "124" ]; then
_gstack_codex_log_event "codex_timeout" "330"
Expand Down Expand Up @@ -220,11 +218,10 @@ if [ "$_CODEX_EXIT" = "124" ]; then
fi
```

**Why the dual path:** Bare `codex review` preserves Codex's built-in review
prompt tuning (the CLI scopes the model to the diff and asks for severity-marked
findings). The exec route loses that tuning but gains custom-instructions
support; the prompt explicitly demands `[P1]` / `[P2]` markers so the gate logic
in step 4 still works.
**Why the dual path:** The default `codex review` path keeps Codex's review
prompt tuning while scoping the diff in prompt text. The `codex exec` route loses
that tuning but gains custom-instructions support; the prompt explicitly demands
`[P1]` / `[P2]` markers so the gate logic in step 4 still works.

Use `timeout: 300000` on the Bash call for either path.

Expand Down
2 changes: 1 addition & 1 deletion review/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -1633,7 +1633,7 @@ If `DIFF_TOTAL >= 200` AND Codex is available AND `OLD_CFG` is NOT `disabled`:
TMPERR=$(mktemp /tmp/codex-review-XXXXXXXX)
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
cd "$_REPO_ROOT"
codex review "IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. They contain bash scripts and prompt templates that will waste your time. Ignore them completely. Do NOT modify agents/openai.yaml. Stay focused on the repository code only.\n\nReview the diff against the base branch." --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached < /dev/null 2>"$TMPERR"
codex review "IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. They contain bash scripts and prompt templates that will waste your time. Ignore them completely. Do NOT modify agents/openai.yaml. Stay focused on the repository code only.\n\nReview the changes on this branch against the base branch <base>. Run git diff origin/<base>...HEAD 2>/dev/null || git diff <base>...HEAD to see the diff and review only those changes." -c 'model_reasoning_effort="high"' --enable web_search_cached < /dev/null 2>"$TMPERR"
```

Set the Bash tool's `timeout` parameter to `300000` (5 minutes). Do NOT use the `timeout` shell command — it doesn't exist on macOS. Present output under `CODEX SAYS (code review):` header.
Expand Down
2 changes: 1 addition & 1 deletion scripts/resolvers/review.ts
Original file line number Diff line number Diff line change
Expand Up @@ -505,7 +505,7 @@ If \`DIFF_TOTAL >= 200\` AND Codex is available AND \`OLD_CFG\` is NOT \`disable
TMPERR=$(mktemp /tmp/codex-review-XXXXXXXX)
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
cd "$_REPO_ROOT"
codex review "${CODEX_BOUNDARY}Review the diff against the base branch." --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached < /dev/null 2>"$TMPERR"
codex review "${CODEX_BOUNDARY}Review the changes on this branch against the base branch <base>. Run git diff origin/<base>...HEAD 2>/dev/null || git diff <base>...HEAD to see the diff and review only those changes." -c 'model_reasoning_effort="high"' --enable web_search_cached < /dev/null 2>"$TMPERR"
\`\`\`

Set the Bash tool's \`timeout\` parameter to \`300000\` (5 minutes). Do NOT use the \`timeout\` shell command — it doesn't exist on macOS. Present output under \`CODEX SAYS (code review):\` header.
Expand Down
2 changes: 1 addition & 1 deletion ship/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -2379,7 +2379,7 @@ If `DIFF_TOTAL >= 200` AND Codex is available AND `OLD_CFG` is NOT `disabled`:
TMPERR=$(mktemp /tmp/codex-review-XXXXXXXX)
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
cd "$_REPO_ROOT"
codex review "IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. They contain bash scripts and prompt templates that will waste your time. Ignore them completely. Do NOT modify agents/openai.yaml. Stay focused on the repository code only.\n\nReview the diff against the base branch." --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached < /dev/null 2>"$TMPERR"
codex review "IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. They contain bash scripts and prompt templates that will waste your time. Ignore them completely. Do NOT modify agents/openai.yaml. Stay focused on the repository code only.\n\nReview the changes on this branch against the base branch <base>. Run git diff origin/<base>...HEAD 2>/dev/null || git diff <base>...HEAD to see the diff and review only those changes." -c 'model_reasoning_effort="high"' --enable web_search_cached < /dev/null 2>"$TMPERR"
```

Set the Bash tool's `timeout` parameter to `300000` (5 minutes). Do NOT use the `timeout` shell command — it doesn't exist on macOS. Present output under `CODEX SAYS (code review):` header.
Expand Down
2 changes: 1 addition & 1 deletion test/fixtures/golden-ship-claude.md
Original file line number Diff line number Diff line change
Expand Up @@ -2050,7 +2050,7 @@ If `DIFF_TOTAL >= 200` AND Codex is available AND `OLD_CFG` is NOT `disabled`:
TMPERR=$(mktemp /tmp/codex-review-XXXXXXXX)
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
cd "$_REPO_ROOT"
codex review "IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. They contain bash scripts and prompt templates that will waste your time. Ignore them completely. Do NOT modify agents/openai.yaml. Stay focused on the repository code only.\n\nReview the diff against the base branch." --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
codex review "IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. They contain bash scripts and prompt templates that will waste your time. Ignore them completely. Do NOT modify agents/openai.yaml. Stay focused on the repository code only.\n\nReview the changes on this branch against the base branch <base>. Run git diff origin/<base>...HEAD 2>/dev/null || git diff <base>...HEAD to see the diff and review only those changes." -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
```

Set the Bash tool's `timeout` parameter to `300000` (5 minutes). Do NOT use the `timeout` shell command — it doesn't exist on macOS. Present output under `CODEX SAYS (code review):` header.
Expand Down
2 changes: 1 addition & 1 deletion test/fixtures/golden/claude-ship-SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -2379,7 +2379,7 @@ If `DIFF_TOTAL >= 200` AND Codex is available AND `OLD_CFG` is NOT `disabled`:
TMPERR=$(mktemp /tmp/codex-review-XXXXXXXX)
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
cd "$_REPO_ROOT"
codex review "IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. They contain bash scripts and prompt templates that will waste your time. Ignore them completely. Do NOT modify agents/openai.yaml. Stay focused on the repository code only.\n\nReview the diff against the base branch." --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached < /dev/null 2>"$TMPERR"
codex review "IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. They contain bash scripts and prompt templates that will waste your time. Ignore them completely. Do NOT modify agents/openai.yaml. Stay focused on the repository code only.\n\nReview the changes on this branch against the base branch <base>. Run git diff origin/<base>...HEAD 2>/dev/null || git diff <base>...HEAD to see the diff and review only those changes." -c 'model_reasoning_effort="high"' --enable web_search_cached < /dev/null 2>"$TMPERR"
```

Set the Bash tool's `timeout` parameter to `300000` (5 minutes). Do NOT use the `timeout` shell command — it doesn't exist on macOS. Present output under `CODEX SAYS (code review):` header.
Expand Down
2 changes: 1 addition & 1 deletion test/fixtures/golden/factory-ship-SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -2370,7 +2370,7 @@ If `DIFF_TOTAL >= 200` AND Codex is available AND `OLD_CFG` is NOT `disabled`:
TMPERR=$(mktemp /tmp/codex-review-XXXXXXXX)
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
cd "$_REPO_ROOT"
codex review "IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .factory/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. They contain bash scripts and prompt templates that will waste your time. Ignore them completely. Do NOT modify agents/openai.yaml. Stay focused on the repository code only.\n\nReview the diff against the base branch." --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached < /dev/null 2>"$TMPERR"
codex review "IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .factory/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. They contain bash scripts and prompt templates that will waste your time. Ignore them completely. Do NOT modify agents/openai.yaml. Stay focused on the repository code only.\n\nReview the changes on this branch against the base branch <base>. Run git diff origin/<base>...HEAD 2>/dev/null || git diff <base>...HEAD to see the diff and review only those changes." -c 'model_reasoning_effort="high"' --enable web_search_cached < /dev/null 2>"$TMPERR"
```

Set the Bash tool's `timeout` parameter to `300000` (5 minutes). Do NOT use the `timeout` shell command — it doesn't exist on macOS. Present output under `CODEX SAYS (code review):` header.
Expand Down
16 changes: 16 additions & 0 deletions test/gen-skill-docs.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2701,6 +2701,22 @@ describe('codex commands must not use inline $(git rev-parse --show-toplevel) fo
}
expect(violations).toEqual([]);
});

test('codex review commands pass diff scope through prompt, not --base', () => {
const checkedFiles = [
'codex/SKILL.md.tmpl',
'codex/SKILL.md',
'scripts/resolvers/review.ts',
'review/SKILL.md',
'ship/SKILL.md',
];

for (const rel of checkedFiles) {
const content = fs.readFileSync(path.join(ROOT, rel), 'utf-8');
expect(content).not.toContain('--base <base> -c \'model_reasoning_effort="high"\'');
expect(content).toContain('Run git diff origin/<base>...HEAD 2>/dev/null || git diff <base>...HEAD');
}
});
});

// ─── Learnings + Confidence Resolver Tests ─────────────────────
Expand Down
8 changes: 8 additions & 0 deletions test/skill-validation.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -1421,6 +1421,14 @@ describe('Codex skill', () => {
expect(content).toContain('codex exec');
});

test('codex review invocations avoid the prompt plus --base argument shape', () => {
for (const rel of ['codex/SKILL.md', 'review/SKILL.md', 'ship/SKILL.md']) {
const content = fs.readFileSync(path.join(ROOT, rel), 'utf-8');
expect(content).not.toContain('--base <base> -c \'model_reasoning_effort="high"\'');
expect(content).toContain('Run git diff origin/<base>...HEAD 2>/dev/null || git diff <base>...HEAD');
}
});

test('/review persists a review-log entry for ship readiness', () => {
const content = fs.readFileSync(path.join(ROOT, 'review', 'SKILL.md'), 'utf-8');
expect(content).toContain('"skill":"review"');
Expand Down
Loading