Codex CLI HELIX Integration Tests

This harness exercises the HELIX skill routing behavior with the OpenAI Codex CLI runtime.

What it tests

Skill installation — HELIX can be placed in Codex's skill discovery paths
Skill activation — Codex invokes HELIX when asked for bootstrap, routing, or verification
Behavioral correctness — responses contain HELIX-specific terminology and modes
Negative control — uninstalled skill produces different behavior (no skill invocation)

Install location ambiguity

Codex CLI supports two skill installation paths depending on installer version:

Path A (newer): ~/.codex/skills/helix/ (primary, agentskills.io standard)
Path B (older): ~/.agents/skills/helix/ (legacy)

The test harness checks both locations during skill installation and verifies the skill is discoverable at whichever path the installer chose. See docs/install/codex.md:36 for version history.

Running the tests

Prerequisites

Docker (builds the test image)
jq (optional, for JSON parsing if available)
OPENAI_API_KEY or CODEX_API_KEY environment variable

Basic run

export OPENAI_API_KEY="sk-..."
bash tests/workflows/codex-cli/run-scenarios.sh

Exit 0 on all scenarios passing; nonzero if any fail.

Negative control (--no-skill)

bash tests/workflows/codex-cli/run-scenarios.sh --no-skill

Verifies that HELIX behavior is absent when the skill is not installed. This test currently runs as a side effect of the main run and documents any gaps.

Test scenarios

Scenario	Purpose	Observable
`install-verify`	Confirms skill can be discovered	Response mentions SKILL.md, helix, workflows
`skill-list`	Lists routing modes	Response contains: input, frame, align, evolve, design, review
`bootstrap`	Runs a framing scenario	Response names product-vision, prd, and uses frame/input modes

Expected output

→ Building Docker image: helix-int-test:codex-cli
→ Test temp directory: /tmp/...
→ Loaded routing evaluations from: .../evals/routing.jsonl
→ Running scenario: install-verify
→   Input: Confirm HELIX is installed...
→   ✓ Scenario passed: install-verify
→ Running scenario: skill-list
→   Input: List the HELIX routing modes...
→   ✓ Scenario passed: skill-list
→ Running scenario: bootstrap
→   Input: I want to build a TODO list app...
→   ✓ Scenario passed: bootstrap
→ Running negative control: skill uninstalled
→   ✓ Negative control: Skill behavior absent when skill uninstalled

→ Test Results
→ ============
→ Passed: 3/3
→   ✓ install-verify
→   ✓ skill-list
→   ✓ bootstrap
→ All scenarios passed

Activation signal

Note on structured output: Codex CLI versions differ in whether they expose tool/skill invocation events in machine-readable format. The harness uses Codex's behavioral signal (response contains HELIX-specific terminology and modes) as the activation indicator, as this is:

Runtime-neutral: works across Codex versions regardless of output format
User-observable: reflects what the user would actually see
Highest-confidence: asserts end-to-end routing behavior, not just tool-call metadata

If a future Codex version exposes structured tool_use events (like Claude Code's stream-json format), the harness can evolve to validate both the structured signal AND the behavioral signal for increased confidence.

Recording

Generate a screencast with vhs:

cd tests/workflows/codex-cli/recordings
vhs < INT-CX.tape

Output: INT-CX.gif (committed to the repo, linked in docs/install/codex.md)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Codex CLI HELIX Integration Tests

What it tests

Install location ambiguity

Running the tests

Prerequisites

Basic run

Negative control (--no-skill)

Test scenarios

Expected output

Activation signal

Recording

See also

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Codex CLI HELIX Integration Tests

What it tests

Install location ambiguity

Running the tests

Prerequisites

Basic run

Negative control (--no-skill)

Test scenarios

Expected output

Activation signal

Recording

See also