When a Codex turn stops, speak the full final assistant message through the local Kokoro-FastAPI container.
The first version should be boring and reliable:
- Codex runs a
SessionStarthook on startup or resume. - The startup hook reads config and checks that the Kokoro API is available.
- Codex runs a
Stophook when a turn completes. - The stop hook reads JSON from stdin.
- The stop hook extracts
last_assistant_message. - The stop hook sends text to the configured Kokoro speech endpoint.
- The stop hook plays the returned audio through the host audio stack.
- Hook scripts print valid hook JSON to stdout and write diagnostics elsewhere.
Official Codex hook docs describe hooks in hooks.json or inline config.toml.
SessionStart can match startup|resume. Stop runs at turn scope. Command
hooks receive JSON on stdin. Stop must emit JSON on stdout when it exits 0;
plain text is invalid for that event. The Stop input includes
last_assistant_message.
For repo-local hooks, the docs recommend resolving paths from the git root because Codex may start from a subdirectory.
Reference: https://developers.openai.com/codex/hooks
The running local API exposes an OpenAI-compatible speech endpoint:
POST http://localhost:8880/v1/audio/speech
Content-Type: application/jsonUseful request fields:
{
"model": "kokoro",
"input": "Text to speak",
"voice": "am_michael",
"response_format": "wav",
"stream": false,
"speed": 1.0
}Supported response formats in the local schema are mp3, opus, aac,
flac, wav, and pcm. The schema notes that pcm is raw 16-bit audio
without a header. Kokoro examples use 24 kHz mono PCM for streaming playback.
The local API also exposes:
GET /health
GET /v1/audio/voicesThe startup hook should use /health to detect API availability and
/v1/audio/voices to validate the configured voice when possible.
codex/
.codex-plugin/
plugin.json
hooks/
hooks.json
scripts/
codex_session_start_tts_check.py
codex_stop_tts.py
src/
tts_hook/
__init__.py
config.py
kokoro.py
playback.py
config.example.toml
README.md
config.py should be shared by both hook scripts. That keeps URL construction,
defaults, timeouts, and voice settings identical between startup validation and
turn-end speech.
The Codex-facing implementation should live under codex/, with codex/ as
the plugin root. The packageable unit needs a plugin manifest:
codex/.codex-plugin/plugin.json
Initial manifest shape:
{
"name": "tts-hook",
"version": "0.1.0",
"description": "Speak Codex final responses through a local Kokoro-FastAPI server.",
"hooks": "./hooks/hooks.json",
"interface": {
"displayName": "Kokoro TTS Hook",
"shortDescription": "Speak Codex responses with local Kokoro TTS.",
"longDescription": "Uses Codex hooks to check Kokoro availability on startup and play assistant responses when a turn stops.",
"developerName": "Local",
"category": "Productivity",
"capabilities": ["Read"]
}
}Lifecycle hook config should live at:
codex/hooks/hooks.json
Assumption: commands in plugin-bundled hooks/hooks.json are resolved relative
to the plugin root. Use relative commands like python3 ./scripts/... and
iterate if this assumption turns out to be incorrect in Codex.
Use TOML for the config file. Python 3.11+ can parse TOML with the standard
library tomllib, so this avoids a dependency for normal Fedora installs.
Expose only values that are expected to vary for a normal local install:
kokoro.host: needed when the API is not on the same host, or whenlocalhostresolves differently under a future runtime.kokoro.port: needed because the Kokoro container port can be remapped.speech.voice: personal preference and the most likely thing to tune. If omitted, the implementation should useam_liam.speech.speed: personal preference and accessibility tuning.playback.player: needed ifautochooses the wrong host audio tool.playback.blocking: defaults tofalse; keep it configurable in case a later user explicitly wants synchronous playback for debugging.timeouts.connect_secondsandtimeouts.read_seconds: useful for machines with slower first-token TTS generation or network hiccups.logging.path: useful for debugging hook behavior without polluting stdout.
Do not expose stable integration constants in the user config:
model: alwayskokorofor this project.scheme: alwayshttpfor localhost development.api_prefix: always/v1.health_path: always/health.speech_path: always/audio/speech.voices_path: always/audio/voices.response_format: usewavfor the MVP.stream: use non-streaming for the MVP.startup_health_seconds: use the normal connect/read timeout pair.max_chars: no truncation for the first implementation; speak the fulllast_assistant_message.
These can still be code constants. If one needs to vary later, promote it to config after the actual need appears.
Config resolution:
<plugin-root>/tts-hook.toml
The plugin should ship config.example.toml. The user-owned config should be
plugin-local at codex/tts-hook.toml and ignored by git. If that file is
absent, use code defaults.
Initial config shape:
[kokoro]
host = "localhost"
port = 8880
[speech]
voice = "am_liam"
speed = 1.0
[playback]
player = "auto"
blocking = false
[timeouts]
connect_seconds = 2.0
read_seconds = 20.0
[logging]
path = "~/.codex/tts-hook.log"Code constants:
KOKORO_SCHEME = "http"
KOKORO_API_PREFIX = "/v1"
KOKORO_MODEL = "kokoro"
HEALTH_PATH = "/health"
SPEECH_PATH = "/audio/speech"
VOICES_PATH = "/audio/voices"
RESPONSE_FORMAT = "wav"
STREAM = false
Default URL construction, using configured host and port plus code
constants:
base_url = "http://{host}:{port}"
health_url = "{base_url}/health"
speech_url = "{base_url}/v1/audio/speech"
voices_url = "{base_url}/v1/audio/voices"
With defaults, those resolve to:
http://localhost:8880/health
http://localhost:8880/v1/audio/speech
http://localhost:8880/v1/audio/voices
Do not add environment-variable config overrides in the first implementation.
Build one Python startup script:
scripts/codex_session_start_tts_check.py
Initial behavior:
- Read hook JSON from stdin.
- Load config.
- Build
health_urlfrom configuredhostandportplus the baked-in/healthpath. - Send
GET /healthwith a short timeout. - If healthy, optionally call
GET /v1/audio/voicesand verify the configured voice exists. If no voice is configured, useam_liam. - Write a concise
systemMessageif Kokoro is unavailable or the voice is invalid. - Return
{"continue": true}so startup check failures do not block Codex while we iterate. - Write detailed diagnostics to stderr or the configured log file.
The startup hook should not start containers in the MVP. Starting containers from a hook adds lifecycle and failure-mode complexity. The first version should only verify that the API is already reachable and tell the user what is wrong.
Possible later behavior:
- Optional
auto_start_command, disabled by default. - Optional warmup speech phrase to verify audio playback, disabled by default.
Build one Python stop script:
scripts/codex_stop_tts.py
Initial behavior:
- Read all stdin as JSON.
- Load config.
- Extract
last_assistant_message. - Ignore empty messages.
- Strip markdown fences and excessive whitespace only lightly.
- Speak the full message. Do not truncate in the first implementation.
- POST to Kokoro using configured host, port, voice, and speed. The model, endpoint path, response format, and non-streaming mode are baked in for the MVP.
- Write audio to a temporary WAV file.
- Spawn playback in the background and return immediately.
- Play the temp WAV with the first available host command:
pw-playpaplayffplay -nodisp -autoexitaplay
- Return hook JSON on stdout even if TTS fails.
- Write errors to stderr or a log file, never stdout.
MVP stdout shape:
{"continue": true}MVP hook config:
{
"hooks": {
"SessionStart": [
{
"matcher": "startup|resume",
"hooks": [
{
"type": "command",
"command": "python3 ./scripts/codex_session_start_tts_check.py",
"timeout": 5,
"statusMessage": "Checking Kokoro TTS"
}
]
}
],
"Stop": [
{
"hooks": [
{
"type": "command",
"command": "python3 ./scripts/codex_stop_tts.py",
"timeout": 5,
"statusMessage": "Speaking final response"
}
]
}
]
}
}Feature flag needed in ~/.codex/config.toml or project config:
[features]
codex_hooks = trueWAV avoids adding Python audio dependencies in the first pass. The hook can rely on the desktop audio tools already present on Fedora. It is less low-latency than streaming PCM, but the failure modes are easier to understand.
After the MVP works, a streaming mode can request response_format: "pcm" and
feed 24 kHz mono int16 chunks directly to an audio process or a Python audio
library.
The config file is plugin-local only:
<plugin-root>/tts-hook.toml
Defaults should be usable with no config file beyond the Kokoro API running on
http://localhost:8880.
Do not add a hotkey or runtime enable toggle for the first version. If the plugin hook is enabled, playback should occur.
- Hook failure should not block Codex unless explicitly configured later.
- Startup failures should warn only. They should not stop Codex.
- Network timeout should be short: about 2 seconds to connect, 20 seconds to generate.
- Playback failure should not fail the hook.
- The hook should be re-entrant because multiple hooks can run concurrently.
- Temporary files should be unique and cleaned after playback when possible.
- Logs should redact or truncate content because assistant messages may contain sensitive project text.
The simplest text source is the entire last_assistant_message. That may be too
verbose for long implementation summaries.
Potential later modes:
full: speak the whole message.summary: speak the first paragraph plus verification status.final_only: speak only final responses, not intermediate updates, if the hook event provides enough context.notify: speak a fixed phrase like "Codex finished" and optionally the first sentence.
For the MVP, use full with no truncation. If long responses become annoying,
add a later speech policy after observing real use.
The Kokoro container is exposed on host port 8880, so the hook does not need
container networking or volume mounts. It only needs HTTP access to localhost.
Audio playback happens on the host. That avoids bridging PipeWire/PulseAudio into the container.
- Do not solve hook installation yet. Keep implementation under
codex/and include setup guidance for Codex. - Use a packageable Codex plugin shape with
.codex-plugin/plugin.json. - Startup failures warn only.
- Speak the full assistant message with no truncation.
- Playback is non-blocking. The hook spawns playback and returns immediately.
- If no voice is configured, default to
am_liam. - Do not add a hotkey or runtime toggle yet. If the hook is enabled, playback occurs.
- Assume hook command paths in
hooks/hooks.jsonare relative to the plugin root. - Use plugin-local config only:
<plugin-root>/tts-hook.toml.
Multi-phase implementation plan derived from this design document. Phases are sequential; tasks within a phase are independent.
Goal: Establish the Codex plugin unit described in Proposed Files and Codex Plugin Shape, with valid metadata and hook declarations under codex/.
Acceptance Criteria:
codex/is the plugin root and contains.codex-plugin/plugin.json,hooks/hooks.json,config.example.toml, and setup notes.- Plugin metadata and hook JSON parse successfully with standard JSON tooling.
- Hook commands assume paths are relative to the plugin root, matching the decision in
Decisions.
Depends on: None
Tasks:
-
Validate Plugin Manifest
- Description: Ensure
codex/.codex-plugin/plugin.jsoncontains the plugin name, version, description, interface metadata, andhookspointer described inCodex Plugin Shape. - Acceptance Criteria:
python3 -m json.tool codex/.codex-plugin/plugin.jsonsucceeds, and the manifest points to./hooks/hooks.json.
- Description: Ensure
-
Validate Lifecycle Hook Config
- Description: Ensure
codex/hooks/hooks.jsondeclaresSessionStartwithstartup|resumeandStopwith relative commands for./scripts/codex_session_start_tts_check.pyand./scripts/codex_stop_tts.py, as shown inMVP hook config. - Acceptance Criteria:
python3 -m json.tool codex/hooks/hooks.jsonsucceeds, and both hook commands are plugin-root relative.
- Description: Ensure
-
Finalize Plugin-Local Config Example
- Description: Keep
codex/config.example.tomlaligned withConfig FileandConfiguration Surface: host, port, voice, speed, playback, timeouts, and logging only. - Acceptance Criteria: The example contains no configurable model, scheme, API paths, response format, stream flag, truncation setting, or environment-variable override.
- Description: Keep
-
Document Plugin Setup Assumptions
- Description: Update
codex/README.mdso it states that hook command paths are assumed plugin-root relative andtts-hook.tomlis the only default config location. - Acceptance Criteria: The README describes the plugin shape, runtime assumptions, config location, and no longer recommends absolute hook command paths.
- Description: Update
Goal: Implement reusable Python modules under codex/src/tts_hook/ for config loading, URL construction, hook I/O, logging, and Kokoro HTTP access. This phase supports both hooks without implementing hook-specific behavior yet.
Acceptance Criteria:
- Shared modules can be imported by scripts under
codex/scripts/without installing a package. - Config loading uses only plugin-local
tts-hook.tomlplus code defaults, as required byConfig File. - URL construction always bakes in
http,/health,/v1/audio/speech,/v1/audio/voices, modelkokoro, WAV response format, and non-streaming mode.
Depends on: Phase 1
Tasks:
-
Implement Config Loader
- Description: Create
codex/src/tts_hook/config.pywith defaults matchingcodex/config.example.toml, TOML parsing viatomllib, plugin-root config resolution fortts-hook.toml, and fallback to code defaults when the file is absent. - Acceptance Criteria: Loading succeeds with no config file, with a partial plugin-local config, and with all supported keys; unsupported stable integration constants are not read from config.
- Description: Create
-
Implement URL and Payload Constants
- Description: Add code constants and helpers for the Kokoro endpoints described in
Kokoro API contractandConfig File: health URL, speech URL, voices URL, modelkokoro, response formatwav, andstream = false. - Acceptance Criteria: Helpers produce
http://localhost:8880/health,http://localhost:8880/v1/audio/speech, andhttp://localhost:8880/v1/audio/voiceswith default config.
- Description: Add code constants and helpers for the Kokoro endpoints described in
-
Implement Hook I/O Helpers
- Description: Add helpers for reading hook JSON from stdin and writing valid hook JSON to stdout, following
Codex hook contractand theMVP stdout shape. - Acceptance Criteria: Helpers parse valid JSON, tolerate empty or invalid stdin by returning a safe warning result, and never write diagnostics to stdout.
- Description: Add helpers for reading hook JSON from stdin and writing valid hook JSON to stdout, following
-
Implement Logging Helper
- Description: Add a logging utility that writes diagnostics to the configured log path from
Config Fileand can also write concise details to stderr. - Acceptance Criteria: Logs are written without creating stdout noise, parent directories are created when needed, and logged assistant content is omitted or kept brief enough to satisfy
Reliability Rules.
- Description: Add a logging utility that writes diagnostics to the configured log path from
-
Implement Kokoro HTTP Client
- Description: Create
codex/src/tts_hook/kokoro.pywith functions forGET /health,GET /v1/audio/voices, andPOST /v1/audio/speechusing the shared config, URL helpers, and timeout settings. - Acceptance Criteria: Client functions expose clear success/error results, use configured connect/read timeouts, and build the speech request with full input text, configured voice, configured speed, model
kokoro, WAV format, and non-streaming mode.
- Description: Create
Goal: Implement SessionStart behavior from Startup Hook: warn when Kokoro is unavailable or the voice is invalid, but never block Codex.
Acceptance Criteria:
codex/scripts/codex_session_start_tts_check.pyreads hook JSON, loads plugin-local config, checks health, validates the configured or default voice, and returns valid JSON.- Startup failures produce a warning-oriented
systemMessageand still continue. - The hook does not start containers or play audio.
Depends on: Phase 2
Tasks:
-
Create Startup Script Entrypoint
- Description: Add
codex/scripts/codex_session_start_tts_check.pythat imports shared modules fromcodex/src/tts_hook, reads stdin JSON, and always emits hook-compatible JSON. - Acceptance Criteria: Running the script with a minimal
SessionStartfixture exits0and prints valid JSON to stdout.
- Description: Add
-
Add Health Check Behavior
- Description: Call Kokoro
GET /healthusing the configured host and port, as required byStartup HookandKokoro API contract. - Acceptance Criteria: When health is reachable and returns healthy, no warning is emitted; when unavailable, the hook emits a concise warning and continues.
- Description: Call Kokoro
-
Add Voice Validation Behavior
- Description: Use
GET /v1/audio/voicesto validate the configured voice when health succeeds, defaulting toam_liamwhen no voice is specified. - Acceptance Criteria: Valid voices pass silently; invalid voices produce a warning that names the configured voice and default behavior without blocking Codex.
- Description: Use
-
Add Startup Fixtures
- Description: Add local JSON fixtures for startup, resume, unavailable API, and invalid voice scenarios.
- Acceptance Criteria: Fixtures can be piped into the startup script for deterministic local testing without Codex.
Goal: Implement Stop behavior from Stop Hook: speak the full last_assistant_message through Kokoro, spawn host playback, and return immediately with valid hook JSON.
Acceptance Criteria:
codex/scripts/codex_stop_tts.pyextractslast_assistant_message, sends the full message to Kokoro, writes a unique temp WAV, spawns playback, and exits without waiting for playback whenblocking = false.- Empty messages and Kokoro/playback failures never break the Codex hook contract.
- No truncation, summarization, hotkey, or runtime enable toggle is implemented.
Depends on: Phase 2
Tasks:
-
Create Stop Script Entrypoint
- Description: Add
codex/scripts/codex_stop_tts.pythat imports shared modules, readsStopJSON, and always emits{"continue": true}on stdout. - Acceptance Criteria: Running the script with an empty or minimal
Stopfixture exits0, writes valid JSON to stdout, and writes diagnostics only to stderr/log.
- Description: Add
-
Extract Full Assistant Message
- Description: Extract
last_assistant_messageexactly as the speech source, applying only light whitespace cleanup described inStop Hookand no truncation. - Acceptance Criteria: Multi-paragraph messages are preserved, empty messages are skipped, and no
max_charspolicy exists in code or config.
- Description: Extract
-
Generate WAV From Kokoro
- Description: POST the full message to
/v1/audio/speechwith configured voice and speed, baked-in modelkokoro, response formatwav, andstream = false. - Acceptance Criteria: A successful Kokoro response is written to a unique temporary
.wavfile, and request failures are logged without failing the hook.
- Description: POST the full message to
-
Implement Non-Blocking Playback
- Description: Add
codex/src/tts_hook/playback.pyto choosepw-play,paplay,ffplay -nodisp -autoexit, oraplaywhenplayer = "auto", then spawn playback in the background by default. - Acceptance Criteria: The hook returns before playback completes when
blocking = false; if no player exists, it logs a warning and still returns valid hook JSON.
- Description: Add
-
Add Stop Fixtures
- Description: Add local JSON fixtures covering a normal assistant response, an empty message, a long multi-paragraph message, and malformed input.
- Acceptance Criteria: Fixtures can be piped into the stop script to validate stdout, logging, Kokoro call behavior, and playback spawning.
Goal: Prove the hooks work against the running local Kokoro-FastAPI container and Fedora host audio tools before testing inside Codex.
Acceptance Criteria:
- Startup and stop scripts work from the plugin root with relative paths, matching the
Codex Plugin Shapeassumption. - Kokoro health, voice validation, speech generation, and non-blocking playback all work locally.
- Failures are logged and surfaced as warnings without blocking.
Depends on: Phase 3 and Phase 4
Tasks:
-
Run Static Validation
- Description: Validate JSON files, Python syntax, and import paths for all plugin files.
- Acceptance Criteria: JSON validation succeeds, Python files compile, and both scripts can import
tts_hookmodules when run fromcodex/.
-
Verify Startup Hook Locally
- Description: Pipe startup fixtures into
python3 ./scripts/codex_session_start_tts_check.pyfromcodex/. - Acceptance Criteria: Healthy Kokoro produces a continue result, unavailable Kokoro produces a warning continue result, and invalid voice produces a warning continue result.
- Description: Pipe startup fixtures into
-
Verify Stop Hook Locally
- Description: Pipe stop fixtures into
python3 ./scripts/codex_stop_tts.pyfromcodex/with the local Kokoro API running. - Acceptance Criteria: The hook returns immediately with valid JSON, generates audio for the full message, and starts host playback.
- Description: Pipe stop fixtures into
-
Verify Failure Handling
- Description: Test stopped Kokoro, invalid host/port, invalid voice, and missing playback command scenarios.
- Acceptance Criteria: Each failure logs a useful diagnostic, returns valid hook JSON, and does not leave the terminal stdout polluted.
Goal: Exercise the packageable plugin scaffold inside Codex with SessionStart and Stop lifecycle events, then document any adjustments needed for actual plugin behavior.
Acceptance Criteria:
- Codex can load the plugin metadata and hook configuration from
codex/. SessionStartwarnings andStopplayback behavior work in a real Codex session.- If plugin-root relative hook command paths are wrong, the command wrapper is adjusted and documented.
Depends on: Phase 5
Tasks:
-
Enable Plugin for Local Trial
- Description: Follow Codex plugin setup guidance for a local plugin unit using
codex/.codex-plugin/plugin.jsonandcodex/hooks/hooks.json; do not broaden the design to solve long-term installation mechanics. - Acceptance Criteria: Codex attempts to load the plugin and recognizes its hook configuration.
- Description: Follow Codex plugin setup guidance for a local plugin unit using
-
Test SessionStart in Codex
- Description: Start or resume a Codex session with Kokoro running and then with Kokoro unavailable.
- Acceptance Criteria: Healthy startup is quiet; unavailable startup produces a warning and continues.
-
Test Stop Playback in Codex
- Description: Complete a Codex turn and verify the final assistant response is sent to Kokoro and played through host audio.
- Acceptance Criteria: Playback starts after
Stop, Codex is not blocked by full audio playback, and stdout contract errors do not occur.
-
Resolve Relative Path Assumption if Needed
- Description: If Codex does not execute hook commands relative to the plugin root, add the smallest wrapper or command adjustment that preserves the plugin shape.
- Acceptance Criteria: The plugin works in Codex without requiring absolute user-specific paths in
hooks/hooks.json.
-
Update Setup Guide
- Description: Update
codex/README.mdwith verified setup steps, assumptions, troubleshooting, and expected Kokoro startup state. - Acceptance Criteria: A reader can reproduce the working local Codex trial using the documented steps and plugin-local config file.
- Description: Update