Problem
OpenKaren relay execution is much more stable now, but the final Telegram-facing summary is still unreliable for Codex PTY workers.
The remaining failure mode is no longer primarily broker/session reuse. The narrow issue is final-answer capture.
In live relay runs, worker roles often finish with waitStatus: "idle" or waitStatus: "exited", but OpenKaren still cannot consistently recover a clean final answer. The resulting Telegram summary often degrades into one of these:
- raw Codex PTY chrome
verifier finished without relay output.
- noisy partial fragments
What is already fixed
This issue is not about redoing the earlier relay reuse work from scratch. The following work has already materially improved the runtime:
- stale in-memory
RelaySession reuse was removed
- new turns rebuild session state from persisted broker state
- repeated-turn hangs became much less frequent
- relay worker model normalization now avoids unsupported ChatGPT-account models for Codex workers by falling back to
gpt-5.4
- focused validation has been passing:
npm run test:agent-runner
npm run typecheck
Current diagnosis
The remaining problem is that Codex PTY workers do not provide OpenKaren a dependable structured final-answer surface.
Right now OpenKaren tries to infer the final answer from:
- PTY worker log tails
worker_stream output chunks
idle / exited worker lifecycle states
That inference is still weak.
Concrete live symptoms observed
- workers can finish as
idle or exited
- verifier sometimes leaves no clean natural-language summary block
- PTY logs are dominated by terminal chrome, MCP boot lines, watchdog noise, and other Codex UI fragments
- useful text can exist, but is often buried in noisy PTY output
Concrete evidence already seen in live runs
- unsupported relay worker model path previously showed up as:
The 'opencode/gpt-5-nano' model is not supported when using Codex with a ChatGPT account.
- after relay model normalization, live workers began using
gpt-5.4, which improved execution viability but did not fully solve final summary extraction
- some live artifacts still ended with:
verifier finished without relay output.
- some other artifacts captured noisy but clearly useful text inside a larger PTY blob, for example summary-like content such as:
normalize unsupported Codex account models to gpt-5.4, and filter Codex UI noise so downstream roles keep real worker output. Fresh proof: npm run test:agent-runner passed ...
Partial mitigation already added
Verifier prompting has already been tightened to ask for an explicit final marker:
OPENKAREN_FINAL: <final reply>
The extractor also now prefers that marker when present.
However, live behavior still suggests that some workers exit before leaving a dependable recoverable final line.
What needs to happen for a real fix
1. Treat clean exited as a first-class completion candidate
Do not assume exited means unusable output.
If a worker exits cleanly and leaves a valid final summary marker or a valid final natural-language completion block, OpenKaren should treat that as a successful completion path.
2. Capture the final answer from a deterministic surface, not just noisy PTY logs
Preferred order should be something like:
- explicit
OPENKAREN_FINAL: marker
- last meaningful natural-language block near completion/exit
- fallback to shaped worker stream/log delta only if it looks like a real answer
3. Tighten prompt contracts for orchestrated roles, especially verifier
The verifier should be required to emit exactly one final-answer marker at the end.
If needed, planner/reviewer/implementer can also be asked to avoid UI-heavy chatter and keep handoff blocks minimal.
4. Avoid treating Codex UI noise as useful completion text
The extractor should continue excluding things like:
Starting MCP servers
Booting MCP server
- terminal chrome
- status spinners
- watchdog idle logs
/model to change
https://chatgpt.com/codex...
- handoff prompt scaffolding that was echoed back
5. Add tests for the real end-state, not just plumbing
We need focused tests that prove:
idle + explicit OPENKAREN_FINAL: returns the final summary
exited + explicit OPENKAREN_FINAL: also returns the final summary
- noisy PTY blobs with one buried useful summary extract the useful summary
- noisy PTY blobs with no useful summary fall back to the current safe degraded text
- handoff prompts / echoed context are not mistaken for final answers
Suggested acceptance criteria
Nice-to-have
If the SDK exposes a better structured final-message/event surface in the future, OpenKaren should switch to that instead of relying so heavily on PTY logs.
Problem
OpenKaren relay execution is much more stable now, but the final Telegram-facing summary is still unreliable for Codex PTY workers.
The remaining failure mode is no longer primarily broker/session reuse. The narrow issue is final-answer capture.
In live relay runs, worker roles often finish with
waitStatus: "idle"orwaitStatus: "exited", but OpenKaren still cannot consistently recover a clean final answer. The resulting Telegram summary often degrades into one of these:verifier finished without relay output.What is already fixed
This issue is not about redoing the earlier relay reuse work from scratch. The following work has already materially improved the runtime:
RelaySessionreuse was removedgpt-5.4npm run test:agent-runnernpm run typecheckCurrent diagnosis
The remaining problem is that Codex PTY workers do not provide OpenKaren a dependable structured final-answer surface.
Right now OpenKaren tries to infer the final answer from:
worker_streamoutput chunksidle/exitedworker lifecycle statesThat inference is still weak.
Concrete live symptoms observed
idleorexitedConcrete evidence already seen in live runs
The 'opencode/gpt-5-nano' model is not supported when using Codex with a ChatGPT account.gpt-5.4, which improved execution viability but did not fully solve final summary extractionverifier finished without relay output.normalize unsupported Codex account models to gpt-5.4, and filter Codex UI noise so downstream roles keep real worker output. Fresh proof: npm run test:agent-runner passed ...Partial mitigation already added
Verifier prompting has already been tightened to ask for an explicit final marker:
OPENKAREN_FINAL: <final reply>The extractor also now prefers that marker when present.
However, live behavior still suggests that some workers exit before leaving a dependable recoverable final line.
What needs to happen for a real fix
1. Treat clean
exitedas a first-class completion candidateDo not assume
exitedmeans unusable output.If a worker exits cleanly and leaves a valid final summary marker or a valid final natural-language completion block, OpenKaren should treat that as a successful completion path.
2. Capture the final answer from a deterministic surface, not just noisy PTY logs
Preferred order should be something like:
OPENKAREN_FINAL:marker3. Tighten prompt contracts for orchestrated roles, especially verifier
The verifier should be required to emit exactly one final-answer marker at the end.
If needed, planner/reviewer/implementer can also be asked to avoid UI-heavy chatter and keep handoff blocks minimal.
4. Avoid treating Codex UI noise as useful completion text
The extractor should continue excluding things like:
Starting MCP serversBooting MCP server/model to changehttps://chatgpt.com/codex...5. Add tests for the real end-state, not just plumbing
We need focused tests that prove:
idle+ explicitOPENKAREN_FINAL:returns the final summaryexited+ explicitOPENKAREN_FINAL:also returns the final summarySuggested acceptance criteria
OPENKAREN_FINAL:is treated as successful and the marker text is returnedverifier finished without relay output.is only used when there is truly no recoverable final answeridleandexitedcompletion with explicit summary markers and noisy-log fallbacksNice-to-have
If the SDK exposes a better structured final-message/event surface in the future, OpenKaren should switch to that instead of relying so heavily on PTY logs.