Skip to content

Fix relay worker final-summary capture for Codex PTY runs #1

@miyaontherelay

Description

@miyaontherelay

Problem

OpenKaren relay execution is much more stable now, but the final Telegram-facing summary is still unreliable for Codex PTY workers.

The remaining failure mode is no longer primarily broker/session reuse. The narrow issue is final-answer capture.

In live relay runs, worker roles often finish with waitStatus: "idle" or waitStatus: "exited", but OpenKaren still cannot consistently recover a clean final answer. The resulting Telegram summary often degrades into one of these:

  • raw Codex PTY chrome
  • verifier finished without relay output.
  • noisy partial fragments

What is already fixed

This issue is not about redoing the earlier relay reuse work from scratch. The following work has already materially improved the runtime:

  • stale in-memory RelaySession reuse was removed
  • new turns rebuild session state from persisted broker state
  • repeated-turn hangs became much less frequent
  • relay worker model normalization now avoids unsupported ChatGPT-account models for Codex workers by falling back to gpt-5.4
  • focused validation has been passing:
    • npm run test:agent-runner
    • npm run typecheck

Current diagnosis

The remaining problem is that Codex PTY workers do not provide OpenKaren a dependable structured final-answer surface.

Right now OpenKaren tries to infer the final answer from:

  • PTY worker log tails
  • worker_stream output chunks
  • idle / exited worker lifecycle states

That inference is still weak.

Concrete live symptoms observed

  • workers can finish as idle or exited
  • verifier sometimes leaves no clean natural-language summary block
  • PTY logs are dominated by terminal chrome, MCP boot lines, watchdog noise, and other Codex UI fragments
  • useful text can exist, but is often buried in noisy PTY output

Concrete evidence already seen in live runs

  • unsupported relay worker model path previously showed up as:
    • The 'opencode/gpt-5-nano' model is not supported when using Codex with a ChatGPT account.
  • after relay model normalization, live workers began using gpt-5.4, which improved execution viability but did not fully solve final summary extraction
  • some live artifacts still ended with:
    • verifier finished without relay output.
  • some other artifacts captured noisy but clearly useful text inside a larger PTY blob, for example summary-like content such as:
    • normalize unsupported Codex account models to gpt-5.4, and filter Codex UI noise so downstream roles keep real worker output. Fresh proof: npm run test:agent-runner passed ...

Partial mitigation already added

Verifier prompting has already been tightened to ask for an explicit final marker:

  • OPENKAREN_FINAL: <final reply>

The extractor also now prefers that marker when present.

However, live behavior still suggests that some workers exit before leaving a dependable recoverable final line.

What needs to happen for a real fix

1. Treat clean exited as a first-class completion candidate

Do not assume exited means unusable output.

If a worker exits cleanly and leaves a valid final summary marker or a valid final natural-language completion block, OpenKaren should treat that as a successful completion path.

2. Capture the final answer from a deterministic surface, not just noisy PTY logs

Preferred order should be something like:

  1. explicit OPENKAREN_FINAL: marker
  2. last meaningful natural-language block near completion/exit
  3. fallback to shaped worker stream/log delta only if it looks like a real answer

3. Tighten prompt contracts for orchestrated roles, especially verifier

The verifier should be required to emit exactly one final-answer marker at the end.

If needed, planner/reviewer/implementer can also be asked to avoid UI-heavy chatter and keep handoff blocks minimal.

4. Avoid treating Codex UI noise as useful completion text

The extractor should continue excluding things like:

  • Starting MCP servers
  • Booting MCP server
  • terminal chrome
  • status spinners
  • watchdog idle logs
  • /model to change
  • https://chatgpt.com/codex...
  • handoff prompt scaffolding that was echoed back

5. Add tests for the real end-state, not just plumbing

We need focused tests that prove:

  • idle + explicit OPENKAREN_FINAL: returns the final summary
  • exited + explicit OPENKAREN_FINAL: also returns the final summary
  • noisy PTY blobs with one buried useful summary extract the useful summary
  • noisy PTY blobs with no useful summary fall back to the current safe degraded text
  • handoff prompts / echoed context are not mistaken for final answers

Suggested acceptance criteria

  • A live relay run with Codex workers returns a clean Telegram-ready summary instead of PTY chrome
  • A live verifier that exits after emitting OPENKAREN_FINAL: is treated as successful and the marker text is returned
  • verifier finished without relay output. is only used when there is truly no recoverable final answer
  • PTY UI noise is not surfaced as the final Telegram reply
  • Focused tests cover idle and exited completion with explicit summary markers and noisy-log fallbacks

Nice-to-have

If the SDK exposes a better structured final-message/event surface in the future, OpenKaren should switch to that instead of relying so heavily on PTY logs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions