Fix relay worker final-summary capture for Codex PTY runs

## Problem

OpenKaren relay execution is much more stable now, but the final Telegram-facing summary is still unreliable for Codex PTY workers.

The remaining failure mode is no longer primarily broker/session reuse. The narrow issue is **final-answer capture**.

In live relay runs, worker roles often finish with `waitStatus: "idle"` or `waitStatus: "exited"`, but OpenKaren still cannot consistently recover a clean final answer. The resulting Telegram summary often degrades into one of these:

- raw Codex PTY chrome
- `verifier finished without relay output.`
- noisy partial fragments

## What is already fixed

This issue is **not** about redoing the earlier relay reuse work from scratch. The following work has already materially improved the runtime:

- stale in-memory `RelaySession` reuse was removed
- new turns rebuild session state from persisted broker state
- repeated-turn hangs became much less frequent
- relay worker model normalization now avoids unsupported ChatGPT-account models for Codex workers by falling back to `gpt-5.4`
- focused validation has been passing:
  - `npm run test:agent-runner`
  - `npm run typecheck`

## Current diagnosis

The remaining problem is that Codex PTY workers do not provide OpenKaren a dependable structured final-answer surface.

Right now OpenKaren tries to infer the final answer from:

- PTY worker log tails
- `worker_stream` output chunks
- `idle` / `exited` worker lifecycle states

That inference is still weak.

### Concrete live symptoms observed

- workers can finish as `idle` or `exited`
- verifier sometimes leaves no clean natural-language summary block
- PTY logs are dominated by terminal chrome, MCP boot lines, watchdog noise, and other Codex UI fragments
- useful text can exist, but is often buried in noisy PTY output

### Concrete evidence already seen in live runs

- unsupported relay worker model path previously showed up as:
  - `The 'opencode/gpt-5-nano' model is not supported when using Codex with a ChatGPT account.`
- after relay model normalization, live workers began using `gpt-5.4`, which improved execution viability but **did not fully solve final summary extraction**
- some live artifacts still ended with:
  - `verifier finished without relay output.`
- some other artifacts captured noisy but clearly useful text inside a larger PTY blob, for example summary-like content such as:
  - `normalize unsupported Codex account models to gpt-5.4, and filter Codex UI noise so downstream roles keep real worker output. Fresh proof: npm run test:agent-runner passed ...`

## Partial mitigation already added

Verifier prompting has already been tightened to ask for an explicit final marker:

- `OPENKAREN_FINAL: <final reply>`

The extractor also now prefers that marker when present.

However, live behavior still suggests that some workers exit before leaving a dependable recoverable final line.

## What needs to happen for a real fix

### 1. Treat clean `exited` as a first-class completion candidate

Do not assume `exited` means unusable output.

If a worker exits cleanly and leaves a valid final summary marker or a valid final natural-language completion block, OpenKaren should treat that as a successful completion path.

### 2. Capture the final answer from a deterministic surface, not just noisy PTY logs

Preferred order should be something like:

1. explicit `OPENKAREN_FINAL:` marker
2. last meaningful natural-language block near completion/exit
3. fallback to shaped worker stream/log delta only if it looks like a real answer

### 3. Tighten prompt contracts for orchestrated roles, especially verifier

The verifier should be required to emit exactly one final-answer marker at the end.

If needed, planner/reviewer/implementer can also be asked to avoid UI-heavy chatter and keep handoff blocks minimal.

### 4. Avoid treating Codex UI noise as useful completion text

The extractor should continue excluding things like:

- `Starting MCP servers`
- `Booting MCP server`
- terminal chrome
- status spinners
- watchdog idle logs
- `/model to change`
- `https://chatgpt.com/codex...`
- handoff prompt scaffolding that was echoed back

### 5. Add tests for the real end-state, not just plumbing

We need focused tests that prove:

- `idle` + explicit `OPENKAREN_FINAL:` returns the final summary
- `exited` + explicit `OPENKAREN_FINAL:` also returns the final summary
- noisy PTY blobs with one buried useful summary extract the useful summary
- noisy PTY blobs with no useful summary fall back to the current safe degraded text
- handoff prompts / echoed context are not mistaken for final answers

## Suggested acceptance criteria

- [ ] A live relay run with Codex workers returns a clean Telegram-ready summary instead of PTY chrome
- [ ] A live verifier that exits after emitting `OPENKAREN_FINAL:` is treated as successful and the marker text is returned
- [ ] `verifier finished without relay output.` is only used when there is truly no recoverable final answer
- [ ] PTY UI noise is not surfaced as the final Telegram reply
- [ ] Focused tests cover `idle` and `exited` completion with explicit summary markers and noisy-log fallbacks

## Nice-to-have

If the SDK exposes a better structured final-message/event surface in the future, OpenKaren should switch to that instead of relying so heavily on PTY logs.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix relay worker final-summary capture for Codex PTY runs #1

Problem

What is already fixed

Current diagnosis

Concrete live symptoms observed

Concrete evidence already seen in live runs

Partial mitigation already added

What needs to happen for a real fix

1. Treat clean `exited` as a first-class completion candidate

2. Capture the final answer from a deterministic surface, not just noisy PTY logs

3. Tighten prompt contracts for orchestrated roles, especially verifier

4. Avoid treating Codex UI noise as useful completion text

5. Add tests for the real end-state, not just plumbing

Suggested acceptance criteria

Nice-to-have

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Fix relay worker final-summary capture for Codex PTY runs #1

Description

Problem

What is already fixed

Current diagnosis

Concrete live symptoms observed

Concrete evidence already seen in live runs

Partial mitigation already added

What needs to happen for a real fix

1. Treat clean exited as a first-class completion candidate

2. Capture the final answer from a deterministic surface, not just noisy PTY logs

3. Tighten prompt contracts for orchestrated roles, especially verifier

4. Avoid treating Codex UI noise as useful completion text

5. Add tests for the real end-state, not just plumbing

Suggested acceptance criteria

Nice-to-have

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. Treat clean `exited` as a first-class completion candidate