This file captures validation nuances that are useful for agents and maintainers. It is not a list of release blockers.
For feature-to-test ownership, see docs/testing/regression-map.md.
The public docs describe the intended observer/UI contract:
/observe allmoves one global observer target./observe offdisables passive monitoring.- foreign GUI/CLI runs render as
New run -> [User] -> [commentary] -> [Tool] -> [Output]. - completed runs send a new
[Final]card with Details, leave status on[commentary]while active and[Final]after completion, then delete transient live cards best-effort. - Telegram notifications are intentionally low-noise: only
New runwhen enabled,[Plan], and[Final]are audible. Tools fileandGet full logare explicit on-demand exports.
When changing this behavior, update the ADRs and run both unit tests and a live Telegram E2E path if a bot token is available.
codex app-server is the integration surface, but its thread/read payloads and live notifications can drift across Codex versions. Tests should cover snapshot normalization, plan prompts, tool rendering from thread/read, and fallback parsing.
Important areas to re-check after Codex upgrades:
agentMessage.phaseclassification for commentary versus final answers.status.activeFlagsvalues such aswaitingOnUserInputandwaitingOnInput.- stale lifecycle snapshots where a latest turn has
final_answerbut still reportsinProgress. turn/steererror text for stale turns, especiallyno active turn to steer.- tool call shape in live notifications and
thread/readsnapshots. - availability of
thread.raw_json.thread.pathfor full-log exports. - missing or null command/request/status fields that could render as literal
"<nil>".
thread/read status alone is not sufficient to decide whether Telegram input would start a parallel turn. The bridge normalizes a latest final_answer to completed state and clears the active turn before routing. If a steer attempt returns no active turn to steer, the bridge re-reads the thread and may start a new turn. Errors that imply an active or not-steerable turn still block fallback.
Regression tests for this area should cover:
- final-answer normalization from stale
inProgress. - no resurrection of SQLite
ActiveTurnIDafter a terminal snapshot. - fallback
turn/startafterno active turn to steer. - no fallback for active-but-not-steerable failures.
- no duplicate global-observer panel for a marked Telegram-origin turn.
Live Telegram-origin turns can expose two independent forms of App Server drift:
- duplicate live event loops after daemon startup or repair when session lifecycle is not serialized.
- transient
interruptedsnapshots, including final-bearing snapshots, that later recover toinProgressorcompletedfor the same turn.
Regression tests should cover:
- startup/reconcile/repair cannot create duplicate live subscriptions.
- stale old live loops cannot clear newer session state or trigger repair loops.
- Telegram-origin
interrupteddoes not compact or render terminal during the grace window, even when partial tool/output or final evidence is already present. - Telegram-origin active turns use App Server live
item/*events for authoritative current tool visibility;thread/readremains the reconciliation source for completed/foreign state. - explicit
/stopacceptsinterruptedimmediately. - recovered turns clear the defer marker and continue normal live panel rendering.
The product contract remains:
- explicit thread id in a command.
- reply-to routed message.
- armed one-shot steer or answer state.
- bound thread for the current chat/topic.
Route correctness must be tested through persisted message routes and callback tokens, not by parsing rendered Telegram headers.
The runtime depends on SQLite through modernc.org/sqlite. Broad go test ./... coverage expects go.sum to be committed and the local Go toolchain to be available.
Validation commands:
go test ./...
go build -buildvcs=false ./...
git diff --checkBot API send/edit success is not enough for user-facing changes. For Telegram UI, routing, callbacks, Plan Mode, Details, Markdown formatting, or observer behavior, verify the rendered result with a real Telegram readback when possible.
Literal "<nil>" in Telegram is treated as a rendering bug. It usually means App Server or rollout data omitted a field that Go code stringified with %v.
Validation expectations:
- unit tests cover nil-like map/slice extraction, command rendering, stale session-tail command suppression, summary Markdown rendering, App Server RPC id stringification, and snapshot string normalization.
- live validation must read edited Telegram messages, not only newly delivered messages.
- the checked-in public-safe harness
tests/live_e2e/telegram_readback_e2e.pyexercises sequentialpwd,date,printf, a dedicated sleep-20 timing run, and a multi-command math run against a dedicated private test thread configured only through local env. - when validating stale-command regressions manually, prefer one private test-thread turn that runs several safe shell commands sequentially as separate tool calls; watch Telegram
MessageEditedupdates and verify[commentary]shows whole-run elapsed time while Telegram-origin[Tool]shows liveCurrent tool:and then settles toLast completed tool:. - current command visibility is part of the live Telegram-origin contract only when it comes from App Server live events for the matching
thread_id + turn_id; foreign GUI/CLI observer panels remain completed-tool only. - while a run is active,
[commentary]must showRun active for: ...; after Final Card collapse,[Final]must showRun duration: .... - after a later turn completes, delayed live tool notifications from an earlier turn must not create a new observer panel or re-render stale tool/output content.
[Tool]may showCurrent tool:for a matching Telegram-origin live event. Otherwise it should showLast completed tool:when available, orNo completed tool yet..[Output]should showLast completed output:for the same completed tool when output is available.- Details view should be checked during E2E when the button is available. For multiple completed runs in one thread, opening
Detailsand pressingBackon an older Final Card must keep that older card bound to its original turn; stale or mismatched Details callbacks must not render the latest run. - Tool-only turns with no commentary and no output must still show completed command/status in Details under
Tool activity. - do not commit Telegram sessions, target thread ids, raw message ids, logs, env files, or screenshots.
Latest local validation note:
- 2026-05-07 PDT macOS v0.4.0 service installer validation built the CLI and tray app, built a local darwin/arm64
.pkgdry-run, verified package contents, and confirmed admin installation is blocked in this non-interactive session without sudo credentials. A non-interactivectr-go service installtemp-home smoke wroteconfig.envwith0600, generated a LaunchAgent with onlyCTR_GO_CONFIG, and did not leak the dummy token. The real service setup path then ran the friendly wizard over the existing private config, preserved proxy runtime env inconfig.env, installed the user LaunchAgent, and started the daemon.launchctlshowed the plist environment limited toCTR_GO_CONFIG; Telegram/statusreadback showed the daemon ready with live and poll App Server sessions connected. Livenewchat_folderandtool_only_sleep_detailsE2E passed through the service-managed daemon, including real Chat folder creation,/newthreadno-cwd behavior, liveCurrent tool:duringsleep 10, DetailsTool activity, and Tools file export. Service lifecycle smoke covered start/stop/start plus login enable/disable, and stderr token URL redaction was verified after an early Telegram startup timeout exposed the risk. - 2026-05-07 PDT macOS v0.3.0 distribution validation built local release archives for macOS, Linux, and Windows through
scripts/build-release.sh, then extracted the darwin/arm64 archive and ranctr-go initplusctr-go statusagainst a temporary config/home path. The binary reportedv0.3.0, wrote the config with0600permissions, read config-file values without token env vars, and did not print the dummy Telegram token in init/status output. Targeted config/CLI tests, full repository tests, andgo build -buildvcs=false ./...passed before final release checks. - 2026-05-06 PDT macOS Plan Mode reset E2E rebuilt and restarted the LaunchAgent daemon, then ran the checked-in
plan_mode_resetlive case through MTProto readback. The case created a disposable thread, started Plan Mode, observed a Plan-like Final Card withTurn off Plan, clicked it, and verified the next normalsleep 5request executed withCurrent tool:before reaching[Final]. It repeated the reset path with/stop <thread>while the thread was idle;/stopreturned an idle response and the following normalsleep 5request also executed. No private thread/message ids, raw logs, or local env values are recorded here. - 2026-05-06 macOS v0.2.5 release validation rebuilt and restarted the LaunchAgent daemon after adding the notification contract, tool-only Details preservation, and explicit Default Mode escape hatch. Targeted App Server/daemon/Telegram tests, full repository tests, build, diff check, Python harness compile, and targeted secret/local scan passed before tagging. The release uses the prior checked-in
tool_only_sleep_detailsandnotification_contractlive Telegram readback evidence for the user-visible message lifecycle. - 2026-05-05 PDT macOS tool-only sleep Details E2E rebuilt and restarted the LaunchAgent daemon, then ran the checked-in
tool_only_sleep_detailslive case through MTProto readback against a private execution-mode test thread.[Tool]showedCurrent tool:for a singlesleep 10command before completion, thenLast completed tool:before Final cleanup. Opening Details on the new Final Card showedTool activitywith the completed command/status,Tool onkept the same tool visible, andTools fileexported the command despite empty output. A follow-upnotification_contractsmoke passed on the same execution-mode contour. - 2026-05-05 PDT macOS notification-contract E2E rebuilt and restarted the LaunchAgent daemon, then ran the checked-in
notification_contractlive case through MTProto readback. The case observedNew run, live[commentary]/[Tool]/[Output], a new[Final]card with a different message id, deletion of the old live commentary card, and Details/Back editing the new Final card. The Plan step reached a routeable[Plan]card; the live App Server shape did not expose structured choice buttons, so the harness recorded the documented fallback and stopped the waiting Plan turn. - 2026-05-05 PDT macOS newchat folder E2E rebuilt and restarted the LaunchAgent daemon, then ran the checked-in
newchat_folderlive case through MTProto readback./newchatcreated a dated Chat folder under the configured Chats root, reached[Final], and/contextwas bound to that generated cwd./newthreadreached[Final]without creating a Chat folder under the configured Chats root; App Server reported the daemon default cwd for that thread. No literal"<nil>", visible interrupted state, or false parallel-turn rejection appeared. - 2026-05-05 PDT macOS command-menu smoke rebuilt and restarted the LaunchAgent daemon, then read Telegram Bot API
getMyCommandsfor the private operator bot without printing the token. The registered command list includednewchatalongside the existing menu commands. - 2026-05-05 PDT macOS Details binding regression E2E rebuilt and restarted the LaunchAgent daemon, then used MTProto readback against a private test thread. The checked-in
details_bindinglive case created two completed Telegram-origin/replyruns in the same Codex thread. OpeningDetailson the older Final Card showed only the older run commentary,Tool onshowed only the older tool/output,Tools filedownloaded older-run details,Backrestored the older Final Card in the same message, and the newer Final Card remained unchanged. No latest-run duplication, stale Details fallback, or literal"<nil>"appeared. - 2026-05-04 PDT macOS Projects/Chats label live readback rebuilt and restarted the LaunchAgent daemon, then used MTProto readback against the private operator bot.
/projectsrendered named project buttons inN. Project nameformat, named Chat preview buttons inChat N. Thread nameformat, project rows withlast thread:, and no visiblekey:rows. Opening a named project showed the project menu withNew thread;Open Chatsshowed named Chat buttons with noNew threadaction; selecting a Chat opened/bound it as confirmed by/context. - 2026-05-04 PDT macOS Projects/Chats live readback rebuilt and restarted the LaunchAgent daemon, then used MTProto readback against the private operator bot.
/projectsrendered normal projects newest-first withLatest Chatspreviews and noDocuments/Codexcwd entries as normal project rows.Open Chatsedited the same menu into a paginatedChatslist with noNew threadaction. Selecting the first Chat opened/bound its single thread as confirmed by/context./newchat <prompt>started a new App Server thread without a cwd parameter, reached[Final], and a plain follow-up routed to the newly bound thread. No literal"<nil>"appeared. - 2026-05-04 PDT macOS live current-tool priority regression E2E rebuilt and restarted the LaunchAgent daemon, then used MTProto readback against a private test thread. The new
current_tool_prioritycase asked a Telegram-origin/replyrun to execute two separate long-running shell commands that print progress lines while they work.[Tool]showed the first command as current, later the first completed progress output, then the second command asCurrent tool:; after the second command became current,[Tool]did not revert to the older completed first command before settling on the second completed command.[Final]includedRun duration; no literal"<nil>", stale-command revert, false parallel-turn rejection, or visible interrupted state appeared. - 2026-05-04 PDT macOS v0.2.0 live-event refactor E2E rebuilt and restarted the LaunchAgent daemon, then used MTProto readback against a private test thread.
sleep20_timingshowedCurrent tool:forsleep 20; printf ...before completion, thenLast completed tool:,[Output], and[Final]with run duration.multi_tool_currentshowed two separate live current tool transitions and all three completed tool/output transitions. A follow-up lifecycle smoke exposed an upstream App Server/tool-run failure (write_stdin failed: stdin is closed for this session) on one intentionally strict sequential-command prompt; after preserving live current state across tool-less same-turnthread/read, a freshsleep20_timingretry passed with no<nil>, stale command, false parallel-turn rejection, or visible non-final interrupted state. - 2026-05-03 PDT macOS create-thread live readback verified
/projects -> project menu -> New thread -> first prompt -> [Final]and a plain follow-up routed to the newly bound thread. The same pass exposed a synthetic Plan fallback bug: a later Plan turn with no structured question rendered the previous turn preview as the[Plan]prompt. After the normalizer fix, the Plan-only live retry renderedInput required.with no stale previous prompt and no choice buttons outside[Plan]. - 2026-05-03 PDT macOS live readback found a stale Plan button regression: a newer
[commentary]card for turn...efdadisplayed structured answer buttons whose callback routes still targeted older Plan turn...eea2/ item...a08f. Sanitized daemon evidence showed the older turn repeatedly alternatinginterruptedandinProgresswithwaiting_reply=true, then a newer turn enteringinProgress; the pendinguser_inputrow remained attached to the older turn and must be filtered by turn before rendering summary buttons. - 2026-04-30 PDT macOS live complex
/replyE2E reproduced a transient partialinterruptedsnapshot with tool/output evidence before final completion. After extending the terminal gate to defer non-final Telegram-origininterrupted, the same E2E passed: sequential command updates stayed visible, then a multi-command number-theory task created a temporary helper and ran four Python range commands before reaching[Final]withCOUNT=2034 SUM=115514223; no visible interrupted state, literal"<nil>", stale command, or input rejection appeared. - 2026-04-30 PDT macOS live logging-flags E2E used MTProto readback against a private test thread after rebuilding and restarting the daemon. It verified sequential
pwd,date,printf 'alpha\nbeta\n', andsleep 20; printf 'slow-command-done\n'tool updates, with the slow command visible in[Tool]about 20 seconds before[Output]; a separate/replymath run reached[Final]with the expected answer and no visible interrupted state. Daemon diagnostics remained present with default logging flags. - 2026-04-30 PDT macOS live stale-command E2E used MTProto readback of
MessageEditedupdates for one private test-thread turn withpwd,date,printf 'alpha\nbeta\n', andsleep 20; printf 'slow-command-done\n'.[Tool]showed the slow command in progress about 20 seconds before[Output]containedslow-command-done; no literal"<nil>"or stale session-tail command appeared. - 2026-04-29 macOS live nil-guard E2E completed all three scenarios and found no literal
"<nil>"in edited New run, summary/Final, Tool, Output, or Details messages after the sanitizer change.
The checked-in public-safe harness tests/live_e2e/telegram_readback_e2e.py validates Telegram-origin lifecycle behavior. It must use MTProto readback, not Bot API polling, and it must require CODEX_TG_LIVE_E2E=1 plus CODEX_TG_E2E_THREAD_ID so it never defaults to the current operator thread.
Acceptance checks:
- sequential command run reaches
[Final]. - Telegram-origin slow command becomes visible as
Current tool:before[Output]contains its completion text. - complex
/replymath run uses multiple shell commands and reaches[Final]with the expected aggregate answer. - Telegram readback contains no false parallel-turn warning, literal
"<nil>", stale known command, or false visibleStatus: interrupted. - optional daemon log correlation for the scenario window contains no premature interrupted terminal before recovery/expiry, no input rejection, and no
telegram_render_contains_nil.
Do not commit Telegram user sessions, chat/thread ids, raw message ids, raw logs, env files, or screenshots.