Add Thoth plugin#132
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces the Thoth plugin, a dashboard-first orchestration runtime for autoresearch, by adding the necessary configuration files, documentation, and command micro-prompts. The review identified several critical issues regarding file path resolutions in the plugin manifest and the execution logic within the command micro-prompts. Specifically, the skills and composerIcon paths in plugin.json require adjustment to correctly reference the bundle root, and the execution strings in the command files contain incorrect path segments for the plugin cache and are missing necessary runtime files for standalone functionality.
| "runtime", | ||
| "dashboard" | ||
| ], | ||
| "skills": "./plugins/thoth/skills", |
There was a problem hiding this comment.
The skills path appears to be incorrect. Since the manifest is located within the .codex-plugin/ directory, the relative path to the skills directory at the bundle root should start with ../ to be correctly resolved by the Codex runtime.
| "skills": "./plugins/thoth/skills", | |
| "skills": "../plugins/thoth/skills", |
| [English](./README.md) | [简体中文](./README.zh-CN.md) | ||
|
|
||
| <div align="center"> | ||
| <h1>🐦 Thoth — Dashboard-First Runtime for Autoresearch</h1> | ||
| <img src="assets/thoth.png" width="80%" alt="Thoth logo" /> | ||
| <p><strong>Dashboard-first orchestration runtime for autoresearch.</strong></p> | ||
| <p>Turn drifting agent work into durable runs, locked work items, and reviewable verdicts.</p> | ||
| <p> | ||
| <img alt="Runtime Dashboard First" src="https://img.shields.io/badge/runtime-dashboard--first-4B5563?style=for-the-badge&labelColor=3F3F46&color=0F766E" /> | ||
| <img alt="Mode Autoresearch" src="https://img.shields.io/badge/mode-autoresearch-4B5563?style=for-the-badge&labelColor=3F3F46&color=B45309" /> | ||
| <img alt="Engine Orchestration" src="https://img.shields.io/badge/engine-orchestration-4B5563?style=for-the-badge&labelColor=3F3F46&color=2563EB" /> | ||
| <img alt="Trust Work Locked" src="https://img.shields.io/badge/trust-work--locked-4B5563?style=for-the-badge&labelColor=3F3F46&color=6D28D9" /> | ||
| </p> | ||
| <p> | ||
| <img alt="Claude Code Plugin" src="https://img.shields.io/badge/Claude%20Code-plugin-4B5563?style=flat-square&labelColor=3F3F46&color=0284C7" /> | ||
| <img alt="Codex Plugin" src="https://img.shields.io/badge/Codex-plugin-4B5563?style=flat-square&labelColor=3F3F46&color=65A30D" /> | ||
| <img alt="Ready Work --work-id" src="https://img.shields.io/badge/work-strict%20--work--id-4B5563?style=flat-square&labelColor=3F3F46&color=7C3AED" /> | ||
| <img alt="Version 0.2.0" src="https://img.shields.io/badge/version-0.2.0-4B5563?style=flat-square&labelColor=3F3F46&color=0369A1" /> | ||
| <img alt="License MIT" src="https://img.shields.io/badge/license-MIT-4B5563?style=flat-square&labelColor=3F3F46&color=84CC16" /> | ||
| </p> | ||
| <h2>🚀 What's New</h2> | ||
| <p><strong>v0.2.0 stable release</strong> · compact work-item authority with <code>work_id</code> · Claude Code and Codex plugin parity</p> | ||
| <img src="assets/thoth-teaser-figure-v2.png" width="100%" alt="Thoth concept banner" /> | ||
| </div> | ||
|
|
||
| ## Control Plane At A Glance | ||
|
|
||
| ```text | ||
| THOTH CONTROL PLANE | ||
|
|
||
| Claude Code surfaces Codex surfaces | ||
| /thoth:* command set $thoth command set | ||
| \ / | ||
| \ / | ||
| +-------------+ | ||
| | | ||
| v | ||
|
|
||
| +----------------------------------------------------------------------------+ | ||
| | Layer 1. Host Surface | | ||
| | | | ||
| | init discuss run loop review auto status | | ||
| | doctor dashboard | | ||
| +----------------------------------------------------------------------------+ | ||
| | | ||
| v | ||
| +----------------------------------------------------------------------------+ | ||
| | Layer 2. Planning Authority | | ||
| | | | ||
| | init -> bootstrap, migrate, or resync .thoth authority | | ||
| | discuss -> record discussions, decisions, and work items | | ||
| | | | ||
| | Discuss -> Decision -> Work Item Object Graph | | ||
| | | | | ||
| | v | | ||
| | Ready Work (--work-id) | | ||
| +----------------------------------------------------------------------------+ | ||
| | | ||
| v | ||
| +----------------------------------------------------------------------------+ | ||
| | Layer 3. Execution Runtime | | ||
| | | | ||
| | run -> one durable execution packet | | ||
| | loop -> one durable recoverable loop packet | | ||
| | review -> structured findings through the same protocol | | ||
| | auto -> priority-driven child loops for actionable work | | ||
| | | | ||
| | +---------------------------+ | | ||
| | | Ready Work (--work-id) | | | ||
| | +-------------+-------------+ | | ||
| | | | | ||
| | +----------+----------+ | | ||
| | | | | | ||
| | v v | | ||
| | Run Loop | | ||
| | | | | | ||
| | +----------+----------+ | | ||
| | | | | ||
| | v | | ||
| | Run Ledger / Events / Artifacts / Result | | ||
| | | | | ||
| | v | | ||
| | Mechanical Validation / Acceptance | | ||
| | | | ||
| | attach watch resume stop | | ||
| +----------------------------------------------------------------------------+ | ||
| | | ||
| v | ||
| +----------------------------------------------------------------------------+ | ||
| | Layer 4. Read Surfaces | | ||
| | | | ||
| | dashboard -> human-visible runtime workbench | | ||
| | status -> active / stale / attachable run summaries | | ||
| | doctor -> strict health, projection, and runtime-shape audit | | ||
| | report -> available through status --report | | ||
| | | | ||
| | +-----------+-----------+-----------+-----------+ | | ||
| | | | | | | | ||
| | v v v v | | ||
| | Dashboard Status Report Doctor | | ||
| +----------------------------------------------------------------------------+ | ||
|
|
||
| Key invariants: | ||
| - .thoth is the shared machine/runtime authority | ||
| - .agent-os is the human governance layer | ||
| - run and loop are strict --work-id surfaces | ||
| - auto executes only actionable ready/active/failed work; blocked and draft work require human decisions | ||
| - dashboard, status, report, and doctor are read surfaces, not authority writers | ||
| - run, loop, and auto progress through the RuntimeDriver until terminal or paused | ||
| ``` | ||
|
|
||
| ## Why Thoth | ||
|
|
||
| Thoth is a dashboard-first orchestration runtime for autoresearch. It assumes chat alone is not an operating system: truth must survive the session, progress must stay visible, and completion must be mechanically testable. | ||
|
|
||
| ## Failure Modes Table | ||
|
|
||
| | Problem | Why it matters | | ||
| | --- | --- | | ||
| | Work is not persistent | Long-running work dies with the session, so the agent cannot keep working while you sleep and there is no durable state to resume or audit. | | ||
| | Parallel work is invisible | Multiple threads or delegated runs drift apart, and humans cannot see what is actually active. | | ||
| | Agents can claim completion too early | A fluent summary can hide that nothing mechanical passed. | | ||
| | Docs and state rot over time | Discussions, decisions, work items, and runtime facts drift until nobody knows which layer is authoritative. | | ||
|
|
||
| ## Thoth Response Table | ||
|
|
||
| | Mechanism | What it does | Counters | | ||
| | --- | --- | --- | | ||
| | Hooks + watchdog + runtime | Keep execution attached to durable ledgers and observable lifecycle events. | Work is not persistent | | ||
| | Dashboard-first visibility | Show live, stale, attachable, and host-specific runtime truth in one read surface. | Parallel work is invisible | | ||
| | Mechanical yes/no acceptance | Force validators, ledgers, and result payloads to decide whether work really passed. | Agents can claim completion too early | | ||
| | Object graph + execution system + locked work items | Freeze what is allowed, compile it into runnable work items, and keep authority layers from drifting. | Docs and state rot over time | | ||
|
|
||
| ## System At A Glance | ||
|
|
||
| Humans should not spend their attention tracking every grain of sand in the funnel. Thoth lets AI own the middle of the hourglass, while the dashboard shows the gold that survives: decisions, work items, runs, results, and the current verdict. | ||
|
|
||
| ## Architecture Flow Table | ||
|
|
||
| | Stage | Purpose | Input | Output | | ||
| | --- | --- | --- | --- | | ||
| | Intent | Capture the user request and operating boundary. | Human goals, constraints, repo context | Direction for planning | | ||
| | Decision | Lock key choices before execution drifts. | Intent, open questions, policy constraints | Recorded decisions | | ||
| | Work Item | Freeze goal, constraints, execution plan, eval, runtime policy, and decisions. | Discussion, decisions, requirements, acceptance rules | Ready or blocked work item | | ||
| | Run | Execute one frozen `work_id@revision` through phase results. | Work item, controller policy, host surface | `.thoth/objects/run` plus `.thoth/runs/<run_id>` ledger | | ||
| | Result | Produce a mechanical verdict instead of narration alone. | Validator outputs, artifacts, runtime checks | Structured result and acceptance evidence | | ||
| | Dashboard | Let humans read the final state without replaying the chat. | Ledgers, read models, derived summaries | Inspectable project truth | | ||
|
|
||
| ## Quick Start | ||
|
|
||
| 1. Install Thoth on the host surfaces you use. | ||
|
|
||
| ```bash | ||
| claude plugin marketplace add SeeleAI/Thoth --scope user | ||
| claude plugin install thoth@thoth --scope user | ||
| codex plugin marketplace add SeeleAI/Thoth | ||
| ``` | ||
|
|
||
| For Codex, adding the marketplace is the source step. Then install or enable the `thoth` plugin from the Codex plugin directory. | ||
|
|
||
| After the plugin is installed, two different entry layers exist on purpose: | ||
|
|
||
| - Public plugin surface: `Claude /thoth:*`, `Codex $thoth <command>`, and the plugin-provided shell wrapper `thoth <command>` | ||
| - Source-repo development fallback: `python -m thoth.cli <command>` | ||
|
|
||
| Use the plugin-installed `thoth` wrapper in fresh repos or empty directories. Use `python -m thoth.cli` only when you are intentionally running against a checked-out Thoth source tree and want execution pinned to that exact checkout. | ||
|
|
||
| 2. Initialize the repository you want Thoth to manage. | ||
|
|
||
| ```text | ||
| /thoth:init | ||
| $thoth init | ||
| ``` | ||
|
|
||
| 3. Start the first strict run from a compiled task. | ||
|
|
||
| ```text | ||
| /thoth:run --work-id task-1 | ||
| $thoth run --work-id task-1 | ||
| ``` | ||
|
|
||
| 4. Open the read surface. | ||
|
|
||
| ```text | ||
| /thoth:dashboard | ||
| $thoth dashboard | ||
| ``` | ||
|
|
||
| ## Host Install And Upgrade | ||
|
|
||
| | Host | First install | Stable upgrade | Important note | | ||
| | --- | --- | --- | --- | | ||
| | Claude Code | `claude plugin marketplace add SeeleAI/Thoth --scope user` then `claude plugin install thoth@thoth --scope user` | `claude plugin marketplace update thoth` then `claude plugin update thoth@thoth --scope user` | Restart Claude Code after `plugin update` so the new version is applied. | | ||
| | Codex | `codex plugin marketplace add SeeleAI/Thoth`, then install or enable `thoth` from the Codex plugin directory | `codex plugin marketplace upgrade thoth` | `add` takes a source such as `SeeleAI/Thoth`; `upgrade` takes the configured marketplace name, which is `thoth` in this repo. | | ||
|
|
||
| ## Verification | ||
|
|
||
| Default development verification is intentionally targeted-only. Broad or full sweeps are not the normal workflow. | ||
|
|
||
| ### Atomic selftest | ||
|
|
||
| - The public selftest entrypoint is now atomic-only: | ||
|
|
||
| ```bash | ||
| python -m thoth.selftest --case plan.discuss.compile --case runtime.run.live | ||
| ``` | ||
|
|
||
| - `python -m thoth.selftest` without any `--case` fails on purpose and prints the available case catalog. | ||
| - Every case runs in its own workdir and artifact directory, writes a per-case report entry keyed by `case_id`, and must not depend on side effects from an earlier case. | ||
| - Release, regression, and closeout gates must record explicit case IDs instead of broad aliases such as `hard` or `heavy`. | ||
| - The current catalog is split into repo-local capability probes such as `plan.discuss.compile`, `runtime.run.live`, `runtime.loop.sleep`, `review.exact_match`, `observe.dashboard`, `hooks.codex`, plus host-surface probes such as `surface.codex.run.live_prepare` and `surface.claude.loop.stop`. | ||
|
|
||
| ### Targeted pytest | ||
|
|
||
| - Allowed developer entrypoints: | ||
|
|
||
| ```bash | ||
| python -m pytest -q tests/unit/test_selftest_registry.py | ||
| python -m pytest -q tests/unit/test_selftest_helpers.py::test_validate_pytest_invocation | ||
| python -m pytest -q --thoth-target selftest-core | ||
| ``` | ||
|
|
||
| - Blocked by default: bare `pytest`, directory-wide invocations such as `pytest tests/unit`, and broad tier sweeps such as `pytest --thoth-tier heavy`. | ||
| - Broad runs are reserved for explicit release or CI situations and require `--thoth-allow-broad` or `THOTH_ALLOW_BROAD_TESTS=1`. | ||
| - `--thoth-tier` is retained only as an explicit override path for those exempted broad runs; it is not the default development interface. | ||
| - The target manifest lives in `thoth/test_targets.py`. | ||
| - Use the helper below to translate changed paths into recommended pytest targets and atomic selftest cases: | ||
|
|
||
| ```bash | ||
| python scripts/recommend_tests.py thoth/observe/selftest/runner.py tests/conftest.py | ||
| ``` | ||
|
|
||
| ## Command Matrix | ||
|
|
||
| | Command | Host Surface | Purpose | Input | Result | | ||
| | --- | --- | --- | --- | --- | | ||
| | `init` | `Claude: /thoth:init`<br>`Codex: $thoth init` | Audit, initialize, migrate, or resync canonical Thoth authority. | `--sync`, `--migrate preview`, `--migrate apply`, `--migrate --preview`, `--migrate --apply`, or optional config payload | `.thoth` authority, migration ledger, generated projections, dashboard scaffolding, scripts, and tests | | ||
| | `discuss` | `Claude: /thoth:discuss`<br>`Codex: $thoth discuss` | Record planning decisions without entering code execution. | Topic, decision payload, or work payload | Updated discussion, decision, or work_item objects plus generated docs view | | ||
| | `run` | `Claude: /thoth:run`<br>`Codex: $thoth run` | Execute one ready work item through a durable runtime packet. | `--work-id`, optional host or executor controls, optional attach/watch/stop | Durable run ledger with state, events, phase results, artifacts, and terminal result | | ||
| | `loop` | `Claude: /thoth:loop`<br>`Codex: $thoth loop` | Iterate on one ready work item through a controller service. | `--work-id`, optional resume or sleep controls | Controller object, child run lineage, and bounded iteration history | | ||
| | `review` | `Claude: /thoth:review`<br>`Codex: $thoth review` | Produce structured findings without modifying source code. | Review target, optional `--work-id`, optional executor controls | Structured review result recorded through the shared protocol | | ||
| | `auto` | `Claude: /thoth:auto`<br>`Codex: $thoth auto` | Run the priority queue while the user is away. | Optional `--sleep`, `--rounds`, `--scope`, or explicit `--work-id` | Auto controller, child loop lineage, monitor events, and terminal or paused summary | | ||
| | `status` | `Claude: /thoth:status`<br>`Codex: $thoth status` | Show project health, active durable runs, doctor, report, or dashboard views. | Optional `--json`, `--doctor`, `--report`, or `--dashboard` | Shared status snapshot and read-only derived views | | ||
| | `doctor` | `Claude: /thoth:doctor`<br>`Codex: $thoth doctor` | Alias for `status --doctor`; strictly audit health and runtime shape. | Optional `--quick` or `--json` | Health report with validation findings | | ||
| | `dashboard` | `Claude: /thoth:dashboard`<br>`Codex: $thoth dashboard` | Alias for `status --dashboard`; manage the local dashboard runtime. | Optional action: `start`, `stop`, or `rebuild` | Local dashboard process and read endpoints backed by `.thoth` ledgers | | ||
|
|
||
| ## Why Trust It | ||
|
|
||
| | Signal | What you can inspect | | ||
| | --- | --- | | ||
| | Durable runtime truth | `.thoth/runs/*` keeps run, state, events, artifacts, and result payloads. | | ||
| | Locked planning authority | ``.thoth/objects/discussion/`, `.thoth/objects/decision/`, and `.thoth/objects/work_item/` define what execution is allowed to do. | | ||
| | Script-backed verification | Validators, doctor checks, and selftests decide pass or fail mechanically. | | ||
| | Shared read model | `status`, `report`, and `dashboard` all read from the same authority instead of chat memory. | | ||
|
|
||
| ## Who It Is For | ||
|
|
||
| | Good fit | Why | | ||
| | --- | --- | | ||
| | Research and experimentation repos | They need durable memory, replayable results, and visible long-running work. | | ||
| | Engineering teams using AI for real changes | They need code execution, review, and acceptance to stay auditable. | | ||
| | Teams that want Claude Code and Codex parity | They need one host-neutral command model rather than two drifting workflows. | | ||
|
|
||
| ## Current Limitations | ||
|
|
||
| | Current boundary | Implication | | ||
| | --- | --- | | ||
| | `run` and `loop` are strict `--work-id` surfaces | Free-form execution is intentionally rejected. | | ||
| | Host parity is semantic, not identical UX | Claude and Codex still need their own install and local runtime wiring. | | ||
| | Dashboard is a local service, not a hosted control plane | Operators need a machine that can run the backend and frontend assets. | | ||
| | The hero logo currently ships as a raster PNG | A clean SVG and icon-family refinement is still useful for smaller surfaces and plugin packaging. | | ||
|
|
||
| --- | ||
|
|
||
| ## Contributors | ||
|
|
||
| Built in public by contributors who want AI work to remain inspectable. | ||
|
|
||
| [](https://github.com/SeeleAI/Thoth/graphs/contributors) | ||
|
|
||
| Contribution path: [open a pull request](https://github.com/SeeleAI/Thoth/pulls) or [start a discussion](https://github.com/SeeleAI/Thoth/discussions). | ||
|
|
||
| ## License | ||
|
|
||
| MIT. See [LICENSE](LICENSE). |
There was a problem hiding this comment.
The Execution String in the command micro-prompts (located in plugins/thoth/skills/thoth/commands/) relies on finding a runtime at $candidate/bin/thoth or $candidate/scripts/thoth-cli-entry.py. However, these files are not included in the mirrored bundle added in this PR. Without these files, the plugin will fail to execute unless the thoth command is already available on the user's system PATH. Please ensure the necessary runtime files are included in the plugins/SeeleAI/Thoth/ directory to ensure the plugin is functional upon installation from this repository.
| "Initialize Thoth in this repository and render project layers.", | ||
| "Start a durable Thoth run for the current task." | ||
| ], | ||
| "composerIcon": "./assets/thoth-icon.svg", |
There was a problem hiding this comment.
|
|
||
| ## Execution String | ||
|
|
||
| Operate only on this repo. Use the installed skill named thoth. The Codex public surface is `$thoth auto`, but in the workspace shell you must execute it literally as `bash -lc 'set -euo pipefail; if [ -n "${THOTH_SELFTEST_RUNTIME_ROOT:-}" ]; then if [ -x "$THOTH_SELFTEST_RUNTIME_ROOT/bin/thoth" ]; then exec "$THOTH_SELFTEST_RUNTIME_ROOT/bin/thoth" "$@"; fi; if [ -f "$THOTH_SELFTEST_RUNTIME_ROOT/scripts/thoth-cli-entry.py" ]; then exec python3 "$THOTH_SELFTEST_RUNTIME_ROOT/scripts/thoth-cli-entry.py" "$@"; fi; fi; if command -v thoth >/dev/null 2>&1; then exec thoth "$@"; fi; candidates="$(ls -td "$HOME"/.codex/plugins/cache/thoth/thoth/* 2>/dev/null || true)"; marketplace="$HOME/.codex/.tmp/marketplaces/thoth"; if [ -d "$marketplace" ]; then candidates="$candidates |
There was a problem hiding this comment.
The path "$HOME"/.codex/plugins/cache/thoth/thoth/* in the Execution String contains an extra thoth/ segment. Standard Codex plugin cache structure is ~/.codex/plugins/cache/<plugin-name>/<version>/. It should likely be "$HOME"/.codex/plugins/cache/thoth/* to correctly locate the versioned installation directories. This issue is present in all command files in this directory (auto.md, dashboard.md, discuss.md, doctor.md, init.md, loop.md, review.md, run.md, status.md).
| - [Tartiner Labs](https://github.com/tartinerlabs/skills) - Agent skills for git workflows, GitHub automation, security audits, code refactoring, and project tooling. | ||
| - [Team Skills Platform](https://github.com/Colin4k1024/tsp) - Role-based team delivery framework — Tech Lead-orchestrated 8-role system with 195+ skills, 27 specialist agents, 80+ commands, hooks, and ECC harness for Claude Code, Codex, and OpenCode. | ||
| - [Test Gap](./plugins/mturac/test-gap) - Find lines in your diff lacking test coverage (Cobertura, lcov, coverage.json). | ||
| - [Thoth](https://github.com/SeeleAI/Thoth) - Dashboard-first Claude Code and Codex runtime for autoresearch, turning drifting agent work into durable runs, locked work items, visible ledgers, and reviewable verdicts. |
There was a problem hiding this comment.
[SUGGESTION]: Add a note near the new Thoth table row about a backtick issue in the Thoth plugin README
The Thoth entry is placed between "Team Skills Platform" and "TODO Harvest" — the alphabetical position is correct.
The plugins/SeeleAI/Thoth/README.md file being introduced in this same PR contains a broken Markdown code-span in its "Why Trust It" table at line 252:
| Locked planning authority | ``.thoth/objects/discussion/`, `.thoth/objects/decision/`, and `.thoth/objects/work_item/` define what execution is allowed to do. |The opening is a mis-typed code fence instead of a single backtick for each of the three path references. This is flagged in a separate inline comment on plugins/SeeleAI/Thoth/README.md line 252.
| | Signal | What you can inspect | | ||
| | --- | --- | | ||
| | Durable runtime truth | `.thoth/runs/*` keeps run, state, events, artifacts, and result payloads. | | ||
| | Locked planning authority | ``.thoth/objects/discussion/`, `.thoth/objects/decision/`, and `.thoth/objects/work_item/` define what execution is allowed to do. | |
There was a problem hiding this comment.
[WARNING]: Broken Markdown inline-code backticks — three paths render as bold text instead of inline code
The "Why Trust It" table row contains a malformed code-span sequence:
| Locked planning authority | ``.thoth/objects/discussion/`, `.thoth/objects/decision/`, and `.thoth/objects/work_item/` define what execution is allowed to do. |The sequence opens with two backticks (a code fence rather than an inline code span), then opens single-backtick pairs around .thoth/objects/discussion/ and .thoth/objects/decision/ individually — producing entirely wrong Markdown output. The fix is to wrap each path in a matching single-backtick pair:
| | Locked planning authority | ``.thoth/objects/discussion/`, `.thoth/objects/decision/`, and `.thoth/objects/work_item/` define what execution is allowed to do. | | |
| | Locked planning authority | `.thoth/objects/discussion/`, `.thoth/objects/decision/`, and `.thoth/objects/work_item/` define what execution is allowed to do. | |
| ## Hard Stops | ||
|
|
||
| - Do not invent or compile a new work item when --work-id is missing. | ||
| - Do not stop after starting the runtime; monitor RuntimeDriver events until terminal. |
There was a problem hiding this comment.
[WARNING]: Hard Stop text is an instruction note, not a prohibition — does not prevent the actual failure mode
The objective is "Finish the current strict task through the four-phase RuntimeDriver." A key premise in the runtime body is that the RuntimeDriver must not be allowed to drift beyond terminal before the agent reports.
The prohibition on line 18 reads: "Do not stop after starting the runtime; monitor RuntimeDriver events until terminal."
This instruction note only describes what to do — it lacks an absolute blocking clause. An agent can satisfy the literal reading by logging the intent to monitor before deferring output or handing off the session. The prohibition should instead read:
- "Do not exit the monitoring session before the RuntimeDriver signals a terminal state."
Without a blocking clause that targets the exact failure mode, the instruction is not self-enforcing.
| ## Hard Stops | ||
|
|
||
| - Do not narrate the whole UI. | ||
| - Do not restate healthy panels. |
There was a problem hiding this comment.
[WARNING]: Hard Stop is a negative instruction, not an absolute prohibition
The dashboard objective is "Report endpoint, failure point, and one notable runtime delta only." A clean result from this command requires that the failure point be explicitly listed even if empty.
Line 18 reads: "Do not restate healthy panels." This only restricts one undesirable output pattern — omission of the failure point and fabrication of a delta when none exists are not explicitly forbidden. The prohibition should read:
- "Do not omit the failure point even when the result is clean; report the absence of failure as the finding."
| ## Hard Stops | ||
|
|
||
| - Do not decide extra iterations outside the recorded loop budget. | ||
| - Do not skip validator output when judging success. |
There was a problem hiding this comment.
[WARNING]: Hard Stop conflicts with the objective in the same way as run.md — imprecise prohibition
The loop objective is "Advance the current bounded loop through foreground or sleeping RuntimeDriver monitoring." The governing principle is that each child run must produce a terminal result before the loop advances.
Line 18 reads: "Do not skip validator output when judging success." As a prohibition, this only constrains one of several failure modes: the validator can be called but its verdict ignored, the output observed but the loop still advanced, or the verdict used incorrectly without any explicit guard against those cases. The Hard Stops section for each of the phase-controller command surfaces (run.md, loop.md, auto.md) should be reviewed in unison to use the same absolute-negative format: "Do not proceed to the next loop iteration before the validator signals terminal."
Code Review SummaryStatus: 7 Issues Found | Recommendation: Address before merge Overview
Issue Details (click to expand)WARNING
SUGGESTION
Other Observations (not in diff)
Files Reviewed (files reviewed across baseline and incremental diff)
Reviewed by nemotron-3-super-120b-a12b-20230311:free · 61,131 tokens |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ff1090817e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| Operate only on this repo. Use the installed skill named thoth. The Codex public surface is `$thoth init`, but in the workspace shell you must execute it literally as `bash -lc 'set -euo pipefail; if [ -n "${THOTH_SELFTEST_RUNTIME_ROOT:-}" ]; then if [ -x "$THOTH_SELFTEST_RUNTIME_ROOT/bin/thoth" ]; then exec "$THOTH_SELFTEST_RUNTIME_ROOT/bin/thoth" "$@"; fi; if [ -f "$THOTH_SELFTEST_RUNTIME_ROOT/scripts/thoth-cli-entry.py" ]; then exec python3 "$THOTH_SELFTEST_RUNTIME_ROOT/scripts/thoth-cli-entry.py" "$@"; fi; fi; if command -v thoth >/dev/null 2>&1; then exec thoth "$@"; fi; candidates="$(ls -td "$HOME"/.codex/plugins/cache/thoth/thoth/* 2>/dev/null || true)"; marketplace="$HOME/.codex/.tmp/marketplaces/thoth"; if [ -d "$marketplace" ]; then candidates="$candidates | ||
| $marketplace"; fi; for candidate in $candidates; do if [ -x "$candidate/bin/thoth" ]; then exec "$candidate/bin/thoth" "$@"; fi; if [ -f "$candidate/scripts/thoth-cli-entry.py" ]; then if command -v python3 >/dev/null 2>&1; then exec python3 "$candidate/scripts/thoth-cli-entry.py" "$@"; else exec python "$candidate/scripts/thoth-cli-entry.py" "$@"; fi; fi; done; echo '"'"'thoth installed runtime not found'"'"' >&2; exit 127' thoth init`. Execute that shell command immediately as your first meaningful action. Do not explain the command before executing it. Do not replace execution with prose. If neither PATH nor the installed Codex plugin cache or marketplace root contains the runtime entrypoint, report host install drift instead of inventing another entrypoint. route_class=mechanical_fast. intelligence_tier=none. packet_authority_mode=result_envelope. Objective: Report audit-first adopt/init outcome, generated artifacts, blockers, and user decisions required before continuing. Hard stop: Do not assume the repo is blank. Hard stop: Do not assume goals, project identity, migration intent, work priority, unblock policy, or acceptance criteria. Hard stop: Do not launch broad Explore, Task, plugin-cache/source scans, or background investigation after the init result. Hard stop: If extra evidence is required, inspect only the smallest artifact explicitly named by the init payload. Hard stop: If the preview/apply result leaves blocked work or unresolved migration choices, ask with AskUserQuestion and stop. Hard stop: Do not narrate the full migration procedure. Reply with `THOTH_DONE` only after the command path reaches its terminal outcome. |
There was a problem hiding this comment.
Include the runtime files the commands execute
When Thoth is installed from this new local marketplace bundle, every $thoth command will fall through to thoth installed runtime not found unless the user already has a separate thoth binary on PATH: the mirrored directory contains only the manifest, skills, README/license, and icon, but this execution string looks for bin/thoth, scripts/thoth-cli-entry.py, and the Python package under the plugin cache/marketplace root. I checked plugins/SeeleAI/Thoth for those entrypoints and none are present, so a fresh install from .agents/plugins/marketplace.json exposes commands that cannot run.
Useful? React with 👍 / 👎.
|
Thanks for the automated reviews. I pushed
I intentionally kept Validation run after the update:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f0de97a4cc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if getattr(args, "migrate", False) and not getattr(args, "apply", False): | ||
| result = preview_project_migration(config, project_root) | ||
| print_envelope(command="init", status="ok", summary=f"Migration preview written for {project_root}", body={"result": result}, refs=output_refs(project_root / ".thoth" / "migrations" / result["migration_id"] / "preview.json")) | ||
| return 0 | ||
| result = initialize_project(config, project_root) |
There was a problem hiding this comment.
Honor init --preview before applying
When a user runs thoth init --preview, the parser sets args.preview=True but args.migrate remains false, so this branch is skipped and the command falls through to initialize_project(). In that dry-run scenario Thoth still writes .thoth, root AGENTS.md/CLAUDE.md, scripts, tests, and dashboard files, which is exactly the mutation the preview flag is meant to avoid; route --preview to preview_project_migration() unless --apply is explicitly selected.
Useful? React with 👍 / 👎.
| PYTHON_BIN="python" | ||
| fi | ||
|
|
||
| exec "${PYTHON_BIN}" "${PLUGIN_ROOT}/scripts/thoth-cli-entry.py" "$@" |
There was a problem hiding this comment.
Run the CLI with its Python dependencies
For a marketplace/plugin-cache install this wrapper executes scripts/thoth-cli-entry.py directly after only adding the plugin root to PYTHONPATH, so dependencies declared in pyproject.toml are never installed or isolated. In a clean environment without PyYAML, even bin/thoth status crashes during import (thoth.observe.read_model imports yaml) before any command handler runs; use an installed entry point/venv/uv path or otherwise bundle/fallback for required runtime deps.
Useful? React with 👍 / 👎.
| ## Execution String | ||
|
|
||
| Operate only on this repo. Use the installed skill named thoth. The Codex public surface is `$thoth run`, but in the workspace shell you must execute it literally as `bash -lc 'set -euo pipefail; if [ -n "${THOTH_SELFTEST_RUNTIME_ROOT:-}" ]; then if [ -x "$THOTH_SELFTEST_RUNTIME_ROOT/bin/thoth" ]; then exec "$THOTH_SELFTEST_RUNTIME_ROOT/bin/thoth" "$@"; fi; if [ -f "$THOTH_SELFTEST_RUNTIME_ROOT/scripts/thoth-cli-entry.py" ]; then exec python3 "$THOTH_SELFTEST_RUNTIME_ROOT/scripts/thoth-cli-entry.py" "$@"; fi; fi; if command -v thoth >/dev/null 2>&1; then exec thoth "$@"; fi; candidates="$(ls -td "$HOME"/.codex/plugins/cache/thoth/thoth/* "$HOME"/.codex/plugins/cache/thoth/* "$HOME"/.codex/plugins/cache/*/thoth/* "$HOME"/.codex/plugins/cache/*/Thoth/* 2>/dev/null || true)"; marketplace="$HOME/.codex/.tmp/marketplaces/thoth"; if [ -d "$marketplace" ]; then candidates="$candidates | ||
| $marketplace"; fi; for candidate in $candidates; do if [ -x "$candidate/bin/thoth" ]; then exec "$candidate/bin/thoth" "$@"; fi; if [ -f "$candidate/scripts/thoth-cli-entry.py" ]; then if command -v python3 >/dev/null 2>&1; then exec python3 "$candidate/scripts/thoth-cli-entry.py" "$@"; else exec python "$candidate/scripts/thoth-cli-entry.py" "$@"; fi; fi; done; echo '"'"'thoth installed runtime not found'"'"' >&2; exit 127' thoth run`. Execute that shell command immediately as your first meaningful action. Do not explain the command before executing it. Do not replace execution with prose. If neither PATH nor the installed Codex plugin cache or marketplace root contains the runtime entrypoint, report host install drift instead of inventing another entrypoint. route_class=live_intelligent. intelligence_tier=high. packet_authority_mode=phase_controller. Objective: Finish the current strict task through the four-phase RuntimeDriver. Hard stop: Do not invent or compile a new work item when --work-id is missing. Hard stop: Do not exit the monitoring session before the RuntimeDriver signals a terminal state. Hard stop: Do not hand-edit .thoth ledgers. If the command streams runtime events, report progress and risks from those events only. Stay in the same session until the RuntimeDriver reaches terminal state unless --sleep was requested. Runtime lifecycle is plan -> execute -> validate -> reflect; auto advances selected work through child loops. Do not hand-edit `.thoth`; let the Thoth runtime driver advance phases. Plan must prove coverage of strict_task.authority_context before execute; needs_input routes back to discuss. Reply with `THOTH_DONE` only after the command path reaches its terminal outcome. |
There was a problem hiding this comment.
[WARNING]: Hard Stop inconsistency — Hard Stops \u00a7 now bans exiting before terminal state, but the Execution String still contains the old clause: "Hard stop: Do not stop after starting the runtime; monitor RuntimeDriver events until terminal." The Execution String is re-read before every action, so an agent following the embedded instruction can comply with the stale text without reaching terminal before reporting. Update the Execution String embedded clause to match the Hard Stops section: "Hard stop: Do not exit the monitoring session before the RuntimeDriver signals a terminal state."
|
Follow-up pushed in b4d802a. Fixed the valid second-round runtime findings: |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b4d802a60b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "queue": [], | ||
| } | ||
| ) | ||
| controller = _update_controller(project_root, controller, status="idle", payload_updates=payload) |
There was a problem hiding this comment.
Use a valid controller status for idle auto queues
When thoth auto starts with no actionable work, or empties its queue before the default 8-hour minimum runtime, this branch writes controller status idle. Store.update validates controller objects against CONTROLLER_STATUSES in thoth/objects.py, which does not include idle, so the auto worker raises a schema error instead of idling and emitting heartbeats. Use an allowed status for this state or add idle to the controller schema.
Useful? React with 👍 / 👎.
| value = runtime.get("python_bin") | ||
| if isinstance(value, str) and value.strip(): | ||
| return value.strip() | ||
| return "python" |
There was a problem hiding this comment.
Launch the dashboard with the bootstrapped interpreter
When the plugin wrapper bootstraps dependencies into its runtime venv because global Python lacks fastapi/uvicorn, thoth dashboard start still launches the server with bare python. That child process is outside the venv used by the CLI, so the dashboard can fail with a missing uvicorn module (or no python executable) even though the wrapper installed the required packages. Default this to the current interpreter unless an explicit project override is configured.
Useful? React with 👍 / 👎.
Summary
Adds Thoth to the Development & Workflow section and mirrors its Codex plugin bundle under plugins/SeeleAI/Thoth.
Thoth is a dashboard-first Claude Code and Codex runtime for autoresearch, turning drifting agent work into durable runs, locked work items, visible ledgers, and reviewable verdicts.
Related upstream request: SeeleAI/Thoth#6
Validation
Notes