|
| 1 | +--- |
| 2 | +name: self-critique |
| 3 | +description: Audit a completed task against the user's original request before declaring done. Catches omitted constraints, misread scope, partially-met requirements, and unsupported "it's done" claims after long multi-tool loops. Opt-in quality gate — run it, report the verdict, never silently re-loop. |
| 4 | +version: 1.0.0 |
| 5 | +author: Nous Research (proposed by @dimokru, issue #372) |
| 6 | +license: Apache-2.0 |
| 7 | +platforms: [linux, macos, windows] |
| 8 | +metadata: |
| 9 | + hermes: |
| 10 | + tags: [quality, self-critique, reflection, verification, audit, review, post-task] |
| 11 | + requires_toolsets: [terminal] |
| 12 | +--- |
| 13 | + |
| 14 | +# Self-Critique |
| 15 | + |
| 16 | +Audit a finished task against what was **originally asked**, not against how |
| 17 | +good the final answer looks. Fluent, well-formatted responses routinely omit a |
| 18 | +requested constraint, misread scope, or stop one step short — especially after |
| 19 | +long multi-tool loops where the final message drifts from the initial ask. |
| 20 | + |
| 21 | +## When to use |
| 22 | + |
| 23 | +- Right before telling the user a non-trivial task is complete. |
| 24 | +- After a long multi-tool run, a hand-off, or a plan with several steps. |
| 25 | +- When invoked explicitly (CLI / cron / a `/self-critique` style request). |
| 26 | + |
| 27 | +Skip it for routine short answers — it adds noise without value there. |
| 28 | + |
| 29 | +## Output shape |
| 30 | + |
| 31 | +Always one JSON object: |
| 32 | + |
| 33 | +```json |
| 34 | +{ |
| 35 | + "verdict": "satisfied | partial | missing | unknown", |
| 36 | + "missing_items": ["concise unmet requirement", "..."], |
| 37 | + "suggested_follow_up": "one short actionable sentence, or empty" |
| 38 | +} |
| 39 | +``` |
| 40 | + |
| 41 | +- `satisfied` — every explicit requirement met (`missing_items` empty). |
| 42 | +- `partial` — some requirements met, some not. |
| 43 | +- `missing` — the core ask is unmet. |
| 44 | +- `unknown` — the audit could not run (no auditor available); never a guess. |
| 45 | + |
| 46 | +## How to run |
| 47 | + |
| 48 | +Feed the original request, the final response, and (optionally) the tool trace |
| 49 | +to the script. It uses Hermes' shared auxiliary client for a cheap audit: |
| 50 | + |
| 51 | +```bash |
| 52 | +echo '{"original_request": "<the user ask>", "final_response": "<your answer>", "tool_trace": "<optional>"}' \ |
| 53 | + | python optional-skills/quality/self-critique/scripts/self_critique.py |
| 54 | +``` |
| 55 | + |
| 56 | +Or from a file: |
| 57 | + |
| 58 | +```bash |
| 59 | +python optional-skills/quality/self-critique/scripts/self_critique.py --input audit.json |
| 60 | +``` |
| 61 | + |
| 62 | +You can also call it in-process and inject your own LLM function (used in |
| 63 | +tests and when embedding). Put the `scripts/` dir on `sys.path` first (or run |
| 64 | +from inside it): |
| 65 | + |
| 66 | +```python |
| 67 | +import sys; sys.path.insert(0, ".../optional-skills/quality/self-critique/scripts") |
| 68 | +from self_critique import critique |
| 69 | +result = critique(original_request, final_response, tool_trace_json) |
| 70 | +``` |
| 71 | + |
| 72 | +## Hard rules |
| 73 | + |
| 74 | +- **Report only.** This skill never edits conversation history and never |
| 75 | + re-enters the agent loop on its own. Surface the verdict; let the user |
| 76 | + decide whether to act. |
| 77 | +- **Opt-in.** It is not part of the default tool schema and must not run on |
| 78 | + every short turn. |
| 79 | +- **No false confidence.** If the auditor is unavailable, return `unknown` — |
| 80 | + do not fabricate a `satisfied`. |
| 81 | + |
| 82 | +## What it checks |
| 83 | + |
| 84 | +- Omitted or partially-satisfied explicit constraints. |
| 85 | +- Scope drift (answered a narrower/different question than asked). |
| 86 | +- Completion claims unsupported by the tool trace. |
0 commit comments