Skip to content

Commit 417a68c

Browse files
authored
Merge pull request #394 from Lexus2016/evolution/issue-372-self-critique-skill
feat(skills): self-critique quality optional-skill (#372)
2 parents a222b41 + ab33fb3 commit 417a68c

3 files changed

Lines changed: 622 additions & 0 deletions

File tree

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
---
2+
name: self-critique
3+
description: Audit a completed task against the user's original request before declaring done. Catches omitted constraints, misread scope, partially-met requirements, and unsupported "it's done" claims after long multi-tool loops. Opt-in quality gate — run it, report the verdict, never silently re-loop.
4+
version: 1.0.0
5+
author: Nous Research (proposed by @dimokru, issue #372)
6+
license: Apache-2.0
7+
platforms: [linux, macos, windows]
8+
metadata:
9+
hermes:
10+
tags: [quality, self-critique, reflection, verification, audit, review, post-task]
11+
requires_toolsets: [terminal]
12+
---
13+
14+
# Self-Critique
15+
16+
Audit a finished task against what was **originally asked**, not against how
17+
good the final answer looks. Fluent, well-formatted responses routinely omit a
18+
requested constraint, misread scope, or stop one step short — especially after
19+
long multi-tool loops where the final message drifts from the initial ask.
20+
21+
## When to use
22+
23+
- Right before telling the user a non-trivial task is complete.
24+
- After a long multi-tool run, a hand-off, or a plan with several steps.
25+
- When invoked explicitly (CLI / cron / a `/self-critique` style request).
26+
27+
Skip it for routine short answers — it adds noise without value there.
28+
29+
## Output shape
30+
31+
Always one JSON object:
32+
33+
```json
34+
{
35+
"verdict": "satisfied | partial | missing | unknown",
36+
"missing_items": ["concise unmet requirement", "..."],
37+
"suggested_follow_up": "one short actionable sentence, or empty"
38+
}
39+
```
40+
41+
- `satisfied` — every explicit requirement met (`missing_items` empty).
42+
- `partial` — some requirements met, some not.
43+
- `missing` — the core ask is unmet.
44+
- `unknown` — the audit could not run (no auditor available); never a guess.
45+
46+
## How to run
47+
48+
Feed the original request, the final response, and (optionally) the tool trace
49+
to the script. It uses Hermes' shared auxiliary client for a cheap audit:
50+
51+
```bash
52+
echo '{"original_request": "<the user ask>", "final_response": "<your answer>", "tool_trace": "<optional>"}' \
53+
| python optional-skills/quality/self-critique/scripts/self_critique.py
54+
```
55+
56+
Or from a file:
57+
58+
```bash
59+
python optional-skills/quality/self-critique/scripts/self_critique.py --input audit.json
60+
```
61+
62+
You can also call it in-process and inject your own LLM function (used in
63+
tests and when embedding). Put the `scripts/` dir on `sys.path` first (or run
64+
from inside it):
65+
66+
```python
67+
import sys; sys.path.insert(0, ".../optional-skills/quality/self-critique/scripts")
68+
from self_critique import critique
69+
result = critique(original_request, final_response, tool_trace_json)
70+
```
71+
72+
## Hard rules
73+
74+
- **Report only.** This skill never edits conversation history and never
75+
re-enters the agent loop on its own. Surface the verdict; let the user
76+
decide whether to act.
77+
- **Opt-in.** It is not part of the default tool schema and must not run on
78+
every short turn.
79+
- **No false confidence.** If the auditor is unavailable, return `unknown`
80+
do not fabricate a `satisfied`.
81+
82+
## What it checks
83+
84+
- Omitted or partially-satisfied explicit constraints.
85+
- Scope drift (answered a narrower/different question than asked).
86+
- Completion claims unsupported by the tool trace.

0 commit comments

Comments
 (0)