You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix(evolve-lite): tighten learn skill to only extract high-signal guidelines (#122)
* fix(evolve-lite): tighten learn skill to only extract high-signal guidelines
Narrow guideline extraction to three categories: shortcuts (reducing
wasted steps), error prevention, and user corrections. Add explicit
exclusion list, quality gate checklist, and enforce that zero entities
is a valid output when no corrections occurred.
Also restrict entity_io.py to an allowlist of types (guideline,
preference) so the LLM can no longer invent types like "observation"
that store codebase facts instead of reasoning chain corrections.
* fix(bob): clarify entity count rule in learn skill
Addresses CodeRabbit review finding: Clarify entity count rule to avoid ambiguity
* fix(evolve-lite): guard against non-string entity type values
Add isinstance check before the allowlist membership test so
non-string or unhashable types fall back to "guideline" instead
of raising.
Addresses CodeRabbit review finding: Handle non-string type values before allowlist check
description: Analyze the current conversation to extract actionable entities — proactive recommendations derived from errors, failures, and successful patterns.
3
+
description: Analyze the current conversation to extract guidelines that correct reasoning chains — reducing wasted steps, preventing errors, and capturing user preferences.
4
4
---
5
5
6
6
# Entity Generator
7
7
8
8
## Overview
9
9
10
-
This skill analyzes the current conversation to extract actionable entities that would help on similar tasks in the future. It **prioritizes errors** — tool failures, exceptions, wrong approaches, retry loops — and transforms them into proactive recommendations that prevent those errors from recurring.
10
+
This skill analyzes the current conversation to extract guidelines that **correct the agent's reasoning chain**. A good guideline is one that, if known beforehand, would have led to a shorter or more correct execution. Only extract guidelines that fall into one of these three categories:
11
11
12
-
## Workflow
13
-
14
-
### Step 1: Analyze the Conversation
12
+
1.**Shortcuts** — The agent took unnecessary steps or tried an approach that didn't work before finding the right one. The guideline encodes the direct path so future runs skip the detour.
13
+
2.**Error prevention** — The agent hit an error (tool failure, exception, wrong output) that could be avoided with upfront knowledge. The guideline prevents the error from happening at all.
14
+
3.**User corrections** — The user explicitly corrected, redirected, or stated a preference during the conversation. The guideline captures what the user said so the agent gets it right next time without being told.
15
15
16
-
Identify from the current conversation:
16
+
**Do NOT extract guidelines that are:**
17
+
- General best practices the agent already knows (e.g., "use descriptive variable names")
18
+
- Observations about the codebase that can be derived by reading the code
19
+
- Restatements of what the agent did successfully without any detour or correction
20
+
- Vague advice that wouldn't change the agent's behavior on a concrete task
17
21
18
-
-**Task/Request**: What was the user asking for?
19
-
-**What Worked**: Which approaches succeeded?
20
-
-**What Failed**: Which approaches didn't work and why?
6.**Silent failures**: Actions that appeared to succeed but produced wrong results
28
+
-**Wasted steps**: Where did the agent go down a path that turned out to be unnecessary? What would have been the direct route?
29
+
-**Errors hit**: What errors occurred? What knowledge would have prevented them?
30
+
-**User corrections**: Where did the user say "no", "not that", "actually", "I want", or otherwise redirect the agent?
33
31
34
-
If no errors are found, extract entities from successful patterns instead.
32
+
If none of these occurred, **output zero entities**. Not every conversation produces guidelines.
35
33
36
-
### Step 3: Extract Entities
34
+
### Step 2: Extract Entities
37
35
38
-
Extract 3-5 proactive entities. **Prioritize entities derived from errors.**
36
+
For each identified shortcut, error, or user correction, create one entity — up to 5 entities; output 0 when none qualify. If more candidates exist, keep only the highest-impact ones.
39
37
40
38
Principles:
41
39
42
-
1.**Reframe failures as proactive recommendations** — recommend what worked, not what to avoid
43
-
- Bad: "If exiftool fails, use PIL instead"
40
+
1.**State what to do, not what to avoid** — frame as proactive recommendations
41
+
- Bad: "Don't use exiftool in sandboxes"
44
42
- Good: "In sandboxed environments, use Python libraries (PIL/Pillow) for image metadata extraction"
45
43
46
44
2.**Triggers should be situational context, not failure conditions**
47
45
- Bad trigger: "When apt-get fails"
48
46
- Good trigger: "When working in containerized/sandboxed environments"
49
47
50
-
3.**For retry loops, recommend the final working approach directly** — eliminate trial-and-error by encoding the answer
48
+
3.**For shortcuts, recommend the final working approach directly** — eliminate trial-and-error by encoding the answer
49
+
50
+
4.**For user corrections, use the user's own words** — preserve the specific preference rather than generalizing it
51
51
52
-
### Step 4: Save Entities
52
+
### Step 3: Save Entities
53
53
54
-
Output entities as JSON and pipe to the save script:
54
+
Output entities as JSON and pipe to the save script. The `type` field must always be `"guideline"` — no other types are accepted.
55
55
56
56
```bash
57
57
echo'{
@@ -72,12 +72,12 @@ The script will:
72
72
- Deduplicate against existing entities
73
73
- Display confirmation with the total count
74
74
75
-
## Best Practices
75
+
## Quality Gate
76
+
77
+
Before saving, review each entity against this checklist:
78
+
79
+
-[ ] Does it fall into one of the three categories (shortcut, error prevention, user correction)?
80
+
-[ ] Would knowing this guideline beforehand have changed the agent's behavior in a concrete way?
81
+
-[ ] Is it specific enough that another agent could act on it without further context?
76
82
77
-
1.**Prioritize error-derived entities**: Errors are the highest-signal source of learnings
78
-
2.**One error, one entity**: Each distinct error should produce one prevention entity
79
-
3.**Be specific and actionable**: State what to do, not what to avoid
80
-
4.**Include rationale**: Explain why the approach works
81
-
5.**Use situational triggers**: Context-based, not failure-based
82
-
6.**Limit to 3-5 entities**: Focus on the most impactful learnings
83
-
7.**When more than 5 errors exist**: Merge errors with the same root cause, rank by severity > frequency > user impact, then keep the top 3-5
83
+
If any answer is no, drop the entity. **Zero entities is a valid output.**
0 commit comments