You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
audit_interval=3, # Run mid-refinement audit every N iterations (default: 3, 0 to disable)
87
88
)
88
89
chef = RuleChef(task, client, coordinator=coordinator)
89
90
chef.learn_rules()
@@ -98,6 +99,27 @@ A safety net re-evaluates after pruning and **reverts all changes** if F1 drops
98
99
99
100
In the CLI: `learn --agentic --prune`.
100
101
102
+
### Critic Agent
103
+
104
+
Enable `enable_critic` to run an LLM critic before refinement. The critic acts like a human domain expert — it reviews the entire ruleset with per-rule metrics, false positive/negative examples, and per-class performance, then provides actionable feedback:
105
+
106
+
```python
107
+
coordinator = AgenticCoordinator(
108
+
client,
109
+
model="gpt-4o-mini",
110
+
prune_after_learn=True,
111
+
enable_critic=True,
112
+
critic_interval=4, # Run critic every N iterations (default: 4, 0 to disable)
113
+
)
114
+
```
115
+
116
+
The critic runs periodically during refinement (controlled by `critic_interval`) and writes feedback using the same mechanism as `add_feedback()`:
117
+
118
+
-**Rule-level feedback**: Specific advice per rule (e.g., "Narrow `\d+` by adding context for dollar amounts")
119
+
-**Task-level feedback**: Strategic guidance about class disambiguation and priority ordering
120
+
121
+
This feedback is automatically picked up by the patch prompt in subsequent refinement iterations. Critic feedback is tagged with `source="critic"` and refreshed each learning cycle.
122
+
101
123
## Custom Coordinators
102
124
103
125
Implement the `CoordinatorProtocol`:
@@ -133,6 +155,10 @@ class MyCoordinator(CoordinatorProtocol):
133
155
defaudit_rules(self, rules, rule_metrics):
134
156
"""Return AuditResult with merge/remove actions. Default: no-op."""
Copy file name to clipboardExpand all lines: docs/guide/learning.md
+5Lines changed: 5 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -130,9 +130,14 @@ Incremental patching:
130
130
131
131
- Generates targeted rules for known failures
132
132
- Merges new rules into the existing ruleset
133
+
-**Deletes** underperforming rules when better replacements are provided
133
134
- Prunes weak rules that don't contribute
134
135
- Preserves stable rules that are working
135
136
137
+
During patching, the LLM can list rules in a `"deleted_rules"` field to remove them from the ruleset. This is used when a rule is too broad (high false positives) and the LLM provides narrower replacement rules in the same response.
138
+
139
+
A patch is accepted if micro F1 stays within 0.5%, or if precision improves (higher precision at the cost of some recall is considered a net quality win). Otherwise the patch is rejected and the previous rules are kept.
140
+
136
141
## Persistence
137
142
138
143
Rules and datasets are automatically saved to disk:
prompt=f"""You are an expert Rule Critic acting as a human domain expert. You are reviewing a rule-based {task.type.value} system and providing actionable feedback.
714
+
715
+
{task_section}
716
+
{perf_section}
717
+
{class_section}
718
+
{rules_section}
719
+
{fp_section}
720
+
{fn_section}
721
+
ANALYZE HOLISTICALLY:
722
+
1. Which rules cause the most harm and WHY? Show your reasoning.
723
+
2. Are there inter-class conflicts? (same text matched by rules for different types)
4. What patterns are MISSING for classes with low recall?
726
+
5. What would a human regex expert change about these patterns?
727
+
728
+
PROVIDE FEEDBACK:
729
+
- rule_feedback: For EACH problematic rule, provide SPECIFIC, ACTIONABLE advice.
730
+
Bad: "This rule is too broad" (vague)
731
+
Good: "Narrow \\d+ by adding word-boundary context: use (\\d+)\\s*(?:million|billion) for large numbers, and let MONEY/PERCENT rules handle $-prefixed and %-suffixed numbers by giving them higher priority"
732
+
- task_guidance: Strategic advice about the ENTIRE ruleset — class disambiguation strategy, priority ordering, what kinds of rules are missing.
733
+
734
+
Return JSON:
735
+
{{
736
+
"analysis": "1-2 sentence summary of the main issues",
737
+
"rule_feedback": {{
738
+
"rule_id": "Specific actionable advice for this rule..."
739
+
}},
740
+
"task_guidance": "Strategic guidance about the full ruleset..."
0 commit comments