Skip to content

Commit d4322e3

Browse files
committed
Add eval fixture pack, rule-level metrics, and PR rule-hit analytics
1 parent b85b4c2 commit d4322e3

File tree

6 files changed

+515
-18
lines changed

6 files changed

+515
-18
lines changed

README.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -133,13 +133,17 @@ expect:
133133
contains: missing auth check
134134
severity: error
135135
category: security
136-
rule_id: sec.auth.guard
136+
rule_id: sec.auth.guard # label for per-rule precision/recall
137+
require_rule_id: false # set true to require model-emitted RULE id
137138
must_not_find:
138139
- contains: style
139140
min_total: 1
140141
max_total: 8
141142
```
142143
144+
`diffscope eval` now reports per-rule precision/recall/F1 (micro and macro), and includes top rule-level TP/FP/FN counts in CLI and JSON output.
145+
Starter fixtures live in `eval/fixtures/repo_regressions`.
146+
143147
### Smart Review (Enhanced Analysis)
144148
```bash
145149
# Get professional-grade analysis with confidence scoring

eval/fixtures/README.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# Eval Fixtures
2+
3+
Starter fixture set for `diffscope eval`.
4+
5+
- `repo_regressions/` contains regression-style diffs based on realistic mistakes in this codebase.
6+
- Each fixture can include `rule_id` as a label for rule-level precision/recall metrics.
7+
- Set `require_rule_id: true` on a pattern if the rule id must be emitted by the model for a match.
8+
9+
Run:
10+
11+
```bash
12+
diffscope eval --fixtures eval/fixtures --output eval-report.json
13+
```
14+
15+
Notes:
16+
- Fixtures call the configured model and API provider; they are not deterministic unit tests.
17+
- Treat this set as a baseline and tighten `must_find`/`must_not_find` thresholds over time.
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
name: repo regression - missing RawComment rule_id initializer
2+
repo_path: ../../..
3+
diff_file: ./raw_comment_missing_field.patch
4+
expect:
5+
must_find:
6+
- file: src/main.rs
7+
contains: rule_id
8+
rule_id: compile.rawcomment.rule_id
9+
min_total: 1
10+
max_total: 8
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
name: repo regression - shell injection in debug helper
2+
repo_path: ../../..
3+
diff_file: ./shell_injection.patch
4+
expect:
5+
must_find:
6+
- file: src/main.rs
7+
contains: injection
8+
rule_id: sec.shell.injection
9+
must_not_find:
10+
- contains: style
11+
min_total: 1
12+
max_total: 12
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
name: repo regression - unwrap panic in confidence parser
2+
repo_path: ../../..
3+
diff_file: ./smart_confidence_unwrap.patch
4+
expect:
5+
must_find:
6+
- file: src/main.rs
7+
contains: unwrap
8+
rule_id: reliability.unwrap_panic
9+
must_not_find:
10+
- contains: style
11+
min_total: 1
12+
max_total: 10

0 commit comments

Comments
 (0)