Skip to content

Commit d3bcd79

Browse files
committed
refactor: expand carving backlog and split pattern matching
Add a broader refactor roadmap and separate eval pattern matching predicates from the top-level matcher so the next carving passes can stay smaller and easier to verify. Made-with: Cursor
1 parent bccc282 commit d3bcd79

5 files changed

Lines changed: 408 additions & 228 deletions

File tree

TODO.md

Lines changed: 143 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,3 +42,146 @@
4242
- [x] Keep `ReviewSession` focused on per-review state in `session.rs`.
4343
- [x] Update imports in `pipeline.rs`, `prepare.rs`, and `postprocess.rs`.
4444
- [x] Validate, commit, and push.
45+
46+
## Wave 3+ — ongoing carving backlog
47+
48+
### Working rules
49+
50+
- [x] Keep refactors behavior-preserving.
51+
- [x] Validate every checkpoint with `cargo fmt --check`, `cargo clippy --all-targets -- -D warnings`, `cargo test`, and `bash scripts/check-workflows.sh`.
52+
- [x] Commit and push after each validated slice.
53+
- [ ] Prefer files that still mix orchestration + parsing/formatting + persistence.
54+
- [ ] Prefer files that remain large after the first round of carving or that keep attracting unrelated edits.
55+
56+
### Recently completed checkpoints
57+
58+
- [x] Split `src/commands/eval/report.rs`.
59+
- [x] Split `src/commands/misc/lsp_check.rs`.
60+
- [x] Split `src/commands/smart_review.rs`.
61+
- [x] Split `src/commands/eval/thresholds/evaluation.rs`.
62+
- [x] Split `src/commands/eval/runner/execute.rs`.
63+
- [x] Split `src/commands/feedback_eval/report/build/aggregate.rs`.
64+
65+
### Immediate queue
66+
67+
- [x] `src/commands/eval/pattern/matching.rs`: split normalized rule-id helpers, matcher predicates, and focused matcher tests.
68+
- [ ] `src/commands/eval/metrics/rules.rs`: separate aggregate math, rule counting, and summary reduction helpers.
69+
- [ ] `src/commands/doctor/endpoint/inference.rs`: split request building, HTTP execution/error handling, and response parsing.
70+
- [ ] `src/commands/feedback_eval/report/build/stats.rs`: split threshold confusion-matrix scoring from bucket primitives.
71+
- [ ] `src/commands/doctor/command/display.rs`: separate header/config output, endpoint listing, and inference result rendering.
72+
- [ ] `src/commands/doctor/command/run.rs`: separate endpoint discovery, recommendation flow, and test helpers.
73+
- [ ] `src/commands/eval/runner/matching.rs`: split required-match search, unexpected-match detection, and rule metric assembly.
74+
- [ ] `src/commands/eval/runner/execute/loading.rs`: separate diff resolution from repo-path resolution if it grows again.
75+
- [ ] `src/commands/feedback_eval/report/examples.rs`: split ranking helpers from example builders.
76+
- [ ] `src/commands/doctor/system.rs`: carve environment probes vs output helpers.
77+
78+
### Commands backlog
79+
80+
- [ ] `src/commands/eval/types.rs`: split fixture, pattern, report, and run-option types if churn keeps touching unrelated structs.
81+
- [ ] `src/commands/feedback_eval/types.rs`: separate input payload types from report/output types.
82+
- [ ] `src/commands/feedback_eval/input/loading.rs`: split format detection from JSON parsing/loading.
83+
- [ ] `src/commands/feedback_eval/input/conversion.rs`: split review-session conversion from label normalization helpers.
84+
- [ ] `src/commands/pr.rs`: separate summary-only flow, full review flow, and comment-posting orchestration.
85+
- [ ] `src/commands/pr/gh.rs`: carve PR resolution, diff fetching, and metadata fetching.
86+
- [ ] `src/commands/git/suggest.rs`: split commit-message prompting from PR-title prompting and response extraction.
87+
- [ ] `src/commands/review/command.rs`: split review/check/compare entrypoints if they keep diverging.
88+
- [ ] `src/commands/misc/feedback/command.rs`: separate file loading/ID normalization from store persistence.
89+
- [ ] `src/commands/misc/feedback/apply.rs`: split acceptance/rejection counters from store mutation helpers.
90+
- [ ] `src/commands/misc/discussion/command.rs`: separate the interactive loop from single-shot execution.
91+
- [ ] `src/commands/misc/discussion/selection.rs`: split file loading/ID repair from selection rules.
92+
- [ ] `src/commands/misc/changelog.rs`: evaluate splitting changelog collection from output formatting.
93+
- [ ] `src/commands/eval/command.rs`: separate CLI option prep, fixture execution, and report lifecycle.
94+
- [ ] `src/commands/feedback_eval/command.rs`: separate input loading from report/output orchestration.
95+
96+
### Review pipeline backlog
97+
98+
- [ ] `src/review/pipeline/execution/responses/processing.rs`: split raw response normalization, comment extraction, and merge logic.
99+
- [ ] `src/review/pipeline/execution/responses/validation.rs`: separate schema validation from per-comment sanitization.
100+
- [ ] `src/review/pipeline/prepare/runner.rs`: split per-diff orchestration, pre-analysis/triage decisions, and progress updates.
101+
- [ ] `src/review/pipeline/context/symbols.rs`: split symbol search, snippet selection, and fallback behavior.
102+
- [ ] `src/review/pipeline/context/related.rs`: separate related-file discovery from ranking/selection.
103+
- [ ] `src/review/pipeline/guidance.rs`: carve guidance assembly, repo-support guidance, and prompt-facing formatting.
104+
- [ ] `src/review/pipeline/session.rs`: split session construction from runtime state transitions.
105+
- [ ] `src/review/pipeline/services.rs`: separate service wiring from optional feature initialization.
106+
- [ ] `src/review/pipeline/file_context/sources.rs`: split repo sources, symbol sources, and supplemental context sources.
107+
- [ ] `src/review/pipeline/comments.rs`: separate comment assembly, filtering, and metadata stamping.
108+
- [ ] `src/review/pipeline/postprocess/dedup.rs`: split duplicate detection, scoring, and merge/rewrite behavior.
109+
- [ ] `src/review/pipeline/postprocess/feedback.rs`: separate store lookups from suppression/annotation decisions.
110+
- [ ] `src/review/pipeline/execution/dispatcher.rs`: carve request scheduling, concurrency control, and result collection.
111+
- [ ] `src/review/pipeline.rs`: keep trimming top-level orchestration as helpers mature.
112+
113+
### Review helper backlog
114+
115+
- [ ] `src/review/rule_helpers/reporting.rs`: separate rendering/formatting from score/rationale helpers.
116+
- [ ] `src/review/rule_helpers/runtime.rs`: split runtime state, caching, and dispatch helpers.
117+
- [ ] `src/review/context_helpers/ranking.rs`: separate scoring inputs from final ranking/selection.
118+
- [ ] `src/review/context_helpers/pattern_repositories.rs`: split pattern loading, matching, and repo fallback logic.
119+
- [ ] `src/review/filters.rs`: carve severity/category filters, suppression filters, and dedup-like passes.
120+
- [ ] `src/review/feedback.rs`: split persistence, semantic examples, and suppression statistics.
121+
- [ ] `src/review/triage.rs`: separate heuristics, explanations, and scoring/reporting.
122+
- [ ] `src/review/compression.rs`: split chunking, summarization, and token-budget planning.
123+
- [ ] `src/review/verification/parser.rs`: separate parser stages and error handling.
124+
- [ ] `src/review/verification/prompt.rs`: split prompt assembly from example selection.
125+
126+
### Core backlog
127+
128+
- [ ] `src/config.rs`: split defaulting, loading, validation, migration, and path-resolution logic.
129+
- [ ] `src/core/comment.rs`: separate model types, ID generation, formatting helpers, and feedback-related transforms.
130+
- [ ] `src/core/symbol_index.rs`: carve command detection, indexing, retrieval, and language-map handling.
131+
- [ ] `src/core/symbol_graph.rs`: separate graph construction, traversal, and serialization helpers.
132+
- [ ] `src/core/semantic.rs`: split semantic extraction, matching, and persistence boundaries.
133+
- [ ] `src/core/pr_summary.rs`: carve stats calculation, prompt generation, response parsing, and diagram support.
134+
- [ ] `src/core/enhanced_review.rs`: split orchestration, prompt building, and response handling.
135+
- [ ] `src/core/eval_benchmarks.rs`: separate benchmark fixtures, thresholds, scoring, and aggregation.
136+
- [ ] `src/core/prompt.rs`: split prompt fragments, model-specific tuning, and reusable builders.
137+
- [ ] `src/core/context.rs`: separate context assembly, provenance handling, and formatting.
138+
- [ ] `src/core/offline.rs`: split endpoint/model probing, metadata parsing, and recommendation helpers.
139+
- [ ] `src/core/function_chunker.rs`: separate parsing, chunk planning, and scoring heuristics.
140+
- [ ] `src/core/agent_tools.rs`: carve tool registry, schema building, and execution adapters.
141+
- [ ] `src/core/agent_loop.rs`: separate loop orchestration, state transitions, and tool/result handling.
142+
- [ ] `src/core/code_summary.rs`: split summary planning, extraction, and formatting.
143+
- [ ] `src/core/changelog.rs`: separate git/history ingestion from final changelog rendering.
144+
- [ ] `src/core/multi_pass.rs`: split pass planning, execution, and result merging.
145+
- [ ] `src/core/composable_pipeline.rs`: separate stage wiring from execution semantics.
146+
- [ ] `src/core/convention_learner.rs`: split store persistence, scoring, and feedback ingestion.
147+
- [ ] `src/core/git_history.rs`: carve log collection, parsing, and summarization.
148+
- [ ] `src/core/diff_parser.rs`: separate unified/text diff parsing, hunk tracking, and post-processing.
149+
- [ ] `src/core/interactive.rs`: split REPL/input loop, commands, and output formatting.
150+
151+
### Server and storage backlog
152+
153+
- [ ] `src/server/api.rs`: split route handlers by domain plus shared request/response helpers.
154+
- [ ] `src/server/state.rs`: separate session state, queueing, and persistence coordination.
155+
- [ ] `src/server/storage_json.rs`: carve file I/O, indexing, migrations, and query helpers.
156+
- [ ] `src/server/storage_pg.rs`: separate SQL-backed persistence domains and query grouping.
157+
- [ ] `src/server/github.rs`: split webhook parsing, API interactions, and review-session orchestration.
158+
- [ ] `src/server/metrics.rs`: separate metric registration from event emission helpers.
159+
- [ ] `src/server/mod.rs`: keep top-level wiring thin as submodules mature.
160+
161+
### Adapters, parsing, and plugins backlog
162+
163+
- [ ] `src/adapters/llm.rs`: split request shaping, retry/policy logic, and response normalization.
164+
- [ ] `src/adapters/openai.rs`: carve request builders, streaming handling, and schema/response parsing.
165+
- [ ] `src/adapters/anthropic.rs`: carve request conversion, retries, and response parsing.
166+
- [ ] `src/adapters/ollama.rs`: separate local model capabilities, request building, and response parsing.
167+
- [ ] `src/adapters/common.rs`: split shared retry/auth/http helpers.
168+
- [ ] `src/parsing/llm_response.rs`: separate fenced-block parsing, comment extraction, and validation.
169+
- [ ] `src/parsing/smart_response.rs`: split structured smart-review parsing from fallbacks.
170+
- [ ] `src/plugins/builtin/secret_scanner.rs`: carve rule loading, scanning, and finding shaping.
171+
- [ ] `src/plugins/builtin/supply_chain.rs`: separate manifest parsing, registry lookups, and finding generation.
172+
- [ ] `src/plugins/builtin/eslint.rs`: split command execution, parser helpers, and finding conversion.
173+
- [ ] `src/plugins/builtin/semgrep.rs`: split command assembly, result parsing, and finding mapping.
174+
- [ ] `src/plugins/builtin/duplicate_filter.rs`: separate fingerprinting from suppression heuristics.
175+
- [ ] `src/plugins/plugin.rs`: split plugin traits/types from execution helpers.
176+
177+
### Output and entrypoint backlog
178+
179+
- [ ] `src/output/format.rs`: separate smart review formatting, patch output, and walkthrough generation.
180+
- [ ] `src/main.rs`: carve CLI wiring by command group and shared config/bootstrap helpers.
181+
- [ ] `src/vault.rs`: split vault discovery, parsing, and maintenance operations.
182+
183+
### Nice-to-have / monitor
184+
185+
- [ ] Revisit freshly split files once they cross roughly 150 LOC again, especially `src/commands/eval/pattern/matching.rs`, `src/commands/eval/metrics/rules.rs`, `src/commands/doctor/endpoint/inference.rs`, and `src/commands/feedback_eval/report/build/stats.rs`.
186+
- [ ] Keep module roots thin: if a root file only re-exports helpers, leave it alone until child files grow again.
187+
- [ ] Favor extracting pure helpers and test-only builders before moving async orchestration.
Lines changed: 6 additions & 228 deletions
Original file line numberDiff line numberDiff line change
@@ -1,228 +1,6 @@
1-
use regex::Regex;
2-
3-
use crate::core;
4-
use crate::review::normalize_rule_id;
5-
6-
use super::super::EvalPattern;
7-
8-
impl EvalPattern {
9-
pub(in super::super) fn matches(&self, comment: &core::Comment) -> bool {
10-
if self.is_empty() {
11-
return false;
12-
}
13-
14-
let content_lower = comment.content.to_ascii_lowercase();
15-
16-
if let Some(file) = &self.file {
17-
let file = file.trim();
18-
if !file.is_empty() {
19-
let candidate = comment.file_path.to_string_lossy();
20-
if !(candidate == file || candidate.ends_with(file)) {
21-
return false;
22-
}
23-
}
24-
}
25-
26-
if let Some(line) = self.line {
27-
if comment.line_number != line {
28-
return false;
29-
}
30-
}
31-
32-
if let Some(contains) = &self.contains {
33-
let needle = contains.trim().to_ascii_lowercase();
34-
if !needle.is_empty() && !content_lower.contains(&needle) {
35-
return false;
36-
}
37-
}
38-
39-
let contains_any: Vec<String> = self
40-
.contains_any
41-
.iter()
42-
.map(|value| value.trim().to_ascii_lowercase())
43-
.filter(|value| !value.is_empty())
44-
.collect();
45-
if !contains_any.is_empty()
46-
&& !contains_any
47-
.iter()
48-
.any(|needle| content_lower.contains(needle))
49-
{
50-
return false;
51-
}
52-
53-
let tags_any: Vec<&str> = self
54-
.tags_any
55-
.iter()
56-
.map(String::as_str)
57-
.map(str::trim)
58-
.filter(|value| !value.is_empty())
59-
.collect();
60-
if !tags_any.is_empty()
61-
&& !tags_any.iter().any(|expected| {
62-
comment
63-
.tags
64-
.iter()
65-
.any(|tag| tag.eq_ignore_ascii_case(expected))
66-
})
67-
{
68-
return false;
69-
}
70-
71-
if let Some(pattern) = self.matches_regex.as_deref().map(str::trim) {
72-
if !pattern.is_empty()
73-
&& !Regex::new(pattern)
74-
.map(|regex| regex.is_match(&comment.content))
75-
.unwrap_or(false)
76-
{
77-
return false;
78-
}
79-
}
80-
81-
if let Some(severity) = &self.severity {
82-
if !comment
83-
.severity
84-
.to_string()
85-
.eq_ignore_ascii_case(severity.trim())
86-
{
87-
return false;
88-
}
89-
}
90-
91-
if let Some(category) = &self.category {
92-
if !comment
93-
.category
94-
.to_string()
95-
.eq_ignore_ascii_case(category.trim())
96-
{
97-
return false;
98-
}
99-
}
100-
101-
if let Some(min_confidence) = self.confidence_at_least {
102-
if comment.confidence < min_confidence {
103-
return false;
104-
}
105-
}
106-
107-
if let Some(max_confidence) = self.confidence_at_most {
108-
if comment.confidence > max_confidence {
109-
return false;
110-
}
111-
}
112-
113-
if let Some(fix_effort) = &self.fix_effort {
114-
let expected = fix_effort.trim();
115-
if !expected.is_empty()
116-
&& !format!("{:?}", comment.fix_effort).eq_ignore_ascii_case(expected)
117-
{
118-
return false;
119-
}
120-
}
121-
122-
if let Some(rule_id) = &self.rule_id {
123-
if self.require_rule_id {
124-
let expected = rule_id.trim().to_ascii_lowercase();
125-
let actual = comment
126-
.rule_id
127-
.as_deref()
128-
.map(|value| value.trim().to_ascii_lowercase())
129-
.unwrap_or_default();
130-
if expected != actual {
131-
return false;
132-
}
133-
}
134-
}
135-
136-
true
137-
}
138-
139-
pub(in super::super) fn normalized_rule_id(&self) -> Option<String> {
140-
normalize_rule_id(self.rule_id.as_deref())
141-
}
142-
143-
fn is_empty(&self) -> bool {
144-
self.file.as_deref().map(str::trim).unwrap_or("").is_empty()
145-
&& self.line.is_none()
146-
&& self
147-
.contains
148-
.as_deref()
149-
.map(str::trim)
150-
.unwrap_or("")
151-
.is_empty()
152-
&& self
153-
.contains_any
154-
.iter()
155-
.all(|value| value.trim().is_empty())
156-
&& self
157-
.matches_regex
158-
.as_deref()
159-
.map(str::trim)
160-
.unwrap_or("")
161-
.is_empty()
162-
&& self
163-
.severity
164-
.as_deref()
165-
.map(str::trim)
166-
.unwrap_or("")
167-
.is_empty()
168-
&& self
169-
.category
170-
.as_deref()
171-
.map(str::trim)
172-
.unwrap_or("")
173-
.is_empty()
174-
&& self.tags_any.iter().all(|value| value.trim().is_empty())
175-
&& self.confidence_at_least.is_none()
176-
&& self.confidence_at_most.is_none()
177-
&& self
178-
.fix_effort
179-
.as_deref()
180-
.map(str::trim)
181-
.unwrap_or("")
182-
.is_empty()
183-
&& (!self.require_rule_id
184-
|| self
185-
.rule_id
186-
.as_deref()
187-
.map(str::trim)
188-
.unwrap_or("")
189-
.is_empty())
190-
}
191-
}
192-
193-
#[cfg(test)]
194-
mod tests {
195-
use super::*;
196-
use crate::core::comment::{Category, FixEffort, Severity};
197-
use std::path::PathBuf;
198-
199-
#[test]
200-
fn test_eval_pattern_matches_regex_tags_and_confidence() {
201-
let comment = core::Comment {
202-
id: "comment-1".to_string(),
203-
file_path: PathBuf::from("src/lib.rs"),
204-
line_number: 12,
205-
content: "Calling panic!(user_input) here can crash the request path".to_string(),
206-
rule_id: Some("panic.user-input".to_string()),
207-
severity: Severity::Warning,
208-
category: Category::Bug,
209-
suggestion: Some("Return an error instead of panicking".to_string()),
210-
confidence: 0.91,
211-
code_suggestion: None,
212-
tags: vec!["reliability".to_string(), "panic".to_string()],
213-
fix_effort: FixEffort::Low,
214-
feedback: None,
215-
};
216-
217-
let pattern = EvalPattern {
218-
contains_any: vec!["panic".to_string(), "unwrap".to_string()],
219-
matches_regex: Some("panic!\\([^)]*user_input[^)]*\\)".to_string()),
220-
tags_any: vec!["security".to_string(), "reliability".to_string()],
221-
confidence_at_least: Some(0.9),
222-
fix_effort: Some("low".to_string()),
223-
..Default::default()
224-
};
225-
226-
assert!(pattern.matches(&comment));
227-
}
228-
}
1+
#[path = "matching/predicates.rs"]
2+
mod predicates;
3+
#[path = "matching/rule_id.rs"]
4+
mod rule_id;
5+
#[path = "matching/run.rs"]
6+
mod run;

0 commit comments

Comments
 (0)