Skip to content

Commit a4d574c

Browse files
committed
feat: surface learning-loop analytics and rule sources
Turn the Greptile research into a concrete 100-item roadmap and expose the first productized workflow surfaces so feedback coverage and shared rule sources are visible without manual config edits. Made-with: Cursor
1 parent 17d0976 commit a4d574c

File tree

3 files changed

+472
-143
lines changed

3 files changed

+472
-143
lines changed

TODO.md

Lines changed: 154 additions & 142 deletions
Original file line numberDiff line numberDiff line change
@@ -1,146 +1,158 @@
1-
# Deep Refactor TODO
1+
# Deep Research Improvement Roadmap
2+
3+
This roadmap is derived from deep research into Greptile's public docs, blog, MCP surface, self-hosted architecture, and GitHub repos, then mapped onto DiffScope's current architecture and gaps.
4+
5+
## Research Signals
6+
7+
- Greptile treats review as a full-codebase intelligence product, not just a PR comment bot.
8+
- Their learning loop is explicit: thumbs, replies, and addressed/not-addressed outcomes reshape future comments.
9+
- Their `v3` review flow is agentic and tool-using, not a rigid single-pass flowchart.
10+
- They productize workflow state: unresolved comments, review completeness, weekly reports, merge readiness.
11+
- They pull in external intent via Jira/Notion/Docs and cross-repo context via pattern repositories.
12+
- They expose review operations back into IDE/agent workflows through MCP and skills.
13+
- They sell an operational platform: self-hosted, queued workflows, analytics, and enterprise controls.
214

315
## Working Rules
416

5-
- Keep refactors behavior-preserving.
6-
- Validate every checkpoint with `cargo fmt --check`, `cargo clippy --all-targets -- -D warnings`, `cargo test`, and `bash scripts/check-workflows.sh`.
17+
- Keep changes additive and behavior-preserving unless an item explicitly requires workflow changes.
18+
- Validate each checkpoint with `cargo fmt --check`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test`, `bash scripts/check-workflows.sh`, `npm --prefix web run lint`, `npm --prefix web run build`, and `npm --prefix web run test` when frontend code changes.
719
- Commit and push after each validated slice.
8-
- Prefer extracting pure helpers and formatter/parsing boundaries before moving async orchestration.
9-
- Keep module roots thin; if a root becomes mostly re-exports, let children carry the logic.
10-
11-
## Improvement Queue
12-
13-
- [ ] `src/commands/eval/`
14-
- Add suite/category/language baseline comparisons so regressions are gated by dimension, not only whole-run totals.
15-
- Add model-matrix and repeat execution support so the same suite can be compared across frontier models and flake-checked.
16-
- Capture failed-run artifacts, including emitted comments, verifier warnings, and per-fixture mismatch details.
17-
- Reduce fixture brittleness with semantic/alias expectation matching instead of exact wording dependence.
18-
- Extend trend history with suite/category/language series plus verifier-health counters and model/provider labels.
19-
- Expand `review-depth-core` with authz, supply-chain, and async-correctness benchmark packs.
20-
- [ ] `src/commands/feedback_eval/`
21-
- Correlate feedback calibration with eval-suite category performance and rule-level precision/recall.
22-
- Surface high-confidence but frequently rejected categories/rules so review quality gaps are obvious.
23-
24-
## Immediate Queue
25-
26-
- [ ] `src/core/semantic.rs`
27-
- Split source-file discovery and excerpt/query builders from index refresh bookkeeping.
28-
- Split semantic diff retrieval and feedback-example matching from feedback-store maintenance.
29-
- [ ] `src/core/symbol_index.rs`
30-
- Split LSP command detection and extension scanning from index-building entry points.
31-
- Split regex-based symbol extraction and dependency-hint parsing from graph/file-summary registration.
32-
- Split `LspClient` protocol transport from symbol-result decoding and path/URI utilities.
33-
- Keep `build()` and `build_with_lsp()` as thin orchestration entry points.
34-
35-
## Core Backlog
36-
37-
- [ ] `src/core/semantic.rs`
38-
- Split semantic chunk hashing/key generation from summary/excerpt assembly.
39-
- Split changed-range filtering and per-query match scoring from context chunk rendering.
40-
- Split feedback fingerprint helpers from feedback-store reconciliation.
41-
- [ ] `src/config.rs`
42-
- Split defaults/model-role conversion from load/deserialize paths.
43-
- Split env/path resolution from validation/migration logic.
44-
- Split serialization-focused test helpers from production config code.
45-
- [ ] `src/core/symbol_index.rs`
46-
- Split language-pattern tables and path candidate expansion from dependency resolution.
47-
- Split file collection and byte-size filtering from index population.
48-
- Split symbol graph and reverse-dependency registration from symbol storage.
49-
- Split LSP symbol collection/range extraction from request/notification plumbing.
50-
- [ ] `src/core/symbol_graph.rs`
51-
- Split graph construction from traversal/query helpers.
52-
- Split serialization/persistence helpers from graph algorithms.
53-
- [ ] `src/core/pr_summary.rs`
54-
- Split stats aggregation, prompt generation, response parsing, and diagram helpers.
55-
- [ ] `src/core/enhanced_review.rs`
56-
- Split context construction, guidance generation, and response handling.
57-
- [ ] `src/core/eval_benchmarks.rs`
58-
- Split fixture loading, threshold selection, scoring, and aggregation/reporting.
59-
- [ ] `src/core/prompt.rs`
60-
- Split prompt fragments, model-specific tuning, and reusable prompt builders.
61-
- [ ] `src/core/context.rs`
62-
- Split context chunk construction, provenance helpers, and formatting/rendering.
63-
- [ ] `src/core/offline.rs`
64-
- Split endpoint/model probing, metadata parsing, and recommendation helpers.
65-
- [ ] `src/core/function_chunker.rs`
66-
- Split parsing, chunk planning, and scoring heuristics.
67-
- [ ] `src/core/agent_tools.rs`
68-
- Split tool registry/definitions from execution adapters and tool-context helpers.
69-
- [ ] `src/core/agent_loop.rs`
70-
- Split loop orchestration, state transitions, and tool/result handling.
71-
- [ ] `src/core/code_summary.rs`
72-
- Split summary planning, extraction, cache helpers, and formatting.
73-
- [ ] `src/core/changelog.rs`
74-
- Split git/history ingestion from final changelog rendering.
75-
- [ ] `src/core/multi_pass.rs`
76-
- Split pass planning, execution bookkeeping, and result merging.
77-
- [ ] `src/core/composable_pipeline.rs`
78-
- Split stage wiring from execution semantics and result transport.
79-
- [ ] `src/core/convention_learner.rs`
80-
- Split store persistence, scoring, and feedback ingestion helpers.
81-
- [ ] `src/core/git_history.rs`
82-
- Split log collection, parsing, and summarization.
83-
- [ ] `src/core/diff_parser.rs`
84-
- Split unified diff parsing, text diff parsing, hunk assembly, and post-processing helpers.
85-
- [ ] `src/core/interactive.rs`
86-
- Split REPL/input loop, commands, and output formatting.
87-
88-
## Server and Storage Backlog
89-
90-
- [ ] `src/server/api.rs`
91-
- Split route handlers by domain plus shared request/response and error helpers.
92-
- [ ] `src/server/state.rs`
93-
- Split session state, queueing, and persistence coordination.
94-
- [ ] `src/server/storage_json.rs`
95-
- Split file I/O, indexing, migrations, and query helpers.
96-
- [ ] `src/server/storage_pg.rs`
97-
- Split SQL-backed persistence by domain and query grouping.
98-
- [ ] `src/server/github.rs`
99-
- Split webhook parsing, API interactions, and review-session orchestration.
100-
- [ ] `src/server/metrics.rs`
101-
- Split metric registration from event emission helpers.
102-
- [ ] `src/server/mod.rs`
103-
- Keep top-level wiring thin as submodules mature.
104-
105-
## Adapters, Parsing, and Plugins Backlog
106-
107-
- [ ] `src/adapters/llm.rs`
108-
- Split request shaping, retry/policy logic, and response normalization.
109-
- [ ] `src/adapters/openai.rs`
110-
- Split request builders, streaming handling, and schema/response parsing.
111-
- [ ] `src/adapters/anthropic.rs`
112-
- Split request conversion, retries, and response parsing.
113-
- [ ] `src/adapters/ollama.rs`
114-
- Split local model capabilities, request building, and response parsing.
115-
- [ ] `src/adapters/common.rs`
116-
- Split shared retry/auth/http helpers.
117-
- [ ] `src/parsing/llm_response.rs`
118-
- Split fenced-block parsing, comment extraction, structured JSON handling, and validation.
119-
- [ ] `src/parsing/smart_response.rs`
120-
- Split structured smart-review parsing from fallback parsing paths.
121-
- [ ] `src/plugins/builtin/secret_scanner.rs`
122-
- Split rule loading, scanning, and finding shaping.
123-
- [ ] `src/plugins/builtin/supply_chain.rs`
124-
- Split manifest parsing, registry lookups, and finding generation.
125-
- [ ] `src/plugins/builtin/eslint.rs`
126-
- Split command execution, parser helpers, and finding conversion.
127-
- [ ] `src/plugins/builtin/semgrep.rs`
128-
- Split command assembly, result parsing, and finding mapping.
129-
- [ ] `src/plugins/builtin/duplicate_filter.rs`
130-
- Split fingerprinting from suppression heuristics.
131-
- [ ] `src/plugins/plugin.rs`
132-
- Split plugin traits/types from execution helpers.
133-
134-
## Output and Entrypoint Backlog
135-
136-
- [ ] `src/output/format.rs`
137-
- Split smart review formatting, patch output, and walkthrough generation.
138-
- [ ] `src/main.rs`
139-
- Split CLI wiring by command group and shared config/bootstrap helpers.
140-
- [ ] `src/vault.rs`
141-
- Split vault discovery, parsing, and maintenance operations.
142-
143-
## Ongoing Watchlist
144-
145-
- [ ] Revisit freshly split files once they cross roughly 150 LOC again, especially `src/review/pipeline/execution/dispatcher/job.rs`, `src/review/pipeline/session/build.rs`, `src/review/pipeline/services/support.rs`, and `src/review/pipeline/postprocess/feedback/lookup.rs`.
146-
- [ ] Keep module roots thin; if a root becomes only re-exports plus tests, leave it alone until children regrow.
20+
- Prefer turning existing primitives into first-class product surfaces before inventing brand new subsystems.
21+
- Optimize for independent validation, tight feedback loops, and high-signal comments over superficial feature parity.
22+
23+
## 1. Feedback, Memory, and Outcomes
24+
25+
1. [ ] Add first-class comment outcome states beyond thumbs: `new`, `accepted`, `rejected`, `addressed`, `stale`, `auto_fixed`.
26+
2. [ ] Infer "addressed by later commit" by diffing follow-up pushes against the original commented lines.
27+
3. [ ] Feed addressed/not-addressed outcomes into the reinforcement store alongside thumbs.
28+
4. [ ] Separate false-positive rejections from "valid but won't fix" dismissals in stored feedback.
29+
5. [ ] Weight reinforcement by reviewer role or trust level when GitHub identity is available.
30+
6. [ ] Add rule-level reinforcement decay so old team preferences do not dominate forever.
31+
7. [ ] Add path-scoped reinforcement buckets so teams can prefer different standards in `tests/`, `scripts/`, and production code.
32+
8. [ ] Persist explanation text from follow-up feedback replies and mine it into reusable review guidance.
33+
9. [ ] Learn "preferred phrasing" for accepted comments so comment tone and specificity improve over time.
34+
10. [ ] Backfill existing stored reviews into the new outcome-aware feedback store for cold-start reduction.
35+
36+
## 2. Review Lifecycle and Merge Readiness
37+
38+
11. [ ] Track unresolved vs resolved findings for PR reviews as a first-class lifecycle state.
39+
12. [ ] Add review completeness metrics: total findings, acknowledged findings, fixed findings, stale findings.
40+
13. [ ] Compute merge-readiness summaries for GitHub PR reviews using severity, unresolved count, and verification state.
41+
14. [ ] Add stale-review detection when new commits land after the latest completed review.
42+
15. [ ] Show "needs re-review" state in review detail and history pages for incremental PR workflows.
43+
16. [ ] Distinguish informational findings from blocking findings in lifecycle and readiness calculations.
44+
17. [ ] Add "critical blockers" summary cards for unresolved `Error` and `Warning` comments.
45+
18. [ ] Add per-PR readiness timelines showing when a review became mergeable.
46+
19. [ ] Store resolution timestamps for findings so mean-time-to-fix can be measured.
47+
20. [ ] Add CLI and API surfaces to query PR readiness without opening the web UI.
48+
49+
## 3. Agentic Validation Loops
50+
51+
21. [ ] Build a first-class `fix until clean` loop that can run review, apply fixes, rerun review, and stop on convergence.
52+
22. [ ] Reuse the existing DAG runtime to model iterative review/fix loops as resumable workflow nodes.
53+
23. [ ] Add a max-iteration policy and loop budget controls for autonomous review convergence.
54+
24. [ ] Add "issue replay" prompts that hand unresolved findings back to a coding agent with file-local context.
55+
25. [ ] Add a handoff contract from reviewer findings to fix agents with rule IDs, evidence, and suggested diffs.
56+
26. [ ] Persist loop-level telemetry: iterations, fixes attempted, findings cleared, findings reopened.
57+
27. [ ] Add "challenge the finding" verification loops where a validator tries to falsify a suspected issue before keeping it.
58+
28. [ ] Add caching between iterations so repeated codebase retrieval and verification runs are cheaper.
59+
29. [ ] Allow loop policies to differ by profile: conservative auditor, high-autonomy fixer, or report-only.
60+
30. [ ] Add eval fixtures specifically for loop convergence and reopened-issue regressions.
61+
62+
## 4. Code Graph and Repository Intelligence
63+
64+
31. [ ] Turn the current symbol graph into a persisted repository graph with durable storage and reload support.
65+
32. [ ] Add caller/callee expansion APIs for multi-hop impact analysis from changed symbols.
66+
33. [ ] Add contract edges between interfaces, implementations, and API endpoints.
67+
34. [ ] Add "similar implementation" lookup so repeated patterns and divergences are explicit.
68+
35. [ ] Add cross-file blast-radius summaries to findings when a change affects many callers.
69+
36. [ ] Add graph freshness/version metadata so reviews know whether they are using stale repository intelligence.
70+
37. [ ] Add graph-backed ranking of related files before semantic RAG retrieval.
71+
38. [ ] Add graph query traces to `dag_traces` or review artifacts for explainability and debugging.
72+
39. [ ] Add graph-aware eval fixtures that require multi-hop code understanding to pass.
73+
40. [ ] Split `src/core/symbol_graph.rs` into construction, persistence, traversal, and ranking modules as it grows.
74+
75+
## 5. External Context and Pattern Repositories
76+
77+
41. [x] Surface pattern repository sources in the Settings UI with validation and defaults.
78+
42. [x] Surface review rule file sources in the Settings UI instead of requiring config edits by hand.
79+
43. [ ] Add structured UI editing for custom context notes, files, and scopes.
80+
44. [ ] Add per-path scoped review instructions in the Settings UI for common repo areas.
81+
45. [ ] Support Jira/Linear issue context ingestion for PR-linked reviews.
82+
46. [ ] Support document-backed context ingestion for design docs, RFCs, and runbooks.
83+
47. [ ] Add explicit "intent mismatch" review checks comparing PR changes to ticket acceptance criteria.
84+
48. [ ] Add review artifacts that show which external context sources influenced a finding.
85+
49. [ ] Add tests for pattern repository resolution across local paths, Git URLs, and broken sources.
86+
50. [ ] Add analytics on which context sources actually improve acceptance and fix rates.
87+
88+
## 6. Review UX and Workflow Integration
89+
90+
51. [ ] Add visible accepted/rejected/dismissed badges to comments throughout the UI, not just icon state.
91+
52. [ ] Add comment grouping by unresolved, fixed, stale, and informational sections in `ReviewView`.
92+
53. [ ] Add a "show only blockers" mode for large reviews.
93+
54. [ ] Add keyboard actions for thumbs, resolve, and jump-to-next-finding workflows.
94+
55. [ ] Add file-level readiness summaries in the diff sidebar.
95+
56. [ ] Add lifecycle-aware PR summaries that explain what still blocks merge.
96+
57. [ ] Add a "train the reviewer" callout when thumbs coverage on a review is low.
97+
58. [ ] Add review-change comparisons so users can diff one review run against the next on the same PR.
98+
59. [ ] Add better surfacing for incremental PR reviews so users know when only the delta was reviewed.
99+
60. [ ] Add discussion workflows that can convert repeated human comments into candidate rules or context snippets.
100+
101+
## 7. Analytics, Reporting, and Quality Dashboards
102+
103+
61. [x] Add feedback coverage metrics: percent of findings with thumbs or explicit disposition.
104+
62. [x] Add acceptance/rejection trend lines over time for recent reviews.
105+
63. [x] Add top accepted categories/rules and top rejected categories/rules to Analytics.
106+
64. [ ] Add unresolved blocker counts per repository and per PR.
107+
65. [ ] Add review completeness and mean-time-to-resolution charts.
108+
66. [ ] Add feedback-learning effectiveness metrics: did reranked findings get higher acceptance after rollout?
109+
67. [ ] Add pattern-repository utilization analytics showing when extra context actually affected findings.
110+
68. [ ] Add eval-vs-production dashboards comparing benchmark strength against real-world acceptance.
111+
69. [ ] Add drill-downs from trend charts directly into the affected reviews, findings, and rules.
112+
70. [ ] Add exportable JSON/CSV reports for review quality, lifecycle, and reinforcement metrics.
113+
114+
## 8. APIs, Automation, and MCP-Like Surfaces
115+
116+
71. [ ] Expose unresolved/resolved comment search through the HTTP API.
117+
72. [ ] Expose PR readiness through the HTTP API for CI and agent integrations.
118+
73. [ ] Add API endpoints to fetch learned rules, attention gaps, and top rejected patterns.
119+
74. [ ] Add machine-friendly APIs to fetch findings grouped by severity, file, and lifecycle state.
120+
75. [ ] Add a "trigger re-review" API that reuses existing PR metadata and loop policy.
121+
76. [ ] Add APIs for comment resolution and lifecycle updates, not just thumbs.
122+
77. [ ] Add an MCP server for DiffScope with review, analytics, and rule-management tools.
123+
78. [ ] Add reusable agent skills/workflows for checking PR readiness and running fix loops.
124+
79. [ ] Add signed webhook or event-stream integration for downstream automation consumers.
125+
80. [ ] Add rate-limited API auth and audit trails for automation-heavy deployments.
126+
127+
## 9. Infra, Self-Hosting, and Enterprise Operations
128+
129+
81. [ ] Split `src/server/api.rs` by domain so the growing platform API stays maintainable.
130+
82. [ ] Split `src/server/state.rs` into session lifecycle, persistence, progress, and GitHub coordination modules.
131+
83. [ ] Add queue depth and worker saturation metrics for long-running review and eval jobs.
132+
84. [ ] Add retention policies for review artifacts, eval artifacts, and trend histories.
133+
85. [ ] Add storage migrations for richer comment lifecycle and reinforcement schemas.
134+
86. [ ] Add deployment docs for self-hosted review + analytics + trend retention setups.
135+
87. [ ] Add secret-management guidance and validation for multi-provider enterprise installs.
136+
88. [ ] Add background jobs for recomputing analytics after schema or scoring changes.
137+
89. [ ] Add cost dashboards by provider/model/role for review, verification, and eval workloads.
138+
90. [ ] Add failure forensics bundles for self-hosted users when review or eval jobs degrade.
139+
140+
## 10. Eval, Benchmarking, and Model Governance
141+
142+
91. [ ] Add eval fixtures for external-context alignment, not just diff-local correctness.
143+
92. [ ] Add eval fixtures for merge-readiness judgments and unresolved-blocker classification.
144+
93. [ ] Add eval fixtures for addressed-vs-stale finding lifecycle inference.
145+
94. [ ] Add eval fixtures for multi-hop graph reasoning across call chains and contract edges.
146+
95. [ ] Add eval runs that compare single-pass review against agentic loop review.
147+
96. [ ] Add production replay evals using anonymized accepted/rejected review outcomes.
148+
97. [ ] Add leaderboard reporting for reviewer usefulness metrics, not just precision/recall.
149+
98. [ ] Add regression gates for feedback coverage, verifier health, and lifecycle-state accuracy.
150+
99. [ ] Add model-routing policies that explicitly separate generation, verification, and auditing roles.
151+
100. [ ] Publish a repeatable "independent auditor" benchmark story in the UI and CLI so DiffScope's differentiation is measurable.
152+
153+
## Current Execution Slice
154+
155+
- [x] Rewrite this roadmap into the active backlog and keep it updated as slices ship.
156+
- [x] Productize the learning loop in Analytics with reaction coverage and acceptance trends.
157+
- [x] Surface repository rule sources and pattern repository sources in Settings.
158+
- [ ] Commit and push each validated checkpoint before moving to the next epic.

0 commit comments

Comments
 (0)