Skip to content

Commit 393432e

Browse files
committed
feat(ops): add self-hosted diagnostics and replay evals
1 parent e436bd7 commit 393432e

File tree

15 files changed

+1638
-74
lines changed

15 files changed

+1638
-74
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -194,6 +194,7 @@ git diff | diffscope review --model custom-model
194194
### Self-Hosted / Local Models
195195

196196
Run DiffScope against a local LLM with zero cloud dependencies. No API key required.
197+
For the server deployment path with persistent analytics, retention, secret-management guidance, and forensics bundles, see [`docs/self-hosting.md`](docs/self-hosting.md).
197198

198199
#### Ollama (Recommended)
199200
```bash

TODO.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -131,11 +131,11 @@ This roadmap is derived from deep research into Greptile's public docs, blog, MC
131131
83. [x] Add queue depth and worker saturation metrics for long-running review and eval jobs.
132132
84. [x] Add retention policies for review artifacts, eval artifacts, and trend histories.
133133
85. [x] Add storage migrations for richer comment lifecycle and reinforcement schemas.
134-
86. [ ] Add deployment docs for self-hosted review + analytics + trend retention setups.
135-
87. [ ] Add secret-management guidance and validation for multi-provider enterprise installs.
134+
86. [x] Add deployment docs for self-hosted review + analytics + trend retention setups.
135+
87. [x] Add secret-management guidance and validation for multi-provider enterprise installs.
136136
88. [x] Add background jobs for recomputing analytics after schema or scoring changes.
137137
89. [x] Add cost dashboards by provider/model/role for review, verification, and eval workloads.
138-
90. [ ] Add failure forensics bundles for self-hosted users when review or eval jobs degrade.
138+
90. [x] Add failure forensics bundles for self-hosted users when review or eval jobs degrade.
139139

140140
## 10. Eval, Benchmarking, and Model Governance
141141

@@ -144,7 +144,7 @@ This roadmap is derived from deep research into Greptile's public docs, blog, MC
144144
93. [x] Add eval fixtures for addressed-vs-stale finding lifecycle inference.
145145
94. [x] Add eval fixtures for multi-hop graph reasoning across call chains and contract edges.
146146
95. [x] Add eval runs that compare single-pass review against agentic loop review.
147-
96. [ ] Add production replay evals using anonymized accepted/rejected review outcomes.
147+
96. [x] Add production replay evals using anonymized accepted/rejected review outcomes.
148148
97. [x] Add leaderboard reporting for reviewer usefulness metrics, not just precision/recall.
149149
98. [x] Add regression gates for feedback coverage, verifier health, and lifecycle-state accuracy.
150150
99. [x] Add model-routing policies that explicitly separate generation, verification, and auditing roles.

docs/self-hosting.md

Lines changed: 187 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,187 @@
1+
# Self-Hosting DiffScope
2+
3+
This guide covers the server deployment path for DiffScope when you want persistent reviews, analytics artifacts, trend history, and operational diagnostics.
4+
5+
## Recommended deployment path
6+
7+
- Use the Helm chart in `charts/diffscope/` for long-running server deployments.
8+
- Use `docker-compose.yml` for local CLI + Ollama workflows only. The compose file runs `diffscope review`, not `diffscope serve`.
9+
- The container image itself defaults to `serve`, so Kubernetes/Helm is the intended self-hosted server path.
10+
11+
## Minimum server checklist
12+
13+
1. Mount a persistent volume at `/home/diffscope/.local/share/diffscope`.
14+
2. Keep the working tree at `/workspace` when you want server-side Git and PR workflows.
15+
3. Set a shared API key with `DIFFSCOPE_SERVER_API_KEY` for protected mutation routes.
16+
4. Point every persisted analytics artifact into the mounted data directory.
17+
5. Schedule `diffscope eval` and `diffscope feedback-eval` if you want trend charts to stay fresh.
18+
19+
## Persistent artifact layout
20+
21+
The server persists and reads these files during normal operation:
22+
23+
| Purpose | Default path | Recommendation for Helm |
24+
| --- | --- | --- |
25+
| Review/event storage | `~/.local/share/diffscope/reviews.json` | keep default |
26+
| Learned conventions | `~/.local/share/diffscope/conventions.json` | keep default |
27+
| Feedback store | `.diffscope.feedback.json` | move under `/home/diffscope/.local/share/diffscope/feedback.json` |
28+
| Eval trend history | `.diffscope.eval-trend.json` | move under `/home/diffscope/.local/share/diffscope/eval-trend.json` |
29+
| Feedback-eval trend history | `.diffscope.feedback-eval-trend.json` | move under `/home/diffscope/.local/share/diffscope/feedback-eval-trend.json` |
30+
| Production replay pack | `~/.local/share/diffscope/eval/production_replay/replay.json` | keep default |
31+
| Failure forensics bundles | `~/.local/share/diffscope/forensics/...` | keep default |
32+
33+
The Helm PVC only covers `/home/diffscope/.local/share/diffscope`, so the relative default paths for `feedback_path`, `eval_trend_path`, and `feedback_eval_trend_path` are not durable unless you override them.
34+
35+
## Example Helm config
36+
37+
```yaml
38+
diffscope:
39+
model: claude-opus-4-6
40+
adapter: anthropic
41+
baseUrl: https://api.anthropic.com
42+
43+
gitRepo:
44+
enabled: true
45+
repository: https://github.com/your-org/your-repo.git
46+
branch: main
47+
48+
persistence:
49+
enabled: true
50+
size: 20Gi
51+
52+
config:
53+
configFile: |
54+
model: claude-opus-4-6
55+
model_reasoning: openai/o3
56+
providers:
57+
anthropic:
58+
enabled: true
59+
openrouter:
60+
enabled: true
61+
feedback_path: /home/diffscope/.local/share/diffscope/feedback.json
62+
eval_trend_path: /home/diffscope/.local/share/diffscope/eval-trend.json
63+
feedback_eval_trend_path: /home/diffscope/.local/share/diffscope/feedback-eval-trend.json
64+
retention:
65+
review_max_age_days: 30
66+
review_max_count: 1000
67+
eval_artifact_max_age_days: 30
68+
trend_history_max_entries: 200
69+
70+
extraEnv:
71+
DIFFSCOPE_SERVER_API_KEY: ${DIFFSCOPE_SERVER_API_KEY}
72+
ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY}
73+
OPENROUTER_API_KEY: ${OPENROUTER_API_KEY}
74+
```
75+
76+
Notes:
77+
78+
- `providers.<name>.api_key` and provider-specific environment variables are the recommended way to run mixed-provider installs.
79+
- `openai/o3` and other non-Anthropic `vendor/model` ids route through OpenRouter unless you explicitly force another adapter.
80+
- `config.configFile` is mounted at `/workspace/.diffscope.yml`; it is easiest to use with `gitRepo.enabled: true` so the working directory already matches `/workspace`.
81+
82+
## Secret-management guidance
83+
84+
For enterprise installs, prefer external secret injection over storing credentials in `.diffscope.yml`.
85+
86+
### Recommended secret sources
87+
88+
- `ANTHROPIC_API_KEY`
89+
- `OPENAI_API_KEY`
90+
- `OPENROUTER_API_KEY`
91+
- `DIFFSCOPE_SERVER_API_KEY`
92+
- `GITHUB_TOKEN`
93+
- `DIFFSCOPE_GITHUB_APP_ID`
94+
- `DIFFSCOPE_GITHUB_PRIVATE_KEY`
95+
- `DIFFSCOPE_JIRA_BASE_URL`
96+
- `DIFFSCOPE_JIRA_EMAIL`
97+
- `DIFFSCOPE_JIRA_API_TOKEN`
98+
- `DIFFSCOPE_LINEAR_API_KEY`
99+
100+
### Validation behavior
101+
102+
`diffscope doctor`, the server doctor endpoint, and startup warnings now surface configuration issues for:
103+
104+
- mixed-provider installs that still rely on legacy top-level `api_key` / `base_url` / `adapter`
105+
- missing provider-specific API keys for selected cloud providers
106+
- incomplete GitHub App config (`github_app_id` + `github_private_key` must be paired)
107+
- incomplete Jira config (`jira_base_url`, `jira_email`, `jira_api_token` must be paired)
108+
- incomplete Vault config (`vault_addr`, `vault_path`, `vault_token`)
109+
110+
### Vault caveat
111+
112+
Vault currently resolves only the legacy top-level `api_key`. For multi-provider installs, use provider-specific secrets from your runtime environment or secret store injection.
113+
114+
## Analytics and retention
115+
116+
### What populates Analytics
117+
118+
- `/api/events` and `/api/events/stats` come from stored reviews and wide events.
119+
- `/api/analytics/learned-rules` reads the convention store.
120+
- `/api/analytics/rejected-patterns` reads the feedback store.
121+
- `/api/analytics/trends` and `/api/analytics/attention-gaps` read the eval and feedback-eval trend JSON files.
122+
123+
### How to keep trend data fresh
124+
125+
Run these on a schedule against a persisted artifact directory:
126+
127+
```bash
128+
diffscope eval --fixtures eval/fixtures --artifact-dir /var/lib/diffscope/eval
129+
diffscope feedback-eval --input /home/diffscope/.local/share/diffscope/reviews.json --eval-report /var/lib/diffscope/eval/report.json
130+
```
131+
132+
### Retention controls
133+
134+
- `retention.review_max_age_days`
135+
- `retention.review_max_count`
136+
- `retention.eval_artifact_max_age_days`
137+
- `retention.trend_history_max_entries`
138+
139+
Completed reviews apply review retention automatically. Eval artifact pruning happens when you run `diffscope eval --artifact-dir ...`.
140+
141+
### Analytics recompute job
142+
143+
Use these protected endpoints after scoring or schema changes:
144+
145+
```text
146+
POST /api/analytics/recompute
147+
GET /api/analytics/recompute/{job_id}
148+
```
149+
150+
The recompute job refreshes stored review summaries and event aggregates. It does not rebuild eval or feedback-eval trend files.
151+
152+
## Operations and diagnostics
153+
154+
### Health checks
155+
156+
- `diffscope doctor`
157+
- `GET /api/status`
158+
- `GET /api/doctor`
159+
- `GET /metrics`
160+
- Helm `test-connection`
161+
162+
### Failure forensics bundles
163+
164+
DiffScope now writes JSON forensics bundles for degraded review and eval runs.
165+
166+
- Review bundles: `~/.local/share/diffscope/forensics/reviews/<review-id>/`
167+
- Eval bundles: `~/.local/share/diffscope/forensics/eval/...` or `<artifact-dir>/forensics/...`
168+
169+
Review manifests are available from the protected endpoint:
170+
171+
```text
172+
GET /api/review/{id}/forensics
173+
```
174+
175+
### Production replay evals
176+
177+
Accepted and rejected review outcomes are captured into an anonymized replay pack at:
178+
179+
```text
180+
~/.local/share/diffscope/eval/production_replay/replay.json
181+
```
182+
183+
Run it with the normal eval command:
184+
185+
```bash
186+
diffscope eval --fixtures ~/.local/share/diffscope/eval/production_replay
187+
```

src/commands/doctor/command/display/config.rs

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,28 @@ pub(in super::super) fn print_configuration(config: &Config) {
2929
if let Some(cw) = config.context_window {
3030
println!(" Context: {cw} tokens");
3131
}
32+
println!(
33+
" Primary Provider: {}",
34+
config
35+
.resolved_provider_for_role(crate::config::ModelRole::Primary)
36+
.provider
37+
.as_deref()
38+
.unwrap_or("(auto-detect)")
39+
);
40+
let validation_issues = config.validation_issues();
41+
if !validation_issues.is_empty() {
42+
println!(" Validation:");
43+
for issue in validation_issues {
44+
println!(
45+
" - {}: {}",
46+
match issue.level {
47+
crate::config::ConfigValidationIssueLevel::Warning => "warning",
48+
crate::config::ConfigValidationIssueLevel::Error => "error",
49+
},
50+
issue.message,
51+
);
52+
}
53+
}
3254
println!();
3355
}
3456

src/commands/eval/command.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,7 @@ pub async fn eval_command(
6666
options.artifact_dir.as_deref(),
6767
);
6868
emit_eval_report(
69+
&config,
6970
execution.results,
7071
report_output_path.as_deref(),
7172
prepared_options,

src/commands/eval/command/batch.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -201,6 +201,7 @@ pub(super) async fn run_eval_batch(
201201
.as_ref()
202202
.map(|dir| dir.join("report.json"));
203203
let report = materialize_eval_report(
204+
&run_config,
204205
execution.results,
205206
report_output_path.as_deref(),
206207
prepared_options.clone(),

src/commands/eval/command/report.rs

Lines changed: 39 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,13 +9,21 @@ use super::super::{EvalFixtureResult, EvalReport, EvalRunMetadata};
99
use super::options::PreparedEvalOptions;
1010

1111
pub(super) async fn emit_eval_report(
12+
config: &crate::config::Config,
1213
results: Vec<EvalFixtureResult>,
1314
output_path: Option<&Path>,
1415
prepared_options: PreparedEvalOptions,
1516
run_metadata: EvalRunMetadata,
1617
) -> Result<()> {
17-
let report =
18-
materialize_eval_report(results, output_path, prepared_options, run_metadata, true).await?;
18+
let report = materialize_eval_report(
19+
config,
20+
results,
21+
output_path,
22+
prepared_options,
23+
run_metadata,
24+
true,
25+
)
26+
.await?;
1927

2028
if let Some(message) = evaluation_failure_message(&report) {
2129
anyhow::bail!("{}", message);
@@ -25,6 +33,7 @@ pub(super) async fn emit_eval_report(
2533
}
2634

2735
pub(super) async fn materialize_eval_report(
36+
config: &crate::config::Config,
2837
results: Vec<EvalFixtureResult>,
2938
output_path: Option<&Path>,
3039
prepared_options: PreparedEvalOptions,
@@ -48,5 +57,33 @@ pub(super) async fn materialize_eval_report(
4857
update_eval_quality_trend(&report, path, prepared_options.trend_max_entries).await?;
4958
}
5059

60+
if report.fixtures_failed > 0
61+
|| !report.warnings.is_empty()
62+
|| !report.threshold_failures.is_empty()
63+
{
64+
let trigger = if report.fixtures_failed > 0 {
65+
"eval_failures"
66+
} else if !report.threshold_failures.is_empty() {
67+
"eval_threshold_failures"
68+
} else {
69+
"eval_warnings"
70+
};
71+
let artifact_dir = report
72+
.run
73+
.artifact_dir
74+
.as_ref()
75+
.map(std::path::PathBuf::from);
76+
let manifest = crate::forensics::write_eval_forensics_bundle(
77+
config,
78+
crate::forensics::EvalForensicsBundleInput {
79+
trigger: trigger.to_string(),
80+
report: report.clone(),
81+
artifact_dir,
82+
},
83+
)
84+
.await?;
85+
println!("Forensics bundle: {}", manifest.root_path);
86+
}
87+
5188
Ok(report)
5289
}

src/commands/mod.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ mod smart_review;
1111

1212
pub(crate) use dag::{build_dag_catalog, describe_dag_graph, plan_dag_graph, DagGraphSelection};
1313
pub use doctor::doctor_command;
14+
pub(crate) use eval::EvalReport;
1415
pub use eval::{eval_command, EvalRunOptions};
1516
pub use feedback_eval::feedback_eval_command;
1617
pub use git::{git_command, GitCommands};

0 commit comments

Comments
 (0)