evalops
diff --git a/‎README.md‎
Lines changed: 1 addition & 0 deletions b/‎README.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎TODO.md‎
Lines changed: 4 additions & 4 deletions b/‎TODO.md‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎docs/self-hosting.md‎
Lines changed: 187 additions & 0 deletions b/‎docs/self-hosting.md‎
Lines changed: 187 additions & 0 deletions
diff --git a/‎src/commands/doctor/command/display/config.rs‎
Lines changed: 22 additions & 0 deletions b/‎src/commands/doctor/command/display/config.rs‎
Lines changed: 22 additions & 0 deletions
diff --git a/‎src/commands/eval/command.rs‎
Lines changed: 1 addition & 0 deletions b/‎src/commands/eval/command.rs‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎src/commands/eval/command/batch.rs‎
Lines changed: 1 addition & 0 deletions b/‎src/commands/eval/command/batch.rs‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎src/commands/eval/command/report.rs‎
Lines changed: 39 additions & 2 deletions b/‎src/commands/eval/command/report.rs‎
Lines changed: 39 additions & 2 deletions
diff --git a/‎src/commands/mod.rs‎
Lines changed: 1 addition & 0 deletions b/‎src/commands/mod.rs‎
Lines changed: 1 addition & 0 deletions
@@ -194,6 +194,7 @@ git diff | diffscope review --model custom-model
 ### Self-Hosted / Local Models
 
 Run DiffScope against a local LLM with zero cloud dependencies. No API key required.
+For the server deployment path with persistent analytics, retention, secret-management guidance, and forensics bundles, see [`docs/self-hosting.md`](docs/self-hosting.md).
 
 #### Ollama (Recommended)
 ```bash
 
@@ -131,11 +131,11 @@ This roadmap is derived from deep research into Greptile's public docs, blog, MC
 83. [x] Add queue depth and worker saturation metrics for long-running review and eval jobs.
 84. [x] Add retention policies for review artifacts, eval artifacts, and trend histories.
 85. [x] Add storage migrations for richer comment lifecycle and reinforcement schemas.
-86. [ ] Add deployment docs for self-hosted review + analytics + trend retention setups.
-87. [ ] Add secret-management guidance and validation for multi-provider enterprise installs.
+86. [x] Add deployment docs for self-hosted review + analytics + trend retention setups.
+87. [x] Add secret-management guidance and validation for multi-provider enterprise installs.
 88. [x] Add background jobs for recomputing analytics after schema or scoring changes.
 89. [x] Add cost dashboards by provider/model/role for review, verification, and eval workloads.
-90. [ ] Add failure forensics bundles for self-hosted users when review or eval jobs degrade.
+90. [x] Add failure forensics bundles for self-hosted users when review or eval jobs degrade.
 
 ## 10. Eval, Benchmarking, and Model Governance
 
@@ -144,7 +144,7 @@ This roadmap is derived from deep research into Greptile's public docs, blog, MC
 93. [x] Add eval fixtures for addressed-vs-stale finding lifecycle inference.
 94. [x] Add eval fixtures for multi-hop graph reasoning across call chains and contract edges.
 95. [x] Add eval runs that compare single-pass review against agentic loop review.
-96. [ ] Add production replay evals using anonymized accepted/rejected review outcomes.
+96. [x] Add production replay evals using anonymized accepted/rejected review outcomes.
 97. [x] Add leaderboard reporting for reviewer usefulness metrics, not just precision/recall.
 98. [x] Add regression gates for feedback coverage, verifier health, and lifecycle-state accuracy.
 99. [x] Add model-routing policies that explicitly separate generation, verification, and auditing roles.
 
@@ -0,0 +1,187 @@
+# Self-Hosting DiffScope
+
+This guide covers the server deployment path for DiffScope when you want persistent reviews, analytics artifacts, trend history, and operational diagnostics.
+
+## Recommended deployment path
+
+- Use the Helm chart in `charts/diffscope/` for long-running server deployments.
+- Use `docker-compose.yml` for local CLI + Ollama workflows only. The compose file runs `diffscope review`, not `diffscope serve`.
+- The container image itself defaults to `serve`, so Kubernetes/Helm is the intended self-hosted server path.
+
+## Minimum server checklist
+
+1. Mount a persistent volume at `/home/diffscope/.local/share/diffscope`.
+2. Keep the working tree at `/workspace` when you want server-side Git and PR workflows.
+3. Set a shared API key with `DIFFSCOPE_SERVER_API_KEY` for protected mutation routes.
+4. Point every persisted analytics artifact into the mounted data directory.
+5. Schedule `diffscope eval` and `diffscope feedback-eval` if you want trend charts to stay fresh.
+
+## Persistent artifact layout
+
+The server persists and reads these files during normal operation:
+
+| Purpose | Default path | Recommendation for Helm |
+| --- | --- | --- |
+| Review/event storage | `~/.local/share/diffscope/reviews.json` | keep default |
+| Learned conventions | `~/.local/share/diffscope/conventions.json` | keep default |
+| Feedback store | `.diffscope.feedback.json` | move under `/home/diffscope/.local/share/diffscope/feedback.json` |
+| Eval trend history | `.diffscope.eval-trend.json` | move under `/home/diffscope/.local/share/diffscope/eval-trend.json` |
+| Feedback-eval trend history | `.diffscope.feedback-eval-trend.json` | move under `/home/diffscope/.local/share/diffscope/feedback-eval-trend.json` |
+| Production replay pack | `~/.local/share/diffscope/eval/production_replay/replay.json` | keep default |
+| Failure forensics bundles | `~/.local/share/diffscope/forensics/...` | keep default |
+
+The Helm PVC only covers `/home/diffscope/.local/share/diffscope`, so the relative default paths for `feedback_path`, `eval_trend_path`, and `feedback_eval_trend_path` are not durable unless you override them.
+
+## Example Helm config
+
+```yaml
+diffscope:
+  model: claude-opus-4-6
+  adapter: anthropic
+  baseUrl: https://api.anthropic.com
+
+gitRepo:
+  enabled: true
+  repository: https://github.com/your-org/your-repo.git
+  branch: main
+
+persistence:
+  enabled: true
+  size: 20Gi
+
+config:
+  configFile: |
+    model: claude-opus-4-6
+    model_reasoning: openai/o3
+    providers:
+      anthropic:
+        enabled: true
+      openrouter:
+        enabled: true
+    feedback_path: /home/diffscope/.local/share/diffscope/feedback.json
+    eval_trend_path: /home/diffscope/.local/share/diffscope/eval-trend.json
+    feedback_eval_trend_path: /home/diffscope/.local/share/diffscope/feedback-eval-trend.json
+    retention:
+      review_max_age_days: 30
+      review_max_count: 1000
+      eval_artifact_max_age_days: 30
+      trend_history_max_entries: 200
+
+  extraEnv:
+    DIFFSCOPE_SERVER_API_KEY: ${DIFFSCOPE_SERVER_API_KEY}
+    ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY}
+    OPENROUTER_API_KEY: ${OPENROUTER_API_KEY}
+```
+
+Notes:
+
+- `providers.<name>.api_key` and provider-specific environment variables are the recommended way to run mixed-provider installs.
+- `openai/o3` and other non-Anthropic `vendor/model` ids route through OpenRouter unless you explicitly force another adapter.
+- `config.configFile` is mounted at `/workspace/.diffscope.yml`; it is easiest to use with `gitRepo.enabled: true` so the working directory already matches `/workspace`.
+
+## Secret-management guidance
+
+For enterprise installs, prefer external secret injection over storing credentials in `.diffscope.yml`.
+
+### Recommended secret sources
+
+- `ANTHROPIC_API_KEY`
+- `OPENAI_API_KEY`
+- `OPENROUTER_API_KEY`
+- `DIFFSCOPE_SERVER_API_KEY`
+- `GITHUB_TOKEN`
+- `DIFFSCOPE_GITHUB_APP_ID`
+- `DIFFSCOPE_GITHUB_PRIVATE_KEY`
+- `DIFFSCOPE_JIRA_BASE_URL`
+- `DIFFSCOPE_JIRA_EMAIL`
+- `DIFFSCOPE_JIRA_API_TOKEN`
+- `DIFFSCOPE_LINEAR_API_KEY`
+
+### Validation behavior
+
+`diffscope doctor`, the server doctor endpoint, and startup warnings now surface configuration issues for:
+
+- mixed-provider installs that still rely on legacy top-level `api_key` / `base_url` / `adapter`
+- missing provider-specific API keys for selected cloud providers
+- incomplete GitHub App config (`github_app_id` + `github_private_key` must be paired)
+- incomplete Jira config (`jira_base_url`, `jira_email`, `jira_api_token` must be paired)
+- incomplete Vault config (`vault_addr`, `vault_path`, `vault_token`)
+
+### Vault caveat
+
+Vault currently resolves only the legacy top-level `api_key`. For multi-provider installs, use provider-specific secrets from your runtime environment or secret store injection.
+
+## Analytics and retention
+
+### What populates Analytics
+
+- `/api/events` and `/api/events/stats` come from stored reviews and wide events.
+- `/api/analytics/learned-rules` reads the convention store.
+- `/api/analytics/rejected-patterns` reads the feedback store.
+- `/api/analytics/trends` and `/api/analytics/attention-gaps` read the eval and feedback-eval trend JSON files.
+
+### How to keep trend data fresh
+
+Run these on a schedule against a persisted artifact directory:
+
+```bash
+diffscope eval --fixtures eval/fixtures --artifact-dir /var/lib/diffscope/eval
+diffscope feedback-eval --input /home/diffscope/.local/share/diffscope/reviews.json --eval-report /var/lib/diffscope/eval/report.json
+```
+
+### Retention controls
+
+- `retention.review_max_age_days`
+- `retention.review_max_count`
+- `retention.eval_artifact_max_age_days`
+- `retention.trend_history_max_entries`
+
+Completed reviews apply review retention automatically. Eval artifact pruning happens when you run `diffscope eval --artifact-dir ...`.
+
+### Analytics recompute job
+
+Use these protected endpoints after scoring or schema changes:
+
+```text
+POST /api/analytics/recompute
+GET  /api/analytics/recompute/{job_id}
+```
+
+The recompute job refreshes stored review summaries and event aggregates. It does not rebuild eval or feedback-eval trend files.
+
+## Operations and diagnostics
+
+### Health checks
+
+- `diffscope doctor`
+- `GET /api/status`
+- `GET /api/doctor`
+- `GET /metrics`
+- Helm `test-connection`
+
+### Failure forensics bundles
+
+DiffScope now writes JSON forensics bundles for degraded review and eval runs.
+
+- Review bundles: `~/.local/share/diffscope/forensics/reviews/<review-id>/`
+- Eval bundles: `~/.local/share/diffscope/forensics/eval/...` or `<artifact-dir>/forensics/...`
+
+Review manifests are available from the protected endpoint:
+
+```text
+GET /api/review/{id}/forensics
+```
+
+### Production replay evals
+
+Accepted and rejected review outcomes are captured into an anonymized replay pack at:
+
+```text
+~/.local/share/diffscope/eval/production_replay/replay.json
+```
+
+Run it with the normal eval command:
+
+```bash
+diffscope eval --fixtures ~/.local/share/diffscope/eval/production_replay
+```
@@ -29,6 +29,28 @@ pub(in super::super) fn print_configuration(config: &Config) {
     if let Some(cw) = config.context_window {
         println!("  Context:  {cw} tokens");
     }
+    println!(
+        "  Primary Provider: {}",
+        config
+            .resolved_provider_for_role(crate::config::ModelRole::Primary)
+            .provider
+            .as_deref()
+            .unwrap_or("(auto-detect)")
+    );
+    let validation_issues = config.validation_issues();
+    if !validation_issues.is_empty() {
+        println!("  Validation:");
+        for issue in validation_issues {
+            println!(
+                "    - {}: {}",
+                match issue.level {
+                    crate::config::ConfigValidationIssueLevel::Warning => "warning",
+                    crate::config::ConfigValidationIssueLevel::Error => "error",
+                },
+                issue.message,
+            );
+        }
+    }
     println!();
 }
 
 
@@ -66,6 +66,7 @@ pub async fn eval_command(
         options.artifact_dir.as_deref(),
     );
     emit_eval_report(
+        &config,
         execution.results,
         report_output_path.as_deref(),
         prepared_options,
 
@@ -201,6 +201,7 @@ pub(super) async fn run_eval_batch(
                     .as_ref()
                     .map(|dir| dir.join("report.json"));
                 let report = materialize_eval_report(
+                    &run_config,
                     execution.results,
                     report_output_path.as_deref(),
                     prepared_options.clone(),
 
@@ -9,13 +9,21 @@ use super::super::{EvalFixtureResult, EvalReport, EvalRunMetadata};
 use super::options::PreparedEvalOptions;
 
 pub(super) async fn emit_eval_report(
+    config: &crate::config::Config,
     results: Vec<EvalFixtureResult>,
     output_path: Option<&Path>,
     prepared_options: PreparedEvalOptions,
     run_metadata: EvalRunMetadata,
 ) -> Result<()> {
-    let report =
-        materialize_eval_report(results, output_path, prepared_options, run_metadata, true).await?;
+    let report = materialize_eval_report(
+        config,
+        results,
+        output_path,
+        prepared_options,
+        run_metadata,
+        true,
+    )
+    .await?;
 
     if let Some(message) = evaluation_failure_message(&report) {
         anyhow::bail!("{}", message);
@@ -25,6 +33,7 @@ pub(super) async fn emit_eval_report(
 }
 
 pub(super) async fn materialize_eval_report(
+    config: &crate::config::Config,
     results: Vec<EvalFixtureResult>,
     output_path: Option<&Path>,
     prepared_options: PreparedEvalOptions,
@@ -48,5 +57,33 @@ pub(super) async fn materialize_eval_report(
         update_eval_quality_trend(&report, path, prepared_options.trend_max_entries).await?;
     }
 
+    if report.fixtures_failed > 0
+        || !report.warnings.is_empty()
+        || !report.threshold_failures.is_empty()
+    {
+        let trigger = if report.fixtures_failed > 0 {
+            "eval_failures"
+        } else if !report.threshold_failures.is_empty() {
+            "eval_threshold_failures"
+        } else {
+            "eval_warnings"
+        };
+        let artifact_dir = report
+            .run
+            .artifact_dir
+            .as_ref()
+            .map(std::path::PathBuf::from);
+        let manifest = crate::forensics::write_eval_forensics_bundle(
+            config,
+            crate::forensics::EvalForensicsBundleInput {
+                trigger: trigger.to_string(),
+                report: report.clone(),
+                artifact_dir,
+            },
+        )
+        .await?;
+        println!("Forensics bundle: {}", manifest.root_path);
+    }
+
     Ok(report)
 }
@@ -11,6 +11,7 @@ mod smart_review;
 
 pub(crate) use dag::{build_dag_catalog, describe_dag_graph, plan_dag_graph, DagGraphSelection};
 pub use doctor::doctor_command;
+pub(crate) use eval::EvalReport;
 pub use eval::{eval_command, EvalRunOptions};
 pub use feedback_eval::feedback_eval_command;
 pub use git::{git_command, GitCommands};