Skip to content

Commit c794c25

Browse files
committed
Mutation testing, cost visibility, CI, and docs
- storage_json: tests for refresh_summary, get_event_stats (empty/single/by_repo), prune boundary - docs/mutation-testing.md: how to run, equivalent mutants, pre-push vs CI, file globs - CI: mutation job on *storage_json* with timeout and missed-count baseline (15) - README/CLAUDE: pre-push vs CI mutation note; optional --no-verify - Analytics: show est. cost (reviews) from useEventStats when available - web: cost.test.ts for formatCost, estimateCost, totalCost Made-with: Cursor
1 parent 2f33f20 commit c794c25

File tree

7 files changed

+322
-5
lines changed

7 files changed

+322
-5
lines changed

.github/workflows/ci.yml

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,3 +66,27 @@ jobs:
6666
- uses: Swatinem/rust-cache@v2
6767
- name: Test
6868
run: cargo test
69+
70+
mutation:
71+
runs-on: ubuntu-latest
72+
steps:
73+
- uses: actions/checkout@v5
74+
- uses: dtolnay/rust-toolchain@1.88.0
75+
- uses: Swatinem/rust-cache@v2
76+
- name: Install cargo-mutants
77+
run: cargo install cargo-mutants --locked
78+
- name: Mutation test (storage_json)
79+
run: |
80+
timeout 900 cargo mutants -f '*storage_json*' 2>&1 | tee mutation.log || true
81+
MISSED=$(grep -E '[0-9]+ missed' mutation.log | tail -1 | grep -oE '[0-9]+' | head -1 || echo "0")
82+
echo "missed=$MISSED" >> "$GITHUB_OUTPUT"
83+
id: mutation
84+
- name: Check mutation baseline
85+
run: |
86+
MISSED="${{ steps.mutation.outputs.missed }}"
87+
BASELINE=15
88+
if [ -n "$MISSED" ] && [ "$MISSED" -gt "$BASELINE" ]; then
89+
echo "Mutation missed count $MISSED exceeds baseline $BASELINE. Update docs/mutation-testing.md and this baseline if intentional."
90+
exit 1
91+
fi
92+
if: always() && steps.mutation.outputs.missed != ''

CLAUDE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ cargo run -- --help # CLI usage
2222
- `examples/` — Usage examples
2323

2424
## Development
25-
- Run `bash scripts/install-hooks.sh` once to install pre-commit and pre-push hooks (merge conflict check, trailing newline, conditional Rust/web checks, shellcheck; pre-push adds version sync, cargo audit, full web build+test).
25+
- Run `bash scripts/install-hooks.sh` once to install pre-commit and pre-push hooks (merge conflict check, trailing newline, conditional Rust/web checks, shellcheck; pre-push adds version sync, cargo audit, full web build+test). Mutation testing runs in CI only; see `docs/mutation-testing.md`.
2626

2727
## Conventions
2828
- Use frontier models for reviews — never default to smaller models

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1017,7 +1017,7 @@ bash scripts/install-hooks.sh
10171017
- When `web/` is staged: `npm run lint`, `tsc -b`, and `npm run test` in `web/`
10181018
- When `scripts/` or `.githooks/` change: `shellcheck` (if installed)
10191019

1020-
**Pre-push** runs the full gate: workflow check, version sync (`Cargo.toml` vs git tags), `cargo fmt`, `cargo clippy`, `cargo audit` (if installed), `npm ci && npm run build && npm run test` in `web/`, and `cargo test`. The first push after clone may take longer due to `npm ci`.
1020+
**Pre-push** runs the full gate: workflow check, version sync (`Cargo.toml` vs git tags), `cargo fmt`, `cargo clippy`, `cargo audit` (if installed), `npm ci && npm run build && npm run test` in `web/`, and `cargo test`. The first push after clone may take longer due to `npm ci`. Mutation testing runs in CI only (see `docs/mutation-testing.md`), not on pre-push, to keep pushes fast. For a quick push without full checks use `git push --no-verify` (use sparingly).
10211021

10221022
## Supported Platforms
10231023

docs/mutation-testing.md

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
# Mutation testing
2+
3+
We use [cargo-mutants](https://github.com/sourcefrog/cargo-mutants) to find gaps in test coverage. Mutants are small code changes (e.g. `*``/`, `>``>=`); if tests still pass, the mutant is "missed" and the behavior is under-tested.
4+
5+
## Running
6+
7+
```bash
8+
# All mutants (slow; 6000+)
9+
cargo mutants
10+
11+
# One file / path (faster; glob)
12+
cargo mutants -f '*storage_json*'
13+
cargo mutants -f '*state*'
14+
cargo mutants -f '*storage_pg*'
15+
cargo mutants -f '*cost*'
16+
```
17+
18+
CI runs mutation on `storage_json` with a timeout; see [CI job](#ci) below.
19+
20+
## Known equivalent / accepted mutants
21+
22+
These mutants are either equivalent (same behavior) or accepted by design. CI does not fail on them.
23+
24+
### cost.rs
25+
26+
| Location | Mutation | Reason |
27+
|----------|----------|--------|
28+
| ~line 43 | `* FALLBACK_PRICE_PER_M``/ FALLBACK_PRICE_PER_M` | With `FALLBACK_PRICE_PER_M = 1.0`, `* 1` and `/ 1` are equivalent. No test can distinguish without changing the constant. |
29+
30+
### storage_json.rs
31+
32+
After adding targeted tests (refresh_summary, get_event_stats exact aggregates/single-event/by_repo, prune boundary), many previously missed mutants are now killed. Any remaining misses in these areas are documented here after a run:
33+
34+
- **refresh_summary**: Condition `summary.is_some() \|\| !comments.is_empty()` — tests now assert synthesized summary when comments exist and no summary.
35+
- **get_event_stats**: Percentile index, avg formulas, by_model/by_repo — tests assert exact values (e.g. p50/p95/p99, avg_score).
36+
- **prune**: Boundary `now - started_at > max_age_secs` — test asserts review exactly at boundary is not pruned; max_count test asserts oldest removed.
37+
38+
If new mutants appear in these regions, add assertions that would fail on the wrong operator/formula, or add them to this table with a one-line rationale.
39+
40+
## Pre-push vs CI
41+
42+
- **Pre-push** (`.githooks/pre-push`): Runs unit tests, `cargo audit`, web build+test. Does *not* run mutation (too slow for every push).
43+
- **CI mutation job**: Runs `cargo mutants -f storage_json` (or one crate) on a schedule or on PRs to main. Fails if "missed" count increases beyond the baseline in this doc.
44+
- For a quick local push without full checks: `git push --no-verify` (use sparingly; CI will still run).
45+
46+
## CI
47+
48+
The `mutation` job in `.github/workflows/ci.yml` runs mutation on the `storage_json` crate with a timeout. Update the "allowed missed" baseline in the job when you intentionally accept new equivalent mutants (and add them to the table above).

src/server/storage_json.rs

Lines changed: 168 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1099,6 +1099,79 @@ mod tests {
10991099
assert!(stats.avg_score.is_none());
11001100
assert_eq!(stats.error_rate, 0.0);
11011101
assert_eq!(stats.p50_latency_ms, 0);
1102+
assert_eq!(stats.p95_latency_ms, 0);
1103+
assert_eq!(stats.p99_latency_ms, 0);
1104+
assert!(stats.by_model.is_empty());
1105+
assert!(stats.by_source.is_empty());
1106+
assert!(stats.by_repo.is_empty());
1107+
assert!(stats.daily_counts.is_empty());
1108+
assert_eq!(stats.total_cost_estimate, 0.0);
1109+
}
1110+
1111+
/// Single event: percentiles and avg must equal that single value (catches percentile/division mutants).
1112+
#[tokio::test]
1113+
async fn test_get_event_stats_single_event_exact_values() {
1114+
let dir = tempfile::tempdir().unwrap();
1115+
let backend = JsonStorageBackend::new(&dir.path().join("reviews.json"));
1116+
1117+
let mut s = make_session_with_event(
1118+
"r1",
1119+
now_ts(),
1120+
ReviewStatus::Complete,
1121+
"review.completed",
1122+
"gpt-4o",
1123+
"head",
1124+
150,
1125+
);
1126+
s.event.as_mut().unwrap().overall_score = Some(7.0);
1127+
backend.save_review(&s).await.unwrap();
1128+
1129+
let stats = backend
1130+
.get_event_stats(&EventFilters::default())
1131+
.await
1132+
.unwrap();
1133+
1134+
assert_eq!(stats.total_reviews, 1);
1135+
assert_eq!(stats.avg_duration_ms, 150.0);
1136+
assert_eq!(stats.p50_latency_ms, 150, "single value is p50/p95/p99");
1137+
assert_eq!(stats.p95_latency_ms, 150);
1138+
assert_eq!(stats.p99_latency_ms, 150);
1139+
assert_eq!(stats.avg_score, Some(7.0));
1140+
}
1141+
1142+
/// By-repo avg_score: two events same repo (4.0 + 6.0) / 2 = 5.0.
1143+
#[tokio::test]
1144+
async fn test_get_event_stats_by_repo_avg_score_exact() {
1145+
let dir = tempfile::tempdir().unwrap();
1146+
let backend = JsonStorageBackend::new(&dir.path().join("reviews.json"));
1147+
1148+
let now = now_ts();
1149+
for (i, &score) in [4.0_f32, 6.0_f32].iter().enumerate() {
1150+
let mut s = make_session_with_event(
1151+
&format!("r{}", i),
1152+
now + i as i64,
1153+
ReviewStatus::Complete,
1154+
"review.completed",
1155+
"gpt-4o",
1156+
"head",
1157+
100,
1158+
);
1159+
if let Some(ref mut e) = s.event {
1160+
e.github_repo = Some("org/same-repo".to_string());
1161+
e.overall_score = Some(score);
1162+
}
1163+
backend.save_review(&s).await.unwrap();
1164+
}
1165+
1166+
let stats = backend
1167+
.get_event_stats(&EventFilters::default())
1168+
.await
1169+
.unwrap();
1170+
1171+
assert_eq!(stats.by_repo.len(), 1);
1172+
assert_eq!(stats.by_repo[0].repo, "org/same-repo");
1173+
assert_eq!(stats.by_repo[0].count, 2);
1174+
assert_eq!(stats.by_repo[0].avg_score, Some(5.0), " (4+6)/2 ");
11021175
}
11031176

11041177
#[tokio::test]
@@ -1346,6 +1419,101 @@ mod tests {
13461419
assert_eq!(stats.by_model[0].avg_duration_ms, 200.0);
13471420
}
13481421

1422+
/// refresh_summary: get_review with comments but no summary produces synthesized summary.
1423+
#[tokio::test]
1424+
async fn test_get_review_refreshes_summary_when_comments_no_summary() {
1425+
let dir = tempfile::tempdir().unwrap();
1426+
let backend = JsonStorageBackend::new(&dir.path().join("reviews.json"));
1427+
1428+
let mut session = make_session("ref-sum", now_ts(), ReviewStatus::Complete);
1429+
session.comments = vec![
1430+
make_comment("c1", "src/a.rs"),
1431+
make_comment("c2", "src/b.rs"),
1432+
];
1433+
session.summary = None;
1434+
backend.save_review(&session).await.unwrap();
1435+
1436+
let loaded = backend.get_review("ref-sum").await.unwrap().unwrap();
1437+
assert!(
1438+
loaded.summary.is_some(),
1439+
"refresh_summary should synthesize summary when comments exist"
1440+
);
1441+
let sum = loaded.summary.unwrap();
1442+
assert_eq!(sum.total_comments, 2);
1443+
}
1444+
1445+
/// refresh_summary: list_reviews returns sessions with refreshed summary when comments exist.
1446+
#[tokio::test]
1447+
async fn test_list_reviews_refreshes_summary_for_sessions_with_comments() {
1448+
let dir = tempfile::tempdir().unwrap();
1449+
let backend = JsonStorageBackend::new(&dir.path().join("reviews.json"));
1450+
1451+
let mut with_comments = make_session("with-c", now_ts(), ReviewStatus::Complete);
1452+
with_comments.comments = vec![make_comment("c1", "src/main.rs")];
1453+
with_comments.summary = None;
1454+
backend.save_review(&with_comments).await.unwrap();
1455+
1456+
backend
1457+
.save_review(&make_session("no-c", now_ts() - 1, ReviewStatus::Complete))
1458+
.await
1459+
.unwrap();
1460+
1461+
let list = backend.list_reviews(10, 0).await.unwrap();
1462+
let with_c = list.iter().find(|s| s.id == "with-c").unwrap();
1463+
assert!(
1464+
with_c.summary.is_some(),
1465+
"list_reviews should refresh summary for session with comments"
1466+
);
1467+
}
1468+
1469+
/// Prune age boundary: review with started_at exactly (now - max_age_secs) must NOT be expired (> not >=).
1470+
#[tokio::test]
1471+
async fn test_prune_age_boundary_not_expired() {
1472+
let dir = tempfile::tempdir().unwrap();
1473+
let backend = JsonStorageBackend::new(&dir.path().join("reviews.json"));
1474+
1475+
let now = now_ts();
1476+
let max_age = 100_i64;
1477+
let session = make_session("boundary", now - max_age, ReviewStatus::Complete);
1478+
backend.save_review(&session).await.unwrap();
1479+
1480+
let removed = backend.prune(max_age, 1000).await.unwrap();
1481+
assert_eq!(
1482+
removed, 0,
1483+
"exactly at boundary (now - max_age) should not be pruned"
1484+
);
1485+
1486+
let list = backend.list_reviews(10, 0).await.unwrap();
1487+
assert_eq!(list.len(), 1);
1488+
assert_eq!(list[0].id, "boundary");
1489+
}
1490+
1491+
/// Prune max_count: keep newest max_count completed, remove oldest.
1492+
#[tokio::test]
1493+
async fn test_prune_max_count_removes_oldest() {
1494+
let dir = tempfile::tempdir().unwrap();
1495+
let backend = JsonStorageBackend::new(&dir.path().join("reviews.json"));
1496+
1497+
let base = now_ts() - 10_000;
1498+
for i in 0..5 {
1499+
let s = make_session(&format!("r{}", i), base + i as i64, ReviewStatus::Complete);
1500+
backend.save_review(&s).await.unwrap();
1501+
}
1502+
1503+
let removed = backend.prune(1_000_000, 3).await.unwrap();
1504+
assert_eq!(removed, 2, "5 - 3 = 2 removed by count limit");
1505+
1506+
let list = backend.list_reviews(10, 0).await.unwrap();
1507+
assert_eq!(list.len(), 3);
1508+
let ids: Vec<_> = list.iter().map(|s| s.id.as_str()).collect();
1509+
let want = ["r2", "r3", "r4"];
1510+
assert!(
1511+
want.iter().all(|w| ids.contains(w)),
1512+
"newest 3 (r2,r3,r4) kept; r0,r1 removed, got {:?}",
1513+
ids
1514+
);
1515+
}
1516+
13491517
/// time_to filter: event with created_at == time_to must be included (<=).
13501518
#[tokio::test]
13511519
async fn test_list_events_time_to_inclusive() {

web/src/lib/__tests__/cost.test.ts

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
import { describe, expect, it } from 'vitest'
2+
import { estimateCost, formatCost, totalCost } from '../cost'
3+
import type { ReviewEvent } from '../../api/types'
4+
5+
function makeEvent(overrides: Partial<ReviewEvent> = {}): ReviewEvent {
6+
return {
7+
review_id: 'e1',
8+
event_type: 'review.completed',
9+
model: 'gpt-4o',
10+
diff_source: 'head',
11+
duration_ms: 1000,
12+
diff_bytes: 0,
13+
diff_files_total: 0,
14+
diff_files_reviewed: 0,
15+
diff_files_skipped: 0,
16+
comments_total: 0,
17+
comments_by_severity: {},
18+
comments_by_category: {},
19+
hotspots_detected: 0,
20+
high_risk_files: 0,
21+
github_posted: false,
22+
tokens_total: 100_000,
23+
...overrides,
24+
}
25+
}
26+
27+
describe('formatCost', () => {
28+
it('formats zero as $0', () => {
29+
expect(formatCost(0)).toBe('$0')
30+
})
31+
it('formats small values with enough precision', () => {
32+
expect(formatCost(0.0001)).toBe('<$0.001')
33+
expect(formatCost(0.005)).toBe('$0.0050')
34+
expect(formatCost(0.5)).toBe('$0.500')
35+
})
36+
it('formats dollars with two decimals', () => {
37+
expect(formatCost(1.5)).toBe('$1.50')
38+
expect(formatCost(10)).toBe('$10.00')
39+
})
40+
})
41+
42+
describe('estimateCost', () => {
43+
it('prefers server cost_estimate_usd when present', () => {
44+
const e = makeEvent({ cost_estimate_usd: 0.42 })
45+
expect(estimateCost(e)).toBe(0.42)
46+
})
47+
it('uses client estimate when cost_estimate_usd is absent', () => {
48+
const e = makeEvent({ cost_estimate_usd: undefined, tokens_total: 1_000_000 })
49+
expect(estimateCost(e)).toBeGreaterThan(0)
50+
})
51+
it('returns 0 when tokens_total is 0 and no server cost', () => {
52+
const e = makeEvent({ tokens_total: 0, cost_estimate_usd: undefined })
53+
expect(estimateCost(e)).toBe(0)
54+
})
55+
})
56+
57+
describe('totalCost', () => {
58+
it('sums server cost when present', () => {
59+
const events = [
60+
makeEvent({ cost_estimate_usd: 0.1 }),
61+
makeEvent({ cost_estimate_usd: 0.2 }),
62+
]
63+
expect(totalCost(events)).toBeCloseTo(0.3)
64+
})
65+
it('returns 0 for empty list', () => {
66+
expect(totalCost([])).toBe(0)
67+
})
68+
})

web/src/pages/Analytics.tsx

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ import {
44
AreaChart, Area, BarChart, Bar,
55
ResponsiveContainer, XAxis, YAxis, Tooltip, CartesianGrid,
66
} from 'recharts'
7-
import { useAnalyticsTrends, useReviews } from '../api/hooks'
7+
import { useAnalyticsTrends, useEventStats, useReviews } from '../api/hooks'
88
import { AlertTriangle, Download, Loader2 } from 'lucide-react'
99
import type { FeedbackEvalTrendGap } from '../api/types'
1010
import {
@@ -17,6 +17,7 @@ import {
1717
formatDurationHours,
1818
formatPercent,
1919
} from '../lib/analytics'
20+
import { formatCost } from '../lib/cost'
2021
import { scoreColorClass } from '../lib/scores'
2122
import { SEV_COLORS, CHART_THEME } from '../lib/constants'
2223

@@ -120,6 +121,7 @@ export function Analytics() {
120121
const navigate = useNavigate()
121122
const { data: reviews, isLoading } = useReviews()
122123
const { data: trends, isLoading: trendsLoading } = useAnalyticsTrends()
124+
const { data: eventStats } = useEventStats()
123125
const [drilldownSelection, setDrilldownSelection] = useState<
124126
{ type: 'review'; reviewId: string }
125127
| { type: 'category'; category: string }
@@ -290,8 +292,15 @@ export function Analytics() {
290292

291293
{hasReviewAnalytics && (
292294
<>
293-
<div className="text-[10px] font-semibold text-text-muted tracking-[0.08em] font-code mb-3">
294-
REVIEW ANALYTICS
295+
<div className="flex items-center gap-4 mb-3">
296+
<div className="text-[10px] font-semibold text-text-muted tracking-[0.08em] font-code">
297+
REVIEW ANALYTICS
298+
</div>
299+
{eventStats != null && (
300+
<div className="text-[11px] text-text-muted font-code">
301+
Est. cost (reviews): <span className="text-text-primary font-medium">{formatCost(eventStats.total_cost_estimate)}</span>
302+
</div>
303+
)}
295304
</div>
296305

297306
<div className="grid grid-cols-2 gap-3 mb-6">

0 commit comments

Comments
 (0)