Commit 1bed9ea
authored
fix(cache): atomic writes to prevent torn-read deletion of valid cache entries (#1056)
## Summary
Fixes the failing `main` build caused by a ~10% flaky unit test
(`response-cache.test.ts` → `--fresh round-trip: stale entry is replaced
by
fresh response`) that exposed a **real production bug** in the HTTP
response
cache.
## Root cause
In `src/lib/response-cache.ts`, each cache write fires `cleanupCache()`
**fire-and-forget** at 10% probability (`CLEANUP_PROBABILITY = 0.1`). A
second
write to the **same key** overwrote the file with a **non-atomic**
`writeFile`.
The first write's async cleanup could then read the file
**mid-overwrite**
(torn read), fail to `JSON.parse` the truncated content, and **delete
it** as
"corrupted" in `collectEntryMetadata` — silently losing a valid cache
entry.
**Proof:** A 300-iteration repro failed ~10% of the time (matching
`CLEANUP_PROBABILITY`), and on every miss the cache directory was
**empty** —
the valid entry was deleted, not expired.
This is a genuine correctness bug beyond tests: any two rapid writes to
the same
cache key (paginated fetches, concurrent CLI invocations sharing the
cache dir)
can race so cleanup's torn read deletes a valid entry.
## Why PR checks didn't catch it
The flake is ~10%, so any single CI "Unit Tests" run passes ~90% of the
time.
PR #1051's Unit Tests check landed in the lucky 90%; the post-merge
`main` run
hit the unlucky 10%. The bug is **pre-existing** — the cleanup code
predates
#1051, which never touched `response-cache.ts`.
## Fix
1. **Atomic writes** — new `atomicWriteCacheFile()` writes to a unique
temp file
(`<key>.<pid>.<uuid>.tmp`) then `rename`s it into place. `rename` is
atomic on
POSIX (same filesystem) and near-atomic on Windows (same volume), so a
concurrent reader sees a complete old or new file, never a torn one.
2. **Cleanup hardening** — `collectEntryMetadata()` now separates
failure modes:
- **transient read failure** (locking, AV scanner, ENOENT from a
concurrent
sweep) → **skip**, never delete.
- **fully read but unparseable** → genuine corruption (atomic writes
preclude
torn reads), so the file is marked expired and reclaimed by
`deleteExpiredEntries`. This keeps corrupt files visible to
`MAX_CACHE_ENTRIES` eviction instead of leaking past the count.
3. **Temp-file sweep** — `cleanupCache()` removes orphaned `.tmp` files
older
than 60s, so a crash between `writeFile` and `rename` can't leak temp
files.
4. **Diagnostics** — previously-silent catch blocks now `log.debug()`
the
suppressed error (per AGENTS.md no-silent-catch rule).
## Verification
- Amplified repro: **300/300 pass** (was ~10% failing before).
- Actual test file: **10/10 runs green** (36 tests, +2 new regression
tests).
- `pnpm run lint`: clean. `pnpm exec tsc --noEmit`: exit 0.
- Full unit suite: 325 files, all passing (one unrelated pre-existing
property
flake noted below).
## Out of scope (follow-up)
During full-suite runs I observed an unrelated, pre-existing flaky
**property**
test: `auto-paginate.property.test.ts:94` (counterexample `[393,392,98]`
— an
`autoPaginate` `nextCursor` trim bug when `total > limit`). It passes in
isolation and is independent of this change. Tracked as a follow-up.1 parent 41f8aa2 commit 1bed9ea
2 files changed
Lines changed: 151 additions & 8 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
17 | | - | |
| 17 | + | |
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
| 22 | + | |
22 | 23 | | |
| 24 | + | |
23 | 25 | | |
24 | 26 | | |
25 | 27 | | |
| |||
30 | 32 | | |
31 | 33 | | |
32 | 34 | | |
| 35 | + | |
33 | 36 | | |
34 | 37 | | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
35 | 41 | | |
36 | 42 | | |
37 | 43 | | |
| |||
582 | 588 | | |
583 | 589 | | |
584 | 590 | | |
| 591 | + | |
| 592 | + | |
| 593 | + | |
| 594 | + | |
| 595 | + | |
| 596 | + | |
| 597 | + | |
| 598 | + | |
| 599 | + | |
| 600 | + | |
| 601 | + | |
| 602 | + | |
| 603 | + | |
| 604 | + | |
| 605 | + | |
| 606 | + | |
| 607 | + | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
| 619 | + | |
| 620 | + | |
| 621 | + | |
| 622 | + | |
585 | 623 | | |
586 | 624 | | |
587 | 625 | | |
| |||
634 | 672 | | |
635 | 673 | | |
636 | 674 | | |
637 | | - | |
| 675 | + | |
638 | 676 | | |
639 | 677 | | |
640 | 678 | | |
| |||
736 | 774 | | |
737 | 775 | | |
738 | 776 | | |
| 777 | + | |
| 778 | + | |
| 779 | + | |
| 780 | + | |
| 781 | + | |
| 782 | + | |
| 783 | + | |
| 784 | + | |
739 | 785 | | |
740 | 786 | | |
741 | 787 | | |
| |||
750 | 796 | | |
751 | 797 | | |
752 | 798 | | |
| 799 | + | |
| 800 | + | |
| 801 | + | |
| 802 | + | |
| 803 | + | |
| 804 | + | |
| 805 | + | |
| 806 | + | |
| 807 | + | |
| 808 | + | |
| 809 | + | |
| 810 | + | |
| 811 | + | |
| 812 | + | |
| 813 | + | |
| 814 | + | |
| 815 | + | |
| 816 | + | |
| 817 | + | |
| 818 | + | |
| 819 | + | |
| 820 | + | |
| 821 | + | |
| 822 | + | |
| 823 | + | |
| 824 | + | |
| 825 | + | |
| 826 | + | |
| 827 | + | |
| 828 | + | |
753 | 829 | | |
754 | 830 | | |
755 | 831 | | |
| |||
769 | 845 | | |
770 | 846 | | |
771 | 847 | | |
| 848 | + | |
| 849 | + | |
| 850 | + | |
| 851 | + | |
| 852 | + | |
| 853 | + | |
| 854 | + | |
| 855 | + | |
| 856 | + | |
| 857 | + | |
| 858 | + | |
| 859 | + | |
| 860 | + | |
| 861 | + | |
| 862 | + | |
| 863 | + | |
| 864 | + | |
772 | 865 | | |
773 | | - | |
774 | 866 | | |
775 | 867 | | |
776 | 868 | | |
777 | 869 | | |
778 | 870 | | |
779 | 871 | | |
780 | 872 | | |
781 | | - | |
782 | | - | |
783 | | - | |
784 | | - | |
785 | | - | |
| 873 | + | |
| 874 | + | |
| 875 | + | |
| 876 | + | |
| 877 | + | |
| 878 | + | |
786 | 879 | | |
787 | 880 | | |
788 | 881 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
470 | 470 | | |
471 | 471 | | |
472 | 472 | | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
473 | 523 | | |
474 | 524 | | |
475 | 525 | | |
| |||
0 commit comments