Skip to content

Commit 41164a2

Browse files
committed
Partition replay cache by day
1 parent 2df29d7 commit 41164a2

17 files changed

Lines changed: 1361 additions & 283 deletions

File tree

docs/backend/replay.md

Lines changed: 55 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -18,60 +18,66 @@ rebuilding trails from replay has to infer leg boundaries from the recorded data
1818

1919
`identd` takes a snapshot of the aircraft present at most once per sample
2020
interval and appends it to an in-memory block that covers a fixed span of time,
21-
five minutes by default. A snapshot holds a timestamp and the set of aircraft
22-
visible at that moment. When a sample arrives that belongs to a later span, the
23-
open block is finalized and written, and a new one starts.
21+
five minutes. A snapshot holds a timestamp and the set of aircraft visible at
22+
that moment. Empty snapshots still matter because they prove the receiver was
23+
being sampled even when no aircraft were visible. When a sample arrives that
24+
belongs to a later span, the open block is finalized and written, and a new one
25+
starts.
2426

2527
The block currently being filled is not listed and not served until it rolls
26-
over. The smallest thing a viewer can load is therefore one finalized block. Both
27-
the block length and the sample interval are configurable, with the block length
28-
bounded below at one minute so a block always spans more than a single sample.
28+
over. The smallest thing a viewer can load is therefore one finalized block. The
29+
sample interval is configurable; the block duration is fixed so storage paths,
30+
cache metadata, and frontend loading all agree about the same time grid.
2931

3032
## On-disk layout
3133

32-
Blocks live in a single flat directory. Each finalized block is one
33-
zstd-compressed JSON file whose name encodes the time range it covers. An index
34-
file sits beside that directory and caches the list of blocks between restarts.
35-
36-
At startup `identd` reads the index, scans the directory, and merges the two,
37-
preferring what the scan actually finds on disk. It does not decompress every
38-
block to validate it; the file name and size are enough to build the in-memory
39-
list, and decompressing the whole corpus on a cold boot would dominate startup
40-
time on modest hardware. A block is only read from disk when a viewer asks for
41-
it. If the index is missing, unreadable, or written in a version this build does
42-
not recognize, `identd` falls back to the directory scan and records a diagnostic
43-
rather than refusing to start.
44-
45-
The block format carries its own version. A block whose version this build does
46-
not support is skipped, not deleted. Earlier behavior deleted mismatched blocks
47-
and could silently destroy recorded history across an upgrade or downgrade, so
48-
the current code never deletes a block on the basis of its version.
34+
Blocks are grouped by UTC day instead of all living in one directory. Each
35+
finalized block is one zstd-compressed JSON file whose name encodes the time
36+
range it covers. The grouping is for filesystem fanout and static serving; it is
37+
not a time-retention policy.
38+
39+
Replay keeps cache manifests next to the blocks. The root cache is intentionally
40+
small: it records the covered days and the overall range, not every block. Each
41+
day cache records the blocks for that day. A valid cache lets startup avoid
42+
walking the full tree and statting every historical file, which matters on small
43+
receiver hosts. If the cache is missing or unreadable, an operator-controlled
44+
reindex setting decides whether `identd` scans filenames to rebuild the cache or
45+
starts with replay unavailable and records a diagnostic.
46+
47+
The normal startup path trusts cache metadata. It does not decompress every block
48+
to validate it; the file name and size are enough to publish availability, and
49+
decompressing the whole corpus on a cold boot would dominate startup time on
50+
modest hardware. A block is only read from disk when a viewer asks for it. If a
51+
cached block is missing or a viewer reports that it could not be decoded, the
52+
cache is corrected for that day and a diagnostic is recorded instead of leaving
53+
the stale coverage in place.
4954

5055
## Retention
5156

52-
Two limits bound disk use, and both are required when replay is enabled:
57+
Replay is bounded by a byte budget. The operator sets the high watermark for
58+
finalized blocks. When the estimated size rises above that watermark, `identd`
59+
removes the oldest cached blocks until usage falls below a lower target.
5360

54-
- A byte budget caps the total size of finalized blocks. When the total would
55-
exceed it, the oldest blocks are removed first until the total fits. This is
56-
checked both before writing a new block and after.
57-
- An age cap sets the oldest a block may be. Blocks past that age are removed
58-
regardless of how much room the byte budget has left.
59-
60-
The two cover different failure modes. A byte budget alone does not bound how old
61-
data gets: on a quiet receiver the budget might never fill, leaving stale history
62-
around indefinitely. An age cap alone does not protect against disk exhaustion
63-
when traffic is unexpectedly heavy. Together the byte budget is the hard ceiling
64-
on space and the age cap sets the history window.
61+
Using two watermarks avoids deleting a single old block every time a new block
62+
rolls over near the limit. The tradeoff is that a cleanup pass can remove a
63+
batch of history at once. That is deliberate: it reduces metadata churn on
64+
storage that may be SD-card-backed. There is no separate age cap in this storage
65+
version, so a quiet receiver can keep old history as long as it fits inside the
66+
byte budget.
6567

6668
## Serving blocks
6769

68-
Two endpoints make replay available to the frontend. One returns a manifest: the
69-
enabled flag, the time range covered, the block length, and the list of finalized
70-
blocks with their URLs and sizes. The other serves a single block file by name,
71-
after checking the requested name against the expected pattern so a request
72-
cannot reach outside the blocks directory. Finalized blocks are served as
73-
cacheable and immutable, since a block's contents are fixed once its time range
74-
has passed.
70+
Replay exposes a dynamic manifest endpoint plus a static artifact subtree. The
71+
manifest tells the frontend which finalized blocks are available for playback.
72+
The artifact subtree contains immutable block files and cache manifests that can
73+
be served directly by a reverse proxy. Dynamic repair endpoints live outside
74+
that subtree so a deployment can hand static replay artifacts to the proxy
75+
without hiding the `identd` APIs that still need application logic.
76+
77+
When `identd` serves a block itself, it checks that the requested name has the
78+
date-partitioned shape that replay writes and that the block is present in its
79+
current cache. A name that does not match replay's own storage shape is rejected
80+
before it can become a filesystem lookup.
7581

7682
### Why blocks are negotiated, not decompressed
7783

@@ -89,15 +95,13 @@ that common case.
8995
The server resolves this by content negotiation. When the request says it accepts
9096
zstd, the server sets the encoding header and ships the raw bytes; the browser
9197
decompresses them natively and JavaScript receives JSON. When the request does
92-
not say so — the plain-HTTP browser being the case that matters — the server
93-
ships the same raw bytes with no encoding header, and the frontend decompresses
94-
them itself.
95-
It decides which path applies by inspecting the first few bytes of
96-
the body for the zstd frame signature rather than trusting a response header,
97-
because a browser strips the encoding header once it has decoded a response and a
98-
cache may surface either form for the same URL. The frontend caps the size it
99-
will expand a block to, and a decode failure shows up as a diagnostic in the
100-
notification area rather than a silent blank.
98+
not say so, the server ships the same raw bytes with no encoding header, and the
99+
frontend decompresses them itself. The frontend decides which path applies by
100+
inspecting the first few bytes of the body for the zstd frame signature rather
101+
than trusting a response header, because a browser strips the encoding header
102+
once it has decoded a response and a cache may surface either form for the same
103+
URL. The frontend caps the size it will expand a block to, and a decode failure
104+
shows up as a diagnostic in the notification area rather than a silent blank.
101105

102106
Treating a wildcard or an explicit request for no encoding as "does not accept
103107
zstd" is deliberate: a wildcard only says unlisted encodings are acceptable, not

docs/frontend/trails-replay.md

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,11 @@ immutable once written, so they are cached aggressively in the browser; the
3737
manifest itself is fetched without caching so a freshly recorded block becomes
3838
visible.
3939

40+
The scrubber treats availability as coverage, not progress. A highlighted
41+
segment means Ident has a finalized block for that part of the selected window.
42+
Gaps can appear inside the same replay window when old blocks were cleaned up or
43+
when a cache repair removed a bad block.
44+
4045
The block list is held in time order, and the code that finds which blocks
4146
cover a requested range, and that later detects gaps between them, depends on
4247
that ordering. Rather than trust the manifest to arrive sorted, the list is
@@ -52,10 +57,10 @@ up fetches that will land too late to matter.
5257

5358
Block failures are handled differently depending on the cause. Bytes that
5459
arrive but cannot be decoded as a valid block surface an error to the user and
55-
are not retried, because re-fetching the same bad bytes would not help. A
56-
failed or rejected request, by contrast, refreshes the manifest and retries,
57-
since the backend may have rotated the file and a newer manifest can point at a
58-
working URL.
60+
are reported back to `identd` so stale coverage can be repaired. A failed or
61+
rejected request, by contrast, refreshes the manifest and retries, since the
62+
backend may have rotated the file and a newer manifest can point at a working
63+
URL.
5964

6065
## Reconstructing replay trails
6166

docs/getting-started/configuration.md

Lines changed: 24 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -83,41 +83,47 @@ Disabling the restart cache keeps trails memory-only; they will be lost when
8383

8484
Replay is opt-in because it writes longer-lived history blocks. When enabled,
8585
`identd` samples live `aircraft.json`, closes one compressed block every five
86-
minutes, writes an index, and prunes old blocks by both age and byte budget.
87-
The byte budget is mandatory so a misconfigured receiver cannot fill the host
88-
disk.
86+
minutes, writes cache manifests, and prunes old blocks by byte budget. The byte
87+
budget is mandatory so a misconfigured receiver cannot fill the host disk.
8988

9089
```sh
9190
IDENT_REPLAY_ENABLE=true
9291
IDENT_REPLAY_DIR=/var/lib/ident/replay
93-
IDENT_REPLAY_RETENTION_SEC=259200
9492
IDENT_REPLAY_MAX_BYTES=524288000
95-
IDENT_REPLAY_BLOCK_SEC=300
93+
IDENT_REPLAY_CLEANUP_LOW_WATERMARK=0.90
94+
IDENT_REPLAY_CACHE_REINDEX=true
9695
IDENT_REPLAY_SAMPLE_INTERVAL_SEC=5
9796
```
9897

99-
With the example above, Ident keeps up to three days of replay data and never
100-
keeps more than 500 MiB of finalized blocks. The currently open block is not
101-
listed or served until it rolls over, so the smallest replay unit is five
102-
minutes. See [Replay history](/backend/replay) for how blocks are recorded and
103-
served.
98+
With the example above, Ident treats 500 MiB as the high watermark. When the
99+
estimated finalized replay size exceeds that value, it may delete oldest cached
100+
blocks until the estimate falls below 90% of the byte budget. The currently open
101+
block is not listed or served until it rolls over, so the smallest replay unit is
102+
five minutes. See [Replay history](/backend/replay) for how blocks are recorded
103+
and served.
104104

105105
## Serving replay blocks through a reverse proxy
106106

107-
`identd` can serve replay blocks itself through `/api/replay/blocks/*`. Replay
108-
blocks are JSON compressed with zstd. For busy public displays, put the replay
109-
directory behind the reverse proxy and let the proxy serve finalized
110-
`.json.zst` files directly:
107+
`identd` can serve replay artifacts itself through `/api/replay/blocks/*`.
108+
Replay blocks are JSON compressed with zstd, while `manifest.cache.json` files
109+
are ordinary JSON. For busy public displays, put the replay `blocks` directory
110+
behind the reverse proxy and let the proxy serve finalized artifacts directly:
111111

112112
```text
113-
@accepts_zstd header Accept-Encoding *zstd*
114-
115113
handle_path /api/replay/blocks/* {
116114
root * /var/lib/ident/replay/blocks
117-
header Content-Type application/octet-stream
115+
116+
@zstd_block path *.zst
117+
header @zstd_block Content-Type application/octet-stream
118+
header @zstd_block Cache-Control "public, max-age=31536000, immutable"
119+
120+
@accepts_zstd {
121+
path *.zst
122+
header Accept-Encoding *zstd*
123+
}
118124
header @accepts_zstd Content-Type application/json
119125
header @accepts_zstd Content-Encoding zstd
120-
header Cache-Control "public, max-age=31536000, immutable"
126+
121127
file_server
122128
}
123129

docs/operations/deployment.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -72,10 +72,11 @@ works at the root or behind a prefix without further configuration.
7272

7373
## Serving replay blocks from the proxy
7474

75-
Ident can serve finalized replay blocks itself. For a busy public display, those
76-
files can instead be served straight from disk by the reverse proxy, taking that
77-
I/O off Ident. This works because finalized blocks are immutable files on disk,
78-
but it carries a constraint that is easy to miss.
75+
Ident can serve finalized replay artifacts itself. For a busy public display,
76+
those files can instead be served straight from disk by the reverse proxy,
77+
taking that I/O off Ident. This works because finalized blocks are immutable
78+
files on disk and cache manifests are ordinary JSON, but it carries a constraint
79+
that is easy to miss.
7980

8081
The blocks are stored as raw zstd-compressed JSON, and Ident's own handler
8182
negotiates how to deliver them. When a client advertises that it accepts zstd,

docs/operations/security.md

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -45,13 +45,12 @@ cannot alter what the receiver produces.
4545
When replay is enabled, finalized blocks are served from disk by name under a
4646
fixed endpoint prefix. Two checks stand between a request and the filesystem.
4747
The requested name must match the exact shape `identd` gives its own blocks (a
48-
plain numeric pattern with a fixed extension, no path separators or relative
49-
segments), and a name that passes that check must also be present in the
50-
in-memory index of blocks `identd` has actually written. A crafted name aimed at
51-
escaping the blocks directory fails the first check; a well-formed name for a
52-
file `identd` never produced fails the second. Both paths return a not-found
53-
result before any filesystem path is built from caller input. Tests cover a
54-
traversal attempt against this endpoint.
48+
UTC day path and a fixed extension, with no relative segments), and a name that
49+
passes that check must also be present in the in-memory cache of finalized
50+
blocks. A crafted name aimed at escaping the blocks directory fails the first
51+
check; a well-formed name for a file `identd` never produced fails the second.
52+
Both paths return a not-found result before caller input can choose an arbitrary
53+
filesystem path.
5554

5655
## Outbound network access
5756

ident/src/data/replay.test.ts

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -574,7 +574,17 @@ describe("replay data loading", () => {
574574
await refreshReplayManifest();
575575
await ensureReplayRange(120_000, 180_000, { background: true });
576576

577-
expect(globalThis.fetch).toHaveBeenCalledTimes(3);
577+
expect(globalThis.fetch).toHaveBeenCalledTimes(4);
578+
expect(globalThis.fetch).toHaveBeenCalledWith(
579+
"/ident/api/replay/block-failure",
580+
expect.objectContaining({
581+
method: "POST",
582+
body: JSON.stringify({
583+
url: "/api/replay/blocks/120000-180000.json.zst",
584+
reason: "decode_failed",
585+
}),
586+
}),
587+
);
578588
expect(warn).not.toHaveBeenCalledWith(
579589
"[ident replay] background block load failed",
580590
expect.any(Error),

ident/src/data/replay.ts

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -212,6 +212,7 @@ function loadReplayBlock(block: ReplayBlockIndex): Promise<void> | null {
212212
throw err;
213213
}
214214
if (err instanceof ReplayBlockFormatError) {
215+
void reportReplayBlockFailure(block.url, "decode_failed");
215216
emitFrontendDiagnostic({
216217
severity: "warning",
217218
channel: "frontend.replay",
@@ -221,6 +222,7 @@ function loadReplayBlock(block: ReplayBlockIndex): Promise<void> | null {
221222
throw err;
222223
}
223224
if (err instanceof ReplayBlockBodyError) {
225+
void reportReplayBlockFailure(block.url, "decode_failed");
224226
emitFrontendDiagnostic({
225227
severity: "warning",
226228
channel: "frontend.replay",
@@ -249,6 +251,23 @@ function loadReplayBlock(block: ReplayBlockIndex): Promise<void> | null {
249251
return load;
250252
}
251253

254+
async function reportReplayBlockFailure(
255+
url: string,
256+
reason: "decode_failed" | "missing",
257+
): Promise<void> {
258+
try {
259+
await fetch(appPath("api/replay/block-failure"), {
260+
method: "POST",
261+
headers: { "Content-Type": "application/json" },
262+
cache: "no-store",
263+
body: JSON.stringify({ url, reason }),
264+
});
265+
} catch {
266+
// Best-effort cache repair signal only; replay loading already reports
267+
// the user-visible diagnostic above.
268+
}
269+
}
270+
252271
function abortStaleBlockLoads(
253272
manifestBlocks: ReplayBlockIndex[],
254273
requestedBlocks: ReplayBlockIndex[],

0 commit comments

Comments
 (0)