Skip to content

Commit 203f248

Browse files
committed
Fix Netdata query-wide anchors
1 parent f987eba commit 203f248

13 files changed

Lines changed: 1077 additions & 288 deletions

.agents/sow/SOW-status.md

Lines changed: 15 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -4,16 +4,7 @@ Last updated: 2026-06-25
44

55
## Current
66

7-
- SOW-0124 - Netdata Query-Wide Anchor Regression: in-progress. Test-first
8-
reproduction is active for the regression between the Netdata UI's single
9-
ordered query-level anchor contract and the SDK Netdata wrappers' current
10-
per-file anchor application plus after-the-fact timestamp merge. The current
11-
surgical SDK repair plan is reviewer-verified READY TO IMPLEMENT by glm,
12-
minimax, kimi, mimo, deepseek, and qwen after round-2 review. The plan keeps
13-
the scalar anchor API and per-file batched retrieval, removes early
14-
display-time timestamp mutation, makes non-tail backward anchors exclusive,
15-
and makes final retention include the whole cross-file boundary timestamp
16-
group.
7+
- None.
178

189
## Pending
1910

@@ -50,6 +41,20 @@ Last updated: 2026-06-25
5041
decisions. Not executable until the user explicitly resumes it.
5142
## Recently Closed Or Completed
5243

44+
- SOW-0124 - Netdata Query-Wide Anchor Regression: completed. Rust and Go
45+
Netdata SDK wrappers now preserve the current scalar `anchor` plus
46+
`pagination.column = "timestamp"` wire contract while applying anchors
47+
query-wide across selected journal files. The repair removes early
48+
display-time timestamp mutation, makes non-tail backward anchors exclusive
49+
for data-only and non-data-only requests, preserves per-file batched
50+
retrieval plus global merge, retains the full equal-timestamp boundary group,
51+
and fixes `items.after` / `items.before` reporting for boundary expansion.
52+
Option 1B is documented as future reference only, not implemented. Validation
53+
passed Rust library tests, full Go module tests, Python Netdata helper tests,
54+
rebuilt Rust and Go wrappers, the shared anchor regression runner across
55+
Rust/Go and four scenarios, whitespace checks, SOW audit, and second-round
56+
read-only reviewer votes from glm, kimi, mimo, deepseek, and qwen all voting
57+
`READY TO COMPLETE: YES`.
5358
- SOW-0121 - File-Backed Journalctl Full Parity And Ship Decision: completed
5459
after strict P0/P1/P2 reviewer-gate repair. Rust and Go portable
5560
`journalctl` now recognize the full official systemd v260.1 option/action

.agents/sow/current/SOW-0124-20260625-netdata-query-wide-anchor-regression.md renamed to .agents/sow/done/SOW-0124-20260625-netdata-query-wide-anchor-regression.md

Lines changed: 358 additions & 71 deletions
Large diffs are not rendered by default.

.agents/sow/specs/systemd-journal-plugin-facets.md

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -404,6 +404,73 @@ Duplicate visible timestamps are made unique inside one query:
404404
- backward scans decrement a duplicate timestamp (`systemd-journal-execute.h:173-179`);
405405
- forward scans increment a duplicate timestamp (`systemd-journal-execute.h:282-288`).
406406

407+
## Query-Wide Anchor Contract (SDK Netdata Boundary)
408+
409+
SOW-0124 (2026-06-25) records the SDK Netdata function paging contract that
410+
matches the Netdata UI one-anchor assumption
411+
(`netdata/cloud-frontend @ b0f9c41cfc36`):
412+
413+
- `anchor` is a microsecond timestamp scalar across all selected journal files,
414+
not a per-file offset. The UI stores one scalar `anchorAfter` /
415+
`anchorBefore` for the whole merged table and sends one scalar `anchor` on
416+
every load-more and tail poll. Consumers must treat it as an ordered scalar;
417+
they must not infer a per-file cursor from it.
418+
- The current wire shape exposes the anchor through `pagination.column =
419+
"timestamp"`. In the SDK this value is the Explorer row realtime, before the
420+
final same-file duplicate display adjustment: journal commit realtime unless
421+
an older `_SOURCE_REALTIME_TIMESTAMP` is present and selected as the effective
422+
row time. There is no separate hidden internal cursor or opaque page token in
423+
this contract.
424+
- Forward paging is strict: rows must satisfy `realtime_usec > anchor` to be
425+
eligible for the next page.
426+
- Non-tail backward paging is exclusive for all row queries, including
427+
`data_only = true` and `data_only = false`. The SDK converts a non-tail
428+
backward realtime anchor into an exclusive upper bound
429+
(`anchor - 1` microsecond) before Explorer range filtering, then clears
430+
the Explorer anchor slot. This prevents page 2 from re-fetching the
431+
boundary group that page 1 already returned.
432+
- Tail paging is exclusive on the next microsecond (`anchor + 1`).
433+
- When a page boundary lands inside an equal scalar-timestamp group across
434+
files, the SDK retains the full boundary group from the merged per-file
435+
batches even though that may return more than the requested `last`. Returning
436+
more rows in that case is preferred over making the scalar anchor skip rows
437+
that share the boundary value, because the current wire contract cannot
438+
represent a compound cursor.
439+
- The SDK sorts combined per-file rows by pre-deduplication
440+
`row.realtime_usec` / `Row.RealtimeUsec` in the requested direction, with
441+
deterministic tie-breakers by file path and cursor. The combined retention
442+
uses the pre-deduplication timestamp at the `limit - 1` index as the
443+
boundary. Backward pages retain every row whose timestamp is greater than or
444+
equal to the boundary. Forward pages retain every row whose timestamp is less
445+
than or equal to the boundary. Any same-file duplicate timestamp adjustment
446+
runs only after that boundary retention, so the cross-file boundary group
447+
stays intact.
448+
- Duplicate scalar timestamps across different journal files are preserved as
449+
equal in the response. Cross-file equal timestamps are the anchor collisions
450+
the scalar query-wide anchor must represent. Same-file duplicate timestamps
451+
still receive the existing direction-specific increment/decrement display
452+
adjustment as a small compatibility tweak inside one file only.
453+
- `items.after` / `items.before` semantics continue to indicate whether more
454+
rows are available; a page that returns more than `last` rows because of
455+
boundary-group retention must still report the correct remaining count
456+
rather than a false "no more rows" signal.
457+
- The implementation keeps per-file batched retrieval and a global merge.
458+
It does not switch to row-by-row k-way multi-file traversal. The user
459+
measured row-by-row k-way multi-file traversal as significantly slower
460+
than per-file batched query plus merge for many large sources, and the
461+
per-file batched shape remains the SDK performance contract.
462+
- The implementation does not introduce a new request key or response
463+
cursor token. The existing scalar `anchor` request shape and the existing
464+
response columns remain the wire contract.
465+
466+
The plugin's same-file duplicate display adjustment
467+
(`systemd-journal-execute.h:173-179`,
468+
`systemd-journal-execute.h:282-288`) is still preserved as a small cosmetic
469+
display tweak when the boundary group is fully retained. The change is only
470+
that the SDK no longer mutates row timestamps before range/anchor filtering,
471+
no longer mutates duplicate timestamps across files, and no longer relies on
472+
per-file exclusive anchor semantics for non-tail backward queries.
473+
407474
## File Selection And Traversal Order
408475

409476
The plugin queries one journal file at a time. It never opens a multi-file

go/journal/explorer.go

Lines changed: 12 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -278,18 +278,17 @@ type ExplorerProgress struct {
278278
}
279279

280280
type ExplorerControl struct {
281-
deadline *time.Time
282-
cancellation func() bool
283-
progress func(ExplorerProgress)
284-
candidateRow func(uint64) bool
285-
adjustRealtime func(uint64) uint64
286-
matchedRow func(uint64, uint64) bool
287-
sampling *explorerSamplingState
288-
progressEvery time.Duration
289-
started time.Time
290-
lastProgress time.Time
291-
nextCheckRows uint64
292-
stopReason ExplorerStopReason
281+
deadline *time.Time
282+
cancellation func() bool
283+
progress func(ExplorerProgress)
284+
candidateRow func(uint64) bool
285+
matchedRow func(uint64, uint64) bool
286+
sampling *explorerSamplingState
287+
progressEvery time.Duration
288+
started time.Time
289+
lastProgress time.Time
290+
nextCheckRows uint64
291+
stopReason ExplorerStopReason
293292
}
294293

295294
func NewExplorerControl() *ExplorerControl {
@@ -322,10 +321,6 @@ func (c *ExplorerControl) setCandidateRowCallback(callback func(uint64) bool) {
322321
c.candidateRow = callback
323322
}
324323

325-
func (c *ExplorerControl) setRealtimeAdjustCallback(callback func(uint64) uint64) {
326-
c.adjustRealtime = callback
327-
}
328-
329324
func (c *ExplorerControl) SetMatchedRowCallback(callback func(uint64, uint64) bool) {
330325
c.matchedRow = callback
331326
}
@@ -394,13 +389,6 @@ func (c *ExplorerControl) emitMatchedRow(realtimeUsec, rowsMatched uint64) bool
394389
return c != nil && c.matchedRow != nil && c.matchedRow(realtimeUsec, rowsMatched)
395390
}
396391

397-
func (c *ExplorerControl) adjust(realtimeUsec uint64) uint64 {
398-
if c != nil && c.adjustRealtime != nil {
399-
return c.adjustRealtime(realtimeUsec)
400-
}
401-
return realtimeUsec
402-
}
403-
404392
type explorerSamplingDecisionKind int
405393

406394
const (
@@ -2122,9 +2110,7 @@ func handleRowValueClass(valueIndex int, acc *explorerAccumulator, rowID uint64,
21222110
func acceptedEffectiveRealtime(query ExplorerQuery, scan rowScan, commitRealtime uint64, stats *ExplorerStats, control *ExplorerControl) (uint64, bool) {
21232111
effective := effectiveRealtimeFromScan(scan.timestamp, commitRealtime)
21242112
recordSourceRealtimeDelta(stats, scan.timestamp, commitRealtime)
2125-
if control != nil {
2126-
effective = control.adjust(effective)
2127-
}
2113+
_ = control
21282114
return effective, timestampInRange(query, effective) && !rowRejectedByFTS(query, scan)
21292115
}
21302116

0 commit comments

Comments
 (0)