Skip to content

Commit 6e8c02f

Browse files
committed
Fix Go lookup dispatch ceilings
1 parent c9eb329 commit 6e8c02f

7 files changed

Lines changed: 58 additions & 78 deletions

File tree

.agents/sow/current/SOW-0021-20260613-netipc-at-scale.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1044,6 +1044,11 @@ Tests or equivalent validation:
10441044
- Focused POSIX validation passed: `cmake --build build-coverage --target test_service -j12 && /usr/bin/ctest --test-dir build-coverage --output-on-failure -R '^test_service$'`; `cd src/go && go test -count=1 -timeout=300s ./pkg/netipc/service/raw -run '^TestLookupLargeResponseSplit$'`; `cargo test --manifest-path src/crates/netipc/Cargo.toml large_response_split -- --nocapture`.
10451045
- Focused Windows validation passed on `win11`: `NIPC_TEST_FILTER=large_response_split timeout 600 build-windows-focused/bin/test_win_service_extra.exe`; `cd src/go && "/c/Program Files/Go/bin/go.exe" test -count=1 -timeout=300s ./pkg/netipc/service/raw -run "^TestWinLookupLargeResponseSplit$"`; `/c/Users/costa/.cargo/bin/cargo.exe test --manifest-path src/crates/netipc/Cargo.toml large_response_split -- --nocapture`.
10461046
- Protocol-level validation also passed: C `test_protocol`, Go `./pkg/netipc/protocol`, and Rust `protocol::lookup` tests.
1047+
- Post-push Go reviewer fixes:
1048+
- External review found that Go zero-config raw servers treated the learned `MaxPayloadDefault` starting capacity as the growth ceiling, so default request/response payload learning stopped at `1024` bytes instead of the documented `MaxPayloadCap`. Fixed `src/go/pkg/netipc/service/raw/server.go` to compute growth ceilings from the original initialization config before replacing zero budgets with starting defaults.
1049+
- External review found that Go raw APPS_LOOKUP and CGROUPS_LOOKUP dispatch duplicated the protocol dispatch path and missed the protocol layer's `Finish()==0` overflow guard. Fixed `src/go/pkg/netipc/service/raw/apps_lookup.go` and `src/go/pkg/netipc/service/raw/cgroups_lookup.go` to delegate to protocol dispatchers while preserving raw-service `errHandlerFailed` semantics for typed handler failure.
1050+
- Removed the now-unused Go raw-service `lookupMinRequired()` helper after the protocol dispatchers became the single production path for lookup response minimum-size checks.
1051+
- Validation passed: POSIX `cd src/go && go test -count=1 -timeout=300s ./pkg/netipc/service/raw`; POSIX `cd src/go && go test -count=1 -timeout=300s ./pkg/netipc/protocol`; Windows `cd src/go && "/c/Program Files/Go/bin/go.exe" test -count=1 -timeout=300s ./pkg/netipc/service/raw`; Windows focused raw lookup/defaults tests also passed on `win11`.
10471052
- Note: `ctest` on `$PATH` resolved to a broken local Python wrapper missing the `cmake` module; validation used `/usr/bin/ctest`.
10481053

10491054
Real-use evidence:
@@ -1059,6 +1064,9 @@ Reviewer findings:
10591064
- External reviewer finding: local oversized request-key synthesis could run before proving the client is connected. Handled by enforcing `READY` before logical lookup work in C, Go, and Rust.
10601065
- External reviewer finding: cgroups all-local oversized handling could skip the zero-item probe path. Handled by continuing after a local oversized item only when more request items remain; otherwise the client sends the zero-item request and validates endpoint/generation behavior.
10611066
- External reviewer concern: hidden fixed payload ceilings would contradict initialization-tunable budgets. Reviewed against code and docs. `NIPC_MAX_PAYLOAD_CAP` / `MaxPayloadCap` remains a named default growth ceiling; request/response payload budgets are exposed through initialization config. Server learned-capacity growth now also honors those configured ceilings.
1067+
- External reviewer finding: Go zero-config raw-server growth ceilings were derived after applying `MaxPayloadDefault`, effectively preventing default servers from learning above `1024` bytes. Handled by deriving growth ceilings from the original config and validating default versus explicit ceiling behavior.
1068+
- External reviewer finding: Go raw lookup dispatchers duplicated protocol dispatcher logic and could diverge from protocol overflow handling. Handled by delegating raw APPS_LOOKUP and CGROUPS_LOOKUP dispatch to protocol dispatchers, preserving raw typed-handler failure semantics, and deleting the now-dead local minimum-size helper.
1069+
- One requested external review run timed out before returning a usable final report. The completed reviewer findings above were triaged against the codebase; concrete defects were fixed and validated, while non-blocking style or broader design suggestions remain subject to the existing SOW close gates.
10621070
- External reviewer finding: coverage scripts and broader adversarial matrix still need updates before SOW completion. Coverage script expansion is now implemented; C/Rust/Go coverage gates pass; representative `8192` and `32768` logical-call tests now pass in C/Rust/Go on POSIX and Windows; POSIX and Windows baseline/SHM lookup-scale interop now pass across all C/Rust/Go directed pairs, including mixed-status lookup cases and heavier `65536` stress-only runs; lookup status codec interop now proves `PAYLOAD_EXCEEDED` and `OVERSIZED_ITEM` wire parity across C/Rust/Go; Rust malformed typed lookup response parity is now covered on POSIX and Windows; malformed follow-up responses after partial progress are now covered in C/Rust/Go on POSIX and Windows; reordered and duplicate response-item corruption is now covered in C/Rust/Go on POSIX and Windows; invalid status enum, invalid status-dependent field, and invalid label-table corruption are now covered in C/Rust/Go on POSIX and Windows; huge valid label isolation is now covered in C/Rust/Go on POSIX and Windows; endpoint absence before call, endpoint disappearance after partial progress, and endpoint disappearance before the first subcall are now covered in C/Rust/Go on POSIX and Windows; zero-item typed lookup calls are now covered in C/Rust/Go on POSIX and Windows; duplicate and unsorted request keys under request splitting are now covered in C/Rust/Go on POSIX and Windows; full POSIX and Windows benchmark regenerations now pass; downstream topology-containers post-vendor validation now passes. Broader adversarial matrix review remains open.
10631071

10641072
Same-failure scan:

src/go/pkg/netipc/service/raw/apps_lookup.go

Lines changed: 10 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -204,34 +204,17 @@ func AppsLookupDispatch(handle AppsLookupHandler) DispatchHandler {
204204
return nil
205205
}
206206
return func(request []byte, responseBuf []byte) (int, error) {
207-
req, err := protocol.DecodeAppsLookupRequest(request)
208-
if err != nil {
209-
return 0, err
210-
}
211-
minRequired, err := lookupMinRequired(protocol.AppsLookupRespHdr, req.ItemCount)
212-
if err != nil {
213-
return 0, err
214-
}
215-
if len(responseBuf) < minRequired {
216-
return 0, protocol.ErrOverflow
217-
}
218-
builder := protocol.NewAppsLookupBuilder(responseBuf, req.ItemCount, 0)
219-
if req.ItemCount > 0 {
220-
lens := make([]int, req.ItemCount)
221-
for i := range lens {
222-
lens[i] = protocol.AppsLookupItemHdr + 3
223-
}
224-
builder.SetPayloadExceededItemLens(lens)
225-
}
226-
if !handle(req, builder) {
207+
handlerFailed := false
208+
n, err := protocol.DispatchAppsLookup(request, responseBuf, func(req *protocol.AppsLookupRequestView, builder *protocol.AppsLookupBuilder) bool {
209+
ok := handle(req, builder)
210+
if !ok && builder.Error() == nil {
211+
handlerFailed = true
212+
}
213+
return ok
214+
})
215+
if handlerFailed && errors.Is(err, protocol.ErrBadLayout) {
227216
return 0, errHandlerFailed
228217
}
229-
if builder.Error() != nil {
230-
return 0, builder.Error()
231-
}
232-
if builder.ItemCount() != req.ItemCount {
233-
return 0, protocol.ErrBadItemCount
234-
}
235-
return builder.Finish(), nil
218+
return n, err
236219
}
237220
}

src/go/pkg/netipc/service/raw/cgroups_lookup.go

Lines changed: 10 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ package raw
22

33
import (
44
"bytes"
5+
"errors"
56

67
"github.com/netdata/plugin-ipc/go/pkg/netipc/protocol"
78
)
@@ -268,39 +269,17 @@ func CgroupsLookupDispatch(handle CgroupsLookupHandler) DispatchHandler {
268269
return nil
269270
}
270271
return func(request []byte, responseBuf []byte) (int, error) {
271-
req, err := protocol.DecodeCgroupsLookupRequest(request)
272-
if err != nil {
273-
return 0, err
274-
}
275-
minRequired, err := lookupMinRequired(protocol.CgroupsLookupRespHdr, req.ItemCount)
276-
if err != nil {
277-
return 0, err
278-
}
279-
if len(responseBuf) < minRequired {
280-
return 0, protocol.ErrOverflow
281-
}
282-
builder := protocol.NewCgroupsLookupBuilder(responseBuf, req.ItemCount, 0)
283-
if req.ItemCount > 0 {
284-
lens := make([]int, req.ItemCount)
285-
for i := range lens {
286-
item, err := req.Item(uint32(i)) // #nosec G115 -- i is bounded by req.ItemCount from the decoded uint32 header.
287-
if err != nil {
288-
return 0, err
289-
}
290-
size := protocol.CgroupsLookupItemHdr + len(item.Bytes()) + 2
291-
lens[i] = size
272+
handlerFailed := false
273+
n, err := protocol.DispatchCgroupsLookup(request, responseBuf, func(req *protocol.CgroupsLookupRequestView, builder *protocol.CgroupsLookupBuilder) bool {
274+
ok := handle(req, builder)
275+
if !ok && builder.Error() == nil {
276+
handlerFailed = true
292277
}
293-
builder.SetPayloadExceededItemLens(lens)
294-
}
295-
if !handle(req, builder) {
278+
return ok
279+
})
280+
if handlerFailed && errors.Is(err, protocol.ErrBadLayout) {
296281
return 0, errHandlerFailed
297282
}
298-
if builder.Error() != nil {
299-
return 0, builder.Error()
300-
}
301-
if builder.ItemCount() != req.ItemCount {
302-
return 0, protocol.ErrBadItemCount
303-
}
304-
return builder.Finish(), nil
283+
return n, err
305284
}
306285
}

src/go/pkg/netipc/service/raw/lookup_common.go

Lines changed: 0 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -147,18 +147,6 @@ func checkedLookupU32(value int) (uint32, error) {
147147
return uint32(value), nil // #nosec G115 -- value is bounded by the uint32 maximum above.
148148
}
149149

150-
func lookupMinRequired(hdrSize int, itemCount uint32) (int, error) {
151-
if hdrSize < 0 {
152-
return 0, protocol.ErrOverflow
153-
}
154-
dirSize64 := uint64(itemCount) * uint64(protocol.LookupDirEntrySize)
155-
min64 := uint64(hdrSize) + dirSize64
156-
if min64 > uint64(int(^uint(0)>>1)) {
157-
return 0, protocol.ErrOverflow
158-
}
159-
return int(min64), nil
160-
}
161-
162150
func cloneLookupRawItem(item []byte) []byte {
163151
cloned := make([]byte, len(item))
164152
copy(cloned, item)

src/go/pkg/netipc/service/raw/lookup_common_test.go

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -68,12 +68,6 @@ func TestLookupCommonSizingHelpers(t *testing.T) {
6868
if _, err := checkedLookupU32(int(uint64(^uint32(0)) + 1)); !errors.Is(err, protocol.ErrOverflow) {
6969
t.Fatalf("checkedLookupU32 overflow error = %v", err)
7070
}
71-
if _, err := lookupMinRequired(-1, 1); !errors.Is(err, protocol.ErrOverflow) {
72-
t.Fatalf("lookupMinRequired negative error = %v", err)
73-
}
74-
if _, err := lookupMinRequired(maxInt, 1); !errors.Is(err, protocol.ErrOverflow) {
75-
t.Fatalf("lookupMinRequired overflow error = %v", err)
76-
}
7771
}
7872

7973
func TestLookupNextRequestSizingEdges(t *testing.T) {

src/go/pkg/netipc/service/raw/server.go

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,8 @@ func (s *Server) initCommon(
2929
if workerCount < 1 {
3030
workerCount = 1
3131
}
32+
requestPayloadGrowthCeiling := payloadGrowthCeiling(maxRequestPayloadBytes)
33+
responsePayloadGrowthCeiling := payloadGrowthCeiling(maxResponsePayloadBytes)
3234
if maxRequestPayloadBytes == 0 {
3335
maxRequestPayloadBytes = protocol.MaxPayloadDefault
3436
}
@@ -41,8 +43,8 @@ func (s *Server) initCommon(
4143
s.expectedMethodCode = expectedMethodCode
4244
s.handler = handler
4345
s.workerCount = workerCount
44-
s.requestPayloadGrowthCeiling = payloadGrowthCeiling(maxRequestPayloadBytes)
45-
s.responsePayloadGrowthCeiling = payloadGrowthCeiling(maxResponsePayloadBytes)
46+
s.requestPayloadGrowthCeiling = requestPayloadGrowthCeiling
47+
s.responsePayloadGrowthCeiling = responsePayloadGrowthCeiling
4648
s.learnedRequestPayloadBytes.Store(maxRequestPayloadBytes)
4749
s.learnedResponsePayloadBytes.Store(maxResponsePayloadBytes)
4850
}

src/go/pkg/netipc/service/raw/server_unit_test.go

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,32 @@ func TestServerCommonInitDefaultsAndWorkerSlotGuards(t *testing.T) {
9393
protocol.MaxPayloadDefault,
9494
protocol.MaxPayloadDefault)
9595
}
96+
if server.requestPayloadGrowthCeiling != protocol.MaxPayloadCap ||
97+
server.responsePayloadGrowthCeiling != protocol.MaxPayloadCap {
98+
t.Fatalf("default growth ceilings = %d/%d, want %d/%d",
99+
server.requestPayloadGrowthCeiling,
100+
server.responsePayloadGrowthCeiling,
101+
protocol.MaxPayloadCap,
102+
protocol.MaxPayloadCap)
103+
}
104+
105+
var configured Server
106+
configured.initCommon("run", "svc", protocol.MethodIncrement, nil, 4, 4096, 8192)
107+
if configured.workerCount != 4 {
108+
t.Fatalf("configured worker count = %d, want 4", configured.workerCount)
109+
}
110+
if configured.learnedRequestPayloadBytes.Load() != 4096 ||
111+
configured.learnedResponsePayloadBytes.Load() != 8192 {
112+
t.Fatalf("configured learned capacities = %d/%d, want 4096/8192",
113+
configured.learnedRequestPayloadBytes.Load(),
114+
configured.learnedResponsePayloadBytes.Load())
115+
}
116+
if configured.requestPayloadGrowthCeiling != 4096 ||
117+
configured.responsePayloadGrowthCeiling != 8192 {
118+
t.Fatalf("configured growth ceilings = %d/%d, want 4096/8192",
119+
configured.requestPayloadGrowthCeiling,
120+
configured.responsePayloadGrowthCeiling)
121+
}
96122

97123
var cleanupCalled atomic.Bool
98124
if server.retryAcceptAfter(func() { cleanupCalled.Store(true) }) {

0 commit comments

Comments
 (0)