Skip to content

Commit f46913e

Browse files
committed
Add huge-label lookup scale coverage
1 parent 0d712e9 commit f46913e

7 files changed

Lines changed: 253 additions & 62 deletions

File tree

.agents/sow/current/SOW-0021-20260613-netipc-at-scale.md

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -881,6 +881,18 @@ Recorded user decisions:
881881
- C Windows on `/tmp/plugin-ipc-sow0021-20260614052244`: `cmake --build build-windows-focused --target test_win_service_extra -j4` passed and `NIPC_TEST_FILTER=malformed_first_response timeout 600 build-windows-focused/bin/test_win_service_extra.exe` passed with `84 passed, 0 failed`.
882882
- Go Windows on `/tmp/plugin-ipc-sow0021-20260614052244`: `cd src/go && "/c/Program Files/Go/bin/go.exe" test -count=1 -timeout=180s ./pkg/netipc/service/raw -run '^TestWin(Apps|Cgroups)LookupRejectsMalformedTypedResponses$'` passed.
883883
- Rust Windows on `/tmp/plugin-ipc-sow0021-20260614052244`: `/c/Users/costa/.cargo/bin/cargo.exe test --manifest-path src/crates/netipc/Cargo.toml test_lookup_rejects_malformed_typed_responses_windows -- --nocapture` passed.
884+
- Closed the huge-valid-metadata oversized-item gap:
885+
- Existing Level 2 transparent overflow tests covered a huge APPS_LOOKUP cgroup path and a huge CGROUPS_LOOKUP name, but not a huge valid label.
886+
- Extended C, Rust, and Go POSIX/Windows transparent `PAYLOAD_EXCEEDED` retry tests so each logical request contains a normal item, an oversized path/name item, an oversized label item, and a trailing normal item.
887+
- Expected result in every language/platform: both huge items return `OVERSIZED_ITEM`, the trailing item still returns `KNOWN`, and the logical call hides intermediate `PAYLOAD_EXCEEDED` outcomes from Level 2 consumers.
888+
- The cgroups test response budget is `256` bytes. This remains far below the 512-byte huge name/label payloads but leaves enough room for compact `PAYLOAD_EXCEEDED` and `OVERSIZED_ITEM` control records, so the test exercises scale handling rather than an impossible control-response buffer.
889+
- Validated huge-valid-metadata oversized-item isolation:
890+
- C POSIX: `cmake --build build-coverage --target test_service -j12 && /usr/bin/ctest --test-dir build-coverage --output-on-failure -R '^test_service$'` passed.
891+
- Go POSIX: `cd src/go && go test -count=1 -timeout=180s ./pkg/netipc/service/raw -run 'Test(Cgroups|Apps)LookupTransparentPayloadExceededRetry'` passed.
892+
- Rust POSIX: `cargo test --manifest-path src/crates/netipc/Cargo.toml transparent_payload_exceeded -- --nocapture` passed.
893+
- C Windows on `/tmp/plugin-ipc-sow0021-20260614052244`: `cmake --build build-windows-focused --target test_win_service_extra -j4` passed and `NIPC_TEST_FILTER=test_lookup_payload_exceeded_retry timeout 600 build-windows-focused/bin/test_win_service_extra.exe` passed with `36 passed, 0 failed`.
894+
- Go Windows on `/tmp/plugin-ipc-sow0021-20260614052244`: `cd src/go && "/c/Program Files/Go/bin/go.exe" test -count=1 -timeout=180s ./pkg/netipc/service/raw -run "^TestWin(Cgroups|Apps)LookupTransparentPayloadExceededRetry$"` passed.
895+
- Rust Windows on `/tmp/plugin-ipc-sow0021-20260614052244`: `/c/Users/costa/.cargo/bin/cargo.exe test --manifest-path src/crates/netipc/Cargo.toml transparent_payload_exceeded -- --nocapture` passed.
884896

885897
## Validation
886898

@@ -986,6 +998,11 @@ Tests or equivalent validation:
986998
- C, Rust, and Go POSIX tests validate APPS_LOOKUP and CGROUPS_LOOKUP fail the whole logical call when a response item advertises more labels than its encoded item contains.
987999
- C, Rust, and Go Windows tests validate the same APPS_LOOKUP and CGROUPS_LOOKUP malformed status/table cases.
9881000
- Latest focused evidence: C POSIX `ctest` `test_service` passed; C Windows focused filter `84 passed, 0 failed`; Go/Rust POSIX and Windows focused malformed-response tests passed.
1001+
- Oversized valid metadata isolation validation:
1002+
- C, Rust, and Go POSIX tests validate APPS_LOOKUP and CGROUPS_LOOKUP transparent `PAYLOAD_EXCEEDED` retry when a logical response contains two different oversized valid items: one huge name/path item and one huge label item.
1003+
- C, Rust, and Go Windows tests validate the same APPS_LOOKUP and CGROUPS_LOOKUP huge valid label isolation case.
1004+
- The final logical response keeps both oversized items as explicit `OVERSIZED_ITEM` outcomes and still returns the trailing normal item as `KNOWN`.
1005+
- Latest focused evidence: C POSIX `ctest` `test_service` passed; C Windows focused filter `36 passed, 0 failed`; Go/Rust POSIX and Windows focused transparent-overflow tests passed.
9891006
- Endpoint-disappears-after-partial-progress validation:
9901007
- C, Rust, and Go POSIX tests validate APPS_LOOKUP and CGROUPS_LOOKUP fail the whole logical call when a valid first partial response is followed by endpoint disappearance before follow-up completion.
9911008
- C, Rust, and Go Windows tests validate the same APPS_LOOKUP and CGROUPS_LOOKUP endpoint-disappears-after-partial-progress case.
@@ -1035,7 +1052,7 @@ Reviewer findings:
10351052
- External reviewer finding: local oversized request-key synthesis could run before proving the client is connected. Handled by enforcing `READY` before logical lookup work in C, Go, and Rust.
10361053
- External reviewer finding: cgroups all-local oversized handling could skip the zero-item probe path. Handled by continuing after a local oversized item only when more request items remain; otherwise the client sends the zero-item request and validates endpoint/generation behavior.
10371054
- External reviewer concern: hidden fixed payload ceilings would contradict initialization-tunable budgets. Reviewed against code and docs. `NIPC_MAX_PAYLOAD_CAP` / `MaxPayloadCap` remains a named default growth ceiling; request/response payload budgets are exposed through initialization config. Server learned-capacity growth now also honors those configured ceilings.
1038-
- External reviewer finding: coverage scripts and broader adversarial matrix still need updates before SOW completion. Coverage script expansion is now implemented; C/Rust/Go coverage gates pass; representative `8192` and `32768` logical-call tests now pass in C/Rust/Go on POSIX and Windows; POSIX and Windows baseline/SHM lookup-scale interop now pass across all C/Rust/Go directed pairs, including mixed-status lookup cases and heavier `65536` stress-only runs; lookup status codec interop now proves `PAYLOAD_EXCEEDED` and `OVERSIZED_ITEM` wire parity across C/Rust/Go; Rust malformed typed lookup response parity is now covered on POSIX and Windows; malformed follow-up responses after partial progress are now covered in C/Rust/Go on POSIX and Windows; reordered and duplicate response-item corruption is now covered in C/Rust/Go on POSIX and Windows; invalid status enum, invalid status-dependent field, and invalid label-table corruption are now covered in C/Rust/Go on POSIX and Windows; endpoint absence before call, endpoint disappearance after partial progress, and endpoint disappearance before the first subcall are now covered in C/Rust/Go on POSIX and Windows; zero-item typed lookup calls are now covered in C/Rust/Go on POSIX and Windows; duplicate and unsorted request keys under request splitting are now covered in C/Rust/Go on POSIX and Windows; full POSIX and Windows benchmark regenerations now pass; downstream topology-containers post-vendor validation now passes. Broader adversarial matrix review remains open.
1055+
- External reviewer finding: coverage scripts and broader adversarial matrix still need updates before SOW completion. Coverage script expansion is now implemented; C/Rust/Go coverage gates pass; representative `8192` and `32768` logical-call tests now pass in C/Rust/Go on POSIX and Windows; POSIX and Windows baseline/SHM lookup-scale interop now pass across all C/Rust/Go directed pairs, including mixed-status lookup cases and heavier `65536` stress-only runs; lookup status codec interop now proves `PAYLOAD_EXCEEDED` and `OVERSIZED_ITEM` wire parity across C/Rust/Go; Rust malformed typed lookup response parity is now covered on POSIX and Windows; malformed follow-up responses after partial progress are now covered in C/Rust/Go on POSIX and Windows; reordered and duplicate response-item corruption is now covered in C/Rust/Go on POSIX and Windows; invalid status enum, invalid status-dependent field, and invalid label-table corruption are now covered in C/Rust/Go on POSIX and Windows; huge valid label isolation is now covered in C/Rust/Go on POSIX and Windows; endpoint absence before call, endpoint disappearance after partial progress, and endpoint disappearance before the first subcall are now covered in C/Rust/Go on POSIX and Windows; zero-item typed lookup calls are now covered in C/Rust/Go on POSIX and Windows; duplicate and unsorted request keys under request splitting are now covered in C/Rust/Go on POSIX and Windows; full POSIX and Windows benchmark regenerations now pass; downstream topology-containers post-vendor validation now passes. Broader adversarial matrix review remains open.
10391056

10401057
Same-failure scan:
10411058

@@ -1095,7 +1112,7 @@ Lessons:
10951112
Follow-up mapping:
10961113

10971114
- Still open inside this active SOW:
1098-
- add any remaining broader adversarial tests from the planned matrix beyond the now-covered representative `8192`/`32768` logical-call cases, now-covered mid-logical timeout/abort cases, now-covered malformed follow-up responses after partial progress, now-covered reordered/duplicate response-item corruption, now-covered invalid status/status-dependent/label-table response corruption, now-covered endpoint absence before call, now-covered endpoint disappearance after partial progress, now-covered endpoint disappearance before the first subcall, now-covered zero-item typed lookup calls, now-covered stale request-capacity reconnect cases, now-covered duplicate/unsorted request keys under request splitting, now-covered request cap-minus-one/exact/plus-one boundaries, now-covered exact response-fit plus/minus-one boundaries, now-covered no-progress overflow cases, now-covered raw no-growth overflow cases, now-covered logical response-byte ceilings, now-covered mixed-status cross-language interop cases, and now-covered lookup status codec interop cases;
1115+
- add any remaining broader adversarial tests from the planned matrix beyond the now-covered representative `8192`/`32768` logical-call cases, now-covered mid-logical timeout/abort cases, now-covered malformed follow-up responses after partial progress, now-covered reordered/duplicate response-item corruption, now-covered invalid status/status-dependent/label-table response corruption, now-covered huge valid label isolation cases, now-covered endpoint absence before call, now-covered endpoint disappearance after partial progress, now-covered endpoint disappearance before the first subcall, now-covered zero-item typed lookup calls, now-covered stale request-capacity reconnect cases, now-covered duplicate/unsorted request keys under request splitting, now-covered request cap-minus-one/exact/plus-one boundaries, now-covered exact response-fit plus/minus-one boundaries, now-covered no-progress overflow cases, now-covered raw no-growth overflow cases, now-covered logical response-byte ceilings, now-covered mixed-status cross-language interop cases, and now-covered lookup status codec interop cases;
10991116
- keep lookup-scale interop green across POSIX baseline, POSIX SHM, Windows Named Pipe, and Windows SHM; all four profiles now cover both all-known scale and mixed-status C/Rust/Go directed tests.
11001117

11011118
## Downstream Vendoring Plan

src/crates/netipc/src/service/raw_unix_tests.rs

Lines changed: 42 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1182,7 +1182,7 @@ fn test_lookup_zero_item_calls() {
11821182
fn test_cgroups_lookup_transparent_payload_exceeded_retry() {
11831183
let svc = unique_service("rs_svc_cgroups_lookup_scale");
11841184
let mut cfg = server_config();
1185-
cfg.max_response_payload_bytes = 160;
1185+
cfg.max_response_payload_bytes = 256;
11861186
let calls = Arc::new(std::sync::atomic::AtomicU32::new(0));
11871187
let handler_calls = calls.clone();
11881188
let handler = cgroups_lookup_dispatch(Arc::new(move |req, builder| {
@@ -1200,13 +1200,22 @@ fn test_cgroups_lookup_transparent_payload_exceeded_retry() {
12001200
} else {
12011201
b"ok"
12021202
};
1203+
let label_value;
1204+
let labels;
1205+
let labels_ref: &[(&[u8], &[u8])] = if item.as_bytes() == b"/huge-label" {
1206+
label_value = vec![b'l'; 512];
1207+
labels = [(b"huge".as_slice(), label_value.as_slice())];
1208+
&labels
1209+
} else {
1210+
&[]
1211+
};
12031212
if builder
12041213
.add(
12051214
CGROUP_LOOKUP_KNOWN,
12061215
ORCHESTRATOR_K8S,
12071216
item.as_bytes(),
12081217
name_ref,
1209-
&[],
1218+
labels_ref,
12101219
)
12111220
.is_err()
12121221
{
@@ -1221,13 +1230,18 @@ fn test_cgroups_lookup_transparent_payload_exceeded_retry() {
12211230
connect_ready(&mut client);
12221231

12231232
let view = client
1224-
.call_cgroups_lookup(&[b"/a".as_slice(), b"/huge".as_slice(), b"/b".as_slice()])
1233+
.call_cgroups_lookup(&[
1234+
b"/a".as_slice(),
1235+
b"/huge".as_slice(),
1236+
b"/huge-label".as_slice(),
1237+
b"/b".as_slice(),
1238+
])
12251239
.expect("cgroups lookup scale call");
12261240
assert!(
12271241
calls.load(Ordering::SeqCst) >= 2,
12281242
"handler should be called for at least two subrequests"
12291243
);
1230-
assert_eq!(view.item_count, 3);
1244+
assert_eq!(view.item_count, 4);
12311245
assert_eq!(view.generation, 7);
12321246
let item0 = view.item(0).expect("item 0");
12331247
assert_eq!(item0.status, CGROUP_LOOKUP_KNOWN);
@@ -1236,9 +1250,12 @@ fn test_cgroups_lookup_transparent_payload_exceeded_retry() {
12361250
assert_eq!(item1.status, CGROUP_LOOKUP_OVERSIZED_ITEM);
12371251
assert_eq!(item1.path.as_bytes(), b"/huge");
12381252
let item2 = view.item(2).expect("item 2");
1239-
assert_eq!(item2.status, CGROUP_LOOKUP_KNOWN);
1240-
assert_eq!(item2.path.as_bytes(), b"/b");
1241-
assert_eq!(item2.name.as_bytes(), b"ok");
1253+
assert_eq!(item2.status, CGROUP_LOOKUP_OVERSIZED_ITEM);
1254+
assert_eq!(item2.path.as_bytes(), b"/huge-label");
1255+
let item3 = view.item(3).expect("item 3");
1256+
assert_eq!(item3.status, CGROUP_LOOKUP_KNOWN);
1257+
assert_eq!(item3.path.as_bytes(), b"/b");
1258+
assert_eq!(item3.name.as_bytes(), b"ok");
12421259

12431260
client.close();
12441261
server.stop();
@@ -1267,6 +1284,15 @@ fn test_apps_lookup_transparent_payload_exceeded_retry() {
12671284
} else {
12681285
b"/ok"
12691286
};
1287+
let label_value;
1288+
let labels;
1289+
let labels_ref: &[(&[u8], &[u8])] = if pid == 44 {
1290+
label_value = vec![b'l'; 512];
1291+
labels = [(b"huge".as_slice(), label_value.as_slice())];
1292+
&labels
1293+
} else {
1294+
&[]
1295+
};
12701296
if builder
12711297
.add(
12721298
PID_LOOKUP_KNOWN,
@@ -1279,7 +1305,7 @@ fn test_apps_lookup_transparent_payload_exceeded_retry() {
12791305
b"ok",
12801306
cgroup_path_ref,
12811307
b"name",
1282-
&[],
1308+
labels_ref,
12831309
)
12841310
.is_err()
12851311
{
@@ -1294,13 +1320,13 @@ fn test_apps_lookup_transparent_payload_exceeded_retry() {
12941320
connect_ready(&mut client);
12951321

12961322
let view = client
1297-
.call_apps_lookup(&[11, 22, 33])
1323+
.call_apps_lookup(&[11, 22, 44, 33])
12981324
.expect("apps lookup scale call");
12991325
assert!(
13001326
calls.load(Ordering::SeqCst) >= 2,
13011327
"handler should be called for at least two subrequests"
13021328
);
1303-
assert_eq!(view.item_count, 3);
1329+
assert_eq!(view.item_count, 4);
13041330
assert_eq!(view.generation, 9);
13051331
let item0 = view.item(0).expect("item 0");
13061332
assert_eq!(item0.status, PID_LOOKUP_KNOWN);
@@ -1310,9 +1336,12 @@ fn test_apps_lookup_transparent_payload_exceeded_retry() {
13101336
assert_eq!(item1.status, PID_LOOKUP_OVERSIZED_ITEM);
13111337
assert_eq!(item1.pid, 22);
13121338
let item2 = view.item(2).expect("item 2");
1313-
assert_eq!(item2.status, PID_LOOKUP_KNOWN);
1314-
assert_eq!(item2.pid, 33);
1315-
assert_eq!(item2.comm.as_bytes(), b"ok");
1339+
assert_eq!(item2.status, PID_LOOKUP_OVERSIZED_ITEM);
1340+
assert_eq!(item2.pid, 44);
1341+
let item3 = view.item(3).expect("item 3");
1342+
assert_eq!(item3.status, PID_LOOKUP_KNOWN);
1343+
assert_eq!(item3.pid, 33);
1344+
assert_eq!(item3.comm.as_bytes(), b"ok");
13161345

13171346
client.close();
13181347
server.stop();

0 commit comments

Comments
 (0)