Skip to content

API: Admin endpoint cluster id filter for listing nodes#2641

Merged
sitole merged 3 commits into
mainfrom
feat/get-nodes-filters-by-cluster
May 13, 2026
Merged

API: Admin endpoint cluster id filter for listing nodes#2641
sitole merged 3 commits into
mainfrom
feat/get-nodes-filters-by-cluster

Conversation

@sitole
Copy link
Copy Markdown
Member

@sitole sitole commented May 13, 2026

Adds support to query nodes in non-default clusters

sitole added 3 commits May 13, 2026 14:09
Previously endpoitn for listing nodes returned only nomad managed nodes
(local cluster). Now it can return any nodes api handles for
pre-selected cluster.
@cursor
Copy link
Copy Markdown

cursor Bot commented May 13, 2026

PR Summary

Medium Risk
Medium risk because it changes the GET /nodes handler and generated client/server interfaces, which can break callers, and it changes node filtering semantics based on clusterID.

Overview
This introduces an optional clusterID query parameter for GET /nodes and propagates it through the generated API client/server code into handlers.GetNodes and orchestrator.AdminNodes.

Potential issues: this is a breaking signature change for any internal callers/tests using the generated GetNodes client/server interfaces, and the filtering logic now strictly matches n.ClusterID (defaulting to the local cluster when omitted), which may exclude nodes that were previously returned under the old IsNomadManaged()-based behavior.

Reviewed by Cursor Bugbot for commit b0f323e. Bugbot is set up for automated code reviews on this repo. Configure here.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 13, 2026

❌ 9 Tests Failed:

Tests completed Failed Passed Skipped
2616 9 2607 7
View the full list of 12 ❄️ flaky test(s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/sandboxes::TestUpdateNetworkConfig

Flake rate in main: 76.54% (Passed 156 times, Failed 509 times)

Stack Traces | 45s run time
=== RUN   TestUpdateNetworkConfig
=== PAUSE TestUpdateNetworkConfig
=== CONT  TestUpdateNetworkConfig
Executing command curl in sandbox iql0vbqd50u6pq04slgy0
--- FAIL: TestUpdateNetworkConfig (45.05s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/sandboxes::TestUpdateNetworkConfig/pause_resume_preserves_allow_internet_access_false

Flake rate in main: 77.03% (Passed 150 times, Failed 503 times)

Stack Traces | 4.51s run time
=== RUN   TestUpdateNetworkConfig/pause_resume_preserves_allow_internet_access_false
Executing command curl in sandbox imgahvtmsw6vngrvwiazn
    sandbox_network_update_test.go:372: Command [curl] output: event:{start:{pid:1354}}
    sandbox_network_update_test.go:372: Command [curl] output: event:{end:{exit_code:35  exited:true  status:"exit status 35"  error:"exit status 35"}}
    sandbox_network_update_test.go:372: Command [curl] output: event:{start:{pid:1355}}
Executing command curl in sandbox ii30e66it3ftlaj408rjp
    sandbox_network_update_test.go:372: Command [curl] output: event:{end:{exit_code:35  exited:true  status:"exit status 35"  error:"exit status 35"}}
    sandbox_network_update_test.go:391: Command [curl] output: event:{start:{pid:1356}}
    sandbox_network_update_test.go:391: Command [curl] output: event:{data:{stdout:"HTTP/2 302 \r\nx-content-type-options: nosniff\r\nlocation: https://dns.google/\r\ndate: Wed, 13 May 2026 14:24:51 GMT\r\ncontent-type: text/html; charset=UTF-8\r\nserver: HTTP server (unknown)\r\ncontent-length: 216\r\nx-xss-protection: 0\r\nx-frame-options: SAMEORIGIN\r\nalt-svc: h3=\":443\"; ma=2592000,h3-29=\":443\"; ma=2592000\r\n\r\n"}}
    sandbox_network_update_test.go:391: Command [curl] output: event:{end:{exited:true  status:"exit status 0"}}
    sandbox_network_update_test.go:391: Command [curl] completed successfully in sandbox imgahvtmsw6vngrvwiazn
    sandbox_network_update_test.go:391: 
        	Error Trace:	.../api/sandboxes/sandbox_network_out_test.go:74
        	            				.../api/sandboxes/sandbox_network_update_test.go:60
        	            				.../api/sandboxes/sandbox_network_update_test.go:391
        	Error:      	An error is expected but got nil.
        	Test:       	TestUpdateNetworkConfig/pause_resume_preserves_allow_internet_access_false
        	Messages:   	https://8.8.8.8 should be blocked
--- FAIL: TestUpdateNetworkConfig/pause_resume_preserves_allow_internet_access_false (4.51s)
github.com/e2b-dev/infra/tests/integration/internal/tests/envd::TestBindLocalhost

Flake rate in main: 57.14% (Passed 258 times, Failed 344 times)

Stack Traces | 0s run time
=== RUN   TestBindLocalhost
=== PAUSE TestBindLocalhost
=== CONT  TestBindLocalhost
--- FAIL: TestBindLocalhost (0.00s)
github.com/e2b-dev/infra/tests/integration/internal/tests/envd::TestBindLocalhost/bind_::1

Flake rate in main: 64.73% (Passed 146 times, Failed 268 times)

Stack Traces | 12.5s run time
=== RUN   TestBindLocalhost/bind_::1
=== PAUSE TestBindLocalhost/bind_::1
=== CONT  TestBindLocalhost/bind_::1
Executing command python in sandbox isw49jy4w3ipe6fjzo3xz
    localhost_bind_test.go:69: Command [python] output: event:{start:{pid:1259}}
Executing command /bin/bash in sandbox ipqbbx1kaa4jpjkk1sdqb
    localhost_bind_test.go:90: 
        	Error Trace:	.../tests/envd/localhost_bind_test.go:90
        	Error:      	Not equal: 
        	            	expected: 200
        	            	actual  : 502
        	Test:       	TestBindLocalhost/bind_::1
        	Messages:   	Unexpected status code 502 for bind address ::1
--- FAIL: TestBindLocalhost/bind_::1 (12.54s)
github.com/e2b-dev/infra/tests/integration/internal/tests/envd::TestListDir

Flake rate in main: 53.81% (Passed 188 times, Failed 219 times)

Stack Traces | 0.86s run time
=== RUN   TestListDir
=== PAUSE TestListDir
=== CONT  TestListDir
Executing command findmnt in sandbox irimls1neeza9fb1b6ib7 (user: root)
--- FAIL: TestListDir (0.86s)
github.com/e2b-dev/infra/tests/integration/internal/tests/envd::TestListDir/depth_0_lists_only_root_directory

Flake rate in main: 57.93% (Passed 146 times, Failed 201 times)

Stack Traces | 0.02s run time
=== RUN   TestListDir/depth_0_lists_only_root_directory
=== PAUSE TestListDir/depth_0_lists_only_root_directory
=== CONT  TestListDir/depth_0_lists_only_root_directory
    filesystem_test.go:97: 
        	Error Trace:	.../tests/envd/filesystem_test.go:97
        	Error:      	Received unexpected error:
        	            	unavailable: 502 Bad Gateway
        	Test:       	TestListDir/depth_0_lists_only_root_directory
--- FAIL: TestListDir/depth_0_lists_only_root_directory (0.02s)
github.com/e2b-dev/infra/tests/integration/internal/tests/envd::TestListDir/depth_1_lists_root_directory

Flake rate in main: 57.93% (Passed 146 times, Failed 201 times)

Stack Traces | 0.01s run time
=== RUN   TestListDir/depth_1_lists_root_directory
=== PAUSE TestListDir/depth_1_lists_root_directory
=== CONT  TestListDir/depth_1_lists_root_directory
    filesystem_test.go:97: 
        	Error Trace:	.../tests/envd/filesystem_test.go:97
        	Error:      	Received unexpected error:
        	            	unavailable: 502 Bad Gateway
        	Test:       	TestListDir/depth_1_lists_root_directory
--- FAIL: TestListDir/depth_1_lists_root_directory (0.01s)
github.com/e2b-dev/infra/tests/integration/internal/tests/envd::TestListDir/depth_2_lists_first_level_of_subdirectories_(in_this_case_the_root_directory)

Flake rate in main: 57.93% (Passed 146 times, Failed 201 times)

Stack Traces | 0.01s run time
=== RUN   TestListDir/depth_2_lists_first_level_of_subdirectories_(in_this_case_the_root_directory)
=== PAUSE TestListDir/depth_2_lists_first_level_of_subdirectories_(in_this_case_the_root_directory)
=== CONT  TestListDir/depth_2_lists_first_level_of_subdirectories_(in_this_case_the_root_directory)
    filesystem_test.go:97: 
        	Error Trace:	.../tests/envd/filesystem_test.go:97
        	Error:      	Received unexpected error:
        	            	unavailable: 502 Bad Gateway
        	Test:       	TestListDir/depth_2_lists_first_level_of_subdirectories_(in_this_case_the_root_directory)
--- FAIL: TestListDir/depth_2_lists_first_level_of_subdirectories_(in_this_case_the_root_directory) (0.01s)
github.com/e2b-dev/infra/tests/integration/internal/tests/envd::TestListDir/depth_3_lists_all_directories_and_files

Flake rate in main: 57.93% (Passed 146 times, Failed 201 times)

Stack Traces | 0.01s run time
=== RUN   TestListDir/depth_3_lists_all_directories_and_files
=== PAUSE TestListDir/depth_3_lists_all_directories_and_files
=== CONT  TestListDir/depth_3_lists_all_directories_and_files
    filesystem_test.go:97: 
        	Error Trace:	.../tests/envd/filesystem_test.go:97
        	Error:      	Received unexpected error:
        	            	unavailable: 502 Bad Gateway
        	Test:       	TestListDir/depth_3_lists_all_directories_and_files
--- FAIL: TestListDir/depth_3_lists_all_directories_and_files (0.01s)
github.com/e2b-dev/infra/tests/integration/internal/tests/orchestrator::TestSandboxMemoryIntegrity

Flake rate in main: 66.23% (Passed 156 times, Failed 306 times)

Stack Traces | 73.2s run time
=== RUN   TestSandboxMemoryIntegrity
=== PAUSE TestSandboxMemoryIntegrity
=== CONT  TestSandboxMemoryIntegrity
    sandbox_memory_integrity_test.go:26: Build completed successfully
--- FAIL: TestSandboxMemoryIntegrity (73.15s)
github.com/e2b-dev/infra/tests/integration/internal/tests/orchestrator::TestSandboxMemoryIntegrity/tmpfs_hash

Flake rate in main: 67.26% (Passed 146 times, Failed 300 times)

Stack Traces | 42.4s run time
=== RUN   TestSandboxMemoryIntegrity/tmpfs_hash
=== PAUSE TestSandboxMemoryIntegrity/tmpfs_hash
=== CONT  TestSandboxMemoryIntegrity/tmpfs_hash
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{start:{pid:1256}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stdout:"Total memory: 985 MB\nUsed memory before tmpfs mount: 185 MB\nFree memory before tmpfs mount: 799 MB\nMemory to use in integrity test (80% of free, min 64MB): 639 MB\n"}}
Executing command bash in sandbox i2y4f3wu9mkm8997m8jlx (user: root)
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"639+0 records in\n639+0 records out\n670040064 bytes (670 MB, 639 MiB) copied, 4.71395 s, 142 MB/s\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"\tCommand being timed: \"dd if=/dev/urandom of=/mnt/testfile bs=1M count=639\"\n\tUser time (seconds): 0.00\n\tSystem time (seconds): 4.65\n\tPercent of CPU this job got: 98%\n\tElapsed (wall clock) time (h:mm:ss or m:ss): 0:04.72\n\tAverage shared text size (kbytes): 0\n\tAverage unshared data"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stdout:"Used memory after tmpfs mount and file fill: 831 MB\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:" size (kbytes): 0\n\tAverage stack size (kbytes): 0\n\tAverage total size (kbytes): 0\n\tMaximum resident set size (kbytes): 2596\n\tAverage resident set size (kbytes): 0\n\tMajor (requiring I/O) page faults: 3\n\tMinor (reclaiming a frame) page faults: 341\n\tVoluntary context switches: 4\n\tInvoluntary context switches: 14\n\tSwaps: 0\n\tFile system inputs: 176\n\tFile system outputs: 0\n\tSocket messages sent: 0\n\tSocket messages received: 0\n\tSignals delivered: 0\n\tPage size (bytes): 4096\n\tExit status: 0\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{end:{exited:true status:"exit status 0"}}
    sandbox_memory_integrity_test.go:70: Command [bash] completed successfully in sandbox icy4quxsuy2j8uozi480f
Executing command bash in sandbox icy4quxsuy2j8uozi480f (user: root)
    sandbox_memory_integrity_test.go:74: Command [bash] output: event:{start:{pid:1272}}
    sandbox_memory_integrity_test.go:74: Command [bash] output: event:{data:{stdout:"725bfcec31a7fc22c890ee8e4adf49efea307d50e80c9616e7643cfbb6f63ff1\n"}}
    sandbox_memory_integrity_test.go:74: Command [bash] output: event:{end:{exited:true status:"exit status 0"}}
    sandbox_memory_integrity_test.go:74: Command [bash] completed successfully in sandbox icy4quxsuy2j8uozi480f
Executing command bash in sandbox icy4quxsuy2j8uozi480f (user: root)
    sandbox_memory_integrity_test.go:99: Command [bash] output: event:{start:{pid:1276}}
    sandbox_memory_integrity_test.go:100: 
        	Error Trace:	.../tests/orchestrator/sandbox_memory_integrity_test.go:100
        	Error:      	Received unexpected error:
        	            	failed to execute command bash in sandbox icy4quxsuy2j8uozi480f: invalid_argument: protocol error: incomplete envelope: unexpected EOF
        	Test:       	TestSandboxMemoryIntegrity/tmpfs_hash
--- FAIL: TestSandboxMemoryIntegrity/tmpfs_hash (42.42s)
github.com/e2b-dev/infra/tests/integration/internal/tests/proxies::TestSandboxProxyWorkingPort

Flake rate in main: 54.85% (Passed 149 times, Failed 181 times)

Stack Traces | 11.8s run time
=== RUN   TestSandboxProxyWorkingPort
=== PAUSE TestSandboxProxyWorkingPort
=== CONT  TestSandboxProxyWorkingPort
Executing command ls in sandbox itfa7uhjdwmujoy933evl
    closed_port_test.go:63: Error: Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Executing command ls in sandbox ikqqi31bx99ijp2ev27wg
    closed_port_test.go:63: Error: Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    closed_port_test.go:63: Error: Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    closed_port_test.go:63: Error: Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    closed_port_test.go:63: Error: Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Executing command ls in sandbox ikqqi31bx99ijp2ev27wg
    closed_port_test.go:63: Error: Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    closed_port_test.go:63: Error: Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    closed_port_test.go:63: Error: Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    closed_port_test.go:63: Error: Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    closed_port_test.go:63: Error: Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    closed_port_test.go:64: 
        	Error Trace:	.../tests/proxies/closed_port_test.go:64
        	Error:      	Expected value not to be nil.
        	Test:       	TestSandboxProxyWorkingPort
--- FAIL: TestSandboxProxyWorkingPort (11.79s)

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the GetNodes endpoint to support filtering by cluster ID, including necessary OpenAPI specification changes, generated client code updates, and logic modifications in the API handler and orchestrator. I have no feedback to provide as the changes correctly implement the requested functionality and follow the existing patterns.

@sitole sitole assigned jakubno and unassigned bchalios May 13, 2026
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Straightforward admin filter addition following the existing cluster-fallback pattern used by sibling node endpoints.

Extended reasoning...

Overview

This PR adds an optional clusterID query parameter to the admin GET /nodes endpoint and replaces the prior IsNomadManaged() filter in Orchestrator.AdminNodes with a direct ClusterID comparison. The bulk of the diff is regenerated client/server scaffolding in api.gen.go and tests/integration/internal/api/generated.go from the openapi spec change; the only hand-written changes are a few lines in handlers/admin.go and orchestrator/admin.go, plus the spec.

Security risks

None significant. The endpoint is already gated by AdminTokenAuth. The new parameter is a UUID parsed via the generated runtime binder, so malformed input yields a 400. Behavior preservation: WithClusterFallback(nil) returns consts.LocalClusterID = uuid.Nil, which matches the ClusterID assigned to Nomad-managed nodes in nodemanager/node.go — so the default response when no clusterID is supplied is equivalent to the previous IsNomadManaged() filter.

Level of scrutiny

Low. Admin-only endpoint, small surface area, follows an established pattern already used by GetNodesNodeID and PostNodesNodeID in the same file. The generated code is mechanical oapi-codegen output. No auth, crypto, or data-mutation logic is touched.

Other factors

No outstanding reviewer comments. The bug hunting system found no issues. The only timeline entry is a Cursor bot summary placeholder.

@sitole sitole enabled auto-merge (squash) May 13, 2026 14:31
@sitole sitole merged commit 66daf3b into main May 13, 2026
54 of 55 checks passed
@sitole sitole deleted the feat/get-nodes-filters-by-cluster branch May 13, 2026 14:33
ValentaTomas pushed a commit that referenced this pull request May 13, 2026
Adds support to query nodes in non-default clusters
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants