Skip to content

fix(api): re-register api-grpc Consul service as alias of api-internal-grpc#2631

Merged
jakubno merged 1 commit into
mainfrom
fix/api-grpc-consul-alias
May 12, 2026
Merged

fix(api): re-register api-grpc Consul service as alias of api-internal-grpc#2631
jakubno merged 1 commit into
mainfrom
fix/api-grpc-consul-alias

Conversation

@jakubno
Copy link
Copy Markdown
Member

@jakubno jakubno commented May 12, 2026

#2470 renamed the API internal gRPC Consul service from api-grpc to api-internal-grpc. Old client-proxy allocations still have API_GRPC_ADDRESS=api-grpc.service.consul: baked in and break when the old name disappears. Re-register api-grpc on the same internal port so legacy client proxies keep working during the migration. Remove once no pre-#2470 client-proxy allocations remain.

…l-grpc

#2470 renamed the API internal gRPC Consul service from api-grpc to
api-internal-grpc. Old client-proxy allocations still have
API_GRPC_ADDRESS=api-grpc.service.consul:<port> baked in and break when
the old name disappears. Re-register api-grpc on the same internal port
so legacy client proxies keep working during the migration. Remove once
no pre-#2470 client-proxy allocations remain.
@cla-bot cla-bot Bot added the cla-signed label May 12, 2026
@cursor
Copy link
Copy Markdown

cursor Bot commented May 12, 2026

PR Summary

Low Risk
Low risk config-only change, but it adds a second Consul service/check on the same port which could cause unexpected discovery or health-check behavior if consumers assume a single service registration.

Overview
Adds a new Consul service stanza named api-grpc that points to the existing api_internal_grpc port, restoring the pre-#2470 service name for legacy clients. The only concern is that this duplicates service registration and TCP health checks on the same endpoint, which may confuse consumers or tooling that assumes a single internal gRPC service name.

Reviewed by Cursor Bugbot for commit 0b2448e. Bugbot is set up for automated code reviews on this repo. Configure here.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

❌ 6 Tests Failed:

Tests completed Failed Passed Skipped
2609 6 2603 7
View the full list of 7 ❄️ flaky test(s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/sandboxes::TestUpdateNetworkConfig

Flake rate in main: 76.73% (Passed 141 times, Failed 465 times)

Stack Traces | 42s run time
=== RUN   TestUpdateNetworkConfig
=== PAUSE TestUpdateNetworkConfig
=== CONT  TestUpdateNetworkConfig
--- FAIL: TestUpdateNetworkConfig (42.01s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/sandboxes::TestUpdateNetworkConfig/pause_resume_preserves_allow_internet_access_false

Flake rate in main: 77.14% (Passed 136 times, Failed 459 times)

Stack Traces | 4.4s run time
=== RUN   TestUpdateNetworkConfig/pause_resume_preserves_allow_internet_access_false
    sandbox_network_update_test.go:372: Command [curl] output: event:{start:{pid:1352}}
    sandbox_network_update_test.go:372: Command [curl] output: event:{end:{exit_code:35 exited:true status:"exit status 35" error:"exit status 35"}}
Executing command curl in sandbox iv8alls0g14gr2kij46of
    sandbox_network_update_test.go:372: Command [curl] output: event:{start:{pid:1353}}
    sandbox_network_update_test.go:372: Command [curl] output: event:{end:{exit_code:35 exited:true status:"exit status 35" error:"exit status 35"}}
Executing command /bin/sh in sandbox itriko0kwvtumxasy0ke3
    sandbox_network_update_test.go:391: Command [curl] output: event:{start:{pid:1354}}
    sandbox_network_update_test.go:391: Command [curl] output: event:{data:{stdout:"HTTP/2 302 \r\nx-content-type-options: nosniff\r\nlocation: https://dns.google/\r\ndate: Tue, 12 May 2026 11:43:59 GMT\r\ncontent-type: text/html; charset=UTF-8\r\nserver: HTTP server (unknown)\r\ncontent-length: 216\r\nx-xss-protection: 0\r\nx-frame-options: SAMEORIGIN\r\nalt-svc: h3=\":443\"; ma=2592000,h3-29=\":443\"; ma=2592000\r\n\r\n"}}
    sandbox_network_update_test.go:391: Command [curl] output: event:{end:{exited:true status:"exit status 0"}}
    sandbox_network_update_test.go:391: Command [curl] completed successfully in sandbox iv8alls0g14gr2kij46of
    sandbox_network_update_test.go:391: 
        	Error Trace:	.../api/sandboxes/sandbox_network_out_test.go:74
        	            				.../api/sandboxes/sandbox_network_update_test.go:60
        	            				.../api/sandboxes/sandbox_network_update_test.go:391
        	Error:      	An error is expected but got nil.
        	Test:       	TestUpdateNetworkConfig/pause_resume_preserves_allow_internet_access_false
        	Messages:   	https://8.8.8.8 should be blocked
--- FAIL: TestUpdateNetworkConfig/pause_resume_preserves_allow_internet_access_false (4.40s)
github.com/e2b-dev/infra/tests/integration/internal/tests/envd::TestBindLocalhost

Flake rate in main: 57.50% (Passed 238 times, Failed 322 times)

Stack Traces | 0s run time
=== RUN   TestBindLocalhost
=== PAUSE TestBindLocalhost
=== CONT  TestBindLocalhost
--- FAIL: TestBindLocalhost (0.00s)
github.com/e2b-dev/infra/tests/integration/internal/tests/envd::TestBindLocalhost/bind_0_0_0_0

Flake rate in main: 63.33% (Passed 132 times, Failed 228 times)

Stack Traces | 7.76s run time
=== RUN   TestBindLocalhost/bind_0_0_0_0
=== PAUSE TestBindLocalhost/bind_0_0_0_0
=== CONT  TestBindLocalhost/bind_0_0_0_0
    localhost_bind_test.go:69: Command [python] output: event:{start:{pid:1258}}
Executing command python in sandbox i79bwas52rjuw59v5ppfg
    localhost_bind_test.go:90: 
        	Error Trace:	.../tests/envd/localhost_bind_test.go:90
        	Error:      	Not equal: 
        	            	expected: 200
        	            	actual  : 502
        	Test:       	TestBindLocalhost/bind_0_0_0_0
        	Messages:   	Unexpected status code 502 for bind address 0.0.0.0
--- FAIL: TestBindLocalhost/bind_0_0_0_0 (7.76s)
github.com/e2b-dev/infra/tests/integration/internal/tests/envd::TestBindLocalhost/bind_127_0_0_1

Flake rate in main: 58.39% (Passed 134 times, Failed 188 times)

Stack Traces | 7.17s run time
=== RUN   TestBindLocalhost/bind_127_0_0_1
=== PAUSE TestBindLocalhost/bind_127_0_0_1
=== CONT  TestBindLocalhost/bind_127_0_0_1
Executing command python in sandbox i4qd7qgvacscqlq9jg8v2
    localhost_bind_test.go:69: Command [python] output: event:{start:{pid:1258}}
    localhost_bind_test.go:90: 
        	Error Trace:	.../tests/envd/localhost_bind_test.go:90
        	Error:      	Not equal: 
        	            	expected: 200
        	            	actual  : 502
        	Test:       	TestBindLocalhost/bind_127_0_0_1
        	Messages:   	Unexpected status code 502 for bind address 127.0.0.1
--- FAIL: TestBindLocalhost/bind_127_0_0_1 (7.17s)
github.com/e2b-dev/infra/tests/integration/internal/tests/envd::TestBindLocalhost/bind_::1

Flake rate in main: 65.26% (Passed 132 times, Failed 248 times)

Stack Traces | 10.5s run time
=== RUN   TestBindLocalhost/bind_::1
=== PAUSE TestBindLocalhost/bind_::1
=== CONT  TestBindLocalhost/bind_::1
Executing command python in sandbox ik7nwnyp63jat5ustcqts
    localhost_bind_test.go:69: Command [python] output: event:{start:{pid:1258}}
    localhost_bind_test.go:90: 
        	Error Trace:	.../tests/envd/localhost_bind_test.go:90
        	Error:      	Not equal: 
        	            	expected: 200
        	            	actual  : 502
        	Test:       	TestBindLocalhost/bind_::1
        	Messages:   	Unexpected status code 502 for bind address ::1
--- FAIL: TestBindLocalhost/bind_::1 (10.54s)
github.com/e2b-dev/infra/tests/integration/internal/tests/envd::TestBindLocalhost/bind_localhost

Flake rate in main: 65.17% (Passed 132 times, Failed 247 times)

Stack Traces | 7.26s run time
=== RUN   TestBindLocalhost/bind_localhost
=== PAUSE TestBindLocalhost/bind_localhost
=== CONT  TestBindLocalhost/bind_localhost
Executing command python in sandbox i5bnrgf4dr6lh3qgfd7a9
    localhost_bind_test.go:69: Command [python] output: event:{start:{pid:1258}}
    localhost_bind_test.go:90: 
        	Error Trace:	.../tests/envd/localhost_bind_test.go:90
        	Error:      	Not equal: 
        	            	expected: 200
        	            	actual  : 502
        	Test:       	TestBindLocalhost/bind_localhost
        	Messages:   	Unexpected status code 502 for bind address localhost
--- FAIL: TestBindLocalhost/bind_localhost (7.26s)

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request adds a compatibility alias for the api-grpc service in the Nomad job configuration to support legacy client-proxy allocations. I have no feedback to provide.

@jakubno jakubno marked this pull request as ready for review May 12, 2026 11:42
@jakubno jakubno requested a review from sitole May 12, 2026 11:43
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — straightforward backwards-compatibility Consul service alias with clear documentation for removal.

Extended reasoning...

Overview

This PR adds a single Consul service stanza to iac/modules/job-api/jobs/api.hcl re-registering the api-grpc name on the existing api_internal_grpc port. The block is well-commented with the rationale (legacy client-proxy allocations still hold API_GRPC_ADDRESS=api-grpc.service.consul:<port> bindings) and an explicit removal condition.

Security risks

None. This is a Nomad job spec change that re-exposes an existing internal port under an additional Consul service name on the same network surface; no new ports, credentials, or auth surfaces are introduced.

Level of scrutiny

Low. This is a config-only HCL change scoped to a single file, follows the exact pattern of the adjacent api-internal-grpc service stanza (same port, same TCP health-check shape), and is intentionally a temporary migration shim. The duplicate health check on the same port is a deliberate trade-off, not a bug — health-checking the same TCP endpoint twice every 3s is negligible cost.

Other factors

The Cursor bugbot flagged the duplicate service registration as a theoretical discovery-confusion concern, but that is precisely the point of the change (give legacy callers a working name during migration). Codecov reports clean. Author has stated the removal criterion explicitly in both the description and an inline comment, which is the right hygiene for this kind of transitional shim.

@jakubno jakubno merged commit bb6891e into main May 12, 2026
54 checks passed
@jakubno jakubno deleted the fix/api-grpc-consul-alias branch May 12, 2026 11:52
ValentaTomas pushed a commit that referenced this pull request May 13, 2026
…l-grpc (#2631)

#2470 renamed the API internal gRPC Consul service from api-grpc to
api-internal-grpc. Old client-proxy allocations still have
API_GRPC_ADDRESS=api-grpc.service.consul:<port> baked in and break when
the old name disappears. Re-register api-grpc on the same internal port
so legacy client proxies keep working during the migration. Remove once
no pre-#2470 client-proxy allocations remain.
AdaAibaby pushed a commit to AdaAibaby/infra that referenced this pull request May 14, 2026
…l-grpc (e2b-dev#2631)

e2b-dev#2470 renamed the API internal gRPC Consul service from api-grpc to
api-internal-grpc. Old client-proxy allocations still have
API_GRPC_ADDRESS=api-grpc.service.consul:<port> baked in and break when
the old name disappears. Re-register api-grpc on the same internal port
so legacy client proxies keep working during the migration. Remove once
no pre-e2b-dev#2470 client-proxy allocations remain.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants