Skip to content

fix(api): downgrade team-limit build errors to warnings and dedupe logs#2635

Merged
jakubno merged 3 commits into
mainfrom
fix/api-team-limits-warning-log
May 13, 2026
Merged

fix(api): downgrade team-limit build errors to warnings and dedupe logs#2635
jakubno merged 3 commits into
mainfrom
fix/api-team-limits-warning-log

Conversation

@jakubno
Copy link
Copy Markdown
Member

@jakubno jakubno commented May 13, 2026

Client-side validation failures (e.g. memory exceeds team limits) were logged twice as errors: once inside RegisterBuild and again in the handlers. Drop the inner log and route the outer log through ReportErrorByCode so 4xx responses emit a single Warn while 5xx responses still emit an Error.

Client-side validation failures (e.g. memory exceeds team limits) were
logged twice as errors: once inside RegisterBuild and again in the
handlers. Drop the inner log and route the outer log through
ReportErrorByCode so 4xx responses emit a single Warn while 5xx
responses still emit an Error.
@cla-bot cla-bot Bot added the cla-signed label May 13, 2026
@cursor
Copy link
Copy Markdown

cursor Bot commented May 13, 2026

PR Summary

Low Risk
Low risk because it only changes telemetry/logging behavior; main risk is reduced visibility for certain 4xx failures if handlers don’t log in all paths.

Overview
This changes template build failure logging to use telemetry.ReportErrorByCode in handlers so 4xx failures log as warnings and aren’t double-reported.

It removes a few ReportCriticalError calls inside template.RegisterBuild for expected 4xx cases (resource-limit validation and alias conflicts), which could make those failures harder to trace if a caller returns the error without logging.

Reviewed by Cursor Bugbot for commit b39dd89. Bugbot is set up for automated code reviews on this repo. Configure here.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 13, 2026

❌ 9 Tests Failed:

Tests completed Failed Passed Skipped
2616 9 2607 7
View the full list of 11 ❄️ flaky test(s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/metrics::TestSandboxMetrics

Flake rate in main: 57.23% (Passed 142 times, Failed 190 times)

Stack Traces | 10.2s run time
=== RUN   TestSandboxMetrics
=== PAUSE TestSandboxMetrics
=== CONT  TestSandboxMetrics
    sandbox_metrics_test.go:47: 
        	Error Trace:	.../api/metrics/sandbox_metrics_test.go:47
        	Error:      	Should NOT be empty, but was 0
        	Test:       	TestSandboxMetrics
--- FAIL: TestSandboxMetrics (10.22s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/metrics::TestTeamMetrics

Flake rate in main: 70.21% (Passed 143 times, Failed 337 times)

Stack Traces | 1.39s run time
=== RUN   TestTeamMetrics
=== PAUSE TestTeamMetrics
=== CONT  TestTeamMetrics
    team_metrics_test.go:61: 
        	Error Trace:	.../api/metrics/team_metrics_test.go:61
        	Error:      	Should be true
        	Test:       	TestTeamMetrics
        	Messages:   	MaxConcurrentSandboxes should be >= 0
--- FAIL: TestTeamMetrics (1.39s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/sandboxes::TestUpdateNetworkConfig

Flake rate in main: 76.52% (Passed 151 times, Failed 492 times)

Stack Traces | 46.6s run time
=== RUN   TestUpdateNetworkConfig
=== PAUSE TestUpdateNetworkConfig
=== CONT  TestUpdateNetworkConfig
Executing command curl in sandbox iezsk7weuq78blotrrpxs
--- FAIL: TestUpdateNetworkConfig (46.58s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/sandboxes::TestUpdateNetworkConfig/3_replace_allowed_ip

Flake rate in main: 55.88% (Passed 135 times, Failed 171 times)

Stack Traces | 7.49s run time
=== RUN   TestUpdateNetworkConfig/3_replace_allowed_ip
Executing command curl in sandbox iflccpjg0ozouymad3z1e
    sandbox_network_update_test.go:328: Command [curl] output: event:{start:{pid:1327}}
    sandbox_network_update_test.go:328: Command [curl] output: event:{end:{exit_code:28 exited:true status:"exit status 28" error:"exit status 28"}}
    sandbox_network_update_test.go:328: 
        	Error Trace:	.../api/sandboxes/sandbox_network_out_test.go:67
        	            				.../api/sandboxes/sandbox_network_update_test.go:58
        	            				.../api/sandboxes/sandbox_network_update_test.go:328
        	Error:      	Received unexpected error:
        	            	command curl in sandbox iflccpjg0ozouymad3z1e failed with exit code 28
        	Test:       	TestUpdateNetworkConfig/3_replace_allowed_ip
        	Messages:   	https://1.1.1.1 should be reachable
--- FAIL: TestUpdateNetworkConfig/3_replace_allowed_ip (7.49s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/sandboxes::TestUpdateNetworkConfig/pause_resume_preserves_allow_internet_access_false

Flake rate in main: 77.02% (Passed 145 times, Failed 486 times)

Stack Traces | 2.84s run time
=== RUN   TestUpdateNetworkConfig/pause_resume_preserves_allow_internet_access_false
Executing command curl in sandbox iflccpjg0ozouymad3z1e
    sandbox_network_update_test.go:372: Command [curl] output: event:{start:{pid:1362}}
    sandbox_network_update_test.go:372: Command [curl] output: event:{end:{exit_code:35 exited:true status:"exit status 35" error:"exit status 35"}}
Executing command curl in sandbox iflccpjg0ozouymad3z1e
    sandbox_network_update_test.go:372: Command [curl] output: event:{start:{pid:1363}}
    sandbox_network_update_test.go:372: Command [curl] output: event:{end:{exit_code:35 exited:true status:"exit status 35" error:"exit status 35"}}
    sandbox_network_update_test.go:391: Command [curl] output: event:{start:{pid:1364}}
    sandbox_network_update_test.go:391: Command [curl] output: event:{data:{stdout:"HTTP/2 302 \r\nx-content-type-options: nosniff\r\nlocation: https://dns.google/\r\ndate: Wed, 13 May 2026 08:47:17 GMT\r\ncontent-type: text/html; charset=UTF-8\r\nserver: HTTP server (unknown)\r\ncontent-length: 216\r\nx-xss-protection: 0\r\nx-frame-options: SAMEORIGIN\r\nalt-svc: h3=\":443\"; ma=2592000,h3-29=\":443\"; ma=2592000\r\n\r\n"}}
    sandbox_network_update_test.go:391: Command [curl] output: event:{end:{exited:true status:"exit status 0"}}
    sandbox_network_update_test.go:391: Command [curl] completed successfully in sandbox iflccpjg0ozouymad3z1e
    sandbox_network_update_test.go:391: 
        	Error Trace:	.../api/sandboxes/sandbox_network_out_test.go:74
        	            				.../api/sandboxes/sandbox_network_update_test.go:60
        	            				.../api/sandboxes/sandbox_network_update_test.go:391
        	Error:      	An error is expected but got nil.
        	Test:       	TestUpdateNetworkConfig/pause_resume_preserves_allow_internet_access_false
        	Messages:   	https://8.8.8.8 should be blocked
--- FAIL: TestUpdateNetworkConfig/pause_resume_preserves_allow_internet_access_false (2.84s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/templates::TestTemplateBuildENV

Flake rate in main: 59.30% (Passed 140 times, Failed 204 times)

Stack Traces | 0s run time
=== RUN   TestTemplateBuildENV
=== PAUSE TestTemplateBuildENV
=== CONT  TestTemplateBuildENV
--- FAIL: TestTemplateBuildENV (0.00s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/templates::TestTemplateBuildENV/ENV_with_multiline_value

Flake rate in main: 60.18% (Passed 133 times, Failed 201 times)

Stack Traces | 8.16s run time
=== RUN   TestTemplateBuildENV/ENV_with_multiline_value
=== PAUSE TestTemplateBuildENV/ENV_with_multiline_value
=== CONT  TestTemplateBuildENV/ENV_with_multiline_value
    build_template_test.go:134: test-ubuntu-env-multiline: [info] Building template k5i4wm8fyimdkb7k4vze/9e908c8c-a4bb-4eb1-b3ef-1cd2ecc552e6
    build_template_test.go:134: test-ubuntu-env-multiline: [info] CACHED [base] FROM ubuntu:22.04 [ffd709f131f42dfab282de47a91dd2c139e900c1c11fc574b49b517a05ef0a32]
    build_template_test.go:134: test-ubuntu-env-multiline: [info] CACHED [base] DEFAULT USER user [90bdd4afa342293c931373351bf578872dec9179214ba3e8bf9edba311466213]
    build_template_test.go:134: test-ubuntu-env-multiline: [info] [builder 1/2] ENV MULTILINE line1
        line2
        line3 [e93da3f3765f20eb6407c336b9e4e0b9321d994ec5f6cb547743a2a4070eed23]
    build_template_test.go:134: test-ubuntu-env-multiline: [info] [builder 2/2] RUN [[ $(echo "$MULTILINE" | wc -l) -eq 3 ]] || exit 1 [477610d61cdf858776262d3331809539bcbcf16f706aac18515a57337bae1786]
    build_template_test.go:134: test-ubuntu-env-multiline: [error] Build failed: failed to run command '[[ $(echo "$MULTILINE" | wc -l) -eq 3 ]] || exit 1': exit status 1
    build_template_test.go:374: Build failed: {<nil> failed to run command '[[ $(echo "$MULTILINE" | wc -l) -eq 3 ]] || exit 1': exit status 1 0xc000306290}
--- FAIL: TestTemplateBuildENV/ENV_with_multiline_value (8.16s)
github.com/e2b-dev/infra/tests/integration/internal/tests/envd::TestBindLocalhost

Flake rate in main: 57.12% (Passed 253 times, Failed 337 times)

Stack Traces | 0s run time
=== RUN   TestBindLocalhost
=== PAUSE TestBindLocalhost
=== CONT  TestBindLocalhost
--- FAIL: TestBindLocalhost (0.00s)
github.com/e2b-dev/infra/tests/integration/internal/tests/envd::TestBindLocalhost/bind_0_0_0_0

Flake rate in main: 62.99% (Passed 141 times, Failed 240 times)

Stack Traces | 9.62s run time
=== RUN   TestBindLocalhost/bind_0_0_0_0
=== PAUSE TestBindLocalhost/bind_0_0_0_0
=== CONT  TestBindLocalhost/bind_0_0_0_0
    localhost_bind_test.go:69: Command [python] output: event:{start:{pid:1257}}
    localhost_bind_test.go:90: 
        	Error Trace:	.../tests/envd/localhost_bind_test.go:90
        	Error:      	Not equal: 
        	            	expected: 200
        	            	actual  : 502
        	Test:       	TestBindLocalhost/bind_0_0_0_0
        	Messages:   	Unexpected status code 502 for bind address 0.0.0.0
--- FAIL: TestBindLocalhost/bind_0_0_0_0 (9.62s)
github.com/e2b-dev/infra/tests/integration/internal/tests/envd::TestBindLocalhost/bind_::1

Flake rate in main: 64.93% (Passed 141 times, Failed 261 times)

Stack Traces | 6.77s run time
=== RUN   TestBindLocalhost/bind_::1
=== PAUSE TestBindLocalhost/bind_::1
=== CONT  TestBindLocalhost/bind_::1
Executing command python in sandbox ic0tuxnintkxhl4jsl5a8
    localhost_bind_test.go:69: Command [python] output: event:{start:{pid:1257}}
    localhost_bind_test.go:90: 
        	Error Trace:	.../tests/envd/localhost_bind_test.go:90
        	Error:      	Not equal: 
        	            	expected: 200
        	            	actual  : 502
        	Test:       	TestBindLocalhost/bind_::1
        	Messages:   	Unexpected status code 502 for bind address ::1
--- FAIL: TestBindLocalhost/bind_::1 (6.77s)
github.com/e2b-dev/infra/tests/integration/internal/tests/envd::TestBindLocalhost/bind_localhost

Flake rate in main: 64.84% (Passed 141 times, Failed 260 times)

Stack Traces | 9.3s run time
=== RUN   TestBindLocalhost/bind_localhost
=== PAUSE TestBindLocalhost/bind_localhost
=== CONT  TestBindLocalhost/bind_localhost
Executing command python in sandbox inugo6t037786nb4ymr3l
    localhost_bind_test.go:69: Command [python] output: event:{start:{pid:1258}}
Executing command python in sandbox inhm90w83n4o3ixp5zc3j
    localhost_bind_test.go:90: 
        	Error Trace:	.../tests/envd/localhost_bind_test.go:90
        	Error:      	Not equal: 
        	            	expected: 200
        	            	actual  : 502
        	Test:       	TestBindLocalhost/bind_localhost
        	Messages:   	Unexpected status code 502 for bind address localhost
--- FAIL: TestBindLocalhost/bind_localhost (9.30s)

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The telemetry call in template_request_build_v3.go is missing the template ID attribute, which is essential for correlating logs and maintaining observability parity between API versions.

Comment thread packages/api/internal/handlers/template_request_build_v3.go Outdated
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@jakubno jakubno marked this pull request as ready for review May 13, 2026 06:49
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small telemetry-only change that correctly downgrades client-side validation failures to Warn and de-duplicates logging; the only bug noted is a pre-existing inconsistency in adjacent code.

Extended reasoning...

Overview

Three files touched, all in the API package: packages/api/internal/template/register_build.go (removes one inner ReportCriticalError for the team-limit branch) and the two handler files deprecated_template_request_build.go and template_request_build_v3.go (switches outer log from ReportCriticalError to ReportErrorByCode, which delegates to Warn for 4xx and Error for 5xx). Net effect: client-side validation failures (e.g. memory exceeds team limits) emit one Warn instead of two Errors at different severities; 500-class failures still log as Error.

Security risks

None. This is purely an observability/logging severity change — no auth, crypto, request-handling, or data-path logic is modified. Worst plausible regression is a missed Error log if RegisterBuild is invoked from a future non-handler caller without its own logging (noted by cursor[bot]); not a concern for the current code.

Level of scrutiny

Low. The change is mechanical, the diff is ~5 lines of meaningful change, and the semantics of the new function (ReportErrorByCode) are already well-established in the codebase. The earlier gemini-code-assist comment about missing WithTemplateID on the v3 handler was already addressed in commit ebb13a1.

Other factors

The bug-hunter flagged a co-located nit: two other 4xx paths inside RegisterBuild (the 409 alias-conflict at line 258 and the 403 alias-already-used at line 320) still call ReportCriticalError internally, so the dedupe goal is only partially achieved. This is a pre-existing inconsistency not introduced by this PR, and the inline comment already conveys it to the author as a follow-up. Not a blocker.

Comment thread packages/api/internal/template/register_build.go
@jakubno jakubno merged commit c5af930 into main May 13, 2026
54 checks passed
@jakubno jakubno deleted the fix/api-team-limits-warning-log branch May 13, 2026 12:39
ValentaTomas pushed a commit that referenced this pull request May 13, 2026
…gs (#2635)

Client-side validation failures (e.g. memory exceeds team limits) were
logged twice as errors: once inside RegisterBuild and again in the
handlers. Drop the inner log and route the outer log through
ReportErrorByCode so 4xx responses emit a single Warn while 5xx
responses still emit an Error.

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants