Skip to content

fix(workloadmanager): map missing public key to 503#336

Open
mrinalchaturvedi27 wants to merge 1 commit into
volcano-sh:mainfrom
mrinalchaturvedi27:public-key-missing
Open

fix(workloadmanager): map missing public key to 503#336
mrinalchaturvedi27 wants to merge 1 commit into
volcano-sh:mainfrom
mrinalchaturvedi27:public-key-missing

Conversation

@mrinalchaturvedi27
Copy link
Copy Markdown
Contributor

What type of PR is this?

/kind bug

What this PR does / why we need it:

This fixes workload-manager so ErrPublicKeyMissing returns HTTP 503 instead of HTTP 500. This tells clients that the Router public key is not ready yet and the request can be retried later.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

The focused Go test could not finish locally because the machine ran out of disk space during compilation. I added a table test for the new 503 mapping.

Does this PR introduce a user-facing change?:

NONE

What I fixed

I fixed sandbox creation in workload-manager for the case where the Router public key is not ready yet. Earlier, ErrPublicKeyMissing returned HTTP 500 with internal server error. Now it returns HTTP 503.

How I found it

I checked handleSandboxCreate in pkg/workloadmanager/handlers.go. It already handled missing AgentRuntime and CodeInterpreter errors, but it did not handle api.ErrPublicKeyMissing.

That error comes from pkg/workloadmanager/workload_builder.go when PicoD auth needs the Router public key and the key has not been loaded yet.

What I changed

  • I added a check for errors.Is(err, api.ErrPublicKeyMissing).
  • I return http.StatusServiceUnavailable for that error.
  • I added a test case for this error path in TestHandleSandboxCreate.

Why this is correct

This matches the issue goal directly. A missing public key during startup is not a real server bug. It is a temporary state, so HTTP 503 is a better response than HTTP 500.

Validation

All tests passed

Copilot AI review requested due to automatic review settings May 14, 2026 12:24
@volcano-sh-bot volcano-sh-bot added the kind/bug Something isn't working label May 14, 2026
@volcano-sh-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign yaozengzeng for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Maps api.ErrPublicKeyMissing (raised by workload_builder.go when the Router public key has not been loaded yet) to HTTP 503 in handleSandboxCreate, so clients can distinguish a transient startup condition from a true internal server error and safely retry.

Changes:

  • In handleSandboxCreate, add an errors.Is(err, api.ErrPublicKeyMissing) branch that responds with http.StatusServiceUnavailable and the error message.
  • Add a corresponding table-test case in TestHandleSandboxCreate verifying the 503 response and message body.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
pkg/workloadmanager/handlers.go Adds branch mapping ErrPublicKeyMissing to HTTP 503 instead of falling through to 500.
pkg/workloadmanager/handlers_test.go New table-driven test case asserting 503 + error message for the public-key-missing path.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the handleSandboxCreate function in pkg/workloadmanager/handlers.go to explicitly handle the api.ErrPublicKeyMissing error by returning an HTTP 503 Service Unavailable status. Additionally, a new test case has been added to pkg/workloadmanager/handlers_test.go to verify this behavior. I have no feedback to provide as there were no review comments to evaluate.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 14, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 88.88889% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 49.21%. Comparing base (524e55e) to head (dba5efd).
⚠️ Report is 54 commits behind head on main.

Files with missing lines Patch % Lines
pkg/workloadmanager/handlers.go 88.88% 1 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #336      +/-   ##
==========================================
+ Coverage   47.57%   49.21%   +1.64%     
==========================================
  Files          30       30              
  Lines        2819     2863      +44     
==========================================
+ Hits         1341     1409      +68     
+ Misses       1338     1301      -37     
- Partials      140      153      +13     
Flag Coverage Δ
unittests 49.21% <88.88%> (+1.64%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Mrinal Chaturvedi <mrinal.chaturvedi27@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bug Something isn't working size/S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants