Skip to content

Handle locks better in the pantry.#1949

Closed
leftwo wants to merge 2 commits into
mainfrom
alan/lock-the-pantry
Closed

Handle locks better in the pantry.#1949
leftwo wants to merge 2 commits into
mainfrom
alan/lock-the-pantry

Conversation

@leftwo

@leftwo leftwo commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Improve the lock handling around pantry actions that require outside activity to make progress. We were holding a lock and then activating or deactivating a volume, and this can take unknown amount of time.

Now we take a lock and leave a state to indicate someone is working on the activation but give back the lock so other actions can continue. A new PantryEntryState enum tracks whether a volume is Attaching (construct/activate in progress) or Attached (ready for use). This prevents duplicate concurrent attach attempts while allowing other volumes or other actions to proceed independently.

entry_get() now returns 503 for Attaching volumes so callers retry.
This might require some updates on the Nexus/Omicron side.

detach() rejects Attaching volumes until attach completes, but this was the same behavior as before, but just because we held the lock for the entire attach.

More unit tests added, as well as a specific pantry test, but I'm not sure if this tools/test_pantry_hang.sh is worth keeping or not. I think growing the pantest into a more functional/stress test of the pantry might be a better way to go. I also might consider putting something into the CI tests for the pantry, maybe this crude script is good enough?

Improve the lock handling around pantry actions that require outside
activity to make progress.  We were holding a lock and then activating
or deactivating a volume, and this can take unknown amount of time.

Now we take a lock and leave a state to indicate someone is working
on the activation but give back the lock so other actions can continue.
A new PantryEntryState enum tracks whether a volume is Attaching
(construct/activate in progress) or Attached (ready for use). This
prevents duplicate concurrent attach attempts while allowing other
volumes or other actions to proceed independently.

entry_get() now returns 503 for Attaching volumes so callers retry.
This might require some updates on the Nexus/Omicron side.

detach() rejects Attaching volumes until attach completes, but this
was the same behavior as before, but just because we held the lock
for the entire attach.

More unit tests added, as well as a specific pantry test, but I'm
not sure if this tools/test_pantry_hang.sh is worth keeping or not.
@leftwo

leftwo commented Jun 12, 2026

Copy link
Copy Markdown
Contributor Author

Fix for #1945

@leftwo leftwo requested a review from jmpesp June 12, 2026 20:14
@jmpesp

jmpesp commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

I disagree with having GET for entry return a 503 during activation, it seems wrong to me. Previously the endpoint was returning a 200 and the caller would look at the return struct to see what the activation status is - a 503 could mean anything.

I agree with the shape of the changes here, in that we hold the entries lock for as short a time as we can, and would like to chat next week: I think we can get to the same spot without the new enum.

@leftwo

leftwo commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

This PR obsoleted by #1952

@leftwo leftwo closed this Jun 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants