-
Notifications
You must be signed in to change notification settings - Fork 1
fix(rest): retry on store conflict in read-modify-write handlers #149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -573,7 +573,16 @@ func (s *Server) upsertNodeAndDiskless(w http.ResponseWriter, r *http.Request, n | |
| case err == nil: | ||
| // fresh create — normal path | ||
| case errors.Is(err, store.ErrAlreadyExists): | ||
| err = s.Store.Nodes().Update(r.Context(), n) | ||
| // Bug 205 shape: the re-register replace is routed through | ||
| // PatchNodeSpec so a satellite Hello / reconciler write | ||
| // landing between the AlreadyExists probe and the persist | ||
| // re-applies the wire snapshot under RetryOnConflict | ||
| // instead of leaking a 409 into the registration loop. | ||
| err = s.Store.Nodes().PatchNodeSpec(r.Context(), n.Name, func(live *apiv1.Node) error { | ||
| *live = *n | ||
|
|
||
| return nil | ||
| }) | ||
|
Comment on lines
+581
to
+585
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Completely overwriting Since satellite re-registrations or operator re-runs of Consider preserving the existing flags and merging the properties to prevent data loss. err = s.Store.Nodes().PatchNodeSpec(r.Context(), n.Name, func(live *apiv1.Node) error {
existingFlags := live.Flags
existingProps := live.Props
*live = *n
// Preserve existing flags (like EVICTED) if not explicitly overwritten
for _, f := range existingFlags {
if !slices.Contains(live.Flags, f) {
live.Flags = append(live.Flags, f)
}
}
// Merge properties so custom metadata (like topology labels or Aux/*) is not wiped
if len(live.Props) == 0 {
live.Props = existingProps
} else if len(existingProps) > 0 {
mergedProps := make(map[string]string, len(existingProps)+len(live.Props))
for k, v := range existingProps {
mergedProps[k] = v
}
for k, v := range live.Props {
mergedProps[k] = v
}
live.Props = mergedProps
}
return nil
})
Comment on lines
+576
to
+585
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Avoid full-replacing the live node inside the retry patch. Line 582 assigns 🤖 Prompt for AI Agents |
||
| if err != nil { | ||
| writeStoreError(w, err) | ||
|
|
||
|
|
@@ -913,9 +922,14 @@ func (s *Server) handleNodeUpdate(w http.ResponseWriter, r *http.Request) { | |
| } | ||
|
|
||
| if patch.NodeType != "" && patch.NodeType != existing.Type { | ||
| existing.Type = patch.NodeType | ||
| // Bug 205 shape: typed-Patch with retry-on-conflict so a | ||
| // concurrent satellite Hello / reconciler write can't bounce | ||
| // the rare node_type toggle with a 409. | ||
| err = s.Store.Nodes().PatchNodeSpec(r.Context(), name, func(live *apiv1.Node) error { | ||
| live.Type = patch.NodeType | ||
|
|
||
| err = s.Store.Nodes().Update(r.Context(), &existing) | ||
| return nil | ||
| }) | ||
| if err != nil { | ||
| writeStoreError(w, err) | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't swallow the RD patch failure on explicit create.
If
layer_listis present and this patch fails, the handler still continues withcreateOrPromoteResource()and returns success even though the parent RD never recorded the requestedLayerStack. That can leave the new Resource out of sync with its definition, and it also discards concrete failures likeErrNotFoundafter a concurrent RD delete.Suggested fix
if len(env.LayerList) > 0 { - _ = s.Store.ResourceDefinitions().PatchResourceDefinitionSpec(r.Context(), rdName, - func(rd *apiv1.ResourceDefinition) error { - if len(rd.LayerStack) == 0 { - rd.LayerStack = append([]string(nil), env.LayerList...) - } - - return nil - }) + err := s.Store.ResourceDefinitions().PatchResourceDefinitionSpec(r.Context(), rdName, + func(rd *apiv1.ResourceDefinition) error { + if len(rd.LayerStack) == 0 { + rd.LayerStack = append([]string(nil), env.LayerList...) + } + + return nil + }) + if err != nil { + writeStoreError(w, err) + + return nil, false + } }📝 Committable suggestion
🤖 Prompt for AI Agents