Skip to content

sitrep concurrent insert/delete can leak rows #10131

@smklein

Description

@smklein

I believe the following situation can leak database rows:

  • Nexus A starts inserting sitrep S. Note that this starts with insertion of the fm_sitrep row, because it's "how orphaned sitreps are found". Nexus A then gets super throttled, and runs very slowly.
  • Nexus B comes along, and does other stuff which makes sitrep S no longer viable (e.g., picking a different sitrep, S2, to be the new current blueprint)
  • Nexus B sees some of sitrep S. It can see the fm_sitrep record, and maybe some of the early-inserted rows, but let's say that it doesn't see any cases -- because Nexus A hasn't finished inserting them.
  • Nexus B lists orphans, sees sitrep S (not in history, parent is not viable) and deletes all the rows that were inserted by Nexus A. This includes deleting the metadata record for sitrep S!
  • Nexus A finally resumes - it was most of the way done with insertion, so it wraps it up: it INSERTs a bunch of cases for "sitrep S", which has unfortunately otherwise been GC'd by Nexus B.
  • Nexus A won't be able to make sitrep S active - insert_sitrep_version_query will fail, because it got GC'd - and will bail out.

The end result: Nexus A will have leaked a chunk of sitrep S, but Nexus B did garbage collection of the metadata record of sitrep S, so those "leaked chunks" (e.g., the "cases", which are currently inserted last) can't be found / cleaned up.

Metadata

Metadata

Assignees

Labels

fault-managementEverything related to the fault-management initiative (RFD480 and others)

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions