bug: db migration + deployment race condition

### Version

main

### Describe the bug.

Migrations run once on the DB, but kubernetes keeps old carbide-api pods alive until the rollout finishes. For a window, new schema + old code (or the reverse) can do the wrong things leading to unexpected behavior. 

#### Example

As a consequence of deploying https://github.com/NVIDIA/infra-controller/pull/1610, many explored endpoints were deleted. This caused the preingestion state of these endpoints to be reset and perform mutating actions on their corresponding assigned hosts.

The database migration sets `machine_id` on BMC `machine_interfaces` rows, but during a deployment, an old carbide-api pod still builds the site explorer underlay list with “only interfaces where machine_id is null.” After migration, those BMC rows no longer qualify, so the BMC IP drops out of the index. On the next tick, that pod treats the existing `explored_endpoints` row as orphaned, deletes it, and a later tick re-inserts it as a fresh endpoint with `preingestion_state: initial`.

#### Next Steps

A potential mitigation is to adjust kubernetes rollout settings to prevent running old and new carbide-api pods simultaneously during deployment. However, this requires an understanding of how the timing of the database migration aligns with the bringup of the new pod. 

### Code of Conduct

- [x] I agree to follow NCX Infra Controller's Code of Conduct
- [x] I have searched the [open bugs](https://github.com/NVIDIA/ncx-infra-controller-core/issues?q=is%3Aopen+is%3Aissue+type:Bug) and have found no duplicates for this bug report

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: db migration + deployment race condition #1824

Version

Describe the bug.

Example

Next Steps

Code of Conduct

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

bug: db migration + deployment race condition #1824

Description

Version

Describe the bug.

Example

Next Steps

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions