Skip to content

Commit 45e6d77

Browse files
kvapsclaude
andauthored
chore(release): v0.1.14 (#169)
BUG-048 concurrent late-vd fix (#164 converge) + resize-deadlock guard (#168), release-gate GO + completed 24h ZFS-thick burn-in. Also documents the #167 auto-release pipeline that now cuts the GitHub Release on tag. Signed-off-by: Andrei Kvapil <kvapss@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>
1 parent 825b6e5 commit 45e6d77

2 files changed

Lines changed: 15 additions & 2 deletions

File tree

CHANGELOG.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,19 @@ All notable changes to blockstor are recorded here. The format follows
44
[Keep a Changelog](https://keepachangelog.com/), and the project follows
55
[Semantic Versioning](https://semver.org/).
66

7+
## v0.1.14 — 2026-06-17
8+
9+
Bugfix release. Completes the BUG-048 concurrent late-volume-add fix (noted as a known issue in v0.1.13): late volume-definition adds now converge, and a kernel-DRBD resize deadlock that the first fix attempt introduced is guarded out. Validated by an independent release gate on the live Talos+QEMU stand plus a completed 24-hour ZFS-thick endurance burn-in. Primary backend focus: ZFS thick.
10+
11+
### Fixed
12+
13+
- **Concurrent / rapid late `vd c` no longer drops or wedges the second volume (#164, BUG-048)** — two back-to-back manual `volume-definition create` calls on an existing multi-replica resource-definition could silently drop the second volume-definition, or leave the second volume `Inconsistent` with no SyncSource. The lost-update race in auto-numbered volume-definition creation is closed (the smallest-free VolumeNumber is allocated under an optimistic-locked store write and the create is verified live against the apiserver), and the satellite seeds the late-added volume race-free and elects a deterministic SyncSource. Both the operator-CLI concurrent path and the linstor-csi single-VD path converge. Pinned at L1 + L6 cli-matrix (`multi-volume-late-vd-create`) + L7 replay (`vd-late-concurrent-no-drop`).
14+
- **No resize deadlock from the late-add self-heal (#168, BUG-048)** — the late-add metadata/self-heal pass added by #164 was gated to fire on every diskful reconcile, so its per-volume metadata probe and cluster-wide self-heals contended for the DRBD metadata buffer with an in-flight `vd s` resize. On the ZFS-thin 2-diskful + 1-diskless-client shape this could spiral into a never-converging cluster-wide size-change loop that held the metadata buffer indefinitely and deadlocked DRBD state changes (down / disconnect) cluster-wide — unrecoverable without a coordinated rebuild. The pass now fires only when a desired volume is genuinely not yet attached in the kernel (a race-free `drbdsetup status` check) and is skipped during an in-flight resize, removing the contention while preserving late-add convergence. Pinned at L1 (`reconciler_bug048_resize_deadlock_test`, which fails on the pre-fix gate) + L6 (`bug-048-resize-deadlock-zfs-thin`) + L7 (`bug048-resize`); validated over a 24-hour ZFS-thick burn-in with zero metadata-buffer waits.
15+
16+
### Testing & infrastructure
17+
18+
- **Releases are cut automatically on a version tag (#167)** — pushing a `vX.Y.Z` tag now builds and publishes the three ghcr.io images and creates the GitHub Release from the matching CHANGELOG section (pre-release tags marked accordingly; idempotent on re-run), so the Release object is no longer a manual step.
19+
720
## v0.1.13 — 2026-06-15
821

922
Release-gate hardening release. A full independent acceptance gate (default NO-GO, re-verify-everything against the live Talos+QEMU stand plus a completed 24-hour ZFS-thick endurance burn-in) was run against this candidate; the fixes below were mined and validated over that campaign. Primary backend focus is ZFS thick. Every fix is pinned at L1 unit and, where operator-CLI-reachable, L6 cli-matrix + L7 replay, and was exercised on the live stand.

docs/known-issues.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -290,7 +290,7 @@ Idempotency of `provider.DeleteVolume` was already guaranteed by Bug 33's contra
290290

291291
## Bug 50: concurrent rapid late `vd c` wedges or drops the second volume
292292

293-
**Status**: open (campaign tracking id: BUG-048)
293+
**Status**: closed (#164 converge + #168 resize-deadlock guard; v0.1.14) (campaign tracking id: BUG-048)
294294
**Severity**: P1 (availability; NOT data-loss, NOT a node-reboot deadlock; recoverable)
295295
**Scenario reference**: tests/e2e/cli-matrix/multi-volume-late-vd-create.sh
296296
**Surfaced by**: release-gate validation campaign
@@ -317,7 +317,7 @@ Idempotency of `provider.DeleteVolume` was already guaranteed by Bug 33's contra
317317
2. **Bug 42** (P1, piraeus pod-CIDR drift) — blocks the iptables-mode e2e lane.
318318
3. **Bug 36 + 37** (P1, VD props merge) — fix together; 37 depends on 36's merge plumbing.
319319
4. **Bug 39 + 40** (P1, toggle-disk retry/cancel) — fix together; together they unlock Bug 34's Option B state machine.
320-
5. **Bug 50** (P1, concurrent late `vd c`) — serialise VolumeDefinition create on the RD and seed GI before peer attach; operator-only path, recoverable, not data-loss. A corrected fix (superseding the deferred PR #164) is tracked.
320+
5. **Bug 50** (FIXED in v0.1.14, #164+#168) — serialise VolumeDefinition create on the RD and seed GI before peer attach; operator-only path, recoverable, not data-loss. Closed by #164 (converge) + #168 (resize-deadlock guard).
321321
6. **Bug 34 Option B** (P1 follow-up) — wrap migrate-disk in the new state machine once 39/40 land.
322322
7. **Bug 38** (P2, cosmetic STATE_INFO on shrink) — pure UX, do last.
323323
8. **Bug 32** (P2, observation) — document only; no code fix needed.

0 commit comments

Comments
 (0)