Skip to content

backports: for v1.13.3#1564

Merged
talos-bot merged 9 commits into
siderolabs:release-1.13from
smira:backports/v1.13.3
May 22, 2026
Merged

backports: for v1.13.3#1564
talos-bot merged 9 commits into
siderolabs:release-1.13from
smira:backports/v1.13.3

Conversation

lukaszraczylo and others added 8 commits May 22, 2026 13:38
Update the three net-macb silent-TX-stall patches from RFC v1 to
PATCH net-next v2.  The v2 series is on lore at:

  https://lore.kernel.org/netdev/20260514215459.36109-1-lukasz@raczylo.com/T/

v2 changes from v1 (already merged in siderolabs#1526):

  * 0001 (PCIe posted-write flush after TSTART doorbell) - now gated
    behind a new MACB_CAPS_PCIE_POSTED_WRITES capability, set only on
    raspberrypi_rp1_config.  v1 applied the readback to every macb
    variant; SoC-integrated parts (Atmel, Microchip, SiFive, Xilinx)
    have no fabric posted-write concern and were paying the
    non-posted-read latency for nothing.

  * 0002 (PCIe read barrier before TX completion descriptor check) -
    replaces the v1 form, which was a regression on read-clear ISR
    silicon.  v1 read ISR with a TCOMP mask in macb_tx_poll(); on
    raspberrypi_rp1_config (where MACB_CAPS_ISR_CLEAR_ON_WRITE is
    not set) that read consumes every bit set in ISR, but the
    use-site masks down to TCOMP and discards the rest -- any
    RCOMP / ROVR / TXUBR bit at that instant is silently consumed.
    v2 replaces with (void)queue_readl(queue, IMR), the read-only
    mask mirror -- non-destructive, same PCIe-barrier effect.

  * 0003 (TX stall watchdog) - tracks tail movement via a bool flag
    set by macb_tx_complete() instead of a tx_tail snapshot
    (form suggested by Phil Elwell on raspberrypi/linux#7340).
    Adds a netif_carrier_ok() gate.  Wraps netdev_warn in
    printk_ratelimit() so operators can count occurrences while
    bounding log noise.  (An earlier draft used the macro
    netdev_warn_ratelimited(), which does not exist in this
    kernel -- caught by John Laur's build test on the v2 patches.)

Production runtime so far: 24-node Pi 5 fleet on v2 patch-2
IMR-barrier form since 2026-05-14 14:00 UTC, ~190 cumulative
node-hours, zero mid-runtime TX stalls.  Pre-patch baseline
(~0.5 stall/node-hour) would have predicted ~95 stalls; observed 0.

Related:
  * netdev v1 RFC thread:    https://lore.kernel.org/netdev/cover.1777064117.git.lukasz@raczylo.com/T/
  * netdev v2 series:        https://lore.kernel.org/netdev/20260514215459.36109-1-lukasz@raczylo.com/T/
  * raspberrypi/linux merge: raspberrypi/linux#7340
  * raspberrypi/linux v2 PR: raspberrypi/linux#7369

Signed-off-by: Lukasz Raczylo <lukasz@raczylo.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 02bcfce)
Latest LTS.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 9fff943)
ZFS: 2.4.2
NVIDIA LTS: 580.159.04

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 12ca698)
Bumping to the latest LTS.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit d616f6c)
Bump kernel 6.18.32

Signed-off-by: Noel Georgi <git@frezbo.dev>
(cherry picked from commit eac5f86)
linux_firmware_version: `20260410` -> `20260519`

Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
(cherry picked from commit 270f9f8)
Preserve System.map, so that we can use that to verify that module set and
the kernel don't have unresolved symbols.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit adeaafc)
Fixes siderolabs/talos#13373

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 4f7feb4)
Copilot AI review requested due to automatic review settings May 22, 2026 09:47
@github-project-automation github-project-automation Bot moved this to To Do in Planning May 22, 2026
@talos-bot talos-bot moved this from To Do to In Review in Planning May 22, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Backport bundle for the v1.13.3 release line, primarily updating core system components (kernel, firmware, ZFS, NVIDIA) and aligning kernel build/config behavior with the referenced upstream/backported PR set (CRI-U enablement, additional kernel modules, and updated macb silent-TX-stall patch series).

Changes:

  • Bump Linux to 6.18.32 and update linux-firmware, ZFS, and NVIDIA LTS driver versions/checksums in Pkgfile.
  • Adjust kernel packaging to preserve /boot/System.map and run depmod using it (with unresolved-symbol reporting).
  • Refresh kernel patch set and kernel configs (PPP, diag options for CRI-U, bnxt_re module), plus update libarchive source URL to GitHub releases.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated no comments.

Show a summary per file
File Description
Pkgfile Updates pinned versions/checksums for Linux, linux-firmware, ZFS, and NVIDIA LTS driver.
libarchive/pkg.yaml Switches libarchive source fetch URL to the GitHub releases asset path.
kernel/kernel/pkg.yaml Preserves System.map in /boot and updates depmod invocation to use it with extra symbol checking.
kernel/build/patches/README.md Updates documentation to match the macb v2 patch series filenames/status/links.
kernel/build/patches/0001-net-macb-flush-PCIe-posted-write-after-TSTART-doorbe.patch Updates macb TSTART posted-write flush logic with a PCIe-only capability gate.
kernel/build/patches/0002-net-macb-insert-PCIe-read-barrier-before-TX-completi.patch Replaces the prior ISR read approach with a non-destructive IMR read barrier.
kernel/build/patches/0003-net-macb-add-TX-stall-watchdog-to-recover-from-lost-.patch Updates TX stall watchdog logic (tail-moved flag + carrier gate + rate limiting).
kernel/build/config-arm64 Regenerated kernel config for 6.18.32 enabling PPP, diag options, and INFINIBAND_BNXT_RE=m.
kernel/build/config-amd64 Regenerated kernel config for 6.18.32 enabling CRI-U-related options (e.g., SOFT_DIRTY, diag) plus PPP and INFINIBAND_BNXT_RE=m.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@smira
Copy link
Copy Markdown
Member Author

smira commented May 22, 2026

let's also get the memorymodule fix, i'll put a PR soon

this is pkgs 🙄

@github-project-automation github-project-automation Bot moved this from In Review to Approved in Planning May 22, 2026
Fixes:

* siderolabs#1557
* siderolabs/talos#13397

New modules:

```
kernel/drivers/infiniband/hw/bnxt_re/bnxt_re.ko
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit c0ec8f3)
@smira smira force-pushed the backports/v1.13.3 branch from 5c904a0 to 993d4a6 Compare May 22, 2026 10:50
@smira
Copy link
Copy Markdown
Member Author

smira commented May 22, 2026

/m

@talos-bot talos-bot merged commit 993d4a6 into siderolabs:release-1.13 May 22, 2026
13 checks passed
@github-project-automation github-project-automation Bot moved this from Approved to Done in Planning May 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants