Skip to content

kernel: macb silent TX stall — v2 patch series#1546

Merged
talos-bot merged 1 commit into
siderolabs:mainfrom
lukaszraczylo:macb-v2-followup
May 15, 2026
Merged

kernel: macb silent TX stall — v2 patch series#1546
talos-bot merged 1 commit into
siderolabs:mainfrom
lukaszraczylo:macb-v2-followup

Conversation

@lukaszraczylo
Copy link
Copy Markdown
Contributor

Update the three net-macb-silent-tx-stall patches from RFC v1 to PATCH net-next v2. Follow-up to #1526.

Background

The v1 RFC drew constructive review on netdev (Andrea della Porta @ SUSE on 2026-05-05; Théo Lebrun @ Bootlin / new macb maintainer on 2026-05-14) and on the rpi vendor fork (Phil Elwell on raspberrypi/linux#7340). v2 incorporates that feedback plus a regression I caught in v1 patch 2 during self-audit.

v2 changes from v1

0001 (PCIe posted-write flush after TSTART doorbell)

v1 applied the NCR readback to every macb variant. SoC-integrated parts (Atmel, Microchip, SiFive, Xilinx) have no fabric posted-write concern and were paying the non-posted-read latency for nothing.

v2 gates the readback behind a new MACB_CAPS_PCIE_POSTED_WRITES capability, set only on raspberrypi_rp1_config.

0002 (PCIe read barrier before TX completion descriptor check) ← the regression fix

v1's if ((queue_readl(queue, ISR) & MACB_BIT(TCOMP)) || macb_tx_complete_pending(queue)) is destructive on read-clear ISR silicon — which is what raspberrypi_rp1_config defaults to (MACB_CAPS_ISR_CLEAR_ON_WRITE not set). The read consumes every bit set in the register, but the use-site masks down to TCOMP and discards the rest. Any RCOMP / ROVR / TXUBR set at that instant is silently consumed; on a level-triggered IRQ the consumed bit drops the line before GIC delivery, so the IRQ for that event vanishes.

Race window is ~200-500 ns per macb_tx_poll completion. At moderate-to-heavy RX load it works out to non-negligible RX-completion loss. Disclosed in the v2 cover letter and reply.

v2 replaces the destructive read with (void)queue_readl(queue, IMR). IMR is the read-only mask mirror; the read has no side effects on either read-clear or W1C ISR silicon and still serves as an architected PCIe read barrier for prior peripheral DMA writes.

0003 (TX stall watchdog)

Three changes:

  • Tracks tail movement via a bool tx_stall_tail_moved set by macb_tx_complete() rather than a tx_tail snapshot. Form requested by Phil Elwell on net: macb: candidate fixes for silent TX stall on BCM2712/RP1 raspberrypi/linux#7340 (merged 2026-05-08).
  • Adds netif_carrier_ok() gate at the top of the watchdog tick — eliminates a boot-time false positive that fires when tx_head advances from kernel-queued packets before link autoneg completes (observed at ~25% rate during a 24-node fleet rolling reboot).
  • Swaps netdev_warn_oncenetdev_warn_ratelimited so operators can count real events across the lifetime of the netdev (Andrea della Porta's ask).

Production runtime so far

The v2 patch-2 IMR-barrier form has been running in production on a 24-node Raspberry Pi 5 fleet since 2026-05-14 14:00 UTC. ~190 cumulative node-hours so far:

Signal Result
Mid-runtime TX stalls (kernel watchdog) 0
User-space watchdog RECOVER events 0
Fleet-wide RX errors 0
Fleet-wide TX errors 0
Boot-time false positives 5 (all tail=0 head=N — eliminated by v2 patch 3's carrier gate)

Pre-patch baseline (~0.5 stall/node-hour at fleet level, observed in our 2026-04-24 reference window) would have predicted ~95 mid-runtime stalls in 190 node-hours; observed is 0.

Related

Files

Old (in main) New (in this PR)
0001-...flush-PCIe-posted-write-after-TSTART-doorbe.patch 0001-...flush-PCIe-posted-write-after-TSTART-doorbe.patch (v2: cap-gated)
0002-...re-check-ISR-after-IER-re-enable-in-macb_tx.patch 0002-...insert-PCIe-read-barrier-before-TX-completi.patch (v2: IMR barrier instead of destructive ISR read)
0003-...add-TX-stall-watchdog-as-defence-in-depth-s.patch 0003-...add-TX-stall-watchdog-to-recover-from-lost-.patch (v2: bool flag + carrier_ok + ratelimited)

README.md updated accordingly.

Happy to split / squash / rebase on feedback.

@smira
Copy link
Copy Markdown
Member

smira commented May 15, 2026

Thank you, we will get them in

@github-project-automation github-project-automation Bot moved this from In Review to Approved in Planning May 15, 2026
@smira
Copy link
Copy Markdown
Member

smira commented May 15, 2026

Looks like the patch doesn't apply:

 > linux/arm64 kernel-build:prepare-0:
0.031 patching file drivers/net/ethernet/cadence/macb.h
0.031 Hunk #1 FAILED at 791.
0.031 1 out of 1 hunk FAILED -- saving rejects to file drivers/net/ethernet/cadence/macb.h.rej
0.032 patching file drivers/net/ethernet/cadence/macb_main.c
0.032 Hunk #1 succeeded at 1807 (offset -115 lines).
0.032 Hunk #2 succeeded at 2489 with fuzz 2 (offset -79 lines).
0.032 Hunk #3 FAILED at 5688.
0.032 1 out of 3 hunks FAILED -- saving rejects to file drivers/net/ethernet/cadence/macb_main.c.rej
0.033 Failed to apply patch /pkg/patches/0001-net-macb-flush-PCIe-posted-write-after-TSTART-doorbe.patch

lukaszraczylo added a commit to lukaszraczylo/pkgs that referenced this pull request May 15, 2026
CI on this PR (siderolabs#1546) reported all three patches fail
to apply against the source tree pkgs actually builds.  Hunk siderolabs#1
FAILED at 791 in macb.h and Hunk siderolabs#3 FAILED at 5688 in macb_main.c.

Root cause: the v2 patches in 8296039 were produced from a tree based
on net-next HEAD (6.19-rc1+), where the macb cap constants and
raspberrypi_rp1_config sit at different line numbers than in linux
6.18.29.  Pkgfile pins linux to 6.18.29.

Patches re-generated against tag v6.18.29 (commit d31a849ff).  All
three apply cleanly with -p1; no source code changes versus the v2
series on lore -- only context-line shifts and updated index hashes.

Anchor differences vs 8296039:

  macb.h:
    @@ -791  ->  @@ -769     (no QUEUE_NEEDS_TSUCLK_FLUSH cap in 6.18.y)

  macb_main.c (macb_tx_restart):
    @@ -1922 ->  @@ -1807

  macb_main.c (macb_start_xmit):
    @@ -2560 ->  @@ -2481    (still gated behind MACB_CAPS_PCIE_POSTED_WRITES)

  macb_main.c (raspberrypi_rp1_config):
    @@ -5674 ->  @@ -5474

  macb_main.c (macb_tx_poll IMR barrier):
    @@ context unchanged, offsets only

  macb_main.c (macb_tx_complete tail-moved bool + watchdog hooks):
    @@ context unchanged, offsets only

Patch 3 (TX stall watchdog) keeps the v2 build fix sent to netdev as
Message-ID <20260515095336.92237-1-lukasz@raczylo.com> -- the
stall-warn is `if (printk_ratelimit()) netdev_warn(...)` rather than
the non-existent netdev_warn_ratelimited() macro that the original v2
series referenced.  Caught by John Laur on an independent Talos
build (siderolabs/sbc-raspberrypi#91).

Link: https://lore.kernel.org/netdev/20260514215459.36109-1-lukasz@raczylo.com/T/
Link: https://lore.kernel.org/netdev/20260515095336.92237-1-lukasz@raczylo.com/T/
Link: siderolabs/sbc-raspberrypi#91

Signed-off-by: Lukasz Raczylo <lukasz@raczylo.com>
@lukaszraczylo
Copy link
Copy Markdown
Contributor Author

Thanks for catching this and for the apply output — that was the missing context I needed.

Root cause: I produced the v2 patches in 8296039 from a tree based on net-next HEAD (6.19-rc1+). The macb cap constants and raspberrypi_rp1_config sit at different line numbers there than in linux 6.18.29, which is what kernel/Pkgfile actually builds. Same patch content, wrong anchors — hence Hunk #1 FAILED at 791 in macb.h and Hunk #3 FAILED at 5688 in macb_main.c.

Re-anchored against tag v6.18.29 (commit d31a849ff) and pushed two follow-up commits:

  • 84b1a08 fix: re-anchor macb v2 patches against linux 6.18.29 stable
  • 164fda0 docs: clarify macb patch 3 ratelimit form in README

No source-code changes versus the v2 series on lore — only context-line shifts and updated index hashes. Anchor diffs from 8296039:

File Old anchor New anchor
macb.h @@ -791 @@ -769
macb_main.c (macb_tx_restart) @@ -1922 @@ -1807
macb_main.c (macb_start_xmit) @@ -2560 @@ -2481
macb_main.c (raspberrypi_rp1_config) @@ -5674 @@ -5474

Patch 3 also carries the build fix sent to netdev as Message-ID <20260515095336.92237-1-lukasz@raczylo.com>: the stall-warn is if (printk_ratelimit()) netdev_warn(...) rather than the non-existent netdev_warn_ratelimited() macro the original v2 series referenced.

I haven't built the pkgs kernel image locally — please let CI re-run before merging; it should now apply cleanly against the 6.18.29 source tree.

Apologies for the noise — I should have verified the anchor base before pushing v2 instead of assuming the net-next tree my netdev patches were authored against matched the pkgs build tree.

Update the three net-macb silent-TX-stall patches from RFC v1 to
PATCH net-next v2.  The v2 series is on lore at:

  https://lore.kernel.org/netdev/20260514215459.36109-1-lukasz@raczylo.com/T/

v2 changes from v1 (already merged in siderolabs#1526):

  * 0001 (PCIe posted-write flush after TSTART doorbell) - now gated
    behind a new MACB_CAPS_PCIE_POSTED_WRITES capability, set only on
    raspberrypi_rp1_config.  v1 applied the readback to every macb
    variant; SoC-integrated parts (Atmel, Microchip, SiFive, Xilinx)
    have no fabric posted-write concern and were paying the
    non-posted-read latency for nothing.

  * 0002 (PCIe read barrier before TX completion descriptor check) -
    replaces the v1 form, which was a regression on read-clear ISR
    silicon.  v1 read ISR with a TCOMP mask in macb_tx_poll(); on
    raspberrypi_rp1_config (where MACB_CAPS_ISR_CLEAR_ON_WRITE is
    not set) that read consumes every bit set in ISR, but the
    use-site masks down to TCOMP and discards the rest -- any
    RCOMP / ROVR / TXUBR bit at that instant is silently consumed.
    v2 replaces with (void)queue_readl(queue, IMR), the read-only
    mask mirror -- non-destructive, same PCIe-barrier effect.

  * 0003 (TX stall watchdog) - tracks tail movement via a bool flag
    set by macb_tx_complete() instead of a tx_tail snapshot
    (form suggested by Phil Elwell on raspberrypi/linux#7340).
    Adds a netif_carrier_ok() gate.  Wraps netdev_warn in
    printk_ratelimit() so operators can count occurrences while
    bounding log noise.  (An earlier draft used the macro
    netdev_warn_ratelimited(), which does not exist in this
    kernel -- caught by John Laur's build test on the v2 patches.)

Production runtime so far: 24-node Pi 5 fleet on v2 patch-2
IMR-barrier form since 2026-05-14 14:00 UTC, ~190 cumulative
node-hours, zero mid-runtime TX stalls.  Pre-patch baseline
(~0.5 stall/node-hour) would have predicted ~95 stalls; observed 0.

Related:
  * netdev v1 RFC thread:    https://lore.kernel.org/netdev/cover.1777064117.git.lukasz@raczylo.com/T/
  * netdev v2 series:        https://lore.kernel.org/netdev/20260514215459.36109-1-lukasz@raczylo.com/T/
  * raspberrypi/linux merge: raspberrypi/linux#7340
  * raspberrypi/linux v2 PR: raspberrypi/linux#7369

Signed-off-by: Lukasz Raczylo <lukasz@raczylo.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
@smira smira force-pushed the macb-v2-followup branch from 76d0927 to 02bcfce Compare May 15, 2026 14:21
@smira
Copy link
Copy Markdown
Member

smira commented May 15, 2026

/m

@talos-bot talos-bot merged commit 02bcfce into siderolabs:main May 15, 2026
14 of 15 checks passed
@github-project-automation github-project-automation Bot moved this from Approved to Done in Planning May 15, 2026
@lukaszraczylo lukaszraczylo deleted the macb-v2-followup branch May 15, 2026 15:34
@shanduur shanduur moved this from Proposed to In Progress in Backports to v1.12 May 18, 2026
@shanduur shanduur moved this from In Progress to Backported in Backports to v1.12 May 19, 2026
@smira smira moved this from Proposed to Accepted in Backports to v1.13 May 21, 2026
@smira smira moved this from Accepted to In Progress in Backports to v1.13 May 22, 2026
@smira smira mentioned this pull request May 22, 2026
@smira smira moved this from In Progress to Backported in Backports to v1.13 May 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Backported
Status: Backported
Archived in project

Development

Successfully merging this pull request may close these issues.

4 participants