kernel: macb silent TX stall — v2 patch series#1546
Conversation
bf35444 to
ba505c5
Compare
|
Thank you, we will get them in |
ba505c5 to
87ff93a
Compare
|
Looks like the patch doesn't apply: |
CI on this PR (siderolabs#1546) reported all three patches fail to apply against the source tree pkgs actually builds. Hunk siderolabs#1 FAILED at 791 in macb.h and Hunk siderolabs#3 FAILED at 5688 in macb_main.c. Root cause: the v2 patches in 8296039 were produced from a tree based on net-next HEAD (6.19-rc1+), where the macb cap constants and raspberrypi_rp1_config sit at different line numbers than in linux 6.18.29. Pkgfile pins linux to 6.18.29. Patches re-generated against tag v6.18.29 (commit d31a849ff). All three apply cleanly with -p1; no source code changes versus the v2 series on lore -- only context-line shifts and updated index hashes. Anchor differences vs 8296039: macb.h: @@ -791 -> @@ -769 (no QUEUE_NEEDS_TSUCLK_FLUSH cap in 6.18.y) macb_main.c (macb_tx_restart): @@ -1922 -> @@ -1807 macb_main.c (macb_start_xmit): @@ -2560 -> @@ -2481 (still gated behind MACB_CAPS_PCIE_POSTED_WRITES) macb_main.c (raspberrypi_rp1_config): @@ -5674 -> @@ -5474 macb_main.c (macb_tx_poll IMR barrier): @@ context unchanged, offsets only macb_main.c (macb_tx_complete tail-moved bool + watchdog hooks): @@ context unchanged, offsets only Patch 3 (TX stall watchdog) keeps the v2 build fix sent to netdev as Message-ID <20260515095336.92237-1-lukasz@raczylo.com> -- the stall-warn is `if (printk_ratelimit()) netdev_warn(...)` rather than the non-existent netdev_warn_ratelimited() macro that the original v2 series referenced. Caught by John Laur on an independent Talos build (siderolabs/sbc-raspberrypi#91). Link: https://lore.kernel.org/netdev/20260514215459.36109-1-lukasz@raczylo.com/T/ Link: https://lore.kernel.org/netdev/20260515095336.92237-1-lukasz@raczylo.com/T/ Link: siderolabs/sbc-raspberrypi#91 Signed-off-by: Lukasz Raczylo <lukasz@raczylo.com>
|
Thanks for catching this and for the apply output — that was the missing context I needed. Root cause: I produced the v2 patches in 8296039 from a tree based on Re-anchored against tag
No source-code changes versus the v2 series on lore — only context-line shifts and updated index hashes. Anchor diffs from 8296039:
Patch 3 also carries the build fix sent to netdev as Message-ID I haven't built the pkgs kernel image locally — please let CI re-run before merging; it should now apply cleanly against the 6.18.29 source tree. Apologies for the noise — I should have verified the anchor base before pushing v2 instead of assuming the net-next tree my netdev patches were authored against matched the pkgs build tree. |
Update the three net-macb silent-TX-stall patches from RFC v1 to PATCH net-next v2. The v2 series is on lore at: https://lore.kernel.org/netdev/20260514215459.36109-1-lukasz@raczylo.com/T/ v2 changes from v1 (already merged in siderolabs#1526): * 0001 (PCIe posted-write flush after TSTART doorbell) - now gated behind a new MACB_CAPS_PCIE_POSTED_WRITES capability, set only on raspberrypi_rp1_config. v1 applied the readback to every macb variant; SoC-integrated parts (Atmel, Microchip, SiFive, Xilinx) have no fabric posted-write concern and were paying the non-posted-read latency for nothing. * 0002 (PCIe read barrier before TX completion descriptor check) - replaces the v1 form, which was a regression on read-clear ISR silicon. v1 read ISR with a TCOMP mask in macb_tx_poll(); on raspberrypi_rp1_config (where MACB_CAPS_ISR_CLEAR_ON_WRITE is not set) that read consumes every bit set in ISR, but the use-site masks down to TCOMP and discards the rest -- any RCOMP / ROVR / TXUBR bit at that instant is silently consumed. v2 replaces with (void)queue_readl(queue, IMR), the read-only mask mirror -- non-destructive, same PCIe-barrier effect. * 0003 (TX stall watchdog) - tracks tail movement via a bool flag set by macb_tx_complete() instead of a tx_tail snapshot (form suggested by Phil Elwell on raspberrypi/linux#7340). Adds a netif_carrier_ok() gate. Wraps netdev_warn in printk_ratelimit() so operators can count occurrences while bounding log noise. (An earlier draft used the macro netdev_warn_ratelimited(), which does not exist in this kernel -- caught by John Laur's build test on the v2 patches.) Production runtime so far: 24-node Pi 5 fleet on v2 patch-2 IMR-barrier form since 2026-05-14 14:00 UTC, ~190 cumulative node-hours, zero mid-runtime TX stalls. Pre-patch baseline (~0.5 stall/node-hour) would have predicted ~95 stalls; observed 0. Related: * netdev v1 RFC thread: https://lore.kernel.org/netdev/cover.1777064117.git.lukasz@raczylo.com/T/ * netdev v2 series: https://lore.kernel.org/netdev/20260514215459.36109-1-lukasz@raczylo.com/T/ * raspberrypi/linux merge: raspberrypi/linux#7340 * raspberrypi/linux v2 PR: raspberrypi/linux#7369 Signed-off-by: Lukasz Raczylo <lukasz@raczylo.com> Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
|
/m |
Update the three
net-macb-silent-tx-stallpatches from RFC v1 to PATCH net-next v2. Follow-up to #1526.Background
The v1 RFC drew constructive review on netdev (Andrea della Porta @ SUSE on 2026-05-05; Théo Lebrun @ Bootlin / new macb maintainer on 2026-05-14) and on the rpi vendor fork (Phil Elwell on raspberrypi/linux#7340). v2 incorporates that feedback plus a regression I caught in v1 patch 2 during self-audit.
v2 changes from v1
0001 (PCIe posted-write flush after TSTART doorbell)
v1 applied the NCR readback to every macb variant. SoC-integrated parts (Atmel, Microchip, SiFive, Xilinx) have no fabric posted-write concern and were paying the non-posted-read latency for nothing.
v2 gates the readback behind a new
MACB_CAPS_PCIE_POSTED_WRITEScapability, set only onraspberrypi_rp1_config.0002 (PCIe read barrier before TX completion descriptor check) ← the regression fix
v1's
if ((queue_readl(queue, ISR) & MACB_BIT(TCOMP)) || macb_tx_complete_pending(queue))is destructive on read-clear ISR silicon — which is whatraspberrypi_rp1_configdefaults to (MACB_CAPS_ISR_CLEAR_ON_WRITEnot set). The read consumes every bit set in the register, but the use-site masks down to TCOMP and discards the rest. Any RCOMP / ROVR / TXUBR set at that instant is silently consumed; on a level-triggered IRQ the consumed bit drops the line before GIC delivery, so the IRQ for that event vanishes.Race window is ~200-500 ns per
macb_tx_pollcompletion. At moderate-to-heavy RX load it works out to non-negligible RX-completion loss. Disclosed in the v2 cover letter and reply.v2 replaces the destructive read with
(void)queue_readl(queue, IMR). IMR is the read-only mask mirror; the read has no side effects on either read-clear or W1C ISR silicon and still serves as an architected PCIe read barrier for prior peripheral DMA writes.0003 (TX stall watchdog)
Three changes:
bool tx_stall_tail_movedset bymacb_tx_complete()rather than atx_tailsnapshot. Form requested by Phil Elwell on net: macb: candidate fixes for silent TX stall on BCM2712/RP1 raspberrypi/linux#7340 (merged 2026-05-08).netif_carrier_ok()gate at the top of the watchdog tick — eliminates a boot-time false positive that fires whentx_headadvances from kernel-queued packets before link autoneg completes (observed at ~25% rate during a 24-node fleet rolling reboot).netdev_warn_once→netdev_warn_ratelimitedso operators can count real events across the lifetime of the netdev (Andrea della Porta's ask).Production runtime so far
The v2 patch-2 IMR-barrier form has been running in production on a 24-node Raspberry Pi 5 fleet since 2026-05-14 14:00 UTC. ~190 cumulative node-hours so far:
tail=0 head=N— eliminated by v2 patch 3's carrier gate)Pre-patch baseline (~0.5 stall/node-hour at fleet level, observed in our 2026-04-24 reference window) would have predicted ~95 mid-runtime stalls in 190 node-hours; observed is 0.
Related
Files
0001-...flush-PCIe-posted-write-after-TSTART-doorbe.patch0001-...flush-PCIe-posted-write-after-TSTART-doorbe.patch(v2: cap-gated)0002-...re-check-ISR-after-IER-re-enable-in-macb_tx.patch0002-...insert-PCIe-read-barrier-before-TX-completi.patch(v2: IMR barrier instead of destructive ISR read)0003-...add-TX-stall-watchdog-as-defence-in-depth-s.patch0003-...add-TX-stall-watchdog-to-recover-from-lost-.patch(v2: bool flag + carrier_ok + ratelimited)README.mdupdated accordingly.Happy to split / squash / rebase on feedback.