Commit f6ac515
net: macb: add TX stall watchdog as defence-in-depth safety net
Patches 1/3 and 2/3 address two candidate races that could lead
to a TCOMP completion being missed on PCIe-attached macb
instances. This patch adds a defence-in-depth safety net, in
case a further race remains that we have not identified.
The watchdog is a per-queue delayed_work that runs once per
second. Movement is tracked via a tx_stall_tail_moved boolean:
macb_tx_complete() sets it under tx_ptr_lock whenever tx_tail
advances, and the watchdog clears it under the same lock at
each tick. If the ring is non-empty (tx_head != tx_tail) and
the boolean was still false at the next tick, the watchdog
calls macb_tx_restart().
A boolean is used in preference to snapshotting tx_tail and
comparing across ticks, because per-queue ring indices are
bounded and reused; under sustained load a snapshot comparison
can false-positive when the index happens to land on the same
value between two ticks. Both writes share tx_ptr_lock with the
existing tx_head / tx_tail updates, so no atomic is required.
No new recovery logic is introduced. macb_tx_restart() already
exists in this file, is correctly locked (tx_ptr_lock, bp->lock),
and verifies that the hardware's TBQP is behind the driver's
head index before re-asserting TSTART. On a healthy ring it is
a no-op at the hardware level; the watchdog only supplies the
missing trigger.
On a healthy queue the per-tick cost is one spin_lock_irqsave()
/ spin_unlock_irqrestore(), one branch, and one byte store. The
delayed_work is only scheduled between macb_open() and
macb_close(), and is cancelled synchronously on close.
Context for submission: on our 24-node Raspberry Pi 5 fleet,
before this series, an out-of-band user-space watchdog
(monitoring tx_packets from /sys/class/net/.../statistics and
toggling the link down/up when it froze) was required to keep
nodes usable. We include this kernel-side watchdog as a cleaner
in-kernel equivalent for any residual stall that patches 1 and
2 do not cover. We are willing to drop this patch if the view
is that 1 and 2 should stand alone.
Link: cilium/cilium#43198
Link: https://bugs.launchpad.net/ubuntu/+source/linux-raspi/+bug/2133877
Signed-off-by: Lukasz Raczylo <lukasz@raczylo.com>1 parent 6292b36 commit f6ac515
2 files changed
Lines changed: 76 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1289 | 1289 | | |
1290 | 1290 | | |
1291 | 1291 | | |
| 1292 | + | |
| 1293 | + | |
| 1294 | + | |
| 1295 | + | |
| 1296 | + | |
| 1297 | + | |
| 1298 | + | |
| 1299 | + | |
| 1300 | + | |
| 1301 | + | |
| 1302 | + | |
1292 | 1303 | | |
1293 | 1304 | | |
1294 | 1305 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1464 | 1464 | | |
1465 | 1465 | | |
1466 | 1466 | | |
| 1467 | + | |
| 1468 | + | |
1467 | 1469 | | |
1468 | 1470 | | |
1469 | 1471 | | |
| |||
1998 | 2000 | | |
1999 | 2001 | | |
2000 | 2002 | | |
| 2003 | + | |
| 2004 | + | |
| 2005 | + | |
| 2006 | + | |
| 2007 | + | |
| 2008 | + | |
| 2009 | + | |
| 2010 | + | |
| 2011 | + | |
| 2012 | + | |
| 2013 | + | |
| 2014 | + | |
| 2015 | + | |
| 2016 | + | |
| 2017 | + | |
| 2018 | + | |
| 2019 | + | |
| 2020 | + | |
| 2021 | + | |
| 2022 | + | |
| 2023 | + | |
| 2024 | + | |
| 2025 | + | |
| 2026 | + | |
| 2027 | + | |
| 2028 | + | |
| 2029 | + | |
| 2030 | + | |
| 2031 | + | |
| 2032 | + | |
| 2033 | + | |
| 2034 | + | |
| 2035 | + | |
| 2036 | + | |
| 2037 | + | |
| 2038 | + | |
| 2039 | + | |
| 2040 | + | |
| 2041 | + | |
| 2042 | + | |
| 2043 | + | |
| 2044 | + | |
| 2045 | + | |
| 2046 | + | |
| 2047 | + | |
| 2048 | + | |
| 2049 | + | |
| 2050 | + | |
| 2051 | + | |
| 2052 | + | |
| 2053 | + | |
| 2054 | + | |
| 2055 | + | |
| 2056 | + | |
| 2057 | + | |
| 2058 | + | |
| 2059 | + | |
2001 | 2060 | | |
2002 | 2061 | | |
2003 | 2062 | | |
| |||
3256 | 3315 | | |
3257 | 3316 | | |
3258 | 3317 | | |
| 3318 | + | |
| 3319 | + | |
| 3320 | + | |
3259 | 3321 | | |
3260 | 3322 | | |
3261 | 3323 | | |
| |||
3306 | 3368 | | |
3307 | 3369 | | |
3308 | 3370 | | |
| 3371 | + | |
3309 | 3372 | | |
3310 | 3373 | | |
3311 | 3374 | | |
| |||
4912 | 4975 | | |
4913 | 4976 | | |
4914 | 4977 | | |
| 4978 | + | |
| 4979 | + | |
4915 | 4980 | | |
4916 | 4981 | | |
4917 | 4982 | | |
| |||
0 commit comments