Commit fc1be58
net: macb: add TX stall watchdog as defence-in-depth safety net
Patches 1/3 and 2/3 address two candidate races that could lead
to a TCOMP completion being missed on PCIe-attached macb
instances. This patch adds a defence-in-depth safety net, in
case a further race remains that we have not identified.
The watchdog is a per-queue delayed_work that runs once per
second. Movement is tracked via a tx_stall_tail_moved boolean:
macb_tx_complete() sets it under tx_ptr_lock whenever tx_tail
advances, and the watchdog clears it under the same lock at
each tick. If the ring is non-empty (tx_head != tx_tail) and
the boolean was still false at the next tick, the watchdog
calls macb_tx_restart().
A boolean is used in preference to snapshotting tx_tail and
comparing across ticks, because per-queue ring indices are
bounded and reused; under sustained load a snapshot comparison
can false-positive when the index happens to land on the same
value between two ticks. Both writes share tx_ptr_lock with the
existing tx_head / tx_tail updates, so no atomic is required.
No new recovery logic is introduced. macb_tx_restart() already
exists in this file, is correctly locked (tx_ptr_lock, bp->lock),
and verifies that the hardware's TBQP is behind the driver's
head index before re-asserting TSTART. On a healthy ring it is
a no-op at the hardware level; the watchdog only supplies the
missing trigger.
On a healthy queue the per-tick cost is one spin_lock_irqsave()
/ spin_unlock_irqrestore(), one branch, and one byte store. The
delayed_work is only scheduled between macb_open() and
macb_close(), and is cancelled synchronously on close.
Context for submission: on our 24-node Raspberry Pi 5 fleet,
before this series, an out-of-band user-space watchdog
(monitoring tx_packets from /sys/class/net/.../statistics and
toggling the link down/up when it froze) was required to keep
nodes usable. We include this kernel-side watchdog as a cleaner
in-kernel equivalent for any residual stall that patches 1 and
2 do not cover. We are willing to drop this patch if the view
is that 1 and 2 should stand alone.
Link: cilium/cilium#43198
Link: https://bugs.launchpad.net/ubuntu/+source/linux-raspi/+bug/2133877
Signed-off-by: Lukasz Raczylo <lukasz@raczylo.com>1 parent 78fbe20 commit fc1be58
2 files changed
Lines changed: 76 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1297 | 1297 | | |
1298 | 1298 | | |
1299 | 1299 | | |
| 1300 | + | |
| 1301 | + | |
| 1302 | + | |
| 1303 | + | |
| 1304 | + | |
| 1305 | + | |
| 1306 | + | |
| 1307 | + | |
| 1308 | + | |
| 1309 | + | |
| 1310 | + | |
1300 | 1311 | | |
1301 | 1312 | | |
1302 | 1313 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1491 | 1491 | | |
1492 | 1492 | | |
1493 | 1493 | | |
| 1494 | + | |
| 1495 | + | |
1494 | 1496 | | |
1495 | 1497 | | |
1496 | 1498 | | |
| |||
2023 | 2025 | | |
2024 | 2026 | | |
2025 | 2027 | | |
| 2028 | + | |
| 2029 | + | |
| 2030 | + | |
| 2031 | + | |
| 2032 | + | |
| 2033 | + | |
| 2034 | + | |
| 2035 | + | |
| 2036 | + | |
| 2037 | + | |
| 2038 | + | |
| 2039 | + | |
| 2040 | + | |
| 2041 | + | |
| 2042 | + | |
| 2043 | + | |
| 2044 | + | |
| 2045 | + | |
| 2046 | + | |
| 2047 | + | |
| 2048 | + | |
| 2049 | + | |
| 2050 | + | |
| 2051 | + | |
| 2052 | + | |
| 2053 | + | |
| 2054 | + | |
| 2055 | + | |
| 2056 | + | |
| 2057 | + | |
| 2058 | + | |
| 2059 | + | |
| 2060 | + | |
| 2061 | + | |
| 2062 | + | |
| 2063 | + | |
| 2064 | + | |
| 2065 | + | |
| 2066 | + | |
| 2067 | + | |
| 2068 | + | |
| 2069 | + | |
| 2070 | + | |
| 2071 | + | |
| 2072 | + | |
| 2073 | + | |
| 2074 | + | |
| 2075 | + | |
| 2076 | + | |
| 2077 | + | |
| 2078 | + | |
| 2079 | + | |
| 2080 | + | |
| 2081 | + | |
| 2082 | + | |
| 2083 | + | |
| 2084 | + | |
2026 | 2085 | | |
2027 | 2086 | | |
2028 | 2087 | | |
| |||
3252 | 3311 | | |
3253 | 3312 | | |
3254 | 3313 | | |
| 3314 | + | |
| 3315 | + | |
| 3316 | + | |
3255 | 3317 | | |
3256 | 3318 | | |
3257 | 3319 | | |
| |||
3302 | 3364 | | |
3303 | 3365 | | |
3304 | 3366 | | |
| 3367 | + | |
3305 | 3368 | | |
3306 | 3369 | | |
3307 | 3370 | | |
| |||
4914 | 4977 | | |
4915 | 4978 | | |
4916 | 4979 | | |
| 4980 | + | |
| 4981 | + | |
4917 | 4982 | | |
4918 | 4983 | | |
4919 | 4984 | | |
| |||
0 commit comments