Commit 8ea87c9
committed
net: macb: add TX stall watchdog as defence-in-depth safety net
Patches 1/3 and 2/3 address two candidate races that could lead
to a TCOMP completion being missed on PCIe-attached macb
instances. This patch adds a defence-in-depth safety net, in
case a further race remains that we have not identified.
The watchdog is a per-queue delayed_work that runs once per
second. Movement is tracked via a tx_stall_tail_moved boolean:
macb_tx_complete() sets it under tx_ptr_lock whenever tx_tail
advances, and the watchdog clears it under the same lock at
each tick. If the ring is non-empty (tx_head != tx_tail) and
the boolean was still false at the next tick, the watchdog
calls macb_tx_restart().
A boolean is used in preference to snapshotting tx_tail and
comparing across ticks, because per-queue ring indices are
bounded and reused; under sustained load a snapshot comparison
can false-positive when the index happens to land on the same
value between two ticks. Both writes share tx_ptr_lock with the
existing tx_head / tx_tail updates, so no atomic is required.
No new recovery logic is introduced. macb_tx_restart() already
exists in this file, is correctly locked (tx_ptr_lock, bp->lock),
and verifies that the hardware's TBQP is behind the driver's
head index before re-asserting TSTART. On a healthy ring it is
a no-op at the hardware level; the watchdog only supplies the
missing trigger.
On a healthy queue the per-tick cost is one spin_lock_irqsave()
/ spin_unlock_irqrestore(), one branch, and one byte store. The
delayed_work is only scheduled between macb_open() and
macb_close(), and is cancelled synchronously on close.
Context for submission: on our 24-node Raspberry Pi 5 fleet,
before this series, an out-of-band user-space watchdog
(monitoring tx_packets from /sys/class/net/.../statistics and
toggling the link down/up when it froze) was required to keep
nodes usable. We include this kernel-side watchdog as a cleaner
in-kernel equivalent for any residual stall that patches 1 and
2 do not cover. We are willing to drop this patch if the view
is that 1 and 2 should stand alone.
Link: cilium/cilium#43198
Link: https://bugs.launchpad.net/ubuntu/+source/linux-raspi/+bug/2133877
Signed-off-by: Lukasz Raczylo <lukasz@raczylo.com>1 parent 3ccf780 commit 8ea87c9
2 files changed
Lines changed: 76 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1294 | 1294 | | |
1295 | 1295 | | |
1296 | 1296 | | |
| 1297 | + | |
| 1298 | + | |
| 1299 | + | |
| 1300 | + | |
| 1301 | + | |
| 1302 | + | |
| 1303 | + | |
| 1304 | + | |
| 1305 | + | |
| 1306 | + | |
| 1307 | + | |
1297 | 1308 | | |
1298 | 1309 | | |
1299 | 1310 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1505 | 1505 | | |
1506 | 1506 | | |
1507 | 1507 | | |
| 1508 | + | |
| 1509 | + | |
1508 | 1510 | | |
1509 | 1511 | | |
1510 | 1512 | | |
| |||
2028 | 2030 | | |
2029 | 2031 | | |
2030 | 2032 | | |
| 2033 | + | |
| 2034 | + | |
| 2035 | + | |
| 2036 | + | |
| 2037 | + | |
| 2038 | + | |
| 2039 | + | |
| 2040 | + | |
| 2041 | + | |
| 2042 | + | |
| 2043 | + | |
| 2044 | + | |
| 2045 | + | |
| 2046 | + | |
| 2047 | + | |
| 2048 | + | |
| 2049 | + | |
| 2050 | + | |
| 2051 | + | |
| 2052 | + | |
| 2053 | + | |
| 2054 | + | |
| 2055 | + | |
| 2056 | + | |
| 2057 | + | |
| 2058 | + | |
| 2059 | + | |
| 2060 | + | |
| 2061 | + | |
| 2062 | + | |
| 2063 | + | |
| 2064 | + | |
| 2065 | + | |
| 2066 | + | |
| 2067 | + | |
| 2068 | + | |
| 2069 | + | |
| 2070 | + | |
| 2071 | + | |
| 2072 | + | |
| 2073 | + | |
| 2074 | + | |
| 2075 | + | |
| 2076 | + | |
| 2077 | + | |
| 2078 | + | |
| 2079 | + | |
| 2080 | + | |
| 2081 | + | |
| 2082 | + | |
| 2083 | + | |
| 2084 | + | |
| 2085 | + | |
| 2086 | + | |
| 2087 | + | |
| 2088 | + | |
| 2089 | + | |
2031 | 2090 | | |
2032 | 2091 | | |
2033 | 2092 | | |
| |||
3294 | 3353 | | |
3295 | 3354 | | |
3296 | 3355 | | |
| 3356 | + | |
| 3357 | + | |
| 3358 | + | |
3297 | 3359 | | |
3298 | 3360 | | |
3299 | 3361 | | |
| |||
3340 | 3402 | | |
3341 | 3403 | | |
3342 | 3404 | | |
| 3405 | + | |
3343 | 3406 | | |
3344 | 3407 | | |
3345 | 3408 | | |
| |||
4938 | 5001 | | |
4939 | 5002 | | |
4940 | 5003 | | |
| 5004 | + | |
| 5005 | + | |
4941 | 5006 | | |
4942 | 5007 | | |
4943 | 5008 | | |
| |||
0 commit comments