ucentral-schema: bridge SSID tx_failed/tx_retries gap on mt76 and ath11k#1101
Open
firasshaari wants to merge 1 commit into
Open
Conversation
mt76 (mt7621/mt7915) and ath11k (QSDK) drivers on the wlan-ap kernel do
not propagate per-STA TX status to mac80211. As a result, the Kafka
state payload always reports
interfaces[].ssids[].counters.tx_failed = 0
interfaces[].ssids[].counters.tx_retries = 0
interfaces[].ssids[].delta_counters.tx_failed = 0
interfaces[].ssids[].delta_counters.tx_retries = 0
even under heavy traffic, leaving cloud consumers with no TX-failure
signal to work with.
Each driver does count semantically-comparable RF-only retry/fail data
in its own debugfs interface:
mt76 reads: /sys/kernel/debug/ieee80211/<phy>/mt76/tx_stats
BA miss count (unicast block-ACK miss count)
ath11k reads: /sys/kernel/debug/ieee80211/<phy>/ath11k/htt_stats
tx_xretry from HTT_TX_PDEV_STATS_CMN_TLV (type 1)
triggered by writing 1 to .../ath11k/htt_stats_type
(excess-retry count -- frames whose ACK never came back)
Both counters represent the same thing -- unicast frames that needed a
retry because the receiver never ack-ed -- so the values are directly
comparable across driver families.
The phy-aggregate count is then projected onto each VAP on the radio,
weighted by the VAPs tx_packets share. A VAP with no traffic gets
nothing; a VAP carrying most of the load takes most of the failure
budget. The existing generate_deltas pipeline then populates
ssid.delta_counters from those values unchanged. No schema additions,
no new fields -- the fields that consumers already key off just stop
being zero.
Verified end-to-end on:
- yuncore_ax820 (ramips/mt7621, mt7915e) -> mt76 BA miss
- yuncore_fap655 (ipq50xx, ath11k QSDK) -> ath11k tx_xretry
Both APs show monotonically increasing nonzero tx_failed and tx_retries
on their SSID counters and delta_counters across consecutive samples in
the live Kafka state topic.
tx_retries is set equal to the per-SSID failure value (neither driver
exposes a separate per-frame retry counter at this level).
associations[].tx_failed/tx_retries remain 0 -- those need a driver-
level fix.
Signed-off-by: Firas Shaari <firas@80211networks.com>
Contributor
|
@firasshaari patches need to land in ucentral-schema HEAD before we can backport them to the LTS branch |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
mt76 (mt7621/mt7915) and ath11k (QSDK) drivers on the wlan-ap kernel do not propagate per-STA TX status to mac80211. As a result, the Kafka
statepayload always reports…even under heavy traffic. Cloud consumers (Kafka subscribers, dashboards, alerting) get no TX-failure signal at all.
Approach
Each driver does count semantically-comparable RF-only retry/fail data in its own debugfs interface — we just have to read from the right place:
/sys/kernel/debug/ieee80211/<phy>/mt76/tx_statsBA miss count(unicast block-ACK miss)/sys/kernel/debug/ieee80211/<phy>/ath11k/htt_stats(write1tohtt_stats_typefirst)tx_xretryfromHTT_TX_PDEV_STATS_CMN_TLV(type 1)Both counters represent the same thing: unicast frames whose ACK never came back and triggered a retry. Values are directly comparable across driver families.
state.ucnow reads whichever path matches the phy's driver, then projects the phy-aggregate failure count onto each VAP on the radio, weighted bytx_packetsshare so heavier-traffic VAPs absorb the larger fraction of failures. The existinggenerate_deltaspipeline picks up the new values intossid.delta_countersautomatically.No schema additions, no new fields — the existing schema fields that consumers already key off just stop being zero.
What changed
One file added under
feeds/ucentral/ucentral-schema/patches/:Applied by openwrt's package build system to the wlan-ucentral-schema source (
system/state.uc). Same mechanism as patches 001/002/003 already in that directory. Drivers, kernel, and the upstream schema repo are untouched.Verification
Built and deployed on both representative APs against this branch:
yuncore_ax820(ramips/mt7621)tx_failed/tx_retriesper SSIDyuncore_fap655(ipq50xx)tx_failed/tx_retriesper SSIDLive sample from
statetopic, both APs simultaneously serving real STA traffic:Both counters increment monotonically across consecutive 60s samples. Raw phy counters (mt76
BA miss, ath11ktx_xretry) confirmed in the same magnitude / scale across drivers, proving the values represent the same thing.Caveats
tx_retriesis set equal to the per-SSID failure value because neither driver exposes a separate per-frame retry counter at this aggregation level. Better than zero; refine when the per-STA driver fix lands.associations[].tx_failed/associations[].tx_retries(per-STA inside the SSID block) remain zero — those need a driver-level fix in mt76 PPDU-TXS handling and ath11k WBM→sta_info path. Out of scope here.htt_statsrequest by writing1tohtt_stats_typeon each state.uc tick, with a 100 ms sleep before reading the response. State.uc already runs at ≥ 60s intervals so the overhead is negligible.Telecominfraproject/wlan-ucentral-schemaupstream; this local patch can then be removed when the schema pin in wlan-ap is bumped.Test plan
yuncore_ax820builds cleanly with the patchyuncore_fap655builds cleanly with the patchstatetopic shows non-zerotx_failed/tx_retrieson the SSIDcountersanddelta_countersblocks for both AP familiesBA miss, ath11ktx_xretry) confirmed in the same magnitude across drivers