Compare commits

..

2 Commits

Author SHA1 Message Date
John Crispin
cb911ccd0f ath11k: fix num_peers counter corruption and add debug logging
The num_peers counter becomes corrupted during peer deletion due to race
conditions between ath11k_peer_delete() and ath11k_peer_unmap_event().
The firmware may or may not send unmap events, and the timing varies,
causing the counter to either leak (increment without decrement) or
underflow (double decrement).

Root causes:
1. ath11k_peer_delete() doesn't decrement num_peers, relying on
   ath11k_peer_unmap_event() to do it
2. Firmware sometimes doesn't send unmap events, leaving num_peers
   inflated
3. When unmap events do arrive, timing races with ath11k_peer_delete()
   can cause missed decrements
4. Cleanup paths may double-decrement if delete_in_progress not checked
5. num_peers modified outside proper locking in some paths

This fix:
- Moves num_peers decrement into ath11k_peer_delete() after successful
  peer deletion wait, ensuring exactly one decrement per deletion
- Handles both cases: peer removed by unmap event, or peer still in list
- Removes num_peers decrement from ath11k_peer_unmap_event() to prevent
  double-decrement when unmap event arrives
- Adds ath11k_dp_peer_cleanup() call before ath11k_peer_delete() in
  roaming path to ensure datapath structures properly cleaned up
- Adds delete_in_progress checks in cleanup paths to prevent
  double-delete
- Ensures all num_peers modifications happen under base_lock
- Adds comprehensive debug logging to track num_peers throughout peer
  lifecycle

Signed-off-by: Arif Alam <arif.alam@netexperience.com>
Signed-off-by: John Crispin <john@phrozen.org>
2025-10-02 11:48:16 +02:00
Venkat Chimata
ce89ff0ffe WIFI-14998: wifi: ap: mitigate peer-delete WMI timeout to reduce blind period & prevent peer leaks
1. When a connected client roams to another AP, the AP is trying to delete the peer
   but for some reason the WMI command times out and while driver is waiting for
   the response, we observed that the AP doesn't respond to any frames from STA
   (probe requests, authentication etc) and once the response times out (3seconds default)
   then AP starts responding to the older requets but client has already connected to
   another AP. As the root cause for the response timing out is in the FW, we added
   a WAR to reduce the timeout to minimize this blind period, with this AP responds
   after 100ms and client connects successfully. And 100ms timeout is also reasonable
   for this internal operation.
2. In case of peer deletion timeout, the driver peer database is not cleared, so,
   if this happens often (which it is) then eventually we hit the max peers in the
   driver and all subsequent operations fail, so, in case of timeout ignore the failure
   and proceed with driver peer database cleanup.

Signed-off-by: Venkat Chimata <venkat@nearhop.com>
2025-10-02 11:47:22 +02:00
9 changed files with 280 additions and 30 deletions

View File

@@ -0,0 +1,53 @@
From 375d0d25e6c02991392e44956c81cbac84909f49 Mon Sep 17 00:00:00 2001
From: Venkat Chimata <venkat@nearhop.com>
Date: Thu, 4 Sep 2025 00:09:17 +0530
Subject: [PATCH] wifi: ap: mitigate peer-delete WMI timeout to reduce blind
period & prevent peer leaks
1. When a connected client roams to another AP, the AP is trying to delete the peer
but for some reason the WMI command times out and while driver is waiting for
the response, we observed that the AP doesn't respond to any frames from STA
(probe requests, authentication etc) and once the response times out (3seconds default)
then AP starts responding to the older requets but client has already connected to
another AP. As the root cause for the response timing out is in the FW, we added
a WAR to reduce the timeout to minimize this blind period, with this AP responds
after 100ms and client connects successfully. And 100ms timeout is also reasonable
for this internal operation.
2. In case of peer deletion timeout, the driver peer database is not cleared, so,
if this happens often (which it is) then eventually we hit the max peers in the
driver and all subsequent operations fail, so, in case of timeout ignore the failure
and proceed with driver peer database cleanup.
Signed-off-by: Venkat Chimata <venkat@nearhop.com>
---
drivers/net/wireless/ath/ath11k/peer.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/net/wireless/ath/ath11k/peer.c b/drivers/net/wireless/ath/ath11k/peer.c
index 1907067..aefc6ba 100644
--- a/drivers/net/wireless/ath/ath11k/peer.c
+++ b/drivers/net/wireless/ath/ath11k/peer.c
@@ -771,7 +771,7 @@ int ath11k_wait_for_peer_delete_done(struct ath11k *ar, u32 vdev_id,
}
time_left = wait_for_completion_timeout(&ar->peer_delete_done,
- 3 * HZ);
+ 100 * HZ / 1000);
if (time_left == 0) {
ath11k_warn(ar->ab, "Timeout in receiving peer delete response\n");
return -ETIMEDOUT;
@@ -857,7 +857,10 @@ int ath11k_peer_delete(struct ath11k *ar, u32 vdev_id, u8 *addr)
}
ret = ath11k_wait_for_peer_delete_done(ar, vdev_id, addr);
- if (ret)
+ /* WAR: For the timeout case, proceed to delete the peer anyway, as FW is
+ * still functional, without this, driver ends up hitting max peers
+ */
+ if (ret && ret != -ETIMEDOUT)
return ret;
ATH11K_MEMORY_STATS_DEC(ar->ab, per_peer_object,
--
2.34.1

View File

@@ -0,0 +1,193 @@
From: John Crispin <john@phrozen.org>
Date: Thu, 2 Oct 2025 09:00:00 +0000
Subject: [PATCH] ath11k: fix num_peers counter corruption and add debug
logging
The num_peers counter becomes corrupted during peer deletion due to race
conditions between ath11k_peer_delete() and ath11k_peer_unmap_event().
The firmware may or may not send unmap events, and the timing varies,
causing the counter to either leak (increment without decrement) or
underflow (double decrement).
Root causes:
1. ath11k_peer_delete() doesn't decrement num_peers, relying on
ath11k_peer_unmap_event() to do it
2. Firmware sometimes doesn't send unmap events, leaving num_peers
inflated
3. When unmap events do arrive, timing races with ath11k_peer_delete()
can cause missed decrements
4. Cleanup paths may double-decrement if delete_in_progress not checked
5. num_peers modified outside proper locking in some paths
This fix:
- Moves num_peers decrement into ath11k_peer_delete() after successful
peer deletion wait, ensuring exactly one decrement per deletion
- Handles both cases: peer removed by unmap event, or peer still in list
- Removes num_peers decrement from ath11k_peer_unmap_event() to prevent
double-decrement when unmap event arrives
- Adds ath11k_dp_peer_cleanup() call before ath11k_peer_delete() in
roaming path to ensure datapath structures properly cleaned up
- Adds delete_in_progress checks in cleanup paths to prevent
double-delete
- Ensures all num_peers modifications happen under base_lock
- Adds comprehensive debug logging to track num_peers throughout peer
lifecycle
Signed-off-by: Arif Alam <arif.alam@netexperience.com>
Signed-off-by: John Crispin <john@phrozen.org>
---
--- a/drivers/net/wireless/ath/ath11k/mac.c
+++ b/drivers/net/wireless/ath/ath11k/mac.c
@@ -5742,14 +5742,22 @@ static int ath11k_mac_op_sta_state(struc
mutex_lock(&ar->ab->tbl_mtx_lock);
spin_lock_bh(&ar->ab->base_lock);
peer = ath11k_peer_find(ar->ab, arvif->vdev_id, sta->addr);
- if (peer && peer->sta == sta) {
+ /* Skip if peer deletion already in progress to prevent
+ * double-delete and num_peers underflow
+ */
+ if (peer && peer->sta == sta && !peer->delete_in_progress) {
ath11k_warn(ar->ab, "Found peer entry %pM n vdev %i after it was supposedly removed\n",
vif->addr, arvif->vdev_id);
ath11k_peer_rhash_delete(ar->ab, peer);
peer->sta = NULL;
+ /* num_peers decrement now happens under base_lock when
+ * peer is actually removed from list
+ */
list_del(&peer->list);
kfree(peer);
ar->num_peers--;
+ ath11k_dbg(ar->ab, ATH11K_DBG_PEER, "%s peer deleted %pM vdev_id: %d num_peers: %d\n",
+ __func__, sta->addr, arvif->vdev_id, ar->num_peers);
}
spin_unlock_bh(&ar->ab->base_lock);
mutex_unlock(&ar->ab->tbl_mtx_lock);
@@ -7847,6 +7855,8 @@ err_peer_del:
goto err_keyid;
ar->num_peers--;
+ ath11k_dbg(ar->ab, ATH11K_DBG_PEER, "%s vif peer deleted %pM vdev_id: %d num_peers: %d\n",
+ __func__, vif->addr, arvif->vdev_id, ar->num_peers);
}
err_vdev_del:
--- a/drivers/net/wireless/ath/ath11k/peer.c
+++ b/drivers/net/wireless/ath/ath11k/peer.c
@@ -461,6 +461,9 @@ void ath11k_peer_unmap_event(struct ath1
ath11k_dbg(ab, ATH11K_DBG_PEER, "peer unmap vdev %d peer %pM id %d\n",
peer->vdev_id, peer->addr, peer_id);
+ /* Don't decrement num_peers here - it's already decremented in
+ * ath11k_peer_delete() after successful wait. Just clean up the peer.
+ */
list_del(&peer->list);
kfree(peer);
wake_up(&ab->peer_mapping_wq);
@@ -726,6 +729,10 @@ void ath11k_peer_cleanup(struct ath11k *
if (peer->vdev_id != vdev_id)
continue;
+ /* Skip peers that are being deleted to prevent double-free */
+ if (peer->delete_in_progress)
+ continue;
+
ath11k_warn(ab, "removing stale peer %pM from vdev_id %d\n",
peer->addr, vdev_id);
@@ -743,7 +750,10 @@ void ath11k_peer_cleanup(struct ath11k *
ath11k_peer_rhash_delete(ab, peer);
list_del(&peer->list);
kfree(peer);
+ /* num_peers decrement happens here under base_lock */
ar->num_peers--;
+ ath11k_dbg(ar->ab, ATH11K_DBG_PEER, "%s peer cleanup %pM vdev_id: %d num_peers: %d\n",
+ __func__, peer->addr, vdev_id, ar->num_peers);
}
spin_unlock_bh(&ab->base_lock);
@@ -824,6 +834,12 @@ int ath11k_peer_delete(struct ath11k *ar
#ifdef CPTCFG_ATH11K_NSS_SUPPORT
peer->delete_in_progress = true;
+#else
+ if (peer)
+ peer->delete_in_progress = true;
+#endif
+
+#ifdef CPTCFG_ATH11K_NSS_SUPPORT
if (peer->self_ast_entry) {
ath11k_peer_del_ast(ar, peer->self_ast_entry);
peer->self_ast_entry = NULL;
@@ -863,10 +879,51 @@ int ath11k_peer_delete(struct ath11k *ar
if (ret && ret != -ETIMEDOUT)
return ret;
- ATH11K_MEMORY_STATS_DEC(ar->ab, per_peer_object,
- sizeof(struct ath11k_peer));
+ /* If timeout occurred, manually remove peer from list since firmware
+ * won't send unmap event. This prevents peer leaks and num_peers corruption.
+ */
+ if (ret == -ETIMEDOUT) {
+ ath11k_warn(ar->ab, "peer delete timeout %pM vdev %d, manually cleaning up\n",
+ addr, vdev_id);
- ar->num_peers--;
+ mutex_lock(&ar->ab->tbl_mtx_lock);
+ spin_lock_bh(&ar->ab->base_lock);
+ peer = ath11k_peer_find(ar->ab, vdev_id, addr);
+ if (peer) {
+ list_del(&peer->list);
+ kfree(peer);
+ ar->num_peers--;
+ ath11k_dbg(ar->ab, ATH11K_DBG_PEER,
+ "%s peer deleted (timeout) %pM vdev_id: %d num_peers: %d\n",
+ __func__, addr, vdev_id, ar->num_peers);
+ }
+ spin_unlock_bh(&ar->ab->base_lock);
+ mutex_unlock(&ar->ab->tbl_mtx_lock);
+ } else {
+ /* Normal path - but firmware may not send unmap event, so decrement here
+ * after successful peer deletion wait
+ */
+ mutex_lock(&ar->ab->tbl_mtx_lock);
+ spin_lock_bh(&ar->ab->base_lock);
+ peer = ath11k_peer_find(ar->ab, vdev_id, addr);
+ if (peer) {
+ /* Peer still in list - firmware didn't send unmap event yet */
+ list_del(&peer->list);
+ kfree(peer);
+ ar->num_peers--;
+ ath11k_dbg(ar->ab, ATH11K_DBG_PEER,
+ "%s peer deleted (no unmap event) %pM vdev_id: %d num_peers: %d\n",
+ __func__, addr, vdev_id, ar->num_peers);
+ } else {
+ /* Peer already removed by unmap event - still need to decrement */
+ ar->num_peers--;
+ ath11k_dbg(ar->ab, ATH11K_DBG_PEER,
+ "%s peer deleted (via unmap event) %pM vdev_id: %d num_peers: %d\n",
+ __func__, addr, vdev_id, ar->num_peers);
+ }
+ spin_unlock_bh(&ar->ab->base_lock);
+ mutex_unlock(&ar->ab->tbl_mtx_lock);
+ }
return 0;
}
@@ -905,6 +962,7 @@ int ath11k_peer_create(struct ath11k *ar
if (vdev_id == param->vdev_id)
return -EINVAL;
+ ath11k_dp_peer_cleanup(ar, vdev_id, param->peer_addr);
ath11k_peer_delete(ar, vdev_id, param->peer_addr);
}
@@ -970,7 +1028,8 @@ int ath11k_peer_create(struct ath11k *ar
ar->num_peers++;
if (ath11k_mac_sta_level_info(arvif, sta)) {
- ath11k_dbg(ar->ab, ATH11K_DBG_PEER, "peer created %pM\n", param->peer_addr);
+ ath11k_dbg(ar->ab, ATH11K_DBG_PEER, "peer created %pM vdev_id: %d num_peers: %d\n",
+ param->peer_addr, param->vdev_id, ar->num_peers);
peer->peer_logging_enabled = true;
}

View File

@@ -1180,7 +1180,7 @@ CONFIG_QTI_LICENSE_MANAGER=y
# CONFIG_SYMBOLIC_ERRNAME is not set
# CONFIG_LEDS_BRIGHTNESS_HW_CHANGED is not set
# CONFIG_LEDS_CLASS_MULTICOLOR is not set
CONFIG_LEDS_TRIGGER_HEARTBEAT=y
# CONFIG_LEDS_TRIGGER_HEARTBEAT is not set
# CONFIG_MITIGATE_SPECTRE_BRANCH_HISTORY is not set
# CONFIG_PINCTRL_SINGLE is not set
# CONFIG_RANDOM_TRUST_BOOTLOADER is not set

View File

@@ -1,5 +1,5 @@
{
"major": 4,
"minor": 1,
"patch": 1
"patch": 0
}

View File

@@ -275,27 +275,23 @@ handlers = {
},
vlan_add: function(notify) {
if (config.config.swconfig)
return handlers.vlan_add_swconfig(notify);
let vlan_id = `${notify.data.vlan_id}`;
vlan_refcount[vlan_id] = (vlan_refcount[vlan_id] || 0) + 1;
let parts = split(notify.data.ifname, '-v');
let is_wifi_iface = (length(parts) == 2 && wildcard(parts[0], 'wlan*'));
if (vlan_refcount[vlan_id] > 1 && !is_wifi_iface)
if (vlan_refcount[vlan_id] > 1) {
return;
}
if (vlan_refcount[vlan_id] == 1) {
for (let wan in wan_ports) {
let msg = {
name: wan,
vlan: [ `${notify.data.vlan_id}:t` ]
};
ubus.call('network.interface.up_none', 'add_device', msg);
ubus.call('udevstats', 'add_device', { device: wan, vlan: +notify.data.vlan_id });
}
if (config.config.swconfig)
return handlers.vlan_add_swconfig(notify);
for (let wan in wan_ports) {
let msg = {
name: wan,
vlan: [ `${notify.data.vlan_id}:t` ]
};
ubus.call('network.interface.up_none', 'add_device', msg);
ubus.call('udevstats', 'add_device', { device: wan, vlan: +notify.data.vlan_id });
}
let msg = {
@@ -308,16 +304,16 @@ handlers = {
},
vlan_remove: function(notify) {
if (config.config.swconfig)
return;
let vlan_id = `${notify.data.vlan_id}`;
vlan_refcount[vlan_id] = (vlan_refcount[vlan_id] || 1) - 1;
if (vlan_refcount[vlan_id] > 0)
if (vlan_refcount[vlan_id] > 0) {
return;
}
delete vlan_refcount[vlan_id];
if (config.config.swconfig)
return;
for (let wan in wan_ports) {
let msg = {
name: wan,

View File

@@ -4,10 +4,10 @@ PKG_NAME:=ucentral-schema
PKG_RELEASE:=1
PKG_SOURCE_URL=https://github.com/Telecominfraproject/wlan-ucentral-schema.git
PKG_MIRROR_HASH:=c0f43db0530a38eb424e81908ad47a14e1d4d8f8a86eb148e34f98187c79ba6b
PKG_MIRROR_HASH:=f72d2e5b01ecb7a488d50d860da63664d992e50c7e046fed866be6733bab3c1c
PKG_SOURCE_PROTO:=git
PKG_SOURCE_DATE:=2025-10-16
PKG_SOURCE_VERSION:=dc9cad95641266a08de73aab85d931d992090159
PKG_SOURCE_DATE:=2025-09-29
PKG_SOURCE_VERSION:=676e1550c53b7d48a54aa759f65d341168627c5e
PKG_MAINTAINER:=John Crispin <john@phrozen.org>
PKG_LICENSE:=BSD-3-Clause

View File

@@ -1,12 +1,17 @@
{
"uuid": 2,
"radios": [
{
"band": "6G",
"country": "CA",
"channel-mode": "HE",
"channel-width": 80
},
{
"band": "5G",
"country": "CA",
"channel-mode": "HE",
"channel-width": 80,
"channel": 36
"channel-width": 80
},
{
"band": "2G",
@@ -52,7 +57,10 @@
"key": "bbbbbbbb"
}
],
"roaming": true
"roaming": {
"message-exchange": "ds",
"generate-psk": true
}
}
]
},

View File

@@ -17,4 +17,4 @@ packages:
- sysstat
- kmod-cig-poe-judgment
diffconfig: |
CONFIG_KERNEL_IPQ_MEM_PROFILE=0
CONFIG_KERNEL_IPQ_MEM_PROFILE=0

View File

@@ -16,4 +16,4 @@ packages:
- sysstat
- kmod-cig-poe-judgment
diffconfig: |
CONFIG_KERNEL_IPQ_MEM_PROFILE=0
CONFIG_KERNEL_IPQ_MEM_PROFILE=0