Hi all,
I’m currently using Simu5G to research Deep Reinforcement Learning (DRL) in 5G networks, and I’ve encountered a bug that leads to a race condition and eventually causes a simulation crash.
Although I initially observed this issue while training a DRL agent that dynamically adjusts gNB transmit power, I was able to reproduce it in a standalone Simu5G setup (i.e., without DRL), which suggests the problem is intrinsic to the simulator under certain conditions.
### Description
There appears to be a race condition in the coordination between the Binder and the NRTxPdcpEntity during an active handover in a 5G NR Standalone (SA) scenario (with no dual connectivity enabled).
During UE handover (source gNB → target gNB), packets arriving at the target gNB’s PDCP layer are occasionally tagged with an incorrect destId. Specifically, the destination is set to the gNB’s own NodeId (e.g., 1) instead of the UE’s NodeId.
This results in a runtime error:
cRuntimeError: NRTxPdcpEntity::deliverPdcpPdu - the destination is not a UE, but Dual Connectivity is not enabled.
Once triggered, this error leads to cascading failures and ultimately crashes the simulation.
### Environment
- Simu5G: 1.3.0
- OMNeT++: 6.2.0
- INET: 4.5.4
Scenario details:
- Pedestrian-like mobility
- Mixed traffic:
- Burst traffic
- CBR traffic
- Dynamic hotspot behavior during simulation runtime
- Configuration:
omnetpp.ini (attached)
Regarding omnetpp.ini, any custom module (power_control) can be removed, as it is not used in this simulation case, and the custom channel model can be substituted with the original Simu5G one.
omnetpp.txt
### Root Cause Analysis
The crash occurs at approximately t = 195.245s in my scenario.
Based on GDB traces and Binder logs:
-
The UE (NodeId = 2109) successfully:
- Unregisters from source gNB (
NodeId = 3)
- Registers with target gNB (
NodeId = 1)
-
Almost simultaneously, a packet (likely:
- forwarded via X2, or
- a delayed GTP-U packet)
arrives at the target gNB through:
NRPdcpRrcEnb::fromDataPort
-
The packet’s ControlInfo (LteControlInfo) contains an incorrect destination:
destId = 1 // gNB ID (invalid)
instead of:
destId = 2109 // UE ID (expected)
-
Inside:
NRTxPdcpEntity::deliverPdcpPdu
the following check fails:
binder_->isUe(destId)
Since destId = 1 corresponds to a gNB, a cRuntimeError is thrown, aborting the simulation.
So it seems that, although the Binder is already aware of the change, the NRPdcpRrcEnb module in gNB1 queries its internal table of active UEs in the cell, and this table has not yet been updated in time. As a result, in NRPdcpRrcEnb::fromDataPort(cPacket *pktAux) , the UE ID is not found and the packet is assigned the local node’s own ID (This assumption should be further verified.).
### GDB Backtrace (summary)
#1 simu5g::NRTxPdcpEntity::deliverPdcpPdu
destId = 1 <-- INVALID (should be UE: 2109)
#2 simu5g::LteTxPdcpEntity::handlePacketFromUpperLayer
#3 simu5g::NRPdcpRrcEnb::fromDataPort
srcAddr = 167772161 (10.0.0.1)
### Relevant Binder Logs (before crash)
[HO] unregisterNextHop: masterId = 3, slaveId = 2109 (UE leaves gNB 3)
[HO] registerNextHop: masterId = 1, slaveId = 2109 (UE joins gNB 1)
CRASH DETECTED shortly after at gNB 1
### Temporary Workaround
As a temporary workaround to continue my experiments, I modified the behavior in
NRTxPdcpEntity::deliverPdcpPdu to drop packets with an invalid destId instead
of throwing a cRuntimeError:
if (getNodeTypeById(destId) != UE) {
EV_WARN << NOW << " CustomNRTxPdcpEntity::deliverPdcpPdu - destination is not UE while Dual Connectivity is disabled. Dropping packet." << std::endl;
delete pkt;
return;
}
This prevents the simulation from crashing and allows long-running simulations to proceed.
However, I understand that this is not a correct fix, since:
- It silently drops packets that should have been correctly routed
- It does not address the underlying race condition between Binder updates and PDCP processing
- It may hide deeper synchronization issues during handover
A proper solution should ensure consistency between:
- Binder state (UE ↔ gNB mapping)
- PDCP packet processing during handover transitions
Any feedback or guidance on the root cause would be highly appreciated.
Hi all,
I’m currently using Simu5G to research Deep Reinforcement Learning (DRL) in 5G networks, and I’ve encountered a bug that leads to a race condition and eventually causes a simulation crash.
Although I initially observed this issue while training a DRL agent that dynamically adjusts gNB transmit power, I was able to reproduce it in a standalone Simu5G setup (i.e., without DRL), which suggests the problem is intrinsic to the simulator under certain conditions.
### Description
There appears to be a race condition in the coordination between the Binder and the NRTxPdcpEntity during an active handover in a 5G NR Standalone (SA) scenario (with no dual connectivity enabled).
During UE handover (source gNB → target gNB), packets arriving at the target gNB’s PDCP layer are occasionally tagged with an incorrect destId. Specifically, the destination is set to the gNB’s own NodeId (e.g., 1) instead of the UE’s NodeId.
This results in a runtime error:
cRuntimeError: NRTxPdcpEntity::deliverPdcpPdu - the destination is not a UE, but Dual Connectivity is not enabled.Once triggered, this error leads to cascading failures and ultimately crashes the simulation.
### Environment
Scenario details:
omnetpp.ini(attached)### Root Cause Analysis
The crash occurs at approximately t = 195.245s in my scenario.
Based on GDB traces and Binder logs:
The UE (
NodeId = 2109) successfully:NodeId = 3)NodeId = 1)Almost simultaneously, a packet (likely:
arrives at the target gNB through:
NRPdcpRrcEnb::fromDataPortThe packet’s
ControlInfo(LteControlInfo)contains an incorrect destination:destId = 1 // gNB ID (invalid)instead of:
destId = 2109 // UE ID (expected)Inside:
NRTxPdcpEntity::deliverPdcpPduthe following check fails:
binder_->isUe(destId)Since
destId = 1corresponds to a gNB, acRuntimeErroris thrown, aborting the simulation.So it seems that, although the Binder is already aware of the change, the
NRPdcpRrcEnbmodule in gNB1 queries its internal table of active UEs in the cell, and this table has not yet been updated in time. As a result, inNRPdcpRrcEnb::fromDataPort(cPacket *pktAux), the UE ID is not found and the packet is assigned the local node’s own ID (This assumption should be further verified.).### GDB Backtrace (summary)
### Relevant Binder Logs (before crash)
[HO] unregisterNextHop: masterId = 3, slaveId = 2109 (UE leaves gNB 3)
[HO] registerNextHop: masterId = 1, slaveId = 2109 (UE joins gNB 1)
CRASH DETECTED shortly after at gNB 1
### Temporary Workaround
As a temporary workaround to continue my experiments, I modified the behavior in
NRTxPdcpEntity::deliverPdcpPduto drop packets with an invaliddestIdinsteadof throwing a
cRuntimeError:This prevents the simulation from crashing and allows long-running simulations to proceed.
However, I understand that this is not a correct fix, since:
A proper solution should ensure consistency between:
Any feedback or guidance on the root cause would be highly appreciated.