Skip to content

Tight retry loop when multicast Class B session start is exactly 135 s away #157

@aelbretonactility

Description

@aelbretonactility

Summary

When a multicast Class B session is scheduled to start exactly 135 seconds in the
future, the Remote Multicast Setup package enters a tight CPU loop, repeatedly
attempting to launch the session and printing an error on every iteration, until
the GPS clock advances one second. Observed as ~13 rapid consecutive error messages
with no back-off.

Affected components

  • lbm_lib/smtc_modem_core/lorawan_packages/remote_multicast_setup/v2.0.0/lorawan_remote_multicast_setup_package_v2.0.0.c — line 619
  • lbm_lib/smtc_modem_core/lorawan_packages/remote_multicast_setup/v1.0.0/lorawan_remote_multicast_setup_package_v1.0.0.c — line 619

Observed console output

Modem event callback
INFO: Event received: TXDONE
INFO: Transmission done
 lorawan_remote_multicast_setup_package activate Class B in 0s
 lorawan_remote_multicast_setup_package start in 0s
ERROR:  lorawan_remote_multicast_setup_package launch LAUNCH_CLASS_B_TASK
 lorawan_remote_multicast_setup_package activate Class B in 0s
 lorawan_remote_multicast_setup_package start in 0s
ERROR:  lorawan_remote_multicast_setup_package launch LAUNCH_CLASS_B_TASK
... (repeated ~13 times) ...
 lorawan_remote_multicast_setup_package activate Class B in 0s
 lorawan_remote_multicast_setup_package start in 30s

Root cause

lorawan_remote_multicast_setup_package_service_on_update contains two related
code paths that together create the loop.

1. LAUNCH_CLASS_B_TASK handler (line 456)

if( lorawan_api_get_class_b_status( stack_id ) == true )
{
    lorawan_api_multicast_b_start_session( ... );
    ...
    ctx->launch_class_b[mc_grp_id] = false;   // only cleared on success
    ctx->stop_class_b[mc_grp_id]   = true;
}
else
{
    SMTC_MODEM_HAL_TRACE_ERROR( " lorawan_remote_multicast_setup_package launch LAUNCH_CLASS_B_TASK \n" );
    // launch_class_b[mc_grp_id] remains true — session will be retried
}

lorawan_api_get_class_b_status returns true only when the beacon is actually
locked. Before the first beacon is received launch_class_b[mc_grp_id] is never
cleared, so the session keeps retrying.

2. Scheduling section (line 619)

After the task handler the same function call falls through to the re-scheduling
logic:

if( ( lorawan_api_get_class_b_status( stack_id ) == false ) && ( ( time_to_start_s - 135 ) < 0 ) )
{
    // If the class B is not ready and the session start is in past
    delay_s = 30;                   // ← back-off: poll every 30 s
}
else
{
    delay_s = ( ( time_to_start_s - 135 ) < 0 )
                  ? 0
                  : time_to_start_s - 135;
}
...
if( delay_s <= 0 )
{
    ctx->task_ctx_mask |= ( 1 << ( LAUNCH_CLASS_B_TASK_GROUP_ID_0 + find_mc_group_id ) );
}

The intent is clear: when beacon is not locked and the session start window has
been reached (time_to_start_s < 135), wait 30 s before retrying. However the
guard uses strict less-than (< 0), so the boundary value
time_to_start_s == 135 is not covered:

time_to_start_s get_class_b_status condition result delay_s effect
136 false false 1 add_task(1 s) — correct
135 false false (135−135 = 0, not < 0) 0 LAUNCH bit set, add_task(0) — LOOP
134 false true 30 add_task(30 s) — correct
135 true false (1st operand) 0 LAUNCH immediately — correct

When time_to_start_s == 135 and the beacon is not yet locked:

  1. delay_s = 0 — the LAUNCH bit is set immediately and add_task(0) fires the
    task again in the same scheduler cycle.
  2. LAUNCH handler: beacon still not locked → ERROR, launch_class_b stays true.
  3. Scheduling section: GPS time has not advanced (still the same integer second) →
    time_to_start_s is still 135 → delay_s = 0 again.
  4. Loop continues until the GPS clock increments from 135 to 134 seconds, at which
    point delay_s = 30 finally breaks the loop.

The number of tight iterations (~13 in the observed log) is non-deterministic and
depends on MCU speed relative to the GPS 1-second tick.

Additionally, lorawan_class_b_management_enable (line 615) is called on every
iteration. The function has an internal idempotency guard
(if( enabled != enable )) so the radio is not actually re-armed on each call,
but the repeated invocations are wasteful and the trace print fires every time.

Impact

  • Burst of error messages on the console making logs hard to read.
  • Unnecessary CPU activity for ~1 second at the exact 135 s boundary.
  • The lorawan_class_b_management_enable function is called repeatedly (harmless
    due to its guard but unexpected).
  • No functional impact once the GPS clock advances — the session proceeds normally
    with a 30 s retry interval as intended.

Fix

Change the strict less-than to less-than-or-equal at line 619 in both package
versions:

// Before
if( ( lorawan_api_get_class_b_status( stack_id ) == false ) && ( ( time_to_start_s - 135 ) < 0 ) )
{
    // If the class B is not ready and the session start is in past
    delay_s = 30;
}

// After
if( ( lorawan_api_get_class_b_status( stack_id ) == false ) && ( ( time_to_start_s - 135 ) <= 0 ) )
{
    // If the class B is not ready and the session start is within 135s or in the past
    delay_s = 30;
}

This ensures the boundary value time_to_start_s == 135 is handled the same way
as time_to_start_s == 134 when the beacon is not yet locked.

The case where beacon IS locked at time_to_start_s == 135 is unaffected: the
first operand of the condition (get_class_b_status == false) is false, so the
else branch is taken and delay_s = 0, correctly launching the session
immediately.

Files changed

File Line Change
lorawan_remote_multicast_setup_package_v2.0.0.c 619 < 0<= 0
lorawan_remote_multicast_setup_package_v1.0.0.c 619 < 0<= 0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions