Skip to content

Z2M won't start: NCP RESET_ASSERT / ASH_NCP_FATAL_ERROR during init — identical on TWO different ember coordinators (over TCP) #32309

Description

@devincornell7

What happened?

Summary
After a coordinator migration, Zigbee2MQTT can no longer start. The ember NCP fatally asserts (RESET_ASSERTASH_NCP_FATAL_ERROR) partway through init, preceded by Frame(s) in progress cancelled and (in some runs) ROUTE_ERROR_ADDRESS_CONFLICT for "0", then ASH_ERROR_TIMEOUTS. The exact same failure occurs on two completely different radios — the original SLZB‑06M (EFR32MG21) and a brand‑new SLZB‑06Mg24U (EFR32MG24) with freshly‑cleared NVM — both connected over TCP. The crash point varies between runs (GET_EUI64, SET_SOURCE_ROUTE_DISCOVERY_MODE, GET_NETWORK_PARAMETERS/ezspExportKey), which suggests an unstable/aborting link rather than one deterministic logic error.

Setup

  • Host: Home Assistant Green (HA OS), Zigbee2MQTT add‑on 2.12.0‑1, zigbee‑herdsman 10.4.0
  • Adapter driver: ember, tcp://<ip>:6638, baudrate: 115200, rtscts: false
  • Coordinator A (original): SMLight SLZB‑06M (EFR32MG21) @ 192.168.50.198 — reports EmberZNet 8.0.2 [GA] build 397, EZSP 14
  • Coordinator B (new): SMLight SLZB‑06Mg24U (EFR32MG24) @ 192.168.50.9 — SLZB‑OS Core v3.3.3.dev4, Zigbee FW rev 20250212 (SDK 8.0.2.0.397), EZSP 14
  • Network: channel 11, PAN 0x61ee, ext‑PAN 0xc906bc61e7a7f814, coordinator IEEE 0x9035eafffeb08f1a; ~170 devices / ~110 routers (router‑dense)

Representative logs
This morning (Coordinator B / MG24, over TCP):

warning: zh:ember:uart:ash: Frame(s) in progress cancelled in [011ac2020682af7e]
error: zh:ember:uart:ash: Received ERROR from adapter, with code=RESET_ASSERT.
error: zh:ember:uart:ash: ASH disconnected | Adapter status: ASH_NCP_FATAL_ERROR
error: zh:ember:ezsp: Fatal error, status=ASH_NCP_FATAL_ERROR. Last Frame: [FRAME: ID=40:"GET_NETWORK_PARAMETERS" Seq=28 Len=30]
error: zh:ember:ezsp: ERROR Transaction failure; status=ASH_ERROR_TIMEOUTS. Last Frame: [FRAME: ID=40:"GET_NETWORK_PARAMETERS" ...]
error: z2m: Error: ASH_ERROR_TIMEOUTS
at Ezsp.ezspExportKey (.../ember/ezsp/ezsp.ts:6244)
at EmberAdapter.initTrustCenter (.../ember/adapter/emberAdapter.ts:890)

Earlier (Coordinator B), version + a different crash point:

info: zh:ember: Adapter version info: {"ezsp":14,"revision":"8.0.2 [GA]","build":397,...}
info: zh:ember: [STACK STATUS] Network up.
info: zh:ember:ezsp: Received network/route error ROUTE_ERROR_ADDRESS_CONFLICT for "0".
info: zh:ember: [INIT TC] Adapter network matches config.
error: zh:ember:ezsp: Fatal error, status=ASH_NCP_FATAL_ERROR. Last Frame: [FRAME: ID=90:"SET_SOURCE_ROUTE_DISCOVERY_MODE" ...]

Coordinator A (original MG21, over TCP):** same Frame(s) in progress cancelled [011ac2020682af7e]RESET_ASSERTASH_NCP_FATAL_ERROR at NETWORK_INIT / GET_NETWORK_PARAMETERS.

Note the recurring cancelled‑frame buffer […1ac2020682af7e] across many runs and both radios.

Timeline / what triggered it
Migrating from the MG21 to the MG24 via ember backup/restore. The MG24's native IEEE differs (0xe07291fffeb9d765), so herdsman's custom‑EUI64 write was needed to present the network's coordinator IEEE (0x9035…). That write asserted at GET_EUI64 on 8.0.2. We then set the IEEE directly in SLZB‑OS (got past GET_EUI64). In the process the MG24 booted ~8× as coordinator (addr 0) with its native IEEE before the IEEE was corrected, which appears to have left address‑conflict state in the mesh. Since then neither coordinator completes startup.

What we've tried (and results)

  • Set MG24 adapter IEEE → 0x9035eafffeb08f1a via SLZB‑OS → passed GET_EUI64, then asserts later.
  • Cleared NVM on the MG24 (firmware flash dialog) → still asserts; ADDRESS_CONFLICT persists (so it's not the radio's own NVM).
  • ~8 h "settle" with all coordinators off → no change (devices never relearn while coordinator is down).
  • Repeated restart cycles → no convergence.
  • USB connection (to isolate TCP): the MG24 in USB mode did not enumerate as a serial device on HA OS (no /dev/serial/by-id, no ttyUSB/ttyACM) — couldn't run the USB test.
  • Confirmed only one coordinator on air at a time (no dual‑coordinator clash).

What this rules out

  • Not coordinator hardware/NVM/IEEE — two different radios (one freshly wiped, correct IEEE) fail identically.
  • Common factors that remain: TCP transport (Frame(s) in progress cancelled + ASH_ERROR_TIMEOUTS, variable crash point), the mesh (persistent ADDRESS_CONFLICT for "0"), and EmberZNet 8.0.2 (only firmware available for these SLZB radios).

Questions

  1. Is the ROUTE_ERROR_ADDRESS_CONFLICT for "0" storm + NCP RESET_ASSERT on 8.0.2 a known firmware issue, and is there a firmware/version that tolerates it?
  2. How do we clear mesh‑side address‑conflict pollution when the coordinator asserts before devices can relearn? (Any way to start the stack "quietly" / without source‑route broadcasts?)
  3. Could the Frame(s) in progress cancelled / ASH_ERROR_TIMEOUTS indicate the TCP serial bridge is the real culprit (vs. the address conflict)? Recommended way to isolate transport if USB won't enumerate?
  4. Is there a clean recovery short of re‑forming + re‑pairing ~170 devices (e.g., via ember-zli)?

Related issues seen
Koenkk/zigbee2mqtt #28842 and #23198 (ADDRESS_CONFLICT for "0" on SLZB ember); #25860 (Frame(s) in progress cancelled); hassio‑zigbee2mqtt #849 / Koenkk #25920 / #30455 (SLZB‑06M TCP drops / ECONNRESET / ASH timeouts).

What did you expect to happen?

No response

How to reproduce it (minimal and precise)

No response

Zigbee2MQTT version

2.12.0

Adapter firmware version

20250220

Adapter

SMLight MG24U

Setup

Home Assistant Green.

Device database.db entry

No response

Debug log

No response

Notes

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    problemSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions