Skip to content

DroneCAN: Fix H7 FDCAN and F7 bxCAN driver configuration#11607

Draft
daijoubu wants to merge 47 commits into
iNavFlight:maintenance-10.xfrom
daijoubu:fix/h7-dronecan-driver
Draft

DroneCAN: Fix H7 FDCAN and F7 bxCAN driver configuration#11607
daijoubu wants to merge 47 commits into
iNavFlight:maintenance-10.xfrom
daijoubu:fix/h7-dronecan-driver

Conversation

@daijoubu

@daijoubu daijoubu commented May 31, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes configuration errors and robustness issues in the H7 FDCAN and F7 bxCAN DroneCAN drivers. Migrates F7 TX from polling to ISR-driven, refilling the 3 hardware TX mailboxes from the libcanard queue in the TX-complete callback. Hardware verified on KAKUTEH7WING (H7) and MATEKF765SE (F7).

Supersedes #11560.


Changes

H7 FDCAN driver (canard_stm32h7xx_driver.c)

  • Fix FDCAN clock source: was using HAL_RCC_GetPCLK1Freq() (APB1); now correctly uses HAL_RCCEx_GetPeriphCLKFreq(RCC_PERIPHCLK_FDCAN)
  • Reduce SJW from 8 to 1 (HAL adds +1 offset internally; verified at 1 Mbps)
  • Disable AutoRetransmission to prevent TX FIFO stall on a degraded bus
  • Set TxBuffersNbr = 0 and RxBuffersNbr = 0 — driver uses FIFO queue only; dedicated buffers unused
  • Flush TX FIFO (TXBCR) before clearing CCCR.INIT on bus-off recovery
  • Fix FDCAN filter type: FDCAN_FILTER_DUAL only matched two exact IDs (0x0 and 0x1FFFFFFF); replaced with FDCAN_FILTER_MASK (pattern=0, mask=0) to accept all extended IDs
  • Fix global filter: NonMatchingStd was FDCAN_ACCEPT_IN_RX_FIFO0; now FDCAN_REJECT — standard frames are not used by DroneCAN
  • Fix RTR filter params: were FDCAN_FILTER_REMOTE; now FDCAN_REJECT_REMOTE
  • Add bounds guard on RxHeader.DataLength before memcpy; FDCAN_DLC_BYTES_0..8 == 0..8 holds only in FDCAN_FRAME_CLASSIC mode
  • Add canardSTM32GetAndClearRxDropCount() stub (FIFO0 has 30 hardware slots; no software ring buffer)
  • Fix all @retval doc comments

PLL2 clock configuration (system_stm32h7xx.c)

  • Extend PLL2 block to cover USE_DRONECAN (was USE_SDCARD_SDIO only)
  • Fix PLL2Q: was 3 (266 MHz, invalid for FDCAN ≤ 80 MHz); now 10 (80 MHz)
  • Add STATIC_ASSERT for HSE_VALUE divisibility by PLL2M
  • Check HAL_RCCEx_PeriphCLKConfig return value

F7 bxCAN driver (canard_stm32f7xx_driver.c)

  • Migrate TX from polling to ISR-driven using HAL_CAN_TxMailbox{0,1,2}CompleteCallback; ISR refills the 3 hardware TX mailboxes from the libcanard queue
  • Add 32-slot SPSC RX ring buffer with volatile indices, local snapshots, and __DMB() barriers for Cortex-M7 correctness
  • canardSTM32GetAndClearRxDropCount(): IRQ-masked atomic read-clear of rxDropCount; ISR increments on RX FIFO overflow instead of logging
  • Move NVIC_EnableIRQ calls after all ActivateNotification calls succeed
  • Fix max_quanta_per_bit: was 18, reduced to 17 to match CiA paper maximum for ≤ 500 kbps
  • Disable AutoRetransmission to prevent TX mailbox stall on a degraded bus
  • Fix doxygen header: "CAN1 (bxCAN) Initialization Function", remove wrong @param hfdcan
  • Fix stale hfdcan->ErrorCode reference → hcan1.ErrorCode
  • Add pre-shifted timing register comment

DroneCAN state machine (dronecan.c)

  • Make all file-local functions static
  • Add static volatile uint32_t txErrCount; ISR increments, 1 Hz loop reads+clears with NVIC mask held across snapshot
  • Check return values of canardBroadcast and canardRequestOrRespond; log on OOM
  • vendor_specific_status_code: (uint16_t)(armingFlags & 0xFFFF) with explicit truncation
  • handle_GNSSRCTMStream: replace dead decode with UNUSED(transfer) stub
  • Add default: case to dronecanInit bitrate switch with error log and 500 kbps fallback
  • Add STATE_DRONECAN_FAILED set on peripheral init failure
  • Add STATE_DRONECAN_COUNT sentinel with STATIC_ASSERT protecting CLI name array
  • Increase bus-off recovery delay 1 ms → 20 ms (worst-case 128×11 at 125 kbps ≈ 11.3 ms)
  • Rate-limit protocol status check to 1 Hz in NORMAL state
  • Conditional CAN status logging (only when BusOff or ErrorPassive non-zero)

GPS DroneCAN driver (gps_dronecan.c)

  • Fix GPS provider leakthrough: fix messages were written to gpsSolDRV regardless of configured provider
  • Guard GNSSAuxiliary HDOP against NaN (float16 optional fields encode NaN when unpopulated)
  • Clamp DOP values with gpsConstrainHDOP() to prevent uint16_t overflow
  • Remove lastVDOP dead store

SITL driver (canard_sitl_driver.c)

  • Add canardSTM32GetAndClearRxDropCount() stub
  • Map RTR flag in sitlCANFrameToLinux / sitlCANFrameFromLinux
  • Add ERR frame guard in canardSTM32Transmit
  • Cast pid_t to uint32_t before bit-shifting for unique ID
  • Add FIONREAD comment explaining 0-or-1 frame-per-tick behaviour

Driver header (canard_stm32_driver.h)

  • Add uint32_t canardSTM32GetAndClearRxDropCount(void) declaration

CLI (cli.c)

  • Use PRId32 format specifiers for fixed-width integers; add #include <inttypes.h>

KAKUTEH7WING target

  • Add USE_DRONECAN with CAN1 RX/TX pin definitions (PD0/PD1)

Testing

Target MCU Build Hardware
KAKUTEH7WING H7 ✅ frames TX/RX verified
MATEKF765SE F7 ✅ frames TX/RX verified
IFLIGHT_BLITZ_AT435 AT32
SITL
SPEEDYBEEF405WING F4

Pending: HITL validation, overnight stability run on H7 and F7 variants, real-airframe flight test.

@github-actions

github-actions Bot commented May 31, 2026

Copy link
Copy Markdown

Test firmware build ready — commit f462dc6

Download firmware for PR #11607

238 targets built. Find your board's .hex file by name on that page (e.g. MATEKF405SE.hex). Files are individually downloadable — no GitHub login required.

Development build for testing only. Use Full Chip Erase when flashing.

daijoubu added 29 commits June 11, 2026 20:19
SJW=8 was overly conservative (80% of bit time at 1Mbps with 10 quanta).
SJW=3 is the standard value also used by the F7 driver.

Tested with 6037 arm/disarm cycles at 500kbps: TEC=0, REC=0, zero errors.
…DCAN

PLL2Q was 3 (266 MHz, invalid for FDCAN ≤ 80 MHz). Fix to 10 (80 MHz).
Extend PLL2 guard from USE_SDCARD_SDIO to USE_SDCARD_SDIO || USE_DRONECAN
so H7 boards with CAN but no SD card get PLL2 configured. Adopt upstream
PLL2M/N formula (VCI=1.6 MHz, VCO=800 MHz) and error check on
HAL_RCCEx_PeriphCLKConfig.
…pport

Remove redundant PeriphClkInitStruct clock config from canardSTM32CAN1_Init.
system_stm32h7xx.c already configures FDCAN to use PLL2Q (80 MHz) when
USE_DRONECAN is defined; duplicating it in the driver overwrites with PLL1.

Also add CAN1 pin definitions and USE_DRONECAN to KAKUTEH7WING target
(PD0/PD1, CAN1_STANDBY PD3 disabled by default).
Use HAL_RCCEx_GetPeriphCLKFreq(RCC_PERIPHCLK_FDCAN) instead of
HAL_RCC_GetPCLK1Freq() for bit timing calculation. FDCAN is clocked
from PLL2Q (80 MHz) configured in system_stm32h7xx.c; using PCLK1
(100 MHz) produced a ~25% baud rate error causing immediate bus-off.

Restore SJW to 3 for better synchronisation tolerance.
Remove high-frequency LOG_DEBUG messages from GNSS Fix/Fix2/Auxiliary
handlers, onTransferReceived, dronecanInit, and gps_dronecan HDOP path
that fired at 25 Hz and flooded the log.

Fix PLL2 VCO input to target 1.6 MHz (PLL2M = HSE/1600000, PLL2N = 500)
rather than 2.0 MHz, keeping the operating point clearly within
VCIRANGE_0 (1-2 MHz) as the original SDCARD-only code did with PLL2M=5.
VCO output remains 800 MHz; FDCAN (80 MHz via PLL2Q=10) and SDMMC
(200 MHz via PLL2R=4) outputs are unchanged.
Drop high-frequency and verbose-but-low-value LOG_DEBUG(CAN messages:
- dronecan.c: Battery Info (x2), GetNodeInfo, NodeStatus, TX success,
  RX loop, commented-out debug blocks
- canard_stm32h7xx_driver.c: timing computation intermediates
  (Baudrate, Max Quanta, Prescaler BS, Prescaler, Timings summary)
- canard_stm32f7xx_driver.c: same timing intermediates, TX success,
  In CAN Init, commented-out clock and RX blocks

Retain error-path messages (decode failed, TX/RX error, init failures)
and the single-line Prescaler/SJW/BS summary logged at init.
… operation

Tested at 1 Mbps on KAKUTEH7WING hardware and confirmed bus operational.
…ivers

Remove CubeMX boilerplate markers, commented-out dead code, and
development-time question comments from both drivers.
Fix: DroneCAN GNSS messages were being applied to gpsSolDRV regardless
of the configured GPS provider. Guard added in gps_dronecan.c where
it belongs, keeping CAN transport layer unaware of GPS config.
…g difference

F7 bxCAN HAL writes SJW directly to BTR register where hardware adds 1,
so stored value 3 gives 4 tq. This wider SJW is needed for reliable bus
operation on F7 targets and is different from H7 where SJW=1 is actual tq.
Prevents state machine from continuing in INIT state when the CAN
peripheral fails to initialize.
Prevents out-of-bounds access when STATE_DRONECAN_FAILED is active.
Prevents stale pre-bus-off frames from storming the bus on recovery.
With AutoRetransmission=ENABLE, frames that fail on a degraded bus
occupy FIFO slots indefinitely. All 32 slots fill, HAL_FDCAN_AddMessage
returns HAL_ERROR, and all outgoing traffic stalls permanently with no
indication until full bus-off. DroneCAN reliability is handled at the
application layer via periodic republishing.
Matches the H7 driver pattern. Previously the return value was silently
discarded; if timing computation failed, uninitialized stack bytes were
passed to HAL_CAN_Init.
The H7 FDCAN 128x11 recessive-bit recovery sequence takes up to 11.264ms
at 125kbps. The 1ms delay was restarting the counter before it could
complete, preventing the node from ever exiting bus-off. 20ms gives safe
margin above worst-case and allows time to detect immediate re-entry.
Guard against non-DroneCAN GPS provider at the transport boundary
(handle_GNSS* functions) rather than in each leaf function in
gps_dronecan.c. Also adds the guard to handle_GNSSRCTMStream which
had none. Removes stale UNUSED(pgnssAux) and placeholder comment
from dronecanGPSReceiveGNSSAuxiliary.
canardSTM32GetProtocolStatus() was called on every dronecanUpdate()
invocation (~500Hz) to detect bus-off. Moved into the existing 1Hz
task block — bus-off detection latency of up to 1s is acceptable.
Adds LOG_DEBUG to report BusOff and ErrorPassive flags each second
for bench diagnostics.
AutoBusOff=ENABLE handles the 128x11 recovery sequence automatically,
but ESR.BOFF is a sticky read-only flag that is NOT cleared when hardware
recovery completes. GetProtocolStatus() reads this flag, so the state
machine was permanently stuck in STATE_DRONECAN_BUS_OFF after any
bus-off event on F7 targets. Stop/Start re-enters init mode which
clears ESR.BOFF, allowing recovery detection to work correctly.
Comment incorrectly stated '25MHz' as a supported HSE value — 25MHz
fails the assert. CMake always provides HSE_VALUE per-target via
-DHSE_VALUE=<n> so the stm32h7xx_hal_conf.h fallback of 25MHz is never
used. Current targets use 8MHz (default) or 16MHz (KAKUTEH7WING).
…adence

GetProtocolStatus() was called every dronecanUpdate() cycle (~500Hz) in
BUS_OFF state. Moved inside the 20ms recovery timer block so it runs at
the same cadence as RecoverFromBusOff() — still detects recovery within
20ms but reduces MMIO reads from ~500/sec to ~50/sec.
HAL_CAN_Stop/Start called from the scheduler context with CAN interrupts
active caused a full FC lockup. Reverted to empty stub pending investigation
of a safe mechanism to clear the sticky ESR.BOFF flag on F7.
Unconditional 1Hz LOG_DEBUG was flooding the bootlog with healthy status
messages. Now only logs when an error condition is actually present.
HAL_CAN_AddTxMessage returns non-OK when all mailboxes are busy — a
normal transient condition at startup. The log was noise. Matches the
H7 driver which already handles this path silently.
DroneCAN float16 optional fields encode NaN when unpopulated. Without a
guard, NaN * 100 converts to 0 on Cortex-M (ARM VCVT saturation),
permanently blocking the HDOP fallback path. Also passes values through
gpsConstrainHDOP() to prevent uint16_t overflow for extreme DOP values.
…ompatible

gpsSolDRV has hdop but no vdop field. VDOP and EPV are not interchangeable
(different units, conversion requires receiver UERE). lastVDOP was a dead
store with no valid consumer.
…ames size

Guards the CLI state name array against future enum additions — if a new
state is added without updating the array, the build fails immediately.
daijoubu added 9 commits June 11, 2026 20:34
Guards against out-of-bounds read if dronecanState is ever corrupted
to STATE_DRONECAN_COUNT or beyond.
ABOM (CAN_MCR bit 6) is enabled in canardSTM32CAN1_Init. Per RM0410
ss40.7.6, with ABOM=1 the hardware manages the full bus-off recovery
sequence automatically: after 128x11 recessive bits it cycles INRQ and
clears ESR.BOFF without any software intervention required.

Removes an incorrect TODO asserting ESR.BOFF is sticky — it is a
hardware-managed status bit that clears when bus-off state is left.
Cast both MIN() arguments to int to avoid -Werror=sign-compare between
dronecanState_e (unsigned enum) and STATE_DRONECAN_COUNT - 1 (int).
clang treats uint32_t as unsigned int, not unsigned long. Use %u to
match the actual type of BusOff and ErrorPassive in canardProtocolStatus_t.
…NODES

Previously last_seen_ms was the raw millis() timestamp when the node was
last heard from, which equals FC uptime for active nodes. Configurators
had no way to compute elapsed time without knowing current FC millis().

Now sends millis() - last_seen_ms so the field means "ms since this node
was last heard from". Unsigned subtraction handles millis() wraparound
correctly.
…NODE_INFO

Apply the same fix as MSP2_INAV_DRONECAN_NODES: send millis() - last_seen_ms
so the field means elapsed time since last heard from the node.
Switch FDCAN from FIFO to Queue mode so hardware transmits highest-priority
frame first, matching libcanard's scheduler. Reduce staging depth 32→3 so
libcanard's queue remains authoritative between poll cycles.

Add HAL_FDCAN_TxBufferCompleteCallback ISR pump to refill the hardware slot
immediately on TX complete. Wrap all canardBroadcast, canardRequestOrRespond,
and canardCleanupStaleTransfers call sites with NVIC_DisableIRQ/EnableIRQ on
FDCAN1_IT1 to prevent races between the ISR and main task on libcanard's queue.
Masking helpers are no-ops on non-H7 targets.
@daijoubu daijoubu force-pushed the fix/h7-dronecan-driver branch from d000077 to 1532585 Compare June 12, 2026 04:03
daijoubu added 9 commits June 11, 2026 21:38
…onecan

Phase 2 of the combined DroneCAN driver rework (folds in iNavFlight#11560 content):

F7 bxCAN ISR-driven TX refill:
- Add NVIC_PRIO_CAN=4 to nvic.h (shared by H7 and F7)
- Enable CAN_IT_TX_MAILBOX_EMPTY and CAN1_TX_IRQn at NVIC_PRIO_CAN
- Add HAL_CAN_TxMailbox{0,1,2}CompleteCallback ISR pumps in dronecan.c
- Wire dronecanMaskTxISR/UnmaskTxISR to NVIC_DisableIRQ/EnableIRQ(CAN1_TX_IRQn)
- Add #else branch for SITL/AT32 (was missing, would break SITL build)

TEC/REC/LEC error counters:
- Extend canardProtocolStatus_t with tec, rec, lec fields
- F7: populate from ESR register (bits 23:16, 31:24, 6:4)
- H7: populate from ECR register (bits 22:16, 14:8) + PSR.LastErrorCode

Typo fix: canardSTM32Recieve -> canardSTM32Receive across all drivers and call sites

Other:
- Add canardSTM32GetTxQueueFillLevel() to all three drivers (returns 0; no SW queue)
- Make canard and memory_pool static in dronecan.c
- Add cliDronecan CLI command showing bus health (BusOff, ErrorPassive, TEC, REC, LEC, fill levels)
Fix H7 ISR pump (was completely non-functional):
- FDCAN_IT_TX_COMPLETE routes to LINE0 by default (ILS=0); enable
  FDCAN1_IT0_IRQn at NVIC_PRIO_CAN and add FDCAN1_IT0_IRQHandler
- Change dronecanMaskTxISR from FDCAN1_IT1_IRQn to FDCAN1_IT0_IRQn

Fix H7 TEC extraction from ECR register:
- ECR[7:0]=TEC, ECR[14:8]=REC, ECR[23:16]=CEL (not TEC)
- Previous code read CEL and labelled it tec

Narrow ISR mask window in processCanardTxQueueSafe:
- Mask only covers canardPeekTxQueue/canardPopTxQueue (linked-list ops)
- HAL transmit call now runs with ISR unmasked
- Re-peek after transmit to detect if ISR already popped the frame

F7 RxBuffer ISR safety:
- writeIndex/readIndex now volatile to prevent compiler caching
- Remove unused file-scope rxMsg variable
- rxBufferPushFrame/PopFrame now int8_t and static
- Log warning on RX buffer full instead of silently dropping

Remove redundant lec & 0x7 mask in cliDronecan (already masked at population)
Move public canardSTM32* functions above static helpers in H7 and F7
drivers. Move dronecanInit/Update/Get* to the top of dronecan.c with
forward declarations for internal callbacks. Mark rxBufferNumMessages
static in F7 driver. Add ISR-context warning comment to
processCanardTxQueue.
- processCanardTxQueueSafe: simplify by holding NVIC mask across
  transmit call, eliminating the pointer-identity ABA assumption
- F7 rxBuffer: add __DMB() barriers between data write and index
  advance to satisfy C99 memory ordering between ISR and main loop
- F7 ComputeTimings: align max_quanta_per_bit with H7 (17, not 18)
  per Koppe reference recommendation
- H7 GetProtocolStatus: document why ECR is read directly vs HAL
- H7 RecoverFromBusOff: clarify CCCR.INIT clear is defensive no-op
- dronecanUpdate: replace dead STATE_DRONECAN_COUNT case with default
- cli: use PRId32 for int32_t format specifiers; add <inttypes.h>
- SITL: remove Doxygen blocks from private static helpers
- Make processCanardTxQueue, shouldAcceptTransfer, onTransferReceived static
- Replace ISR LOG calls with volatile counters (txErrCount, rxDropCount);
  log and clear at 1Hz from main loop via canardSTM32GetAndClearRxDropCount()
- Add canardSTM32GetAndClearRxDropCount() to driver interface; F7 implements
  ring-buffer drop counter, H7/SITL return 0 (no SW ring buffer)
- Check canardBroadcast/canardRequestOrRespond return values; log on OOM
- H7: set RxBuffersNbr=0 (dedicated buffer was unused, wasted message RAM)
- H7: reject unmatched standard-ID frames (FDCAN_REJECT for NonMatchingStd)
- F7: move NVIC_EnableIRQ calls after all ActivateNotification calls succeed
- Fix F7 bxCAN Doxygen: correct @brief, remove wrong @param, fix @RetVal
- Fix all @RetVal docs: ret==0 is OK (CANARD_OK), not ret==1
- Cast pid_t to uint32_t before bit-shifting in SITL unique ID generation
- Wrap processCanardTxQueue in #if STM32H7||STM32F7 — function is
  ISR-only and has no callers in SITL builds; static + no callers
  caused -Werror=unused-function on CI
- Atomic read-clear of txErrCount: hold dronecanMaskTxISR across
  snapshot+zero to prevent ISR increment being silently dropped
- Atomic read-clear of rxDropCount: disable CAN1_RX0_IRQn across
  snapshot+zero in canardSTM32GetAndClearRxDropCount (F7)
- Make all eight file-local handler functions static: send_NodeStatus,
  handle_NodeStatus, handle_GNSSAuxiliary, handle_GNSSFix,
  handle_GNSSFix2, handle_GNSSRCTMStream, handle_BatteryInfo,
  handle_GetNodeInfo
- H7 global filter: FDCAN_REJECT_REMOTE for both RTR frame params
  (was FDCAN_FILTER_REMOTE which passed RTR frames to normal filter)
- H7 Receive: add comment that DataLength==byte count holds only in
  FDCAN_FRAME_CLASSIC mode (FDCAN_DLC_BYTES_0..8 equal 0..8)
- F7 ring buffer: use local index snapshots in push/pop to avoid
  double volatile re-reads of writeIndex/readIndex
- H7 Transmit: add FDCAN_FRAME_CLASSIC comment on DataLength assignment
  matching the comment already on the receive path
- F7 init: add comment explaining pre-shifted timing register values
- SITL: map RTR flag in sitlCANFrameToLinux/FromLinux, consistent with
  hardware drivers
- SITL canardSTM32Transmit: add ERR frame guard matching hardware drivers
- handle_GNSSRCTMStream: remove dead decode, add comment that RTCM
  forwarding is not yet implemented
- dronecanInit: add default case to bitrate switch for EEPROM corruption
- vendor_specific_status_code: explicit (uint16_t) cast with comment
  acknowledging bits 16-30 of armingFlags are not transmitted
- fport.c: remove dead static volatile frameErrors counter (written but
  never read; triggered -Werror=unused-but-set-variable on GCC 16)
- H7 Receive: add bounds guard on DataLength before assignment to
  data_len (uint8_t); mirrors existing TX path guard; safe no-op in
  FDCAN_FRAME_CLASSIC mode but prevents silent truncation if DLC ever
  exceeds 8
- F7: fix stale comment hfdcan->ErrorCode -> hcan1.ErrorCode
- dronecanInit: fix mixed tab/space indentation in canardInit() call
- dronecanGetBitrateKbps: return 500 for DRONECAN_BITRATE_COUNT and
  default cases, matching what dronecanInit actually selects
- Fix typo: incremeneted -> incremented
- SITL GetRxFifoFillLevel: add comment explaining FIONREAD on SOCK_RAW
  returns next-datagram size only, so result is 0 or 1
FDCAN_FILTER_DUAL with FilterID1=0x0, FilterID2=0x1FFFFFFF only matched
those two exact IDs. Replace with FDCAN_FILTER_MASK (pattern=0, mask=0)
which accepts any extended ID. Also fix stale comments in H7 driver and
dronecanInit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant