Skip to content

ive_neo: Hi3516EV200/EV300 XNN/Conv HW appears non-functional at the silicon level #33

@widgetii

Description

@widgetii

Summary

After extensive work making ive_neo behaviorally equivalent to vendor open_ive.ko on hi3516ev300_lite, empirical evidence strongly suggests the XNN/Conv hardware block is non-functional on this SoC variant, regardless of driver. Both our driver and vendor's driver report success via ioctls but never actually execute the submitted XNN tasks.

Evidence

  • HW task counter [0x11320018] stays at 0x00000000 throughout all XNN submissions — both with ive_neo and with vendor's /lib/modules/4.9.37/extra/open_ive.ko
  • Output buffers (dst_phys) stay all-zero after fwd_slice ioctls
  • tmp_phys + conv1_out_offset stays zero — Conv layer 1 never writes its activations
  • Only the preproc output (691200 bytes at tmp_phys+0) is visible — that's our CPU-side preproc, not HW work
  • Vendor test-fc-model returns forward ret=0x0 handle=0 instant completion in 0.7 ms — way too fast for any real compute
  • Vendor test-ivp ivp_re_allday_f1y2f2m1_640x360_v1003.oms fails at SYS_Init: 0xa0028012 on this board
  • This matches the earlier finding that HI_MPI_IVE_CNN_LoadModel returns HI_ERR_IVE_NOT_SUPPORT (see git 112c3fb "test-cnn-model: CNN API returns NOT_SUPPORT on Hi3516EV200")
  • Task nodes are byte-for-byte identical to what vendor writes (verified via kprobe_ive.ko hooking drv_ive_write_regs)
  • All init registers match vendor ([0x34]=0x00313307, [0x54]=0x00003f07, [0x60]=0xffffffff, [0x84]=1, [0x88]=1, [0x8c]=0, [0x90]=0x01ab5159)

What ive_neo does achieve (current state)

  • Full libive.so compatibility — test-fc-model returns instant completion, test-ivp runs 50 frames at 50.9 fps (vendor hangs this test on the same board)
  • All kernel memory bugs fixed (tile-node buffer, preproc pp_size, fwd_slice Unpack loop cap)
  • Register init sequence matches vendor byte-for-byte (added drv_ive_set_mem_speed which was missing)
  • IRQ handler registered with completion primitive
  • Async submit semantics matching vendor's drv_ive_write_regs
  • libive.so init path implemented: svp_alg_proc_init (cmd 0x8010463b), ivp_proc_init (cmd 0x801046c8), mmap fops callback using osal_io_remap_pfn_range
  • Slot-based model ID allocation
  • Preproc writes raw Y/U/V pixel values (matches vendor VGS behavior)

Action points (for future work)

  1. Investigate CRG / AI subsystem enable bits. Vendor's ive_init path only touches registers in the IVE block itself (0x11320000..0x11330000). If there's a separate AI/NNIE power domain or clock gate outside this window, it may need to be toggled. Suggested: read a full CRG dump during vendor runtime on a board where XNN is known to work (if any exists), compare against this board.
  2. Verify on a different SoC variant. Hi3516CV300, Hi3516CV500, Hi3519V101, Hi3516DV300 reportedly have functional CNN/XNN hardware. ive_neo should port with minimal changes since the kernel ABI is the OSAL layer and the task-node format is the same across variants. This would also establish a known-good baseline.
  3. Confirm non-XNN IVE ops work on this board. Submit a simple DMA, Sobel, or ThreshU8 task (bytes [10]!=0x36) through ive_neo and check if HW actually runs it. If DMA works but XNN doesn't, the issue is XNN-specific (e.g., XNN subsystem disabled). If neither works, the whole IVE block is broken.
  4. Compare Hi3516EV200 vs EV300 silicon. The "EV" suffix suggests a variant without the IVP/CNN block. Check the official datasheet feature matrix — it's possible that CNN/XNN is only in the D/C/V variants and EV stands for "entry video" with XNN fused off.
  5. Check mpp_init flags. libive.so may require a specific MPP_SYS_CONFIG_S parameter to enable an XNN subsystem that's otherwise off. Try running the vendor SDK's sample_ive_xnn binary (if available) to see what it calls before submission.
  6. RE vendor's ive_umap_module_init (ive.o.c:8185) fully. We verified the documented register writes but there may be undocumented side-effects in CMPI_GetModuleFuncById(2) callbacks or SysDrvIoCtrl calls with IDs other than 144/145.

Test recipe to reproduce

```bash

Build

cd ~/git/openhisilicon/kernel/ive
make -C /git/firmware/output/build/linux-custom M=$PWD ARCH=arm \
CROSS_COMPILE=
/git/firmware/output/host/bin/arm-openipc-linux-musleabi- \
PREFIX=open_ \
EXTRA_CFLAGS="-Dhi3516ev200 \
-I/home/dima/git/openhisilicon/include \
-I/home/dima/git/openhisilicon/kernel/osal/include \
-I/home/dima/git/openhisilicon/kernel/osal \
-I/home/dima/git/openhisilicon/kernel/ext_inc" modules

Test with ive_neo (runs at 50 fps but 0 detections)

ssh root@ 'rmmod open_ive; insmod /root/open_ive.ko save_power=0
cd /utils && ./test-ivp ivp_re_allday_f1y2f2m1_640x360_v1003.oms test-ivp-people.y4m'

Test with vendor (reports success but [0x18] stays 0)

ssh root@ 'rmmod open_ive; insmod /lib/modules/4.9.37/extra/open_ive.ko save_power=0
cd /utils && ./test-fc-model test_fc_model.oms
busybox devmem 0x11320018' # → 0x00000000
```

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions