Summary
After extensive work making ive_neo behaviorally equivalent to vendor open_ive.ko on hi3516ev300_lite, empirical evidence strongly suggests the XNN/Conv hardware block is non-functional on this SoC variant, regardless of driver. Both our driver and vendor's driver report success via ioctls but never actually execute the submitted XNN tasks.
Evidence
- HW task counter
[0x11320018] stays at 0x00000000 throughout all XNN submissions — both with ive_neo and with vendor's /lib/modules/4.9.37/extra/open_ive.ko
- Output buffers (dst_phys) stay all-zero after
fwd_slice ioctls
tmp_phys + conv1_out_offset stays zero — Conv layer 1 never writes its activations
- Only the preproc output (691200 bytes at
tmp_phys+0) is visible — that's our CPU-side preproc, not HW work
- Vendor
test-fc-model returns forward ret=0x0 handle=0 instant completion in 0.7 ms — way too fast for any real compute
- Vendor
test-ivp ivp_re_allday_f1y2f2m1_640x360_v1003.oms fails at SYS_Init: 0xa0028012 on this board
- This matches the earlier finding that
HI_MPI_IVE_CNN_LoadModel returns HI_ERR_IVE_NOT_SUPPORT (see git 112c3fb "test-cnn-model: CNN API returns NOT_SUPPORT on Hi3516EV200")
- Task nodes are byte-for-byte identical to what vendor writes (verified via
kprobe_ive.ko hooking drv_ive_write_regs)
- All init registers match vendor (
[0x34]=0x00313307, [0x54]=0x00003f07, [0x60]=0xffffffff, [0x84]=1, [0x88]=1, [0x8c]=0, [0x90]=0x01ab5159)
What ive_neo does achieve (current state)
- Full libive.so compatibility —
test-fc-model returns instant completion, test-ivp runs 50 frames at 50.9 fps (vendor hangs this test on the same board)
- All kernel memory bugs fixed (tile-node buffer, preproc pp_size, fwd_slice Unpack loop cap)
- Register init sequence matches vendor byte-for-byte (added
drv_ive_set_mem_speed which was missing)
- IRQ handler registered with completion primitive
- Async submit semantics matching vendor's
drv_ive_write_regs
- libive.so init path implemented:
svp_alg_proc_init (cmd 0x8010463b), ivp_proc_init (cmd 0x801046c8), mmap fops callback using osal_io_remap_pfn_range
- Slot-based model ID allocation
- Preproc writes raw Y/U/V pixel values (matches vendor VGS behavior)
Action points (for future work)
- Investigate CRG / AI subsystem enable bits. Vendor's
ive_init path only touches registers in the IVE block itself (0x11320000..0x11330000). If there's a separate AI/NNIE power domain or clock gate outside this window, it may need to be toggled. Suggested: read a full CRG dump during vendor runtime on a board where XNN is known to work (if any exists), compare against this board.
- Verify on a different SoC variant. Hi3516CV300, Hi3516CV500, Hi3519V101, Hi3516DV300 reportedly have functional CNN/XNN hardware. ive_neo should port with minimal changes since the kernel ABI is the OSAL layer and the task-node format is the same across variants. This would also establish a known-good baseline.
- Confirm non-XNN IVE ops work on this board. Submit a simple DMA, Sobel, or ThreshU8 task (bytes
[10]!=0x36) through ive_neo and check if HW actually runs it. If DMA works but XNN doesn't, the issue is XNN-specific (e.g., XNN subsystem disabled). If neither works, the whole IVE block is broken.
- Compare Hi3516EV200 vs EV300 silicon. The "EV" suffix suggests a variant without the IVP/CNN block. Check the official datasheet feature matrix — it's possible that CNN/XNN is only in the D/C/V variants and EV stands for "entry video" with XNN fused off.
- Check
mpp_init flags. libive.so may require a specific MPP_SYS_CONFIG_S parameter to enable an XNN subsystem that's otherwise off. Try running the vendor SDK's sample_ive_xnn binary (if available) to see what it calls before submission.
- RE vendor's
ive_umap_module_init (ive.o.c:8185) fully. We verified the documented register writes but there may be undocumented side-effects in CMPI_GetModuleFuncById(2) callbacks or SysDrvIoCtrl calls with IDs other than 144/145.
Test recipe to reproduce
```bash
Build
cd ~/git/openhisilicon/kernel/ive
make -C /git/firmware/output/build/linux-custom M=$PWD ARCH=arm \
CROSS_COMPILE=/git/firmware/output/host/bin/arm-openipc-linux-musleabi- \
PREFIX=open_ \
EXTRA_CFLAGS="-Dhi3516ev200 \
-I/home/dima/git/openhisilicon/include \
-I/home/dima/git/openhisilicon/kernel/osal/include \
-I/home/dima/git/openhisilicon/kernel/osal \
-I/home/dima/git/openhisilicon/kernel/ext_inc" modules
Test with ive_neo (runs at 50 fps but 0 detections)
ssh root@ 'rmmod open_ive; insmod /root/open_ive.ko save_power=0
cd /utils && ./test-ivp ivp_re_allday_f1y2f2m1_640x360_v1003.oms test-ivp-people.y4m'
Test with vendor (reports success but [0x18] stays 0)
ssh root@ 'rmmod open_ive; insmod /lib/modules/4.9.37/extra/open_ive.ko save_power=0
cd /utils && ./test-fc-model test_fc_model.oms
busybox devmem 0x11320018' # → 0x00000000
```
Summary
After extensive work making ive_neo behaviorally equivalent to vendor
open_ive.koonhi3516ev300_lite, empirical evidence strongly suggests the XNN/Conv hardware block is non-functional on this SoC variant, regardless of driver. Both our driver and vendor's driver report success via ioctls but never actually execute the submitted XNN tasks.Evidence
[0x11320018]stays at0x00000000throughout all XNN submissions — both withive_neoand with vendor's/lib/modules/4.9.37/extra/open_ive.kofwd_sliceioctlstmp_phys + conv1_out_offsetstays zero — Conv layer 1 never writes its activationstmp_phys+0) is visible — that's our CPU-side preproc, not HW worktest-fc-modelreturnsforward ret=0x0 handle=0 instant completionin 0.7 ms — way too fast for any real computetest-ivp ivp_re_allday_f1y2f2m1_640x360_v1003.omsfails atSYS_Init: 0xa0028012on this boardHI_MPI_IVE_CNN_LoadModelreturnsHI_ERR_IVE_NOT_SUPPORT(see git112c3fb"test-cnn-model: CNN API returns NOT_SUPPORT on Hi3516EV200")kprobe_ive.kohookingdrv_ive_write_regs)[0x34]=0x00313307,[0x54]=0x00003f07,[0x60]=0xffffffff,[0x84]=1,[0x88]=1,[0x8c]=0,[0x90]=0x01ab5159)What ive_neo does achieve (current state)
test-fc-modelreturnsinstant completion,test-ivpruns 50 frames at 50.9 fps (vendor hangs this test on the same board)drv_ive_set_mem_speedwhich was missing)drv_ive_write_regssvp_alg_proc_init(cmd0x8010463b),ivp_proc_init(cmd0x801046c8), mmap fops callback usingosal_io_remap_pfn_rangeAction points (for future work)
ive_initpath only touches registers in the IVE block itself (0x11320000..0x11330000). If there's a separate AI/NNIE power domain or clock gate outside this window, it may need to be toggled. Suggested: read a full CRG dump during vendor runtime on a board where XNN is known to work (if any exists), compare against this board.[10]!=0x36) through ive_neo and check if HW actually runs it. If DMA works but XNN doesn't, the issue is XNN-specific (e.g., XNN subsystem disabled). If neither works, the whole IVE block is broken.mpp_initflags. libive.so may require a specificMPP_SYS_CONFIG_Sparameter to enable an XNN subsystem that's otherwise off. Try running the vendor SDK'ssample_ive_xnnbinary (if available) to see what it calls before submission.ive_umap_module_init(ive.o.c:8185) fully. We verified the documented register writes but there may be undocumented side-effects inCMPI_GetModuleFuncById(2)callbacks orSysDrvIoCtrlcalls with IDs other than 144/145.Test recipe to reproduce
```bash
Build
cd ~/git/openhisilicon/kernel/ive
make -C
/git/firmware/output/build/linux-custom M=$PWD ARCH=arm \/git/firmware/output/host/bin/arm-openipc-linux-musleabi- \CROSS_COMPILE=
PREFIX=open_ \
EXTRA_CFLAGS="-Dhi3516ev200 \
-I/home/dima/git/openhisilicon/include \
-I/home/dima/git/openhisilicon/kernel/osal/include \
-I/home/dima/git/openhisilicon/kernel/osal \
-I/home/dima/git/openhisilicon/kernel/ext_inc" modules
Test with ive_neo (runs at 50 fps but 0 detections)
ssh root@ 'rmmod open_ive; insmod /root/open_ive.ko save_power=0
cd /utils && ./test-ivp ivp_re_allday_f1y2f2m1_640x360_v1003.oms test-ivp-people.y4m'
Test with vendor (reports success but [0x18] stays 0)
ssh root@ 'rmmod open_ive; insmod /lib/modules/4.9.37/extra/open_ive.ko save_power=0
cd /utils && ./test-fc-model test_fc_model.oms
busybox devmem 0x11320018' # → 0x00000000
```