You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
cv500 has CNN inference on a separate NNIE block at 0x11100000 (IRQ SPI 45) — distinct from the classic IVE block at 0x11230000 that ive_neo drives. Today our cv500 dispatcher returns -EOPNOTSUPP for all XNN/CNN ops (gated by chip.has_xnn = false since the IVE block has no NEO unit). To run real CNN models on cv500 we need a new driver for the NNIE block.
A. New kernel/nnie_neo/ module — separate driver bound to compatible="hisilicon,hisi-nnie", independent register window, independent IRQ. Cleanest separation but duplicates a lot of submit-chain infrastructure already in ive_neo.
B. Extend ive_neo with a second platform_driver entry for the NNIE compatible. Reuses ive_submit_chain and the chip-ops abstraction; cv500 grows a second register region.
(A) matches how the vendor ships it (hi3516cv500_nnie.ko separate from hi3516cv500_ive.ko). Probably the right call long-term.
HW task-node layout for NNIE — different from IVE's 208-byte node; need RE pass.
Memory arbitration [0x90] — the magic-bit sequence we skipped for cv500 IVE is required for NNIE (it's the Conv-unit DRAM-priority knob). Sequence needs to be dumped from vendor hi3516cv500_nnie.ko blob.
NNIE has its own queue/scheduler ABI — vendor blob probably uses the same pattern as IVE (ive_create_task / ive_schedule_task / handle tracking).
Userland: libive_neo already has stub HI_MPI_IVE_CNN_* returning HI_ERR_IVE_NOT_SUPPORT. Need real marshalling + ioctl path.
Scope estimate
Multi-day RE per major op (CNN_LoadModel + CNN_Predict are the load-bearing two). Similar shape to the PerspTrans/Hog work but with twice the surface (model loader is large; HW dispatch has more state to manage).
Definition of done
New kernel module open_nnie_neo.ko bound to hisilicon,hisi-nnie on cv500.
CNN model loader + forward pass produce non-trivial output on a small test model.
Round-trip test in libraries/ive_neo/test/ that loads a tiny convolutional model and checks the output classification is plausible.
Memory arbitration [0x90] sequence captured from the cv500 vendor blob and applied.
Related
kaeru ive-neo-cv500-hw-init-reg-window-mismatch documents why [0x90] was skipped for IVE.
kaeru hiive-cnn-chip-support-matrix for which HiSilicon SoCs actually support CNN HW.
cv500 has CNN inference on a separate NNIE block at
0x11100000(IRQ SPI 45) — distinct from the classic IVE block at0x11230000thative_neodrives. Today our cv500 dispatcher returns-EOPNOTSUPPfor all XNN/CNN ops (gated bychip.has_xnn = falsesince the IVE block has no NEO unit). To run real CNN models on cv500 we need a new driver for the NNIE block.This was explicitly deferred from the original cv500 plan (see plan section "Different IP layout").
Architectural choices
A. New
kernel/nnie_neo/module — separate driver bound tocompatible="hisilicon,hisi-nnie", independent register window, independent IRQ. Cleanest separation but duplicates a lot of submit-chain infrastructure already inive_neo.B. Extend
ive_neowith a second platform_driver entry for the NNIE compatible. Reusesive_submit_chainand the chip-ops abstraction; cv500 grows a second register region.(A) matches how the vendor ships it (
hi3516cv500_nnie.koseparate fromhi3516cv500_ive.ko). Probably the right call long-term.What's involved
libive.soexposesHI_MPI_IVE_CNN_LoadModel/_Predict/_GetResult/_UnloadModel. Find ioctl numbers + arg buffer layout (similar effort to PerspTrans/Hog RE in kernel/ive_neo + libraries/ive_neo: cv500 PerspTrans + Hog HW dispatch (default-on) #107).[0x90]— the magic-bit sequence we skipped for cv500 IVE is required for NNIE (it's the Conv-unit DRAM-priority knob). Sequence needs to be dumped from vendorhi3516cv500_nnie.koblob.ive_create_task/ive_schedule_task/ handle tracking).libive_neoalready has stubHI_MPI_IVE_CNN_*returning HI_ERR_IVE_NOT_SUPPORT. Need real marshalling + ioctl path.Scope estimate
Multi-day RE per major op (
CNN_LoadModel+CNN_Predictare the load-bearing two). Similar shape to the PerspTrans/Hog work but with twice the surface (model loader is large; HW dispatch has more state to manage).Definition of done
open_nnie_neo.kobound tohisilicon,hisi-nnieon cv500.libraries/ive_neo/test/that loads a tiny convolutional model and checks the output classification is plausible.[0x90]sequence captured from the cv500 vendor blob and applied.Related
ive-neo-cv500-hw-init-reg-window-mismatchdocuments why[0x90]was skipped for IVE.hiive-cnn-chip-support-matrixfor which HiSilicon SoCs actually support CNN HW.