NNIE backend for cv500 (CNN inference)

cv500 has CNN inference on a **separate NNIE block** at `0x11100000` (IRQ SPI 45) — distinct from the classic IVE block at `0x11230000` that `ive_neo` drives. Today our cv500 dispatcher returns `-EOPNOTSUPP` for all XNN/CNN ops (gated by `chip.has_xnn = false` since the IVE block has no NEO unit). To run real CNN models on cv500 we need a new driver for the NNIE block.

This was explicitly deferred from the original cv500 plan ([see plan section "Different IP layout"](https://github.com/OpenIPC/openhisilicon/pull/105)).

## Architectural choices
A. **New `kernel/nnie_neo/` module** — separate driver bound to `compatible="hisilicon,hisi-nnie"`, independent register window, independent IRQ. Cleanest separation but duplicates a lot of submit-chain infrastructure already in `ive_neo`.
B. **Extend `ive_neo` with a second platform_driver entry** for the NNIE compatible. Reuses `ive_submit_chain` and the chip-ops abstraction; cv500 grows a second register region.

(A) matches how the vendor ships it (`hi3516cv500_nnie.ko` separate from `hi3516cv500_ive.ko`). Probably the right call long-term.

## What's involved
1. **Wire format**: cv500 `libive.so` exposes `HI_MPI_IVE_CNN_LoadModel` / `_Predict` / `_GetResult` / `_UnloadModel`. Find ioctl numbers + arg buffer layout (similar effort to PerspTrans/Hog RE in #107).
2. **HW task-node layout for NNIE** — different from IVE's 208-byte node; need RE pass.
3. **Memory arbitration `[0x90]`** — the magic-bit sequence we skipped for cv500 IVE is **required** for NNIE (it's the Conv-unit DRAM-priority knob). Sequence needs to be dumped from vendor `hi3516cv500_nnie.ko` blob.
4. **NNIE has its own queue/scheduler ABI** — vendor blob probably uses the same pattern as IVE (`ive_create_task` / `ive_schedule_task` / handle tracking).
5. **Userland**: `libive_neo` already has stub `HI_MPI_IVE_CNN_*` returning HI_ERR_IVE_NOT_SUPPORT. Need real marshalling + ioctl path.

## Scope estimate
Multi-day RE per major op (`CNN_LoadModel` + `CNN_Predict` are the load-bearing two). Similar shape to the PerspTrans/Hog work but with twice the surface (model loader is large; HW dispatch has more state to manage).

## Definition of done
- [ ] New kernel module `open_nnie_neo.ko` bound to `hisilicon,hisi-nnie` on cv500.
- [ ] CNN model loader + forward pass produce non-trivial output on a small test model.
- [ ] Round-trip test in `libraries/ive_neo/test/` that loads a tiny convolutional model and checks the output classification is plausible.
- [ ] Memory arbitration `[0x90]` sequence captured from the cv500 vendor blob and applied.

## Related
- kaeru `ive-neo-cv500-hw-init-reg-window-mismatch` documents why `[0x90]` was skipped for IVE.
- kaeru `hiive-cnn-chip-support-matrix` for which HiSilicon SoCs actually support CNN HW.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

NNIE backend for cv500 (CNN inference) #111

Architectural choices

What's involved

Scope estimate

Definition of done

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

NNIE backend for cv500 (CNN inference) #111

Description

Architectural choices

What's involved

Scope estimate

Definition of done

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions