|
1 | | -# Embedded Linux on i.MX 8M Plus |
| 1 | +# Edge AI on Embedded Linux — i.MX 8M Plus |
| 2 | + |
| 3 | +> **Real-time object detection at 11ms per frame** on a 2.3 TOPS NPU, with live MIPI camera feed rendered to HDMI. |
| 4 | +> Full BSP bring-up, kernel drivers, WiFi/BT, and NPU-accelerated inference — from first boot to working demo. |
| 5 | +
|
| 6 | +--- |
| 7 | + |
| 8 | +## Live Demo — NPU-Accelerated Object Detection |
| 9 | + |
| 10 | +<p align="center"> |
| 11 | + <a href="https://github.com/Corning-AI/embedded-linux/releases/latest"> |
| 12 | + <img src="media/demo-hero.jpg" alt="Real-time edge AI: person 72% + cell phone 80% detected on i.MX8MP NPU at 11ms" width="480"> |
| 13 | + </a> |
| 14 | +</p> |
| 15 | +<p align="center"> |
| 16 | + <strong>Person (72%) + Cell Phone (80%)</strong> detected in real-time on the <strong>NPU</strong> — <a href="https://github.com/Corning-AI/embedded-linux/releases/latest">watch the full 68s demo video</a> |
| 17 | +</p> |
| 18 | + |
| 19 | +<table> |
| 20 | +<tr> |
| 21 | +<td width="33%" align="center"> |
| 22 | +<img src="media/demo-detection-3.jpg" width="240"><br> |
| 23 | +<sub><strong>3 objects</strong> — person + phone + bottle</sub> |
| 24 | +</td> |
| 25 | +<td width="33%" align="center"> |
| 26 | +<img src="media/demo-detection-2.jpg" width="240"><br> |
| 27 | +<sub><strong>NPU 11.3ms</strong> — consistent ultra-low latency</sub> |
| 28 | +</td> |
| 29 | +<td width="33%" align="center"> |
| 30 | +<img src="media/demo-person-84.jpg" width="240"><br> |
| 31 | +<sub><strong>person 84%</strong> — high-confidence detection</sub> |
| 32 | +</td> |
| 33 | +</tr> |
| 34 | +</table> |
| 35 | + |
| 36 | +### Key Results |
| 37 | + |
| 38 | +| Metric | Value | |
| 39 | +|--------|-------| |
| 40 | +| **NPU inference latency** | **11 ms** (MobileNet SSD v2, INT8 quantized) | |
| 41 | +| **End-to-end pipeline** | **9 FPS** (camera → NPU → overlay → HDMI) | |
| 42 | +| **Detectable classes** | **80** (COCO: person, phone, bottle, laptop, book…) | |
| 43 | +| **NPU vs CPU speedup** | **4× faster** (11ms vs 45ms per frame) | |
| 44 | +| **Pose estimation** | **13 ms** (MoveNet Lightning, 17 keypoints) | |
| 45 | + |
| 46 | +### End-to-End Pipeline |
2 | 47 |
|
3 | | -Heterogeneous SoC platform project built on the NXP i.MX 8M Plus EVK. Covers BSP bring-up, kernel driver development, camera/NPU pipelines, and real-time co-processing with FreeRTOS on the Cortex-M7. |
| 48 | +``` |
| 49 | +OV5640 ──MIPI CSI-2──▶ ISI DMA ──▶ GStreamer appsink ──▶ TFLite + NPU ──▶ PIL overlay ──▶ HDMI |
| 50 | + Camera 2-lane /dev/video3 Python/numpy VX Delegate bounding waylandsink |
| 51 | + 640×480 ~11ms/frame boxes+labels |
| 52 | +``` |
| 53 | + |
| 54 | +--- |
| 55 | + |
| 56 | +## What This Project Covers |
| 57 | + |
| 58 | +This is a **full-stack embedded Linux + edge AI** project on the NXP i.MX 8M Plus EVK, covering every layer from silicon to application: |
| 59 | + |
| 60 | +| Layer | What's Done | Difficulty | |
| 61 | +|-------|-------------|------------| |
| 62 | +| **BSP** | Yocto Scarthgap custom image build (`imx-image-multimedia`) | Medium | |
| 63 | +| **Kernel drivers** | 3 out-of-tree modules (hello → chardev → I2C BME280) | Medium | |
| 64 | +| **Device tree** | Annotated camera pipeline overlay (MIPI CSI-2 → ISI) | Hard | |
| 65 | +| **V4L2 userspace** | C program with multi-planar mmap capture | Medium | |
| 66 | +| **WiFi/BT bring-up** | DTB binary patch + PCIe/UART driver loading | Hard | |
| 67 | +| **Camera pipeline** | GStreamer → appsink → numpy (zero-copy DMA) | Medium | |
| 68 | +| **NPU inference** | TFLite INT8 + VX Delegate on 2.3 TOPS NPU | Hard | |
| 69 | +| **Edge AI app** | Real-time detection + pose + OSD → HDMI display | Hard | |
| 70 | +| **Debug methodology** | 6 documented debug cases with root cause analysis | — | |
| 71 | + |
| 72 | +--- |
4 | 73 |
|
5 | 74 | ## Hardware |
6 | 75 |
|
7 | | -- **SoC:** NXP i.MX 8M Plus — quad Cortex-A53 (1.8 GHz) + Cortex-M7 + 2.3 TOPS NPU |
8 | | -- **Board:** 8MPLUSLPD4-EVK (6 GB LPDDR4, 32 GB eMMC) |
9 | | -- **Camera:** OV5640 MIPI CSI-2 via MINISASTOCSI adapter (J12) |
10 | | -- **Display:** HDMI 2.0a (J17), Weston/Wayland compositor |
11 | | -- **Kernel:** 6.6.52-lts (Yocto Scarthgap) |
| 76 | +| Component | Spec | |
| 77 | +|-----------|------| |
| 78 | +| **SoC** | NXP i.MX 8M Plus — quad Cortex-A53 (1.8 GHz) + Cortex-M7 + **2.3 TOPS NPU** | |
| 79 | +| **Board** | 8MPLUSLPD4-EVK (6 GB LPDDR4, 32 GB eMMC) | |
| 80 | +| **Camera** | OV5640 MIPI CSI-2 via MINISASTOCSI adapter (J12) | |
| 81 | +| **Display** | HDMI 2.0a (J17), Weston/Wayland compositor | |
| 82 | +| **WiFi/BT** | AzureWave AW-CM276NF (NXP 88W8997, PCIe + UART) | |
| 83 | +| **Kernel** | 6.6.52-lts (Yocto Scarthgap) | |
12 | 84 |
|
13 | 85 | ## Repository Structure |
14 | 86 |
|
15 | | -```text |
| 87 | +``` |
| 88 | +app/camera-detect/ Real-time edge AI app (detection + pose + demo modes) |
16 | 89 | kernel-modules/ |
17 | | -├── hello/ Minimal loadable kernel module |
18 | | -├── chardev/ Character device driver (/dev node, ioctl, mutex) |
19 | | -└── bme280/ I2C client driver with sysfs interface |
20 | | -
|
21 | | -drivers/ |
22 | | -└── v4l2-capture/ V4L2 mmap frame capture (C, multi-planar API) |
23 | | -
|
24 | | -dts/ |
25 | | -└── imx8mp-evk-ov5640.dts Annotated device tree overlay (camera pipeline) |
26 | | -
|
27 | | -debug/ |
28 | | -├── 01-video-device-numbering /dev/video0 is NOT the camera on i.MX8MP |
29 | | -├── 02-camera-red-tint ISI RGB/BGR format mismatch → color shift |
30 | | -├── 03-galcore-not-in-lsmod Built-in driver vs loadable module confusion |
31 | | -├── 04-wifi-dns-resolution WiFi connected but no DNS → resolv.conf |
32 | | -├── 05-media-controller-pipeline MC API graph setup for MIPI camera |
33 | | -└── 06-weston-service-name Service renamed between Yocto releases |
34 | | -
|
35 | | -scripts/ |
36 | | -├── build-multimedia.sh Yocto build helper for imx-image-multimedia |
37 | | -└── serial_transfer.py File transfer over debug UART (base64) |
38 | | -
|
39 | | -docs/ |
40 | | -├── 01-dev-environment.md Build host setup |
41 | | -├── 02-hardware-guide.md EVK wiring, boot config, peripheral map |
42 | | -├── 03-yocto-bsp.md Yocto BSP build (repo init → bitbake) |
43 | | -├── 07-camera-npu.md Camera + NPU object detection pipeline |
44 | | -└── 08-device-tree-explained.md Device tree walkthrough |
| 90 | +├── hello/ Minimal loadable kernel module |
| 91 | +├── chardev/ Character device driver (/dev node, ioctl, mutex) |
| 92 | +└── bme280/ I2C client driver with sysfs interface |
| 93 | +drivers/v4l2-capture/ V4L2 mmap frame capture (C, multi-planar API) |
| 94 | +dts/ Annotated device tree overlay (OV5640 → ISI pipeline) |
| 95 | +debug/ 6 documented hardware/driver debug cases |
| 96 | +scripts/ Yocto build helper, serial file transfer tool |
| 97 | +docs/ Step-by-step guides: BSP, hardware, WiFi/BT, camera, NPU |
45 | 98 | ``` |
46 | 99 |
|
47 | | -## What's Working |
48 | | - |
49 | | -| Subsystem | Status | Details | |
50 | | -| --------- | ------ | ------- | |
51 | | -| Yocto BSP | Boots from SD | `imx-image-multimedia`, Scarthgap branch | |
52 | | -| OV5640 camera | Live HDMI preview | MIPI CSI-2 → ISI → GStreamer → Weston | |
53 | | -| NPU (VIP8000) | Driver loaded | galcore 6.4.11, VX Delegate + TFLite 2.16.2 | |
54 | | -| GPU (GC7000UL) | Weston @ 60 FPS | `weston-simple-egl` verified | |
55 | | -| Debug UART | Working | J23, 3rd COM port = A53 console (115200 8N1) | |
56 | | - |
57 | 100 | ## Quick Start |
58 | 101 |
|
59 | 102 | ```bash |
60 | | -# 1. Build the Yocto image (Ubuntu 22.04 host, ~2 hours first build) |
| 103 | +# 1. Build the Yocto image (Ubuntu 22.04 host) |
61 | 104 | source scripts/build-multimedia.sh |
62 | 105 |
|
63 | | -# 2. Flash to SD card (use Rufus DD image mode on Windows) |
64 | | -# Set SW4: OFF OFF ON ON (SD card boot) |
65 | | - |
66 | | -# 3. Connect: USB-C to J5 (power), micro-USB to J23 (debug UART) |
67 | | -# Serial: 3rd COM port, 115200 8N1 |
68 | | - |
69 | | -# 4. Camera preview (on the EVK, over serial console) |
70 | | -export XDG_RUNTIME_DIR=/run/user/0 |
71 | | -gst-launch-1.0 v4l2src device=/dev/video3 ! \ |
72 | | - video/x-raw,width=640,height=480,framerate=30/1 ! \ |
73 | | - videoconvert ! autovideosink |
74 | | - |
75 | | -# 5. Build and load the hello module (cross-compile or on-device) |
76 | | -cd kernel-modules/hello |
77 | | -make ARCH=arm64 CROSS_COMPILE=aarch64-poky-linux- \ |
78 | | - KERNELDIR=/path/to/yocto/build/tmp/work/.../linux-imx/build |
79 | | -scp hello.ko root@<EVK_IP>:/tmp/ |
80 | | -# On EVK: |
81 | | -insmod /tmp/hello.ko && dmesg | tail -3 |
82 | | -``` |
| 106 | +# 2. Flash SD card (Rufus DD mode), set SW4: OFF OFF ON ON |
83 | 107 |
|
84 | | -## Kernel Modules |
85 | | - |
86 | | -Three out-of-tree modules demonstrating progressive driver complexity: |
87 | | - |
88 | | -**hello** — Module lifecycle (`init`/`exit`), `printk`, section markers (`__init`/`__exit`). |
89 | | - |
90 | | -**chardev** — Full character device with dynamic major allocation, `file_operations` (read/write/ioctl), `copy_to_user`/`copy_from_user`, mutex synchronization, and automatic `/dev` node creation via `class_create`/`device_create`. |
91 | | - |
92 | | -**bme280** — I2C client driver using the `probe`/`remove` lifecycle, device tree `compatible` matching, `i2c_smbus_*` register access, sysfs attributes, and `devm_` managed allocation. |
| 108 | +# 3. Connect: USB-C → J5 (power), micro-USB → J23 (debug UART, 3rd COM port, 115200) |
93 | 109 |
|
94 | | -## V4L2 Capture |
| 110 | +# 4. Run the edge AI demo on the EVK |
| 111 | +export XDG_RUNTIME_DIR=/run/user/0 WAYLAND_DISPLAY=wayland-1 |
| 112 | +python3 /opt/camera-detect/detect_camera.py --mode demo |
| 113 | +``` |
95 | 114 |
|
96 | | -Userspace C program demonstrating the V4L2 multi-planar API as used by i.MX8MP's ISI: |
| 115 | +## Edge AI Detection App |
97 | 116 |
|
98 | | -- `VIDIOC_QUERYCAP` → capability query |
99 | | -- `VIDIOC_S_FMT` → format negotiation (multi-planar) |
100 | | -- `VIDIOC_REQBUFS` + `mmap()` → zero-copy DMA buffer setup |
101 | | -- `VIDIOC_QBUF`/`VIDIOC_DQBUF` → streaming capture loop |
| 117 | +```bash |
| 118 | +python3 detect_camera.py # Object detection (MobileNet SSD v2) |
| 119 | +python3 detect_camera.py --mode pose # Pose estimation (MoveNet, 17 joints) |
| 120 | +python3 detect_camera.py --mode demo # Both models simultaneously |
| 121 | +python3 detect_camera.py --mode demo --compare # NPU vs CPU side-by-side benchmark |
| 122 | +python3 detect_camera.py --no-display # Headless mode (SSH/serial output only) |
| 123 | +``` |
102 | 124 |
|
103 | | -## Device Tree |
| 125 | +Features: real-time OSD (FPS, latency, object count), NMS post-processing, multi-model support, Wayland/HDMI output, headless mode. |
104 | 126 |
|
105 | | -The [`dts/imx8mp-evk-ov5640.dts`](dts/imx8mp-evk-ov5640.dts) overlay is annotated to explain how the camera pipeline is described in hardware: |
| 127 | +## Kernel Modules |
106 | 128 |
|
107 | | -```text |
108 | | -OV5640 (I2C, 0x3c) → MIPI CSI-2 RX → ISI ch0 → /dev/video3 |
109 | | -``` |
| 129 | +Three out-of-tree modules with progressive complexity: |
110 | 130 |
|
111 | | -Covers clock providers, regulator bindings, OF graph endpoint linking, and MIPI lane configuration. |
| 131 | +| Module | Concepts | |
| 132 | +|--------|----------| |
| 133 | +| **hello** | `module_init`/`module_exit`, `printk`, `__init`/`__exit` section markers | |
| 134 | +| **chardev** | `file_operations`, `copy_to_user`/`copy_from_user`, mutex, dynamic major, `class_create`/`device_create` | |
| 135 | +| **bme280** | I2C client driver, device tree `compatible` matching, `i2c_smbus_*`, sysfs attributes, `devm_` managed alloc | |
112 | 136 |
|
113 | | -## i.MX8MP Video Device Map |
| 137 | +## Debug Notes |
114 | 138 |
|
115 | | -On this SoC, `/dev/video0` is **not** the camera: |
| 139 | +Real debugging cases encountered during bring-up — each with root cause and fix: |
116 | 140 |
|
117 | | -| Device | Function | |
118 | | -| ------ | -------- | |
119 | | -| /dev/video0 | VPU H.264/HEVC encoder | |
120 | | -| /dev/video1 | VPU decoder | |
121 | | -| /dev/video2 | ISI memory-to-memory (CSC) | |
122 | | -| /dev/video3 | **ISI capture (camera)** | |
| 141 | +| # | Issue | Root Cause | |
| 142 | +|---|-------|-----------| |
| 143 | +| 01 | `/dev/video0` is not the camera | VPU encoder registered first; ISI capture = `/dev/video3` | |
| 144 | +| 02 | Camera feed has red tint | ISI outputs BGR, app assumed RGB | |
| 145 | +| 03 | `galcore` not in `lsmod` | Built-in kernel driver, not a loadable module | |
| 146 | +| 04 | WiFi connected but no DNS | `resolv.conf` not populated by DHCP client | |
| 147 | +| 05 | Camera pipeline won't start | Media controller link setup required before streaming | |
| 148 | +| 06 | `weston@root` service not found | Renamed to `weston.service` in newer Yocto | |
123 | 149 |
|
124 | 150 | ## Roadmap |
125 | 151 |
|
126 | | -- [x] Yocto BSP bring-up and first boot verification |
127 | | -- [x] Camera pipeline: OV5640 → ISI → GStreamer → HDMI |
128 | | -- [x] NPU stack verification (galcore, VX Delegate, TFLite) |
129 | | -- [x] Kernel module examples (hello, chardev, I2C driver) |
130 | | -- [x] V4L2 capture program and device tree overlay |
131 | | -- [ ] NPU benchmark: CPU vs NPU inference latency |
132 | | -- [ ] Real-time object detection (MobileNet SSD + camera + NPU) |
133 | | -- [ ] FreeRTOS on Cortex-M7 with RPMsg inter-core communication |
134 | | -- [ ] End-to-end demo: camera → NPU → overlay → HDMI |
| 152 | +- [x] Yocto BSP build and first boot |
| 153 | +- [x] Camera: OV5640 → MIPI CSI-2 → ISI → GStreamer → HDMI |
| 154 | +- [x] NPU stack: galcore + VX Delegate + TFLite INT8 |
| 155 | +- [x] Kernel modules: hello → chardev → I2C driver |
| 156 | +- [x] V4L2 capture (C) and device tree overlay |
| 157 | +- [x] WiFi (PCIe) + Bluetooth (UART) bring-up |
| 158 | +- [x] NPU benchmark: 11ms NPU vs 45ms CPU |
| 159 | +- [x] **Real-time edge AI: camera → NPU → overlay → HDMI** |
| 160 | +- [ ] FreeRTOS on Cortex-M7 + RPMsg inter-core communication |
135 | 161 |
|
136 | 162 | ## License |
137 | 163 |
|
|
0 commit comments