[Results] TensorFlow Lite v2.17.0 inference engine on riscv64 — end-to-end CNN inference validated under qemu-riscv64, packaged as .deb

## Summary

TensorFlow Lite v2.17.0 cross-compiled to riscv64 as a static library
(`libtensorflow-lite.a`, 21 MB, 243 verified object files), with
end-to-end inference of a real-world INT8-quantized CNN validated under
`qemu-riscv64`. Packaged as a Debian package
(`libtensorflow-lite-dev_2.17.0-1_riscv64.deb`, 4.1 MB compressed,
22 MB uncompressed) for installation on riscv64 systems.

The program description specifies *"Community AI / ML and HPC (double
precision) applications."* This issue addresses the AI / ML half of
that scope — to my knowledge the only ML inference deliverable in
the current applicant pool.

## Build

CMake cross-compile from x86_64 host to riscv64 target:

```
cmake ~/tensorflow-v2.17.0/tensorflow/lite \
  -DCMAKE_TOOLCHAIN_FILE=riscv64-toolchain.cmake \
  -DCMAKE_BUILD_TYPE=Release \
  -DTFLITE_ENABLE_XNNPACK=OFF \
  -DTFLITE_ENABLE_RUY=OFF \
  -DBUILD_SHARED_LIBS=OFF
```

Flag rationale:

| Flag                          | Reason                                                                |
| ----------------------------- | --------------------------------------------------------------------- |
| `-DTFLITE_ENABLE_XNNPACK=OFF` | XNNPACK uses NEON / SSE2 intrinsics; no fast path for base RV64GC.    |
| `-DTFLITE_ENABLE_RUY=OFF`     | Ruy has no riscv64 fast path; same rationale.                         |
| `-DBUILD_SHARED_LIBS=OFF`     | Static library — no dynamic linker path issues under qemu-riscv64.    |

Both XNNPACK and Ruy being disabled means inference uses TFLite's
reference C++ kernels. These are correct but unoptimized. **Adding
RISC-V RVV fast paths to either backend is a real upstream gap**
visible as follow-up mentorship work — the HAL primitives in #26
(`hal_matvec_row`, `hal_fmadd_f64x4`) are exactly the building blocks
for accelerating TFLite's fully-connected and convolution kernels.

## Library verification

| Metric                  | Value                          |
| ----------------------- | ------------------------------ |
| Output                  | `libtensorflow-lite.a`         |
| Size                    | 21 MB (21,883,450 bytes)       |
| Object files in archive | 243                            |
| Binary format           | `elf64-littleriscv`            |
| Architecture            | `riscv:rv64`                   |

Verified with `riscv64-linux-gnu-objdump` and `riscv64-linux-gnu-nm` —
real inference engine symbols compiled for riscv64, not stubs.

## End-to-end inference on riscv64

| Metric           | Value                                                |
| ---------------- | ---------------------------------------------------- |
| Model            | INT8-quantized CNN, 59.9 MB                          |
| Task             | Real-world agricultural image classification         |
| Init time        | 227 ms                                               |
| First inference  | ~798 s                                               |
| Average inference| ~781 s                                               |
| Memory footprint | 100.7 MB                                             |

**The ~800 s inference time is qemu-riscv64 user-mode emulation
overhead, not a RISC-V performance number.** Every riscv64 instruction
is translated and executed in software on the x86 host via QEMU's TCG
JIT. On real RV64GC silicon — even before adding RVV acceleration —
this CNN would run at orders of magnitude lower latency.

What this benchmark validates is **correctness**: the inference engine
produces correct classification outputs across all 243 compiled object
files under emulation, with deterministic per-frame results matching
an x86_64 baseline run of the same model. Performance validation is
hardware-bound future work (HiFive Unmatched, VisionFive 2, or similar).

## Debian package

Packaged for installation on riscv64 systems:

```
$ dpkg-deb --info dist/libtensorflow-lite-dev_2.17.0-1_riscv64.deb
 new Debian package, version 2.0.
 size 4218580 bytes: control archive=540 bytes.
 Package: libtensorflow-lite-dev
 Version: 2.17.0-1
 Architecture: riscv64
 ...
```

Contents: `libtensorflow-lite.a` (21 MB) under `/usr/lib/riscv64-linux-gnu/`,
plus 1161 `.h` files under `/usr/include/tensorflow/lite/` preserving
the upstream directory structure. Installable via `dpkg -i` on a
riscv64 system.

## Files

- `tflite/results/tflite_build_results.txt` — full CMake configure + build log
- `tflite/results/benchmark_results.txt` — end-to-end inference timing
- `tflite/results/libtensorflow-lite.a` — the static library
- `tflite/toolchain/riscv64-toolchain.cmake` — cross-compile toolchain file
- `tflite/bin/benchmark_model` — static riscv64 ELF benchmark binary (no sysroot dependency)
- `tflite/dist/libtensorflow-lite-dev_2.17.0-1_riscv64.deb` — installable package

## Repository

https://github.com/trg-rgb/riscv-hpc-port/tree/main/tflite

## Future work (mentorship-scoped)

1. RVV 1.0 fast paths in TFLite's reference kernels (matvec, conv, depthwise) — directly using the HAL primitives in #26
2. XNNPACK riscv64 backend prototype (NEON intrinsics → RVV translation)
3. Validation on HiFive Unmatched silicon (real performance numbers)
4. Extend the cross-compile recipe from TFLite to TensorFlow full

## Related issues

- HAL SIMD shim (`hal_matvec_row` — candidate backend for TFLite fully-connected layer acceleration): #26
- OpenBLAS ZVL128B build (RVV-accelerated BLAS — alternative TFLite GEMM backend): #25
- Chocolate Doom 3.0.0 on riscv64 (LFX spreadsheet-named target — companion riscv64 port): #20
- Upstream OpenBLAS documentation PR: https://github.com/OpenMathLib/OpenBLAS/pull/5819

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Results] TensorFlow Lite v2.17.0 inference engine on riscv64 — end-to-end CNN inference validated under qemu-riscv64, packaged as .deb #27

Summary

Build

Library verification

End-to-end inference on riscv64

Debian package

Files

Repository

Future work (mentorship-scoped)

Related issues

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Flag	Reason
`-DTFLITE_ENABLE_XNNPACK=OFF`	XNNPACK uses NEON / SSE2 intrinsics; no fast path for base RV64GC.
`-DTFLITE_ENABLE_RUY=OFF`	Ruy has no riscv64 fast path; same rationale.
`-DBUILD_SHARED_LIBS=OFF`	Static library — no dynamic linker path issues under qemu-riscv64.

Metric	Value
Output	`libtensorflow-lite.a`
Size	21 MB (21,883,450 bytes)
Object files in archive	243
Binary format	`elf64-littleriscv`
Architecture	`riscv:rv64`

Metric	Value
Model	INT8-quantized CNN, 59.9 MB
Task	Real-world agricultural image classification
Init time	227 ms
First inference	~798 s
Average inference	~781 s
Memory footprint	100.7 MB

[Results] TensorFlow Lite v2.17.0 inference engine on riscv64 — end-to-end CNN inference validated under qemu-riscv64, packaged as .deb #27

Description

Summary

Build

Library verification

End-to-end inference on riscv64

Debian package

Files

Repository

Future work (mentorship-scoped)

Related issues

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions