Skip to content

Commit f973765

Browse files
committed
Add docs on arduino code gen
Implement: #270 Related-To: #267
1 parent 82b1899 commit f973765

2 files changed

Lines changed: 89 additions & 1 deletion

File tree

README.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,12 +72,25 @@ println(ds.describe())
7272
### SKaiNET is Compiler
7373

7474
- MLIR/StableHLO based lowering (modules provided in `SKaiNET-compile-*`)
75-
7675
```kotlin
7776
// Illustrative: export graph to JSON/StableHLO IR
7877
val ir = Compile.toStableHlo(model)
7978
println(ir.pretty())
8079
```
80+
- **Arduino C Code Generation**: Export models to standalone, optimized C99 code with static memory allocation.
81+
82+
```kotlin
83+
// Export model to an Arduino library
84+
val facade = CCodegenFacade()
85+
facade.exportToArduinoLibrary(
86+
model = model,
87+
forwardPass = { ctx -> model.forward(input, ctx) },
88+
outputPath = "build/arduino",
89+
libraryName = "MyModel"
90+
)
91+
```
92+
93+
Read the [Deep Technical Explanation](docs/arduino-c-codegen.md) for more details.
8194

8295
### SKaiNET is for Developers
8396

docs/arduino-c-codegen.md

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
# Arduino C Code Generation
2+
3+
SKaiNET provides a specialized compiler backend for exporting trained neural networks to highly optimized, standalone C99 code suitable for microcontrollers like Arduino.
4+
5+
## Overview
6+
7+
The Arduino C code generation process transforms a high-level Kotlin model into a memory-efficient C implementation. It prioritizes static memory allocation, minimal overhead, and numerical consistency with the original model.
8+
9+
### Codegen Pipeline
10+
11+
```mermaid
12+
graph TD
13+
A[Kotlin Model] --> B[Recording Pass]
14+
B --> C[Execution Tape]
15+
C --> D[Compute Graph]
16+
D --> E[Graph Validation]
17+
E --> F[Memory Layout Calculation]
18+
F --> G[C Code Emission]
19+
G --> H[Arduino Library Packaging]
20+
H --> I[Generated .h/.c files]
21+
```
22+
23+
## Technical Deep Dive
24+
25+
### 1. Tape-based Tracing
26+
Instead of static analysis of the Kotlin code, SKaiNET uses a dynamic tracing mechanism. When you call `exportToArduinoLibrary`, the framework executes a single forward pass of your model using a specialized `RecordingContext`.
27+
- Every operation (Dense, ReLU, etc.) is recorded onto an **Execution Tape**.
28+
- This approach handles Kotlin's language features (loops, conditionals) naturally, as it only records the actual operations that were executed.
29+
30+
### 2. Compute Graph Construction
31+
The execution tape is converted into a directed acyclic graph (DAG) called `ComputeGraph`.
32+
- Nodes represent operations (Ops).
33+
- Edges represent data flow (Tensors).
34+
- During this phase, the compiler performs **Shape Inference** to ensure every tensor has a fixed, known size.
35+
36+
### 3. Static Memory Management
37+
Microcontrollers typically have very limited RAM and lack robust heap management. SKaiNET uses a **Ping-Pong Buffer Strategy** to eliminate dynamic memory allocation (`malloc`/`free`) during inference.
38+
39+
#### Ping-Pong Buffer Strategy
40+
The compiler calculates the maximum size required for any intermediate tensor in the graph and allocates exactly two static buffers of that size.
41+
42+
```mermaid
43+
sequenceDiagram
44+
participant I as Input
45+
participant B1 as Buffer A
46+
participant B2 as Buffer B
47+
participant O as Output
48+
49+
I->>B1: Layer 1 (Input -> A)
50+
B1->>B2: Layer 2 (A -> B)
51+
B2->>B1: Layer 3 (B -> A)
52+
B1->>O: Layer 4 (A -> Output)
53+
```
54+
55+
- **Buffer Reuse**: Instead of allocating space for every layer's output, buffers are reused.
56+
- **Direct Output Optimization**: The first layer reads from the input pointer, and the last layer writes directly to the output pointer, avoiding unnecessary copies.
57+
58+
### 4. Code Generation (Emission)
59+
The `CCodeGenerator` emits C99-compatible code using templates.
60+
- **Weights & Biases**: Extracted from the trained Kotlin model and serialized as `static const float` arrays. This places them in Flash memory (PROGMEM) on many microcontrollers, saving precious RAM.
61+
- **Kernel Implementation**: Operations like `Dense` (Linear) are implemented as optimized nested loops.
62+
- **Header Generation**: Produces a clean API for the user:
63+
```c
64+
int model_inference(const float* input, float* output);
65+
```
66+
67+
### 5. Validation
68+
The generator performs post-generation validation:
69+
- **Static Allocation Check**: Ensures no dynamic allocation is present in the generated source.
70+
- **Buffer Alternation Check**: Verifies that the ping-pong strategy is correctly implemented without data races or overwrites.
71+
72+
## Performance and Constraints
73+
- **Floating Point**: Currently optimized for `FP32`.
74+
- **Supported Ops**: `Dense`, `ReLU`, `Sigmoid`, `Tanh`, `Add`, `MatMul`.
75+
- **Memory**: Total memory consumption is `TotalWeights + 2 * MaxIntermediateTensor`.

0 commit comments

Comments
 (0)