@@ -5,11 +5,11 @@ Single WASM file containing all AI models and execution engines for Socket CLI.
55## Architecture
66
77```
8- socket-ai.wasm (~145MB )
8+ socket-ai.wasm (~115MB )
99 ├─ ONNX Runtime (~2-5MB) - ML execution engine
1010 ├─ MiniLM model (~17MB int8) - Semantic understanding
11- ├─ CodeT5 encoder (~60MB int8 ) - Code generation (encoder)
12- ├─ CodeT5 decoder (~60MB int8 ) - Code generation (decoder)
11+ ├─ CodeT5 encoder (~30MB int4 ) - Code generation (encoder)
12+ ├─ CodeT5 decoder (~60MB int4 ) - Code generation (decoder)
1313 ├─ Tokenizers (~1MB) - Text tokenization
1414 └─ Yoga Layout (~95KB) - Flexbox layout engine
1515```
@@ -41,8 +41,8 @@ node scripts/wasm/download-models.mjs
4141- Yoga Layout WASM (from node_modules)
4242
4343** What needs conversion** (see next step):
44- - CodeT5 encoder (PyTorch → ONNX int8 )
45- - CodeT5 decoder (PyTorch → ONNX int8 )
44+ - CodeT5 encoder (PyTorch → ONNX int4 )
45+ - CodeT5 decoder (PyTorch → ONNX int4 )
4646
4747### 3. Convert CodeT5 Models (One-Time)
4848
@@ -57,7 +57,7 @@ node scripts/wasm/convert-codet5.mjs
5757** What it does** :
5858- Downloads ` Salesforce/codet5-small ` from HuggingFace
5959- Exports PyTorch → ONNX format
60- - Quantizes fp32 → int8
60+ - Quantizes fp32 → int4 (4-bit weights, 50% smaller than int8)
6161- Saves to ` .cache/models/ `
6262
6363### 4. Build Unified WASM
@@ -74,7 +74,7 @@ node scripts/wasm/build-unified-wasm.mjs
74745 . Generates ` external/socket-ai-sync.mjs `
7575
7676** Output** :
77- - ` wasm-bundle/pkg/socket_ai_bg.wasm ` (~ 145MB )
77+ - ` wasm-bundle/pkg/socket_ai_bg.wasm ` (~ 115MB )
7878- ` external/socket-ai-sync.mjs ` (brotli+base64 embedded)
7979
8080## Distribution Pipeline
@@ -169,8 +169,8 @@ external/
169169.cache/models/ # Downloaded models (gitignored)
170170├── minilm-int8.onnx
171171├── minilm-tokenizer.json
172- ├── codet5-encoder-int8 .onnx
173- ├── codet5-decoder-int8 .onnx
172+ ├── codet5-encoder-int4 .onnx
173+ ├── codet5-decoder-int4 .onnx
174174├── codet5-tokenizer.json
175175├── ort-wasm-simd-threaded.wasm
176176└── yoga.wasm
@@ -181,8 +181,8 @@ external/
181181### 1. Create CodeT5 Conversion Script
182182
183183``` bash
184- # TODO: Create scripts/wasm/convert-codet5.mjs
185- # Uses Python + optimum-cli to convert PyTorch → ONNX int8
184+ # NOTE: Already created - see scripts/wasm/convert-codet5.mjs
185+ # Uses Python + optimum to convert PyTorch → ONNX int4
186186```
187187
188188### 2. Update Native Stub for .bz Detection
@@ -261,7 +261,7 @@ node bin/bootstrap.js --version
261261| ** Initialization** | ~ 150-200ms | ~ 50-100ms |
262262| ** Memory layout** | Fragmented | Contiguous |
263263| ** Distribution** | Complex | Simple |
264- | ** Size (raw)** | ~ 140MB total | ~ 145MB (+5MB overhead ) |
264+ | ** Size (raw)** | ~ 140MB total | ~ 115MB (30MB savings with int4 ) |
265265| ** Size (brotli)** | N/A | ~ 50-70MB base64 |
266266| ** Size (final)** | ~ 10MB | ~ 20-30MB (estimated) |
267267
@@ -299,10 +299,10 @@ node scripts/wasm/convert-codet5.mjs
299299
300300### WASM Too Large
301301
302- The WASM file is ~ 145MB which is expected given:
303- - CodeT5 encoder: 60MB
304- - CodeT5 decoder: 60MB
305- - MiniLM: 17MB
302+ The WASM file is ~ 115MB which is expected given:
303+ - CodeT5 encoder: 30MB (int4)
304+ - CodeT5 decoder: 60MB (int4)
305+ - MiniLM: 17MB (int8)
306306- ONNX Runtime: 2-5MB
307307- Yoga: <1MB
308308
0 commit comments