Skip to content

Commit ec5d37c

Browse files
committed
Fix INT4 quantization references in WASM build files
Updated all INT8 references to INT4 to match the actual quantization level used. The conversion scripts generate INT4 models and lib.rs expects INT4 files, but download-models.mjs and README.md still referenced INT8, causing build failures.
1 parent 1b589a1 commit ec5d37c

File tree

2 files changed

+22
-22
lines changed

2 files changed

+22
-22
lines changed

scripts/wasm/download-models.mjs

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,8 @@
44
* WHAT THIS DOWNLOADS:
55
* 1. MiniLM model (int8 quantized, ~17MB)
66
* 2. MiniLM tokenizer (~500KB)
7-
* 3. CodeT5 encoder (int8 quantized, ~60MB)
8-
* 4. CodeT5 decoder (int8 quantized, ~60MB)
7+
* 3. CodeT5 encoder (int4 quantized, ~30MB)
8+
* 4. CodeT5 decoder (int4 quantized, ~60MB)
99
* 5. CodeT5 tokenizer (~500KB)
1010
* 6. ONNX Runtime WASM (~2-5MB)
1111
* 7. Yoga Layout WASM (~95KB) - copied from node_modules
@@ -50,14 +50,14 @@ const FILES = [
5050
// CodeT5 (needs manual conversion first - see convert-codet5.mjs).
5151
{
5252
copyFrom: null, // Set after conversion
53-
description: 'CodeT5 encoder (int8)',
54-
name: 'codet5-encoder-int8.onnx',
53+
description: 'CodeT5 encoder (int4)',
54+
name: 'codet5-encoder-int4.onnx',
5555
url: null, // Needs conversion first
5656
},
5757
{
5858
copyFrom: null,
59-
description: 'CodeT5 decoder (int8)',
60-
name: 'codet5-decoder-int8.onnx',
59+
description: 'CodeT5 decoder (int4)',
60+
name: 'codet5-decoder-int4.onnx',
6161
url: null, // Needs conversion first
6262
},
6363
{

wasm-bundle/README.md

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -5,11 +5,11 @@ Single WASM file containing all AI models and execution engines for Socket CLI.
55
## Architecture
66

77
```
8-
socket-ai.wasm (~145MB)
8+
socket-ai.wasm (~115MB)
99
├─ ONNX Runtime (~2-5MB) - ML execution engine
1010
├─ MiniLM model (~17MB int8) - Semantic understanding
11-
├─ CodeT5 encoder (~60MB int8) - Code generation (encoder)
12-
├─ CodeT5 decoder (~60MB int8) - Code generation (decoder)
11+
├─ CodeT5 encoder (~30MB int4) - Code generation (encoder)
12+
├─ CodeT5 decoder (~60MB int4) - Code generation (decoder)
1313
├─ Tokenizers (~1MB) - Text tokenization
1414
└─ Yoga Layout (~95KB) - Flexbox layout engine
1515
```
@@ -41,8 +41,8 @@ node scripts/wasm/download-models.mjs
4141
- Yoga Layout WASM (from node_modules)
4242

4343
**What needs conversion** (see next step):
44-
- CodeT5 encoder (PyTorch → ONNX int8)
45-
- CodeT5 decoder (PyTorch → ONNX int8)
44+
- CodeT5 encoder (PyTorch → ONNX int4)
45+
- CodeT5 decoder (PyTorch → ONNX int4)
4646

4747
### 3. Convert CodeT5 Models (One-Time)
4848

@@ -57,7 +57,7 @@ node scripts/wasm/convert-codet5.mjs
5757
**What it does**:
5858
- Downloads `Salesforce/codet5-small` from HuggingFace
5959
- Exports PyTorch → ONNX format
60-
- Quantizes fp32 → int8
60+
- Quantizes fp32 → int4 (4-bit weights, 50% smaller than int8)
6161
- Saves to `.cache/models/`
6262

6363
### 4. Build Unified WASM
@@ -74,7 +74,7 @@ node scripts/wasm/build-unified-wasm.mjs
7474
5. Generates `external/socket-ai-sync.mjs`
7575

7676
**Output**:
77-
- `wasm-bundle/pkg/socket_ai_bg.wasm` (~145MB)
77+
- `wasm-bundle/pkg/socket_ai_bg.wasm` (~115MB)
7878
- `external/socket-ai-sync.mjs` (brotli+base64 embedded)
7979

8080
## Distribution Pipeline
@@ -169,8 +169,8 @@ external/
169169
.cache/models/ # Downloaded models (gitignored)
170170
├── minilm-int8.onnx
171171
├── minilm-tokenizer.json
172-
├── codet5-encoder-int8.onnx
173-
├── codet5-decoder-int8.onnx
172+
├── codet5-encoder-int4.onnx
173+
├── codet5-decoder-int4.onnx
174174
├── codet5-tokenizer.json
175175
├── ort-wasm-simd-threaded.wasm
176176
└── yoga.wasm
@@ -181,8 +181,8 @@ external/
181181
### 1. Create CodeT5 Conversion Script
182182

183183
```bash
184-
# TODO: Create scripts/wasm/convert-codet5.mjs
185-
# Uses Python + optimum-cli to convert PyTorch → ONNX int8
184+
# NOTE: Already created - see scripts/wasm/convert-codet5.mjs
185+
# Uses Python + optimum to convert PyTorch → ONNX int4
186186
```
187187

188188
### 2. Update Native Stub for .bz Detection
@@ -261,7 +261,7 @@ node bin/bootstrap.js --version
261261
| **Initialization** | ~150-200ms | ~50-100ms |
262262
| **Memory layout** | Fragmented | Contiguous |
263263
| **Distribution** | Complex | Simple |
264-
| **Size (raw)** | ~140MB total | ~145MB (+5MB overhead) |
264+
| **Size (raw)** | ~140MB total | ~115MB (30MB savings with int4) |
265265
| **Size (brotli)** | N/A | ~50-70MB base64 |
266266
| **Size (final)** | ~10MB | ~20-30MB (estimated) |
267267

@@ -299,10 +299,10 @@ node scripts/wasm/convert-codet5.mjs
299299

300300
### WASM Too Large
301301

302-
The WASM file is ~145MB which is expected given:
303-
- CodeT5 encoder: 60MB
304-
- CodeT5 decoder: 60MB
305-
- MiniLM: 17MB
302+
The WASM file is ~115MB which is expected given:
303+
- CodeT5 encoder: 30MB (int4)
304+
- CodeT5 decoder: 60MB (int4)
305+
- MiniLM: 17MB (int8)
306306
- ONNX Runtime: 2-5MB
307307
- Yoga: <1MB
308308

0 commit comments

Comments
 (0)