Skip to content

Commit 7d218df

Browse files
psiddhclaude
andcommitted
MathPal: 8th-grade math tutor app powered by ExecuTorch
On-device AI math tutor for Android with step-by-step problem solving. Runs entirely offline using ExecuTorch + XNNPACK backend. Features: - Ask Anything input with voice-to-text support - Practice mode with 40 grade-leveled problems (Grade 4-8) - Step-by-step reasoning card with answer validation - Supports both #### and \boxed{} answer formats - Gamification: XP system, streaks, boss battles, 30 badges - Lazy KV cache via DYNAMIC_UNBOUND for memory-efficient inference Tested models: - Qwen2.5-Math-1.5B-Instruct (8da4w): 83% GSM8K, 28 tok/s on S23 - Qwen3-0.6B fine-tuned on GSM8K: 47% accuracy, 12 tok/s on S23 Also includes macOS SwiftUI app scaffold (apple-mathpal/). See llm/android/MathPal/README.md for export instructions and model options. Co-authored-by: Claude <noreply@anthropic.com>
1 parent c947b03 commit 7d218df

53 files changed

Lines changed: 5946 additions & 0 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

llm/android/MathPal/README.md

Lines changed: 263 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,263 @@
1+
# MathPal — 8th Grade Math Tutor
2+
3+
An on-device AI math tutor for Android, powered by ExecuTorch. Runs entirely on the phone — no internet, no accounts, no data collection.
4+
5+
## Features
6+
7+
- **Ask Anything** — type or speak any math problem, get step-by-step solutions
8+
- **Practice Mode** — 40 grade-leveled problems (Grade 4–8) across 10 categories
9+
- **On-Device AI** — model runs locally via ExecuTorch + XNNPACK, works offline
10+
- **Lazy KV Cache** — uses `DYNAMIC_UNBOUND` allocation to defer ~500MB of KV cache memory until first inference ([PR #18350](https://github.com/pytorch/executorch/pull/18350))
11+
- **Voice Input** — tap the mic icon to speak your math problem
12+
- **Answer Validation** — parses `####` and `\boxed{}` answer formats automatically
13+
14+
## Supported Models
15+
16+
| Model | Params | Quantization | .pte Size | GSM8K Accuracy | Speed (S23) | Recommended |
17+
|-------|--------|-------------|-----------|---------------|-------------|-------------|
18+
| Qwen2.5-Math-1.5B-Instruct | 1.5B | 8da4w | 1.6 GB | **83%** | 28–37 tok/s | Best accuracy |
19+
| Qwen3-0.6B (GSM8K fine-tuned) | 0.6B | fp16 | 1.4 GB | 47% | 10–13 tok/s | Good for prototyping |
20+
| Qwen3-1.7B (GSM8K fine-tuned) | 1.7B | fp16 | 3.8 GB | ~60% | 8–12 tok/s | Needs 12GB+ RAM phone |
21+
| Any Qwen2.5/Qwen3 model |||||| Bring your own |
22+
23+
### Accuracy by Category (Qwen2.5-Math-1.5B-Instruct)
24+
25+
| Category | Accuracy | Example Problem |
26+
|----------|----------|-----------------|
27+
| Arithmetic | ~95% | "What is 47 x 86?" |
28+
| Fractions and Decimals | ~90% | "What is 2/3 + 3/4?" |
29+
| Percentages | ~88% | "A $80 shirt is 25% off. Final price?" |
30+
| Ratios and Proportions | ~85% | "Ratio 3:5, if 15 cats how many dogs?" |
31+
| Linear Equations | ~82% | "Solve 3x - 7 = 14" |
32+
| Geometry (area/volume) | ~80% | "Cylinder r=5, h=12, volume?" |
33+
| Rate/Speed/Work | ~75% | "Train A at 60mph, Train B at 80mph..." |
34+
| Probability | ~70% | "Draw 2 cards, P(both aces)?" |
35+
| Multi-step Word Problems | ~78% | "Buy-2-get-1-free + 10% coupon..." |
36+
| Combinatorics | ~55% | "Arrangements of MISSISSIPPI?" |
37+
38+
## Quick Start
39+
40+
### Option 1: Use Pre-trained Qwen2.5-Math (Recommended)
41+
42+
```bash
43+
# 1. Download model from HuggingFace
44+
pip install huggingface_hub
45+
python -c "
46+
from huggingface_hub import snapshot_download
47+
snapshot_download('Qwen/Qwen2.5-Math-1.5B-Instruct', local_dir='./qwen2.5-math-1.5b-hf')
48+
"
49+
50+
# 2. Convert weights to Meta/Llama format
51+
python -m executorch.examples.models.qwen2_5.convert_weights \
52+
./qwen2.5-math-1.5b-hf \
53+
./qwen2.5-math-1.5b-meta.pth
54+
55+
# 3. Export to .pte with XNNPACK + 8da4w quantization + lazy KV cache
56+
python -m executorch.examples.models.llama.export_llama \
57+
--model qwen2_5_1_5b \
58+
-c ./qwen2.5-math-1.5b-meta.pth \
59+
-p examples/models/qwen2_5/config/1_5b_config.json \
60+
--max_context_length 4096 \
61+
-kv --use_sdpa_with_kv_cache \
62+
-X --xnnpack-extended-ops \
63+
--pt2e_quantize xnnpack_dynamic_qc4 \
64+
--lazy_kv_cache \
65+
--metadata '{"get_bos_id":151643, "get_eos_ids":[151645,151643]}' \
66+
-o ./
67+
68+
# 4. Push to phone
69+
adb shell mkdir -p /data/local/tmp/llama
70+
adb push qwen2_5_1_5b_h.pte /data/local/tmp/llama/model.pte
71+
adb push ./qwen2.5-math-1.5b-hf/tokenizer.json /data/local/tmp/llama/tokenizer.json
72+
```
73+
74+
### Option 2: Use Qwen3-0.6B (Smaller, faster export)
75+
76+
```bash
77+
# 1. Download
78+
python -c "
79+
from huggingface_hub import snapshot_download
80+
snapshot_download('Qwen/Qwen3-0.6B', local_dir='./qwen3-0.6b-hf')
81+
"
82+
83+
# 2. Convert
84+
python -m executorch.examples.models.qwen3.convert_weights \
85+
./qwen3-0.6b-hf \
86+
./qwen3-0.6b-meta.bin
87+
88+
# 3. Export (fp16, no quantization)
89+
python -m executorch.examples.models.llama.export_llama \
90+
--model qwen3_0_6b \
91+
-c ./qwen3-0.6b-meta.bin \
92+
-p examples/models/qwen3/config/0_6b_config.json \
93+
--max_context_length 4096 \
94+
-kv --use_sdpa_with_kv_cache \
95+
-X --dtype fp16 \
96+
--lazy_kv_cache \
97+
--metadata '{"get_bos_id":199999, "get_eos_ids":[200020,199999]}' \
98+
-o ./
99+
100+
# 4. Push to phone
101+
adb push qwen3_0_6b_h.pte /data/local/tmp/llama/model.pte
102+
adb push ./qwen3-0.6b-hf/tokenizer.json /data/local/tmp/llama/tokenizer.json
103+
```
104+
105+
### Option 3: Fine-tune Your Own Math Model
106+
107+
You can fine-tune any small model on math datasets for better accuracy:
108+
109+
```bash
110+
# Fine-tune Qwen3-0.6B on GSM8K (requires GPU, ~10 min on A100)
111+
pip install transformers datasets trl
112+
113+
python -c "
114+
from datasets import load_dataset
115+
from transformers import AutoModelForCausalLM, AutoTokenizer
116+
from trl import SFTTrainer, SFTConfig
117+
118+
model = AutoModelForCausalLM.from_pretrained('Qwen/Qwen3-0.6B', torch_dtype='bfloat16')
119+
tokenizer = AutoTokenizer.from_pretrained('Qwen/Qwen3-0.6B')
120+
dataset = load_dataset('openai/gsm8k', 'main', split='train')
121+
122+
def format_gsm8k(example):
123+
return {'text': '<|im_start|>user\n' + example['question'] + '<|im_end|>\n<|im_start|>assistant\n' + example['answer'] + '<|im_end|>'}
124+
125+
dataset = dataset.map(format_gsm8k)
126+
127+
trainer = SFTTrainer(
128+
model=model,
129+
train_dataset=dataset,
130+
args=SFTConfig(output_dir='./gsm8k-finetuned', num_train_epochs=3, per_device_train_batch_size=8, learning_rate=2e-5, bf16=True),
131+
dataset_text_field='text',
132+
)
133+
trainer.train()
134+
model.save_pretrained('./gsm8k-finetuned')
135+
tokenizer.save_pretrained('./gsm8k-finetuned')
136+
"
137+
138+
# Then convert + export using the same steps as Option 2,
139+
# but point -c to ./gsm8k-finetuned instead
140+
```
141+
142+
Other math datasets to consider: `hendrycks/competition_math` (MATH), `deepmind/math_dataset`, `microsoft/orca-math-word-problems`.
143+
144+
### Option 4: Bring Any Custom Model
145+
146+
Any Qwen2.5 or Qwen3 model works. The app expects:
147+
- `.pte` file at `/data/local/tmp/llama/model.pte`
148+
- `tokenizer.json` at `/data/local/tmp/llama/tokenizer.json`
149+
- Model that responds to Qwen chat template (`<|im_start|>user\n...<|im_end|>\n<|im_start|>assistant\n`)
150+
151+
## Build the Android App
152+
153+
### Prerequisites
154+
155+
- Android SDK (API 28+)
156+
- Android NDK 29+
157+
- ExecuTorch AAR (pre-built or build from source)
158+
159+
### Using Pre-built AAR
160+
161+
```bash
162+
cd llm/android/MathPal
163+
ANDROID_HOME=/path/to/android/sdk ./gradlew assembleDebug
164+
adb install -r app/build/outputs/apk/debug/app-debug.apk
165+
```
166+
167+
### Using Local AAR (with Lazy KV Cache)
168+
169+
To use `--lazy_kv_cache`, build a custom AAR with `EXECUTORCH_ENABLE_DYNAMIC_ALLOCATOR=ON`:
170+
171+
```bash
172+
# 1. Build native libraries
173+
cd /path/to/executorch
174+
cmake . -DCMAKE_INSTALL_PREFIX=cmake-out-android-arm64-v8a \
175+
-DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
176+
--preset android-arm64-v8a \
177+
-DANDROID_PLATFORM=android-26 \
178+
-DEXECUTORCH_BUILD_EXTENSION_LLM=ON \
179+
-DEXECUTORCH_BUILD_LLAMA_JNI=ON \
180+
-DEXECUTORCH_ENABLE_DYNAMIC_ALLOCATOR=ON \
181+
-DCMAKE_BUILD_TYPE=Release \
182+
-Bcmake-out-android-arm64-v8a
183+
cmake --build cmake-out-android-arm64-v8a -j$(nproc) --target install
184+
185+
# 2. Stage .so
186+
mkdir -p cmake-out-android-so/arm64-v8a
187+
cp cmake-out-android-arm64-v8a/extension/android/*.so cmake-out-android-so/arm64-v8a/libexecutorch.so
188+
$ANDROID_NDK/toolchains/llvm/prebuilt/*/bin/llvm-strip cmake-out-android-so/arm64-v8a/libexecutorch.so
189+
190+
# 3. Build AAR
191+
cd extension/android
192+
ANDROID_HOME=/path/to/android/sdk ./gradlew :executorch_android:assembleDebug
193+
194+
# 4. Build MathPal with local AAR
195+
mkdir -p /path/to/MathPal/app/libs
196+
cp executorch_android/build/outputs/aar/executorch_android-debug.aar /path/to/MathPal/app/libs/executorch.aar
197+
cd /path/to/MathPal
198+
ANDROID_HOME=/path/to/android/sdk ./gradlew assembleDebug -PuseLocalAar=true
199+
```
200+
201+
## Lazy KV Cache (DYNAMIC_UNBOUND)
202+
203+
MathPal leverages `--lazy_kv_cache` which marks KV cache buffers as `DYNAMIC_UNBOUND`. The KV cache is not allocated until the first inference call:
204+
205+
| Phase | Without lazy KV cache | With lazy KV cache |
206+
|-------|----------------------|-------------------|
207+
| Model load | ~2150 MiB | ~100 MiB |
208+
| First inference | ~2150 MiB | ~1730 MiB |
209+
| 10+ turns | ~2150 MiB | ~1730 MiB (stable) |
210+
211+
KV cache cost by context length (Qwen3-0.6B, fp16):
212+
213+
| max_context_length | KV Cache Size | At load without PR | At load with PR |
214+
|---|---|---|---|
215+
| 128 (default) | 14 MB | 14 MB pre-allocated | 0 MB |
216+
| 1024 | 115 MB | 115 MB pre-allocated | 0 MB |
217+
| 2048 (standard) | 229 MB | 229 MB pre-allocated | 0 MB |
218+
| 4096 (our test) | 459 MB | 459 MB pre-allocated | 0 MB |
219+
| 16384 | 1.8 GB | OOM at load | 0 MB |
220+
221+
Requires AAR built with `-DEXECUTORCH_ENABLE_DYNAMIC_ALLOCATOR=ON`.
222+
223+
## App Architecture
224+
225+
```
226+
com.mathpal.app/
227+
├── MathPalActivity.kt # Single activity, Compose navigation
228+
├── MathPalApplication.kt # Lifecycle, memory management
229+
├── data/
230+
│ ├── model/ # Problem, Badge, BossBattle data classes
231+
│ ├── db/ # SQLite (XP, streaks, progress)
232+
│ └── repository/ # MathRepository
233+
├── inference/
234+
│ ├── InferenceEngine.kt # ExecuTorch LlmModule wrapper
235+
│ ├── PromptFormatter.kt # Qwen chat template
236+
│ ├── AnswerValidator.kt # #### and \boxed{} parsing
237+
│ └── StepParser.kt # Token stream to step cards
238+
├── ui/
239+
│ ├── home/ # Ask Anything + daily challenge
240+
│ ├── solve/ # Reasoning card + answer (MathViewModel)
241+
│ ├── practice/ # Grade 4-8 problem bank
242+
│ ├── progress/ # Stats and topic mastery
243+
│ ├── components/ # StepCard, StreakBadge
244+
│ └── theme/ # Material3 theme
245+
└── gamification/
246+
├── XPManager.kt # 50 levels, 6 tiers
247+
├── StreakManager.kt # Daily streaks with grace periods
248+
├── BossManager.kt # 8 math boss battles
249+
├── BadgeManager.kt # 30 achievements
250+
└── ProblemBank.kt # 40 problems (Grade 4-8)
251+
```
252+
253+
## Tested Hardware
254+
255+
| Device | RAM | Model | Result |
256+
|--------|-----|-------|--------|
257+
| Samsung Galaxy S23 | 8 GB | Qwen2.5-Math-1.5B (8da4w) | 10+ turns, 28 tok/s |
258+
| Samsung Galaxy S23 | 8 GB | Qwen3-0.6B (fp16) | 10+ turns, 12 tok/s |
259+
| Samsung Galaxy S23 | 8 GB | Qwen3-1.7B (fp16) + lazy KV | Works with lazy allocation |
260+
261+
## License
262+
263+
BSD-style license. See LICENSE file in the root directory.

0 commit comments

Comments
 (0)