|
| 1 | +# MathPal — 8th Grade Math Tutor |
| 2 | + |
| 3 | +An on-device AI math tutor for Android, powered by ExecuTorch. Runs entirely on the phone — no internet, no accounts, no data collection. |
| 4 | + |
| 5 | +## Features |
| 6 | + |
| 7 | +- **Ask Anything** — type or speak any math problem, get step-by-step solutions |
| 8 | +- **Practice Mode** — 40 grade-leveled problems (Grade 4–8) across 10 categories |
| 9 | +- **On-Device AI** — model runs locally via ExecuTorch + XNNPACK, works offline |
| 10 | +- **Lazy KV Cache** — uses `DYNAMIC_UNBOUND` allocation to defer ~500MB of KV cache memory until first inference ([PR #18350](https://github.com/pytorch/executorch/pull/18350)) |
| 11 | +- **Voice Input** — tap the mic icon to speak your math problem |
| 12 | +- **Answer Validation** — parses `####` and `\boxed{}` answer formats automatically |
| 13 | + |
| 14 | +## Supported Models |
| 15 | + |
| 16 | +| Model | Params | Quantization | .pte Size | GSM8K Accuracy | Speed (S23) | Recommended | |
| 17 | +|-------|--------|-------------|-----------|---------------|-------------|-------------| |
| 18 | +| Qwen2.5-Math-1.5B-Instruct | 1.5B | 8da4w | 1.6 GB | **83%** | 28–37 tok/s | Best accuracy | |
| 19 | +| Qwen3-0.6B (GSM8K fine-tuned) | 0.6B | fp16 | 1.4 GB | 47% | 10–13 tok/s | Good for prototyping | |
| 20 | +| Qwen3-1.7B (GSM8K fine-tuned) | 1.7B | fp16 | 3.8 GB | ~60% | 8–12 tok/s | Needs 12GB+ RAM phone | |
| 21 | +| Any Qwen2.5/Qwen3 model | — | — | — | — | — | Bring your own | |
| 22 | + |
| 23 | +### Accuracy by Category (Qwen2.5-Math-1.5B-Instruct) |
| 24 | + |
| 25 | +| Category | Accuracy | Example Problem | |
| 26 | +|----------|----------|-----------------| |
| 27 | +| Arithmetic | ~95% | "What is 47 x 86?" | |
| 28 | +| Fractions and Decimals | ~90% | "What is 2/3 + 3/4?" | |
| 29 | +| Percentages | ~88% | "A $80 shirt is 25% off. Final price?" | |
| 30 | +| Ratios and Proportions | ~85% | "Ratio 3:5, if 15 cats how many dogs?" | |
| 31 | +| Linear Equations | ~82% | "Solve 3x - 7 = 14" | |
| 32 | +| Geometry (area/volume) | ~80% | "Cylinder r=5, h=12, volume?" | |
| 33 | +| Rate/Speed/Work | ~75% | "Train A at 60mph, Train B at 80mph..." | |
| 34 | +| Probability | ~70% | "Draw 2 cards, P(both aces)?" | |
| 35 | +| Multi-step Word Problems | ~78% | "Buy-2-get-1-free + 10% coupon..." | |
| 36 | +| Combinatorics | ~55% | "Arrangements of MISSISSIPPI?" | |
| 37 | + |
| 38 | +## Quick Start |
| 39 | + |
| 40 | +### Option 1: Use Pre-trained Qwen2.5-Math (Recommended) |
| 41 | + |
| 42 | +```bash |
| 43 | +# 1. Download model from HuggingFace |
| 44 | +pip install huggingface_hub |
| 45 | +python -c " |
| 46 | +from huggingface_hub import snapshot_download |
| 47 | +snapshot_download('Qwen/Qwen2.5-Math-1.5B-Instruct', local_dir='./qwen2.5-math-1.5b-hf') |
| 48 | +" |
| 49 | + |
| 50 | +# 2. Convert weights to Meta/Llama format |
| 51 | +python -m executorch.examples.models.qwen2_5.convert_weights \ |
| 52 | + ./qwen2.5-math-1.5b-hf \ |
| 53 | + ./qwen2.5-math-1.5b-meta.pth |
| 54 | + |
| 55 | +# 3. Export to .pte with XNNPACK + 8da4w quantization + lazy KV cache |
| 56 | +python -m executorch.examples.models.llama.export_llama \ |
| 57 | + --model qwen2_5_1_5b \ |
| 58 | + -c ./qwen2.5-math-1.5b-meta.pth \ |
| 59 | + -p examples/models/qwen2_5/config/1_5b_config.json \ |
| 60 | + --max_context_length 4096 \ |
| 61 | + -kv --use_sdpa_with_kv_cache \ |
| 62 | + -X --xnnpack-extended-ops \ |
| 63 | + --pt2e_quantize xnnpack_dynamic_qc4 \ |
| 64 | + --lazy_kv_cache \ |
| 65 | + --metadata '{"get_bos_id":151643, "get_eos_ids":[151645,151643]}' \ |
| 66 | + -o ./ |
| 67 | + |
| 68 | +# 4. Push to phone |
| 69 | +adb shell mkdir -p /data/local/tmp/llama |
| 70 | +adb push qwen2_5_1_5b_h.pte /data/local/tmp/llama/model.pte |
| 71 | +adb push ./qwen2.5-math-1.5b-hf/tokenizer.json /data/local/tmp/llama/tokenizer.json |
| 72 | +``` |
| 73 | + |
| 74 | +### Option 2: Use Qwen3-0.6B (Smaller, faster export) |
| 75 | + |
| 76 | +```bash |
| 77 | +# 1. Download |
| 78 | +python -c " |
| 79 | +from huggingface_hub import snapshot_download |
| 80 | +snapshot_download('Qwen/Qwen3-0.6B', local_dir='./qwen3-0.6b-hf') |
| 81 | +" |
| 82 | + |
| 83 | +# 2. Convert |
| 84 | +python -m executorch.examples.models.qwen3.convert_weights \ |
| 85 | + ./qwen3-0.6b-hf \ |
| 86 | + ./qwen3-0.6b-meta.bin |
| 87 | + |
| 88 | +# 3. Export (fp16, no quantization) |
| 89 | +python -m executorch.examples.models.llama.export_llama \ |
| 90 | + --model qwen3_0_6b \ |
| 91 | + -c ./qwen3-0.6b-meta.bin \ |
| 92 | + -p examples/models/qwen3/config/0_6b_config.json \ |
| 93 | + --max_context_length 4096 \ |
| 94 | + -kv --use_sdpa_with_kv_cache \ |
| 95 | + -X --dtype fp16 \ |
| 96 | + --lazy_kv_cache \ |
| 97 | + --metadata '{"get_bos_id":199999, "get_eos_ids":[200020,199999]}' \ |
| 98 | + -o ./ |
| 99 | + |
| 100 | +# 4. Push to phone |
| 101 | +adb push qwen3_0_6b_h.pte /data/local/tmp/llama/model.pte |
| 102 | +adb push ./qwen3-0.6b-hf/tokenizer.json /data/local/tmp/llama/tokenizer.json |
| 103 | +``` |
| 104 | + |
| 105 | +### Option 3: Fine-tune Your Own Math Model |
| 106 | + |
| 107 | +You can fine-tune any small model on math datasets for better accuracy: |
| 108 | + |
| 109 | +```bash |
| 110 | +# Fine-tune Qwen3-0.6B on GSM8K (requires GPU, ~10 min on A100) |
| 111 | +pip install transformers datasets trl |
| 112 | + |
| 113 | +python -c " |
| 114 | +from datasets import load_dataset |
| 115 | +from transformers import AutoModelForCausalLM, AutoTokenizer |
| 116 | +from trl import SFTTrainer, SFTConfig |
| 117 | +
|
| 118 | +model = AutoModelForCausalLM.from_pretrained('Qwen/Qwen3-0.6B', torch_dtype='bfloat16') |
| 119 | +tokenizer = AutoTokenizer.from_pretrained('Qwen/Qwen3-0.6B') |
| 120 | +dataset = load_dataset('openai/gsm8k', 'main', split='train') |
| 121 | +
|
| 122 | +def format_gsm8k(example): |
| 123 | + return {'text': '<|im_start|>user\n' + example['question'] + '<|im_end|>\n<|im_start|>assistant\n' + example['answer'] + '<|im_end|>'} |
| 124 | +
|
| 125 | +dataset = dataset.map(format_gsm8k) |
| 126 | +
|
| 127 | +trainer = SFTTrainer( |
| 128 | + model=model, |
| 129 | + train_dataset=dataset, |
| 130 | + args=SFTConfig(output_dir='./gsm8k-finetuned', num_train_epochs=3, per_device_train_batch_size=8, learning_rate=2e-5, bf16=True), |
| 131 | + dataset_text_field='text', |
| 132 | +) |
| 133 | +trainer.train() |
| 134 | +model.save_pretrained('./gsm8k-finetuned') |
| 135 | +tokenizer.save_pretrained('./gsm8k-finetuned') |
| 136 | +" |
| 137 | + |
| 138 | +# Then convert + export using the same steps as Option 2, |
| 139 | +# but point -c to ./gsm8k-finetuned instead |
| 140 | +``` |
| 141 | + |
| 142 | +Other math datasets to consider: `hendrycks/competition_math` (MATH), `deepmind/math_dataset`, `microsoft/orca-math-word-problems`. |
| 143 | + |
| 144 | +### Option 4: Bring Any Custom Model |
| 145 | + |
| 146 | +Any Qwen2.5 or Qwen3 model works. The app expects: |
| 147 | +- `.pte` file at `/data/local/tmp/llama/model.pte` |
| 148 | +- `tokenizer.json` at `/data/local/tmp/llama/tokenizer.json` |
| 149 | +- Model that responds to Qwen chat template (`<|im_start|>user\n...<|im_end|>\n<|im_start|>assistant\n`) |
| 150 | + |
| 151 | +## Build the Android App |
| 152 | + |
| 153 | +### Prerequisites |
| 154 | + |
| 155 | +- Android SDK (API 28+) |
| 156 | +- Android NDK 29+ |
| 157 | +- ExecuTorch AAR (pre-built or build from source) |
| 158 | + |
| 159 | +### Using Pre-built AAR |
| 160 | + |
| 161 | +```bash |
| 162 | +cd llm/android/MathPal |
| 163 | +ANDROID_HOME=/path/to/android/sdk ./gradlew assembleDebug |
| 164 | +adb install -r app/build/outputs/apk/debug/app-debug.apk |
| 165 | +``` |
| 166 | + |
| 167 | +### Using Local AAR (with Lazy KV Cache) |
| 168 | + |
| 169 | +To use `--lazy_kv_cache`, build a custom AAR with `EXECUTORCH_ENABLE_DYNAMIC_ALLOCATOR=ON`: |
| 170 | + |
| 171 | +```bash |
| 172 | +# 1. Build native libraries |
| 173 | +cd /path/to/executorch |
| 174 | +cmake . -DCMAKE_INSTALL_PREFIX=cmake-out-android-arm64-v8a \ |
| 175 | + -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \ |
| 176 | + --preset android-arm64-v8a \ |
| 177 | + -DANDROID_PLATFORM=android-26 \ |
| 178 | + -DEXECUTORCH_BUILD_EXTENSION_LLM=ON \ |
| 179 | + -DEXECUTORCH_BUILD_LLAMA_JNI=ON \ |
| 180 | + -DEXECUTORCH_ENABLE_DYNAMIC_ALLOCATOR=ON \ |
| 181 | + -DCMAKE_BUILD_TYPE=Release \ |
| 182 | + -Bcmake-out-android-arm64-v8a |
| 183 | +cmake --build cmake-out-android-arm64-v8a -j$(nproc) --target install |
| 184 | + |
| 185 | +# 2. Stage .so |
| 186 | +mkdir -p cmake-out-android-so/arm64-v8a |
| 187 | +cp cmake-out-android-arm64-v8a/extension/android/*.so cmake-out-android-so/arm64-v8a/libexecutorch.so |
| 188 | +$ANDROID_NDK/toolchains/llvm/prebuilt/*/bin/llvm-strip cmake-out-android-so/arm64-v8a/libexecutorch.so |
| 189 | + |
| 190 | +# 3. Build AAR |
| 191 | +cd extension/android |
| 192 | +ANDROID_HOME=/path/to/android/sdk ./gradlew :executorch_android:assembleDebug |
| 193 | + |
| 194 | +# 4. Build MathPal with local AAR |
| 195 | +mkdir -p /path/to/MathPal/app/libs |
| 196 | +cp executorch_android/build/outputs/aar/executorch_android-debug.aar /path/to/MathPal/app/libs/executorch.aar |
| 197 | +cd /path/to/MathPal |
| 198 | +ANDROID_HOME=/path/to/android/sdk ./gradlew assembleDebug -PuseLocalAar=true |
| 199 | +``` |
| 200 | + |
| 201 | +## Lazy KV Cache (DYNAMIC_UNBOUND) |
| 202 | + |
| 203 | +MathPal leverages `--lazy_kv_cache` which marks KV cache buffers as `DYNAMIC_UNBOUND`. The KV cache is not allocated until the first inference call: |
| 204 | + |
| 205 | +| Phase | Without lazy KV cache | With lazy KV cache | |
| 206 | +|-------|----------------------|-------------------| |
| 207 | +| Model load | ~2150 MiB | ~100 MiB | |
| 208 | +| First inference | ~2150 MiB | ~1730 MiB | |
| 209 | +| 10+ turns | ~2150 MiB | ~1730 MiB (stable) | |
| 210 | + |
| 211 | +KV cache cost by context length (Qwen3-0.6B, fp16): |
| 212 | + |
| 213 | +| max_context_length | KV Cache Size | At load without PR | At load with PR | |
| 214 | +|---|---|---|---| |
| 215 | +| 128 (default) | 14 MB | 14 MB pre-allocated | 0 MB | |
| 216 | +| 1024 | 115 MB | 115 MB pre-allocated | 0 MB | |
| 217 | +| 2048 (standard) | 229 MB | 229 MB pre-allocated | 0 MB | |
| 218 | +| 4096 (our test) | 459 MB | 459 MB pre-allocated | 0 MB | |
| 219 | +| 16384 | 1.8 GB | OOM at load | 0 MB | |
| 220 | + |
| 221 | +Requires AAR built with `-DEXECUTORCH_ENABLE_DYNAMIC_ALLOCATOR=ON`. |
| 222 | + |
| 223 | +## App Architecture |
| 224 | + |
| 225 | +``` |
| 226 | +com.mathpal.app/ |
| 227 | +├── MathPalActivity.kt # Single activity, Compose navigation |
| 228 | +├── MathPalApplication.kt # Lifecycle, memory management |
| 229 | +├── data/ |
| 230 | +│ ├── model/ # Problem, Badge, BossBattle data classes |
| 231 | +│ ├── db/ # SQLite (XP, streaks, progress) |
| 232 | +│ └── repository/ # MathRepository |
| 233 | +├── inference/ |
| 234 | +│ ├── InferenceEngine.kt # ExecuTorch LlmModule wrapper |
| 235 | +│ ├── PromptFormatter.kt # Qwen chat template |
| 236 | +│ ├── AnswerValidator.kt # #### and \boxed{} parsing |
| 237 | +│ └── StepParser.kt # Token stream to step cards |
| 238 | +├── ui/ |
| 239 | +│ ├── home/ # Ask Anything + daily challenge |
| 240 | +│ ├── solve/ # Reasoning card + answer (MathViewModel) |
| 241 | +│ ├── practice/ # Grade 4-8 problem bank |
| 242 | +│ ├── progress/ # Stats and topic mastery |
| 243 | +│ ├── components/ # StepCard, StreakBadge |
| 244 | +│ └── theme/ # Material3 theme |
| 245 | +└── gamification/ |
| 246 | + ├── XPManager.kt # 50 levels, 6 tiers |
| 247 | + ├── StreakManager.kt # Daily streaks with grace periods |
| 248 | + ├── BossManager.kt # 8 math boss battles |
| 249 | + ├── BadgeManager.kt # 30 achievements |
| 250 | + └── ProblemBank.kt # 40 problems (Grade 4-8) |
| 251 | +``` |
| 252 | + |
| 253 | +## Tested Hardware |
| 254 | + |
| 255 | +| Device | RAM | Model | Result | |
| 256 | +|--------|-----|-------|--------| |
| 257 | +| Samsung Galaxy S23 | 8 GB | Qwen2.5-Math-1.5B (8da4w) | 10+ turns, 28 tok/s | |
| 258 | +| Samsung Galaxy S23 | 8 GB | Qwen3-0.6B (fp16) | 10+ turns, 12 tok/s | |
| 259 | +| Samsung Galaxy S23 | 8 GB | Qwen3-1.7B (fp16) + lazy KV | Works with lazy allocation | |
| 260 | + |
| 261 | +## License |
| 262 | + |
| 263 | +BSD-style license. See LICENSE file in the root directory. |
0 commit comments