Skip to content

Commit f019b92

Browse files
author
ssjia
committed
Update on "[ET-VK] Add symint infrastructure to VulkanBackend and ComputeGraph"
Extend the Vulkan backend runtime infrastructure to better support symbolic integer (symint) arguments. This is a prerequisite for operators that need to handle dynamic shapes via symint values. Changes: - VulkanBackend.cpp: Compute output offset from end of args instead of assuming outputs follow inputs directly. Add scalar-to-tensor input handling so that Int/Bool EValues can populate tensor inputs. Support symint inputs provided as raw Int EValues (not just scalar tensors). Add symint output handling to write values back as tensor or Int EValue. - ComputeGraph.h: Add SymInt case to extract_scalar<T>() so operators can transparently read symint values as scalars. - ComputeGraph.cpp: Add Int fallback in read_symint() so values stored as plain Int (rather than SymInt objects) can be read uniformly. Differential Revision: [D95970167](https://our.internmc.facebook.com/intern/diff/D95970167/) cc manuelcandales digantdesai cbilgin [ghstack-poisoned]
2 parents 3070b7a + 2812ecc commit f019b92

200 files changed

Lines changed: 8780 additions & 3345 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.ci/scripts/test_wheel_package_qnn.sh

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,9 @@ import argparse
1818
1919
import torch
2020
from executorch.backends.qualcomm.quantizer.quantizer import QnnQuantizer
21+
from executorch.backends.qualcomm.serialization.qc_schema import (
22+
QnnExecuTorchBackendType,
23+
)
2124
from executorch.backends.qualcomm.utils.utils import (
2225
generate_htp_compiler_spec,
2326
generate_qnn_executorch_compiler_spec,
@@ -50,7 +53,7 @@ def main() -> None:
5053
example_inputs = model.get_example_inputs()
5154
5255
if args.quantization:
53-
quantizer = QnnQuantizer()
56+
quantizer = QnnQuantizer(backend=QnnExecuTorchBackendType.kHtpBackend, soc_model=get_soc_to_chipset_map()[args.soc])
5457
m = torch.export.export(model.eval(), example_inputs, strict=True).module()
5558
if args.quantization == "qat":
5659
m = prepare_qat_pt2e(m, quantizer)

.claude/skills/building/SKILL.md

Lines changed: 211 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,223 @@
11
---
22
name: building
3-
description: Build ExecuTorch runners or C++ libraries. Use when compiling runners for Llama, Whisper, or other models, or building the C++ runtime.
3+
description: Build ExecuTorch from source — Python package, C++ runtime, runners, cross-compilation, and backend-specific builds. Use when compiling anything in the ExecuTorch repo, diagnosing build failures, or setting up platform-specific builds.
44
---
55

6-
# Building
6+
# Building ExecuTorch
77

8-
## Runners (Makefile)
8+
## Step 1: Ensure Python environment (detect and fix automatically)
9+
10+
**Path A — conda (preferred):**
11+
```bash
12+
# Initialize conda for non-interactive shells (required in Claude Code / CI)
13+
eval "$(conda shell.bash hook 2>/dev/null)"
14+
15+
# Check if executorch conda env exists; create if not
16+
conda env list 2>/dev/null | grep executorch || \
17+
ls "$(conda info --base 2>/dev/null)/envs/" 2>/dev/null | grep executorch || \
18+
conda create -yn executorch python=3.12
19+
20+
# Activate
21+
conda activate executorch
22+
```
23+
24+
**Path B — no conda (fall back to venv):**
25+
```bash
26+
# Find a compatible Python (3.10–3.13). On macOS with only Homebrew Python 3.14+,
27+
# install a compatible version first: brew install python@3.12
28+
python3.12 -m venv .executorch-venv # or python3.11, python3.10, python3.13
29+
source .executorch-venv/bin/activate
30+
pip install --upgrade pip
31+
```
32+
33+
**Then verify (either path):**
34+
35+
Run `python --version` and `cmake --version`. Fix automatically:
36+
- **Python not 3.10–3.13**: recreate the env with a correct Python version.
37+
- **cmake missing or < 3.24**: run `pip install 'cmake>=3.24'` inside the env.
38+
- **cmake >= 4.0**: works in practice, no action needed.
39+
40+
Parallel jobs: `$(sysctl -n hw.ncpu)` on macOS, `$(nproc)` on Linux.
41+
42+
## Step 2: Build
43+
44+
Route based on what the user asks for:
45+
- User mentions **Android** → skip to [Cross-compilation: Android](#cross-compilation)
46+
- User mentions **iOS** or **frameworks** → skip to [Cross-compilation: iOS](#cross-compilation)
47+
- User mentions a **model name** (llama, whisper, etc.) → skip to [LLM / ASR model runner](#llm--asr-model-runner-simplest-path-for-running-models)
48+
- User mentions **C++ runtime** or **cmake** → skip to [C++ runtime](#c-runtime-standalone)
49+
- Otherwise → default to **Python package** below
50+
51+
### Python package (default)
952
```bash
10-
make help # list all targets
11-
make llama-cpu # Llama
12-
make whisper-metal # Whisper on Metal
13-
make gemma3-cuda # Gemma3 on CUDA
53+
conda activate executorch
54+
./install_executorch.sh --editable # editable install from source
1455
```
56+
This handles everything: submodules, deps, C++ build, Python install. Takes ~10 min on Apple Silicon.
57+
58+
For subsequent rebuilds (deps already present): `pip install -e . --no-build-isolation`
59+
60+
For minimal install (skip example deps): `./install_executorch.sh --minimal`
61+
62+
Enable additional backends:
63+
```bash
64+
CMAKE_ARGS="-DEXECUTORCH_BUILD_COREML=ON -DEXECUTORCH_BUILD_MPS=ON" ./install_executorch.sh --editable
65+
```
66+
67+
Verify: `python -c "from executorch.exir import to_edge_transform_and_lower; print('OK')"`
68+
69+
### LLM / ASR model runner (simplest path for running models)
70+
71+
```bash
72+
conda activate executorch
73+
make <model>-<backend>
74+
```
75+
76+
Available targets (run `make help` for full list):
77+
78+
| Target | Backend | macOS | Linux |
79+
|--------|---------|-------|-------|
80+
| `llama-cpu` | CPU | yes | yes |
81+
| `llama-cuda` | CUDA || yes |
82+
| `llama-cuda-debug` | CUDA (debug) || yes |
83+
| `llava-cpu` | CPU | yes | yes |
84+
| `whisper-cpu` | CPU | yes | yes |
85+
| `whisper-metal` | Metal | yes ||
86+
| `whisper-cuda` | CUDA || yes |
87+
| `parakeet-cpu` | CPU | yes | yes |
88+
| `parakeet-metal` | Metal | yes ||
89+
| `parakeet-cuda` | CUDA || yes |
90+
| `voxtral-cpu` | CPU | yes | yes |
91+
| `voxtral-cuda` | CUDA || yes |
92+
| `voxtral-metal` | Metal | yes ||
93+
| `voxtral_realtime-cpu` | CPU | yes | yes |
94+
| `voxtral_realtime-cuda` | CUDA || yes |
95+
| `voxtral_realtime-metal` | Metal | yes ||
96+
| `gemma3-cpu` | CPU | yes | yes |
97+
| `gemma3-cuda` | CUDA || yes |
98+
| `sortformer-cpu` | CPU | yes | yes |
99+
| `sortformer-cuda` | CUDA || yes |
100+
| `silero-vad-cpu` | CPU | yes | yes |
101+
| `clean` || yes | yes |
15102

16103
Output: `cmake-out/examples/models/<model>/<runner>`
17104

18-
## C++ Libraries (CMake)
105+
### C++ runtime (standalone)
106+
107+
**With presets (recommended):**
108+
109+
| Platform | Command |
110+
|----------|---------|
111+
| macOS | `cmake -B cmake-out --preset macos` (uses Xcode generator — requires Xcode) |
112+
| Linux | `cmake -B cmake-out --preset linux -DCMAKE_BUILD_TYPE=Release` |
113+
| Windows | `cmake -B cmake-out --preset windows -T ClangCL` |
114+
115+
Then: `cmake --build cmake-out --config Release -j$(sysctl -n hw.ncpu)` (macOS) or `cmake --build cmake-out -j$(nproc)` (Linux)
116+
117+
**LLM libraries via workflow presets** (configure + build + install in one command):
118+
```bash
119+
cmake --workflow --preset llm-release # CPU
120+
cmake --workflow --preset llm-release-metal # Metal (macOS)
121+
cmake --workflow --preset llm-release-cuda # CUDA (Linux/Windows)
122+
```
123+
124+
**Manual CMake (custom flags):**
125+
```bash
126+
cmake -B cmake-out \
127+
-DCMAKE_BUILD_TYPE=Release \
128+
-DEXECUTORCH_BUILD_XNNPACK=ON \
129+
-DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
130+
-DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
131+
-DEXECUTORCH_BUILD_EXTENSION_FLAT_TENSOR=ON \
132+
-DEXECUTORCH_BUILD_EXTENSION_NAMED_DATA_MAP=ON \
133+
-DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
134+
-DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON
135+
cmake --build cmake-out --parallel "$(nproc 2>/dev/null || sysctl -n hw.ncpu)"
136+
```
137+
138+
Run `cmake --list-presets` to see all available presets.
139+
140+
### Cross-compilation
141+
142+
**iOS/macOS frameworks:**
143+
```bash
144+
./scripts/build_apple_frameworks.sh --coreml --mps --xnnpack
145+
```
146+
Link in Xcode with `-all_load` linker flag.
147+
148+
**Android:**
149+
150+
Requires `ANDROID_NDK` on PATH (typically set by Android Studio or standalone NDK install).
19151
```bash
20-
cmake --list-presets # list presets
21-
cmake --workflow --preset llm-release # LLM CPU
22-
cmake --workflow --preset llm-release-metal # LLM Metal
152+
# Verify NDK is available
153+
echo $ANDROID_NDK # must point to NDK root, e.g. ~/Library/Android/sdk/ndk/<version>
154+
export ANDROID_ABIS=arm64-v8a BUILD_AAR_DIR=aar-out
155+
mkdir -p $BUILD_AAR_DIR && sh scripts/build_android_library.sh
23156
```
157+
158+
## Key build options
159+
160+
Most commonly needed flags (full list: `CMakeLists.txt`):
161+
162+
| Flag | What it enables |
163+
|------|-----------------|
164+
| `EXECUTORCH_BUILD_XNNPACK` | XNNPACK CPU backend |
165+
| `EXECUTORCH_BUILD_COREML` | Core ML (macOS/iOS) |
166+
| `EXECUTORCH_BUILD_MPS` | MPS GPU (macOS/iOS) |
167+
| `EXECUTORCH_BUILD_METAL` | Metal compute (macOS, requires EXTENSION_TENSOR) |
168+
| `EXECUTORCH_BUILD_CUDA` | CUDA GPU (Linux/Windows, requires EXTENSION_TENSOR) |
169+
| `EXECUTORCH_BUILD_KERNELS_OPTIMIZED` | Optimized kernels |
170+
| `EXECUTORCH_BUILD_KERNELS_QUANTIZED` | Quantized kernels |
171+
| `EXECUTORCH_BUILD_EXTENSION_MODULE` | Module extension (requires DATA_LOADER + FLAT_TENSOR + NAMED_DATA_MAP) |
172+
| `EXECUTORCH_BUILD_EXTENSION_LLM` | LLM extension |
173+
| `EXECUTORCH_BUILD_TESTS` | Unit tests (`ctest --test-dir cmake-out --output-on-failure`) |
174+
| `EXECUTORCH_BUILD_DEVTOOLS` | DevTools (Inspector, ETDump) |
175+
| `EXECUTORCH_OPTIMIZE_SIZE` | Size-optimized build (`-Os`, no exceptions/RTTI) |
176+
| `CMAKE_BUILD_TYPE` | `Release` or `Debug` (5-10x slower). Some presets (e.g. `llm-release`) set this; others require it explicitly. |
177+
178+
## Troubleshooting
179+
180+
| Symptom | Fix |
181+
|---------|-----|
182+
| Missing headers / `CMakeLists.txt not found` in third-party | `git submodule sync --recursive && git submodule update --init --recursive` |
183+
| Mysterious failures after `git pull` or branch switch | `rm -rf cmake-out/ pip-out/ && git submodule sync && git submodule update --init --recursive` |
184+
| `conda env list` PermissionError | Use `CONDA_NO_PLUGINS=true conda env list` or check env dir directly |
185+
| CMake >= 4.0 | Works in practice despite `< 4.0` in docs; only fix if build actually fails |
186+
| `externally-managed-environment` / PEP 668 error | You're using system Python, not conda. Activate conda env first. |
187+
| pip conflicts with torch versions | Fresh conda env; or `./install_executorch.sh --use-pt-pinned-commit` |
188+
| Missing `Python.h` (Linux) | `sudo apt install python3.X-dev` |
189+
| Missing operator registrations at runtime | Link kernel libs with `-Wl,-force_load,<lib>` (macOS) or `-Wl,--whole-archive <lib> -Wl,--no-whole-archive` (Linux) |
190+
| `install_executorch.sh` fails on Intel Mac | No prebuilt PyTorch wheels; use `--use-pt-pinned-commit --minimal` |
191+
| XNNPACK build errors about cpuinfo/pthreadpool | Ensure `EXECUTORCH_BUILD_CPUINFO=ON` and `EXECUTORCH_BUILD_PTHREADPOOL=ON` (both ON by default) |
192+
| Duplicate kernel registration abort | Only link one `gen_operators_lib` per target |
193+
194+
## Build output
195+
196+
**From `./install_executorch.sh` (Python package):**
197+
198+
| Artifact | Location |
199+
|----------|----------|
200+
| Python package | `site-packages/executorch` |
201+
202+
**From CMake builds** (`cmake --install` with `CMAKE_INSTALL_PREFIX=cmake-out`):
203+
204+
| Artifact | Location |
205+
|----------|----------|
206+
| Core runtime | `cmake-out/lib/libexecutorch.a` |
207+
| XNNPACK backend | `cmake-out/lib/libxnnpack_backend.a` |
208+
| executor_runner | `cmake-out/executor_runner` (Ninja/Make) or `cmake-out/Release/executor_runner` (Xcode) |
209+
| Model runners | `cmake-out/examples/models/<model>/<runner>` |
210+
211+
**From cross-compilation:**
212+
213+
| Artifact | Location |
214+
|----------|----------|
215+
| iOS frameworks | `cmake-out/*.xcframework` |
216+
| Android AAR | `aar-out/` |
217+
218+
## Tips
219+
- Always use `Release` for benchmarking; `Debug` is 5–10x slower
220+
- `ccache` is auto-detected if installed (`brew install ccache`)
221+
- `Ninja` is faster than Make (`-G Ninja`) — but `--preset macos` uses Xcode generator
222+
- For LLM workflows, `make <model>-<backend>` is the simplest path
223+
- After `git pull`, clean and re-init submodules before rebuilding

.github/workflows/pull.yml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1057,7 +1057,8 @@ jobs:
10571057
10581058
test-samsung-quantmodels-linux:
10591059
name: test-samsung-quantmodels-linux
1060-
# if: github.event.pull_request.head.repo.full_name == github.repository || github.event_name != 'pull_request'
1060+
# Skip this job if the pull request is from a fork (secrets are not available)
1061+
if: github.event.pull_request.head.repo.full_name == github.repository || github.event_name != 'pull_request'
10611062
uses: pytorch/test-infra/.github/workflows/linux_job_v2.yml@main
10621063
permissions:
10631064
id-token: write
@@ -1094,7 +1095,8 @@ jobs:
10941095
10951096
test-samsung-models-linux:
10961097
name: test-samsung-models-linux
1097-
# if: github.event.pull_request.head.repo.full_name == github.repository || github.event_name != 'pull_request'
1098+
# Skip this job if the pull request is from a fork (secrets are not available)
1099+
if: github.event.pull_request.head.repo.full_name == github.repository || github.event_name != 'pull_request'
10981100
uses: pytorch/test-infra/.github/workflows/linux_job_v2.yml@main
10991101
permissions:
11001102
id-token: write

backends/arm/CMakeLists.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ set(ETHOSU_LINUX_DRIVER_GIT_REPO
2929
CACHE STRING "Git repository that hosts the Ethos-U Linux driver stack"
3030
)
3131
set(ETHOSU_LINUX_DRIVER_GIT_TAG
32-
"25.11"
32+
"26.02"
3333
CACHE STRING
3434
"Git tag/branch/commit used to fetch the Ethos-U Linux driver stack"
3535
)

backends/arm/MODELS.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
1-
# The following file contains all models that have been confirmed to be functional and tested for the Arm backend :
1+
<!-- Copyright 2025-2026 Arm Limited and/or its affiliates. -->
2+
# The following file contains all models that have been confirmed to be functional and tested for the Arm backend:
23
- Conformer
34
- Deit Tiny
45
- DeepLab v3 (DL3)
@@ -12,6 +13,7 @@
1213
- Some popular torch.nn.modules models (NN modules)
1314
- Some popular torch ops (Torch Functions)
1415
- Neural Super Sampler (NSS)
16+
- Phi-3
1517
- ResNet 18
1618
- Wav2Letter (W2L)
1719
- Stable Diffusion:

backends/arm/_passes/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -135,6 +135,8 @@
135135
from .rewrite_index_put_pass import RewriteIndexPutPass # noqa
136136
from .rewrite_le_lt_to_ge_gt_pass import RewriteLeLtToGeGtPass # noqa
137137
from .rewrite_matmul import RewriteMatmulPass # noqa
138+
from .rewrite_pad import RewritePadPass # noqa
139+
from .rewrite_slice import RewriteSlicePass # noqa
138140
from .rewrite_upsample import RewriteUpsamplePass # noqa
139141
from .scalars_to_attribute_pass import ScalarsToAttributePass # noqa
140142
from .size_adjust_input_pass import SizeAdjustInputPass # noqa

backends/arm/_passes/arm_pass.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -107,7 +107,7 @@ def call_submodule(
107107
return result
108108

109109
def call_shape_operator(
110-
self, op, args: tuple, kwargs: dict, meta: NodeMetadata, update: bool
110+
self, op, args: tuple, kwargs: dict, meta: NodeMetadata, updated: bool = True
111111
) -> ProxyValue:
112112
"""Call operator for shape-producing operators.
113113
@@ -123,4 +123,4 @@ def call_shape_operator(
123123
shape_meta.data = dict(meta.data)
124124
shape_meta.data[TosaSpecialDtype.meta_key()] = TosaSpecialDtype.SHAPE
125125
# Call the super (ArmPass) call operator with updated meta
126-
return self.call_operator(op, args, kwargs, shape_meta, update)
126+
return self.call_operator(op, args, kwargs, shape_meta, updated)

backends/arm/_passes/arm_pass_manager.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -120,6 +120,8 @@
120120
RewriteIndexPutPass,
121121
RewriteLeLtToGeGtPass,
122122
RewriteMatmulPass,
123+
RewritePadPass,
124+
RewriteSlicePass,
123125
RewriteUpsamplePass,
124126
ScalarsToAttributePass,
125127
SizeAdjustInputPass,
@@ -372,6 +374,8 @@ def _tosa_pipeline(
372374
RewriteUpsamplePass(),
373375
RewriteConvPass(exported_program),
374376
RewriteMatmulPass(),
377+
RewritePadPass(),
378+
RewriteSlicePass(),
375379
]
376380
)
377381

0 commit comments

Comments
 (0)