Skip to content

Commit b3c2933

Browse files
committed
Update base for Update on "[ET Device Support] Parse device info from serialized tensor in tensor_parser"
Parse device info (device_type, device_index) from the serialized ExtraTensorInfo in .pte files into TensorImpl at runtime. When a tensor's extra_tensor_info contains device annotations (e.g., CUDA), the tensor parser now reads and propagates them to the TensorImpl constructor. Tensors without extra_tensor_info default to CPU/0 for backward compatibility with older PTE files.、 Differential Revision: [D97199497](https://our.internmc.facebook.com/intern/diff/D97199497/) [ghstack-poisoned]
2 parents fd3ae83 + 81bc830 commit b3c2933

248 files changed

Lines changed: 16533 additions & 4026 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.ci/scripts/test_qnn_static_llm.sh

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -47,11 +47,11 @@ if [[ "${TASK_NAME}" == "stories_110m" ]]; then
4747
$PYTHON_EXECUTABLE -m pytorch_tokenizers.tools.llama2c.convert -t tokenizer.model -o tokenizer.bin
4848

4949
# Compile only as weight sharing is not applicable on x86.
50-
$PYTHON_EXECUTABLE backends/qualcomm/tests/test_qnn_delegate.py -k TestExampleLLMScript.test_llama_stories_110m --model SM8650 --build_folder build-android/ --executorch_root . --artifact_dir ./stories_110m_pte_size --llama_artifacts . --compile_only
50+
$PYTHON_EXECUTABLE backends/qualcomm/tests/test_qnn_delegate.py -k TestExampleLLMScript.test_llama_stories_110m --soc_model SM8650 --build_folder build-android/ --executorch_root . --artifact_dir ./stories_110m_pte_size --llama_artifacts . --compile_only
5151
exit_code1=$?
5252

5353
# Checks accuracy with weight sharing disabled since x86 does not support weight sharing.
54-
$PYTHON_EXECUTABLE backends/qualcomm/tests/test_qnn_delegate.py -k TestExampleLLMScript.test_llama_stories_110m --model SM8650 --build_folder build-x86/ --executorch_root . --artifact_dir ./stories_110m_accuracy --llama_artifacts . --enable_x86_64
54+
$PYTHON_EXECUTABLE backends/qualcomm/tests/test_qnn_delegate.py -k TestExampleLLMScript.test_llama_stories_110m --soc_model SM8650 --build_folder build-x86/ --executorch_root . --artifact_dir ./stories_110m_accuracy --llama_artifacts . --enable_x86_64
5555
exit_code2=$?
5656

5757
# Check the exit codes and print messages
@@ -84,7 +84,7 @@ elif [[ "${TASK_NAME}" == "smollm2_135m" ]]; then
8484
if [ -n "$2" ]; then
8585
EXTRA_FLAGS="$EXTRA_FLAGS --static_llm_eval_method $2"
8686
fi
87-
$PYTHON_EXECUTABLE backends/qualcomm/tests/test_qnn_delegate.py -k TestExampleLLMScript.test_static_llm_model --model_name smollm2_135m --model SM8650 --build_folder build-x86/ --executorch_root . --artifact_dir ./static_smollm2 --enable_x86_64 $EXTRA_FLAGS
87+
$PYTHON_EXECUTABLE backends/qualcomm/tests/test_qnn_delegate.py -k TestExampleLLMScript.test_static_llm_model --model_name smollm2_135m --soc_model SM8650 --build_folder build-x86/ --executorch_root . --artifact_dir ./static_smollm2 --enable_x86_64 $EXTRA_FLAGS
8888
exit_code1=$?
8989
if [ $exit_code1 -ne 0 ]; then
9090
exit 1

.claude/skills/qualcomm/SKILL.md

Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
---
2+
name: qualcomm
3+
description: Build, test, or develop the QNN (Qualcomm AI Engine Direct) backend. Use when working on backends/qualcomm/, building QNN (use backends/qualcomm/scripts/build.sh), adding new ops or passes, running QNN delegate
4+
tests, or exporting models for Qualcomm HTP/GPU targets.
5+
---
6+
7+
# QNN (Qualcomm AI Engine Direct) Backend
8+
9+
## Advanced Topics
10+
11+
When the user's request falls into one of these areas, read the corresponding file before proceeding:
12+
13+
| Topic | File | When to read |
14+
|---|---|---|
15+
| Export / lowering / quantization options / pass pipelines | `lowering_export.md` | User asks about exporting, lowering, quantization config, QuantDtype, QuantRecipe, pass pipelines |
16+
| New op development | `new_op_development.md` | User asks to add/implement a new op or op builder |
17+
| Model enablement | `model_enablement.md` | User asks to enable a new model end-to-end |
18+
| Profiling & debugging | `profiling.md` | User asks about profiling, optrace, QHAS, QAIRT Visualizer *(file TBD)* |
19+
20+
## Building
21+
22+
Use `backends/qualcomm/scripts/build.sh`. Linux only (macOS not supported).
23+
24+
**Environment variables:**
25+
- `QNN_SDK_ROOT` — path to QNN SDK (auto-downloaded if not set)
26+
- `ANDROID_NDK_ROOT` — path to Android NDK (auto-downloaded if not set)
27+
28+
**Build targets:**
29+
30+
| Target | Default | Build dir |
31+
|---|---|---|
32+
| x86_64 (Python interface + host tools) | enabled | `build-x86/` |
33+
| Android arm64-v8a (device runner) | enabled | `build-android/` |
34+
| Hexagon DSP (direct mode) | disabled | `build-hexagon/` |
35+
| OE Linux embedded | disabled | `build-oe-linux/` |
36+
37+
**Common build commands:**
38+
39+
```bash
40+
# Full build (x86_64 + Android)
41+
./backends/qualcomm/scripts/build.sh
42+
43+
# x86_64 only (faster, for Python interface development)
44+
./backends/qualcomm/scripts/build.sh --skip_linux_android
45+
46+
# Android only (skip x86_64)
47+
./backends/qualcomm/scripts/build.sh --skip_x86_64
48+
49+
# Incremental build (skip clean)
50+
./backends/qualcomm/scripts/build.sh --no_clean
51+
52+
# Enable Hexagon DSP direct mode (requires HEXAGON_SDK_ROOT, HEXAGON_TOOLS_ROOT, DSP_VERSION)
53+
./backends/qualcomm/scripts/build.sh --enable_hexagon
54+
55+
# OE Linux embedded target (requires TOOLCHAIN_ROOT_HOST, TOOLCHAIN_ROOT_TARGET)
56+
./backends/qualcomm/scripts/build.sh --enable_linux_embedded
57+
58+
# Release build
59+
./backends/qualcomm/scripts/build.sh --release
60+
61+
# Control parallelism
62+
./backends/qualcomm/scripts/build.sh --job_number 8
63+
```
64+
65+
**After x86_64 build**, the Python interface `.so` files are copied to `backends/qualcomm/python/` automatically.
66+
67+
## Testing
68+
69+
```bash
70+
QNN_SDK_ROOT=/path/to/qnn_sdk \
71+
ANDROID_NDK_ROOT=/path/to/android_ndk \
72+
LD_LIBRARY_PATH=/path/to/executorch/build-x86/lib:/path/to/qnn_sdk/lib/x86_64-linux-clang \
73+
PYTHONPATH=$(dirname $EXECUTORCH_ROOT) \
74+
python backends/qualcomm/tests/test_qnn_delegate.py \
75+
TestQNNFloatingPointOperator.test_qnn_backend_abs \
76+
-H $HOST -s $DEVICE_SERIAL -m SM8850 -b build-android -a /path/to/artifacts
77+
```
78+
79+
> **Note (build from source):** Set `PYTHONPATH` to the parent directory of the executorch repo root. Required because `executorch.examples.qualcomm` lives in the source tree and is not installed into site-packages.
80+
81+
Required flags: `-m` (SoC model), `-b` (Android build dir). Optional: `-s` (device serial), `-H` (host), `-a` (artifact dir), `-c` (compile only), `-x` (run on x86_64).
82+
83+
**Test classes:**
84+
85+
| Class | Description |
86+
|---|---|
87+
| `TestQNNFloatingPointOperator` | FP16 operator tests |
88+
| `TestQNNQuantizedOperator` | Quantized operator tests |
89+
| `TestQNNFloatingPointModel` | FP16 model-level tests |
90+
| `TestQNNQuantizedModel` | Quantized model-level tests |
91+
| `TestQNNFloatingPointUtils` | FP16 utility tests |
92+
| `TestQNNQuantizedUtils` | Quantized utility tests |
93+
| `TestExampleLLMScript` | LLM script tests |
94+
| `TestExampleMultimodalityScript` | Multimodality script tests |
95+
| `TestExampleOssScript` | OSS model script tests |
96+
| `TestExampleQaihubScript` | QAI Hub script tests |
97+
| `TestExampleScript` | General example script tests |
98+
| `TestUtilsScript` | Utility script tests |
Lines changed: 141 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,141 @@
1+
# QNN Lowering / Export
2+
3+
## Common Setup
4+
5+
```python
6+
from executorch.backends.qualcomm.serialization.qc_schema import QnnExecuTorchBackendType
7+
from executorch.backends.qualcomm.utils.utils import (
8+
generate_htp_compiler_spec,
9+
generate_qnn_executorch_compiler_spec,
10+
get_soc_to_chipset_map,
11+
to_edge_transform_and_lower_to_qnn,
12+
)
13+
14+
soc_model = get_soc_to_chipset_map()["SM8650"] # adjust SoC as needed
15+
```
16+
17+
---
18+
19+
## FP16 Export
20+
21+
```python
22+
backend_options = generate_htp_compiler_spec(use_fp16=True)
23+
compiler_specs = generate_qnn_executorch_compiler_spec(
24+
soc_model=soc_model,
25+
backend_options=backend_options,
26+
)
27+
edge_prog_mgr = to_edge_transform_and_lower_to_qnn(model, example_inputs, compiler_specs)
28+
et_program = edge_prog_mgr.to_executorch()
29+
```
30+
31+
---
32+
33+
## Quantized (PTQ) Export
34+
35+
```python
36+
import torch
37+
from torchao.quantization.pt2e.quantize_pt2e import convert_pt2e, prepare_pt2e
38+
from executorch.backends.qualcomm.quantizer.quantizer import QnnQuantizer
39+
40+
# 1. Export to ATen IR
41+
m = torch.export.export(model.eval(), example_inputs, strict=True).module()
42+
43+
# 2. Prepare for quantization
44+
quantizer = QnnQuantizer(
45+
backend=QnnExecuTorchBackendType.kHtpBackend,
46+
soc_model=soc_model,
47+
)
48+
m = prepare_pt2e(m, quantizer)
49+
50+
# 3. Calibrate
51+
m(*example_inputs)
52+
53+
# 4. Convert
54+
m = convert_pt2e(m)
55+
56+
# 5. Lower to QNN
57+
backend_options = generate_htp_compiler_spec(use_fp16=False)
58+
compiler_specs = generate_qnn_executorch_compiler_spec(
59+
soc_model=soc_model,
60+
backend_options=backend_options,
61+
)
62+
edge_prog_mgr = to_edge_transform_and_lower_to_qnn(m, example_inputs, compiler_specs)
63+
et_program = edge_prog_mgr.to_executorch()
64+
```
65+
66+
---
67+
68+
## Quantized (QAT) Export
69+
70+
Same as PTQ but use `prepare_qat_pt2e` and run a training loop instead of calibration:
71+
72+
```python
73+
from torchao.quantization.pt2e.quantize_pt2e import convert_pt2e, prepare_qat_pt2e
74+
75+
m = prepare_qat_pt2e(m, quantizer)
76+
# training loop
77+
m(*example_inputs)
78+
m = convert_pt2e(m)
79+
# ... same lowering steps as PTQ
80+
```
81+
82+
---
83+
84+
## Quantization Options
85+
86+
| QuantDtype | Activation | Weight |
87+
|---|---|---|
88+
| `use_16a16w` | uint16 | int16 |
89+
| `use_16a8w` | uint16 | int8 |
90+
| `use_16a4w` | uint16 | int4 |
91+
| `use_16a4w_block` | uint16 | int4 (block-wise) |
92+
| `use_8a8w` | uint8 | int8 |
93+
| `use_8a4w` | uint8 | int4 |
94+
95+
**Fine-grained control with QuantRecipe:**
96+
97+
```python
98+
from executorch.backends.qualcomm.quantizer.quant_recipe import QuantRecipe, QuantGranularity
99+
100+
recipe = QuantRecipe(quant_dtype=QuantDtype.use_8a8w, is_qat=False)
101+
recipe.add_node_target(targets={torch.ops.aten.linear.default}, quant_dtype=QuantDtype.use_16a8w)
102+
recipe.add_regex(regex={"layers.[0-3].attention"}, quant_dtype=QuantDtype.use_16a4w)
103+
```
104+
105+
---
106+
107+
## Pass Pipelines (QnnPassManager)
108+
109+
| Pipeline | When Called | Key Passes |
110+
|---|---|---|
111+
| `transform_for_annotation_pipeline` | Before `prepare_pt2e` (called internally by `QnnQuantizer`) | RemoveRedundancy, Decompose*, Recompose*, ReplaceInfValues |
112+
| `transform_for_export_pipeline` | After `torch.export` | Decompose*, CanonicalizeConv, LiftConstantScalarOperands |
113+
| `get_to_edge_transform_passes` | Before `to_edge` | AnnotateQuantAttrs, FoldQDQ, LayoutTransform, TagQuantIO, **ResolveDebugHandle (must be last)** |
114+
| `transform_for_preprocess_pipeline` | Inside `QnnBackend.preprocess` | FoldQDQ(force_fold=True), InsertRequantize, InsertIOQDQ, LayoutTransform(insert_permute=True), FuseConsecutiveCast |
115+
116+
---
117+
118+
## Skipping Ops / Partial Delegation
119+
120+
```python
121+
from executorch.backends.qualcomm.utils.utils import skip_annotation
122+
123+
# Skip specific node targets from being delegated
124+
skip_annotation(model, skipped_ops={torch.ops.aten.add.Tensor})
125+
```
126+
127+
---
128+
129+
## Dumping Context Binary
130+
131+
```python
132+
from executorch.backends.qualcomm.utils.utils import dump_context_from_pte
133+
134+
dump_context_from_pte("model.pte", output_dir="./context_bins/")
135+
```
136+
137+
---
138+
139+
## SoC Reference
140+
141+
See `_soc_info_table` in `backends/qualcomm/serialization/qc_schema.py`.
Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
# Model Enablement
2+
3+
Checklist for enabling a new model end-to-end on the QNN backend.
4+
5+
---
6+
7+
## 1. Identify Unsupported Ops
8+
9+
Export the model and check which ops fall back to CPU:
10+
11+
```python
12+
from executorch.backends.qualcomm.utils.utils import capture_program
13+
14+
prog = capture_program(model, example_inputs)
15+
for node in prog.exported_program.graph.nodes:
16+
if node.op == "call_function":
17+
print(node.target.__name__)
18+
```
19+
20+
Or run the full lowering and inspect the partition result — nodes outside the delegate are CPU fallbacks.
21+
22+
For each unsupported op, follow `new_op_development.md`.
23+
24+
---
25+
26+
## 2. Add Export Script
27+
28+
Place the script under `examples/qualcomm/scripts/<model_name>.py`. Use `build_executorch_binary` as the standard entry point:
29+
30+
```python
31+
from executorch.examples.qualcomm.utils import build_executorch_binary
32+
33+
build_executorch_binary(
34+
model=model,
35+
inputs=example_inputs,
36+
soc_model=args.model,
37+
file_name=f"{args.artifact}/{pte_filename}",
38+
dataset=calibration_data, # None for FP16
39+
quant_dtype=QuantDtype.use_8a8w, # omit for FP16
40+
online_prepare=args.online_prepare,
41+
)
42+
```
43+
44+
For models requiring custom runners, add under `examples/qualcomm/oss_scripts/`.
45+
46+
---
47+
48+
## 3. Verify Delegation
49+
50+
After lowering, confirm the graph is fully delegated:
51+
52+
```python
53+
from executorch.backends.qualcomm.utils.utils import draw_graph
54+
55+
draw_graph("model_graph", prog.exported_program.graph)
56+
```
57+
58+
Expected: all compute nodes inside a single `torch.ops.higher_order.executorch_call_delegate` node. Any remaining `call_function` nodes are CPU fallbacks — investigate and fix.
59+
60+
---
61+
62+
## 4. Add Model-Level Tests
63+
64+
In `tests/test_qnn_delegate.py`, add to `TestQNNFloatingPointModel` and/or `TestQNNQuantizedModel`:
65+
66+
```python
67+
def test_qnn_backend_my_model(self):
68+
# setup model and inputs
69+
module = MyModel()
70+
sample_input = (torch.randn(1, 3, 224, 224),)
71+
# lower and test
72+
self.lower_module_and_test_output(module, sample_input)
73+
```
74+
75+
For script-based tests (with artifact dependencies), add to `TestExampleScript` or `TestExampleOssScript`.
76+
77+
---
78+
79+
## 5. Accuracy Validation
80+
81+
Run on device and compare outputs against CPU reference:
82+
83+
```python
84+
import torch
85+
86+
cpu_output = model(*example_inputs)
87+
qnn_output = # load from device execution
88+
89+
torch.testing.assert_close(qnn_output, cpu_output, rtol=1e-2, atol=1e-2)
90+
```
91+
92+
Typical tolerances:
93+
- FP16: `rtol=1e-2, atol=1e-2`
94+
- INT8 quantized: `rtol=1e-1, atol=1e-1` (accuracy depends on calibration quality)
95+
96+
---
97+
98+
## 6. Common Issues
99+
100+
| Symptom | Likely Cause | Fix |
101+
|---|---|---|
102+
| Op falls back to CPU | Missing builder or annotation | Add op builder + quantizer annotation |
103+
| Shape mismatch after layout transform | NHWC/NCHW confusion | Check `LayoutTransform` pass, verify `get_tensor` axis order |
104+
| Quantization accuracy degraded | Poor calibration data | Use representative dataset; try per-channel quantization |
105+
| `KeyError` in `node_visitors` | Builder not registered | Check `builders/__init__.py` import |
106+
| Context binary compile failure | QNN op spec mismatch | Verify IO order and parameter names against `QnnOpDef.h` |
107+
| `online_prepare` vs offline mismatch | Context binary format | Use `--online_prepare` for QAIRT Visualizer; offline for deployment |

0 commit comments

Comments
 (0)