Skip to content

Commit b618756

Browse files
Merge branch 'main' into remove_redundant_pass_insertions
2 parents 0fd46cc + b04cc65 commit b618756

235 files changed

Lines changed: 15863 additions & 2062 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/scripts/docathon-label-sync.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ def main() -> None:
99
token = os.environ.get("GITHUB_TOKEN")
1010

1111
repo_owner = "pytorch"
12-
repo_name = "pytorch"
12+
repo_name = "executorch"
1313
pull_request_number = int(sys.argv[1])
1414

1515
g = Github(token)

.github/workflows/cuda.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -148,6 +148,10 @@ jobs:
148148
# Run Qwen 3.5 MoE tests (quantize roundtrip + TurboQuant KV cache + sampler)
149149
python -m pytest examples/models/qwen3_5_moe/test_quantize_roundtrip.py examples/models/qwen3_5_moe/test_turboquant.py examples/models/qwen3_5_moe/test_sampler.py -v -o "addopts="
150150
151+
# Run Gemma 4 31B tests (quant unit tests + pipeline integration tests)
152+
pip install gguf
153+
python -m pytest examples/models/gemma4_31b/quant/tests/ examples/models/gemma4_31b/tests/ -v -o "addopts="
154+
151155
export-model-cuda-artifact:
152156
name: export-model-cuda-artifact
153157
# Skip this job if the pull request is from a fork (HuggingFace secrets are not available)

CONTRIBUTING.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -321,6 +321,25 @@ CI is run automatically on all pull requests. However, if you want to run tests
321321
- The `test/run_oss_cpp_tests.sh` script will build and run C++ tests locally
322322
- Running `pytest` from the root directory will run Python tests locally. Make sure to run this after finishing [Dev Install](#dev-install).
323323

324+
To build C++ tests manually with CMake, run the following from the repository root:
325+
326+
```bash
327+
cmake . -Bcmake-out -DCMAKE_INSTALL_PREFIX=cmake-out -DEXECUTORCH_BUILD_TESTS=ON
328+
cmake --build cmake-out -j9 --target install
329+
```
330+
331+
You can then use `ctest` to list or run individual C++ tests directly:
332+
333+
```bash
334+
ctest --test-dir cmake-out -N
335+
ctest --test-dir cmake-out -R <test_name_regex> --output-on-failure
336+
```
337+
338+
This workflow is useful when you want to rerun one test, attach a debugger to a
339+
test binary under `cmake-out`, or keep a build directory around for quick rebuild
340+
cycles. Add the same `-DEXECUTORCH_BUILD_*` options used by
341+
`test/run_oss_cpp_tests.sh` when the test needs optional kernels or extensions.
342+
324343
### Writing Tests
325344
To help keep code quality high, ExecuTorch uses a combination of unit tests and
326345
end-to-end (e2e) tests. If you add a new feature or fix a bug, please add tests

Makefile

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@
9191
#
9292
# ==============================================================================
9393

94-
.PHONY: voxtral-cuda voxtral-cpu voxtral-metal voxtral-mlx voxtral_realtime-cuda voxtral_realtime-cpu voxtral_realtime-metal voxtral_realtime-mlx voxtral_tts-cpu voxtral_tts-cuda whisper-cuda whisper-cuda-debug whisper-cpu whisper-metal parakeet-cuda parakeet-cuda-debug parakeet-cpu parakeet-metal parakeet-mlx parakeet-vulkan dinov2-cuda dinov2-cuda-debug sortformer-cuda sortformer-cpu silero-vad-cpu llama-cuda llama-cuda-debug llama-cpu llava-cpu gemma3-cuda gemma3-cpu qwen3_5_moe-cuda qwen3_5_moe-metal clean help
94+
.PHONY: voxtral-cuda voxtral-cpu voxtral-metal voxtral-mlx voxtral_realtime-cuda voxtral_realtime-cpu voxtral_realtime-metal voxtral_realtime-mlx voxtral_tts-cpu voxtral_tts-cuda whisper-cuda whisper-cuda-debug whisper-cpu whisper-metal parakeet-cuda parakeet-cuda-debug parakeet-cpu parakeet-metal parakeet-mlx parakeet-vulkan dinov2-cuda dinov2-cuda-debug sortformer-cuda sortformer-cpu silero-vad-cpu llama-cuda llama-cuda-debug llama-cpu llava-cpu gemma3-cuda gemma3-cpu gemma4_31b-cuda qwen3_5_moe-cuda qwen3_5_moe-metal clean help
9595

9696
help:
9797
@echo "This Makefile adds targets to build runners for various models on various backends. Run using \`make <target>\`. Available targets:"
@@ -126,6 +126,7 @@ help:
126126
@echo " llava-cpu - Build Llava runner with CPU backend"
127127
@echo " gemma3-cuda - Build Gemma3 runner with CUDA backend"
128128
@echo " gemma3-cpu - Build Gemma3 runner with CPU backend"
129+
@echo " gemma4_31b-cuda - Build Gemma 4 31B runner with CUDA backend"
129130
@echo " qwen3_5_moe-cuda - Build Qwen3.5 MoE runner with CUDA backend"
130131
@echo " qwen3_5_moe-metal - Build Qwen3.5 MoE runner with Metal backend"
131132
@echo " clean - Clean build artifacts"
@@ -425,6 +426,15 @@ qwen3_5_moe-cuda:
425426
@echo "✓ Build complete!"
426427
@echo " Binary: cmake-out/examples/models/qwen3_5_moe/qwen3_5_moe_runner"
427428

429+
gemma4_31b-cuda:
430+
@echo "==> Building and installing ExecuTorch with CUDA..."
431+
cmake --workflow --preset llm-release-cuda
432+
@echo "==> Building Gemma 4 31B runner with CUDA..."
433+
cd examples/models/gemma4_31b && cmake --workflow --preset gemma4-31b-cuda
434+
@echo ""
435+
@echo "✓ Build complete!"
436+
@echo " Binary: cmake-out/examples/models/gemma4_31b/gemma4_31b_runner"
437+
428438
qwen3_5_moe-metal:
429439
@echo "==> Building and installing ExecuTorch with Metal..."
430440
cmake --workflow --preset llm-release-metal

backends/arm/README.md

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -249,10 +249,15 @@ Some tests, with `u55`, `u85` and `vgf` in the name require external dependencie
249249
```
250250

251251
In addition, some model tests in the Arm backend require third-party libraries or packages.
252-
To run these tests, you need to install the required dependencies by running the script `examples/arm/setup.sh` with the flag `--setup-test-dependency`.
252+
To run these tests, install the required dependencies directly:
253253

254-
Please note that installing model test dependencies is a standalone process. When using the `--setup-test-dependency` flag,
255-
the script will install only the necessary dependencies for model tests, skipping all other setup procedures.
254+
```
255+
bash backends/arm/scripts/install_models_for_test.sh
256+
```
257+
258+
Installing model test dependencies is a standalone process. The script installs
259+
only the dependencies needed for model tests, skipping all other setup
260+
procedures.
256261

257262
## Using git hooks
258263

backends/arm/_passes/__init__.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -140,6 +140,9 @@
140140
from .remove_getitem_pass import RemoveGetItemPass # noqa
141141
from .remove_graph_asserts_pass import RemoveGraphAssertsPass # noqa
142142
from .remove_noop_pass import RemoveNoopPass # noqa
143+
from .remove_permutes_around_elementwise_tosa_ops import ( # noqa
144+
RemovePermutesAroundElementwiseTosaOps,
145+
)
143146
from .replace_scalar_with_tensor_pass import ( # noqa
144147
ReplaceScalarWithTensorByProfilePass,
145148
)

backends/arm/_passes/arm_pass_manager.py

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -125,6 +125,7 @@
125125
RemoveGetItemPass,
126126
RemoveGraphAssertsPass,
127127
RemoveNoopPass,
128+
RemovePermutesAroundElementwiseTosaOps,
128129
ReplaceInfAndLimitValuesPass,
129130
ReplaceScalarWithTensorByProfilePass,
130131
RewriteAvgPool2dPass,
@@ -164,9 +165,6 @@
164165
PostponePermuteOpBelowSqueezeOrUnsqueezeLikeView,
165166
)
166167

167-
from executorch.backends.transforms.remove_permutes_around_elementwise_ops import (
168-
RemovePermutesAroundElementwiseOps,
169-
)
170168
from executorch.exir import ExportedProgram
171169
from executorch.exir.pass_base import ExportPass
172170
from executorch.exir.pass_manager import PassManager
@@ -523,6 +521,7 @@ def _tosa_pipeline(
523521
DecomposeSumPass(),
524522
InsertTableOpsPass(exported_program),
525523
RemoveNoopPass(),
524+
InsertDataLayoutCastsPass(),
526525
]
527526
)
528527

@@ -535,7 +534,7 @@ def _tosa_pipeline(
535534
RewriteMatmulPass(),
536535
RewritePadPass(),
537536
FuseViewCopyTransformPass(),
538-
RemovePermutesAroundElementwiseOps(),
537+
RemovePermutesAroundElementwiseTosaOps(),
539538
PostponePermuteOpBelowSqueezeOrUnsqueezeLikeView(),
540539
FuseCascadedTransposeOrPermuteOps(),
541540
ConvertPermuteSingletonToViewPass(),
@@ -555,7 +554,6 @@ def _tosa_pipeline(
555554
EnsureUniqueOutputNodesPass(),
556555
RemoveNoopPass(),
557556
InsertRescalePass(),
558-
InsertDataLayoutCastsPass(),
559557
]
560558
)
561559

backends/arm/_passes/fuse_constant_ops_pass.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -204,7 +204,6 @@ def call(self, graph_module):
204204
f"{[input_node.name for input_node in input_nodes]}"
205205
)
206206
modified |= did_fuse
207-
graph_module.recompile() # Recompile needed to catch chains of constant ops
208207
input_nodes_to_maybe_delete.update(input_nodes)
209208
except Exception as e:
210209
logger.warning(

backends/arm/_passes/insert_table_ops.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -278,11 +278,12 @@ def call(self, graph_module: GraphModule) -> PassResult:
278278
out_quantargs=output_qparams[0],
279279
)
280280
# Register buffer in self.exported_program.state_dict
281+
# b_ prefix is important to be recognized as a constant in RemovePermutesAroundElementwiseOps
281282
const_table_node = create_constant_placeholder(
282283
exp_program=self.exported_program,
283284
graph=node.graph,
284285
kind=InputKind.BUFFER,
285-
name=node.name + "_table_constant",
286+
name="b_" + node.name + "_table_constant",
286287
data=buffer,
287288
persistent_buffer=True,
288289
)
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# Copyright 2026 Arm Limited and/or its affiliates.
2+
#
3+
# This source code is licensed under the BSD-style license found in the
4+
# LICENSE file in the root directory of this source tree.
5+
6+
from executorch.backends.arm._passes.insert_table_ops import TableOps
7+
from executorch.backends.transforms.remove_permutes_around_elementwise_ops import (
8+
RemovePermutesAroundElementwiseOps,
9+
)
10+
from executorch.exir.dialects._ops import ops as exir_ops
11+
12+
13+
class RemovePermutesAroundElementwiseTosaOps(RemovePermutesAroundElementwiseOps):
14+
permutable_ops = {
15+
*RemovePermutesAroundElementwiseOps.permutable_ops,
16+
*TableOps.unary_table_ops.keys(),
17+
*TableOps.special_table_ops,
18+
exir_ops.backend.tosa.RESCALE.default,
19+
exir_ops.backend.tosa.TABLE.default,
20+
}
21+
22+
def permute_subgraph(self, subgraph):
23+
# Original function will always permute constant nodes which is wrong for table ops
24+
# Remove constant tosa.TABLE edges before running full function
25+
new_constant_edges_in = set()
26+
for const_node, user_node in subgraph.constant_edges_in:
27+
if user_node.target == exir_ops.backend.tosa.TABLE.default:
28+
continue
29+
else:
30+
new_constant_edges_in.add((const_node, user_node))
31+
32+
subgraph.constant_edges_in = new_constant_edges_in
33+
super().permute_subgraph(subgraph)

0 commit comments

Comments
 (0)