Arm backend: Add quantizer tutorial (pytorch#18490)

AdrianLundell · web-flow · commit 431032bca09d · 2026-03-25T15:49:03.000+01:00
Adds a new jupyter notebook tutorial for the new composable quantizer in
the Arm backend

Signed-off-by: Adrian Lundell &lt;adrian.lundell@arm.com&gt;
diff --git a/examples/arm/quantizer_tutorial.ipynb b/examples/arm/quantizer_tutorial.ipynb
@@ -0,0 +1,316 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Copyright 2026 Arm Limited and/or its affiliates.\n",
+    "#\n",
+    "# This source code is licensed under the BSD-style license found in the\n",
+    "# LICENSE file in the root directory of this source tree."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# WIP: TOSA/EthosU/VgfQuantizer composable quantizer tutorial\n",
+    "\n",
+    "This is an in-depth tutorial of the new `TOSA/EthosU/VgfQuantizer` API. While the `TOSAQuantizer` is used in the example, both the\n",
+    "`EthosUQuantizer` and `VgfQuantizer` directly inherit from this base class. \n",
+    "\n",
+    "Note that the main API and functionality remains largely the same to allow for a drop-in replacement, but the underlying framework is different - as will be explained. **Both the quantizer and this tutorial are currently experimental and may change without prior notice.** Refer to https://github.com/pytorch/executorch/issues/17701 for questions and feedback.\n",
+    "\n",
+    "Before you begin:\n",
+    "1. (In a clean virtual environment with a compatible Python version) Install executorch using `./install_executorch.sh`\n",
+    "2. Install Arm TOSA dependencies using `examples/arm/setup.sh --disable-ethos-u-deps`\n",
+    "\n",
+    "With all commands executed from the base `executorch` folder."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Setup model and logging"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "\n",
+    "class ToyModel(torch.nn.Module):\n",
+    "    def __init__(self):\n",
+    "        super().__init__()\n",
+    "        self.conv1 = torch.nn.Conv2d(1, 1, 1)\n",
+    "        self.conv2 = torch.nn.Conv2d(1, 1, 1)\n",
+    "\n",
+    "    def forward(self, x: torch.Tensor, y: torch.Tensor) -> torch.Tensor:\n",
+    "        x = self.conv1(x)\n",
+    "        x = torch.relu(x)        \n",
+    "        y = self.conv2(y)\n",
+    "        z = x/y\n",
+    "        return z.view((1,))\n",
+    "\n",
+    "example_inputs = (torch.ones(1,1,1,1),torch.ones(1,1,1,1))\n",
+    "\n",
+    "model = ToyModel()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Set logger to DEBUG for full quantization report\n",
+    "import logging\n",
+    "logging.basicConfig()\n",
+    "logging.getLogger().setLevel(logging.DEBUG)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# If you have model-explorer installed, you can visualize the exported program with the following code:\n",
+    "from executorch.devtools.visualization import visualize\n",
+    "\n",
+    "exported_program = torch.export.export(model, example_inputs)\n",
+    "visualize(exported_program)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Basic quantizer useage\n",
+    "The experimental API is enabled by setting `use_composable_quantizer=True` when initializing\n",
+    "the quantizer. The name `composable_quantizer` refers to the new implementation using multiple\n",
+    "separate quantizers; the user configures quantization by specifying a sequence of quantizers,\n",
+    "with each annotating a selection of nodes with a particular quantization config.\n",
+    "\n",
+    "The node selection and quantization config is set by the user. They can be selected using basic API-calls as demonstrated here,\n",
+    "or be completely customized as shown in the advanced section. However, the backend has limits on what is supported and may\n",
+    "reject quantization of nodes with unsupported quantization configs. A few operators additionally require special quantization\n",
+    "strategies for numerical correctness, which is encoded in the backend specific `TOSAQuantizationSpec`. These special cases will\n",
+    "be reported in a quantization report.\n",
+    "\n",
+    "The quantizer additionally applies it's own filtering of the selected nodes to only quantize what is known to be supported\n",
+    "in the backend.\n",
+    "\n",
+    "Below, the model is quantized by three different quantizers:\n",
+    "1. The nodes named 'conv2d' and 'relu' are quantized using the a8a8 config.\n",
+    "2. The remaning conv is targeted with None, leaving it non-quantized.\n",
+    "3. The remaining nodes are targeted by the global config, which is a16w8.\n",
+    "\n",
+    "Note that order of configuration is important, later specified quantizers have precedence (with the exception of global,\n",
+    "which is always applied last). Switching 1 and 2 would leave both convolutions in floating point. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from executorch.backends.arm.tosa.compile_spec import TosaCompileSpec\n",
+    "from executorch.backends.arm.quantizer import (\n",
+    "    TOSAQuantizer,\n",
+    "    get_symmetric_quantization_config,\n",
+    "    get_symmetric_a16w8_quantization_config,\n",
+    ")\n",
+    "from torchao.quantization.pt2e.quantize_pt2e import convert_pt2e, prepare_pt2e\n",
+    "\n",
+    "# Export the model\n",
+    "exported_program = torch.export.export(model, example_inputs)\n",
+    "graph_module = exported_program.module(check_guards=False)\n",
+    "\n",
+    "# Create and configure quantizer to use a symmetric quantization config globally on all nodes\n",
+    "target = \"TOSA-1.0+INT\"\n",
+    "compile_spec = TosaCompileSpec(target)\n",
+    "quantizer = TOSAQuantizer(compile_spec, use_composable_quantizer=True)\n",
+    "\n",
+    "a16w8_config = get_symmetric_a16w8_quantization_config()\n",
+    "fp_config = None\n",
+    "a8w8_config = get_symmetric_quantization_config()\n",
+    "\n",
+    "quantizer.set_global(a16w8_config)                        # Gloabl config, applied last\n",
+    "quantizer.set_module_type(torch.nn.Conv2d, fp_config)     # Applied second, remaning conv2d set to floating point\n",
+    "quantizer.set_node_name([\"conv2d\", \"relu\"], a8w8_config)  # Applied first, conv+relu quantized using the a8a8 config\n",
+    "\n",
+    "# Post training quantization\n",
+    "quantized_graph_module = prepare_pt2e(graph_module, quantizer)\n",
+    "quantized_graph_module(*example_inputs) # Calibrate the graph module with the example input\n",
+    "quantized_graph_module = convert_pt2e(quantized_graph_module)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "exported_program = torch.export.export(quantized_graph_module, example_inputs)\n",
+    "visualize(exported_program)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### The quantization report\n",
+    "\n",
+    "In the logged quantization report each quantizer has added one header describing targeted nodes, the used quantization config, and the supported operators / operator patterns. \n",
+    "```\n",
+    "PatternQuantizer using NodeNameNodeFinder targeting names: conv2d, relu\n",
+    "Annotating with executorch.backends.arm.quantizer.arm_quantizer.get_symmetric_quantization_config(is_per_channel=True)\n",
+    "Supported operators and patterns defined by executorch.backends.arm.quantizer.quantizer_support.TOSA_QUANTIZER_SUPPORT_DICT\n",
+    "```\n",
+    "\n",
+    "it then gives a short overview of how many nodes it has targeted\n",
+    "```\n",
+    "   Accepted nodes: 2\n",
+    "   Rejected due to previous annotation: 0\n",
+    "   Rejected nodes: 0\n",
+    "```\n",
+    "\n",
+    "and finally a node-by-node report\n",
+    "```\n",
+    "       NODE NAME    INPUT QSPEC MAP                           OUTPUT QSPEC MAP\n",
+    "   --  -----------  ----------------------------------------  ---------------------\n",
+    "   ╒   conv2d       x: INT8_PER_TENSOR_QSPEC                  NO_QSPEC\n",
+    "   |                _param_constant0: INT8_PER_CHANNEL_QSPEC\n",
+    "   |                _param_constant1: DERIVED_QSPEC\n",
+    "   ╘   relu  \n",
+    "```\n",
+    "\n",
+    "The brackets here indicates that the conv2d and relu has been recognized as a single pattern\n",
+    "to be quantized to allow a fusing later in the backend. One quantization config translates to\n",
+    "many different quantization annotations for different types of tensors; per tensor for\n",
+    "activations, per channel for weights, and a special quantization spec for the int32 bias. \n",
+    "\n",
+    "### Pre-transform for annotation vs. final quantization report\n",
+    "One important detail is that there are two reports printed, one named PRE-TRANSFORM_FOR_ANNOTATION QUANTIZATION REPORT,\n",
+    "and one named FINAL QUANTIZATION REPORT. This is related to the fact that some operators has to be decomposed before quantization to ensure\n",
+    "that all \"sub operators\" gets quantized properly. As an example, the division operator in the first report\n",
+    "has decomposed into a reciprocal and multiplication operator in the second. Had it not been marked for quantization\n",
+    "in the first step, it would have remained a single division operator.\n",
+    "\n",
+    "**This is important to be aware of when doing mixed quantization since this means that for an operator to be fully quantized,\n",
+    "both the original operator and the decomposition needs to be targeted.**\n",
+    "\n",
+    "### SharedQspecQuantizer\n",
+    "Last in the report there is always an additional quantizer applied which is not specified by the user, the SharedQspecQuantizer.\n",
+    "It handles data shuffling operators without numerical behaviour such as copies and reshapes to ensure that they are quantized with the same qspec as\n",
+    "surrounding nodes, rather than counting on the user to configure them correctly. It shouldn't need much attention as it is not configured, \n",
+    "but it is good to be aware of when analyzing the quantization behaviour. The targeted operators are defined by `SHARED_QSPEC_OPS_DEFAULT`\n",
+    "in the quantizer class."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Advanced quantizer useage\n",
+    "\n",
+    "The composability of the quantizer has an additional benefit for advanced user in that each component can easily be modified\n",
+    "or completely be swapped out in cases where special behaviour is needed. Let's see this in action by recreating what happens under\n",
+    "the hood when the `set_node_name` API is used."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import executorch.backends.cortex_m.quantizer.node_finders as node_finders\n",
+    "from executorch.backends.arm.quantizer.quantization_config import TOSAQuantizationConfig\n",
+    "from executorch.backends.arm.quantizer.arm_quantizer_utils import PatternQuantizer\n",
+    "from executorch.backends.cortex_m.quantizer.pattern_matcher import PatternMatcher\n",
+    "from torchao.quantization.pt2e.quantizer import (\n",
+    "    QuantizationSpec,\n",
+    ")\n",
+    "from torchao.quantization.pt2e import (\n",
+    "    MinMaxObserver,\n",
+    ")\n",
+    "\n",
+    "# Export the model\n",
+    "exported_program = torch.export.export(model, example_inputs)\n",
+    "graph_module = exported_program.module(check_guards=False)\n",
+    "\n",
+    "# Create and configure quantizer to use a symmetric quantization config globally on all nodes\n",
+    "target = \"TOSA-1.0+INT\"\n",
+    "compile_spec = TosaCompileSpec(target)\n",
+    "quantizer = TOSAQuantizer(compile_spec, use_composable_quantizer=True)\n",
+    "\n",
+    "\n",
+    "# The first component is the selection of nodes, done through NodeFinders\n",
+    "# A node finder is a class implementing the NodeFinder interface\n",
+    "# This is instantiated inside the set_node_name function\n",
+    "node_finder = node_finders.NodeNameNodeFinder(\"conv2d\")\n",
+    "\n",
+    "# The second component is the quantization config, which may be custom\n",
+    "# This is what is returned by the get_symmetric_quantization_config function\n",
+    "qspec = QuantizationSpec(torch.int8, MinMaxObserver, quant_min=-128, quant_max=127, qscheme = torch.per_tensor_symmetric)\n",
+    "quantization_config = TOSAQuantizationConfig(input_activation=qspec,output_activation=qspec, weight=None, bias=None)\n",
+    "\n",
+    "# The third component is the pattern matcher which defines support for the backend\n",
+    "# This would typically be the TOSA_QUANTIZER_SUPPORT_DICT but here a minimal support dict is created for demonstration purposes\n",
+    "# A pattern_checker is a class implementing the PatternChecker interface, or it can be None if no extra checks are needed\n",
+    "# This is instantiated by the backend and used by all sub-quantizers\n",
+    "pattern_checker = None\n",
+    "SUPPORT_DICT = {(torch.ops.aten.conv2d.default,) : pattern_checker}\n",
+    "pattern_matcher = PatternMatcher(SUPPORT_DICT, \"MY_SUPPORT_DICT\")\n",
+    "\n",
+    "# All components are brought together in the PatterQuantizer and added to the quantizer\n",
+    "# This is done last in the set_node_name function\n",
+    "pattern_quantizer = PatternQuantizer(quantization_config, node_finder, pattern_matcher)\n",
+    "quantizer.add_quantizer(pattern_quantizer)\n",
+    "\n",
+    "quantized_graph_module = prepare_pt2e(graph_module, quantizer)\n",
+    "quantized_graph_module(*example_inputs)\n",
+    "quantized_graph_module = convert_pt2e(quantized_graph_module)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "As confirmed by the report, the quantizer has now only targeted one single convolution with a custom quantization config, and this config was then propagated to the relu node by the SharedQpsecQuantizer as expected. The view stays in float since the preceeding division operator is in float.\n",
+    "\n",
+    "This useage of the quantizer has less guarantees of producing numerically correct or even functional graphs, but it can be a useful tool for debugging or when an otherwise unsupported behaviour is required."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": ".venv (3.10.15)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.15"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}