Skip to content

Commit 0bba44a

Browse files
author
Github Executorch
committed
[Cortex-M]: Add int8 I/O quantization to Cortex-M export path
Apply QuantizeInputs and QuantizeOutputs passes in the Cortex-M compilation path to strip the float-in/float-out wrapper from quantized models. This produces a fully int8 model that accepts and returns int8 tensors directly. The passes are applied after to_edge_transform_and_lower but before CortexMPassManager, since the latter renames quantized_decomposed ops to cortex_m variants which the I/O passes cannot recognize.
1 parent 0907294 commit 0bba44a

1 file changed

Lines changed: 13 additions & 0 deletions

File tree

examples/arm/aot_arm_compiler.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,8 @@
4747
from executorch.devtools.backend_debug import get_delegation_info
4848
from executorch.devtools.bundled_program.config import MethodTestCase, MethodTestSuite
4949

50+
from executorch.exir.passes.quantize_io_pass import QuantizeInputs, QuantizeOutputs
51+
5052
from executorch.exir import (
5153
EdgeCompileConfig,
5254
ExecutorchBackendConfig,
@@ -860,6 +862,17 @@ def _to_channels_last(x):
860862
),
861863
)
862864

865+
# Strip the float I/O wrapper from the quantized model to produce
866+
# fully int8 inputs and outputs. This must run before CortexMPassManager
867+
# which renames quantized_decomposed ops to cortex_m variants.
868+
if args.quantize:
869+
print("Applying passes to create a fully int8 quantized model...")
870+
871+
edge = edge.transform([
872+
QuantizeInputs(edge, [0]),
873+
QuantizeOutputs(edge, [0]),
874+
])
875+
863876
pass_manager = CortexMPassManager(edge.exported_program())
864877
edge._edge_programs["forward"] = pass_manager.transform()
865878

0 commit comments

Comments
 (0)