Skip to content

Latest commit

 

History

History
147 lines (101 loc) · 6.14 KB

File metadata and controls

147 lines (101 loc) · 6.14 KB

Arm VGF Backend

The Arm® VGF backend is the ExecuTorch solution for lowering PyTorch models to VGF compatible hardware. It leverages the TOSA operator set and the ML SDK for Vulkan® to produce a .PTE file. The VGF backend also supports execution from a .PTE file and provides functionality to extract the corresponding VGF file for integration into various applications.

Features

  • Wide operator support for delegating large parts of models to the VGF target.
  • A quantizer that optimizes quantization for the VGF target.

Target Requirements

The target system must include ML SDK for Vulkan and a Vulkan driver with Vulkan API >= 1.3.

Development Requirements

All requirements can be downloaded using `examples/arm/setup.sh --enable-mlsdk-deps --disable-ethos-u-deps` and added to the path using
`source examples/arm/arm-scratch/setup_path.sh`

For the AOT flow, compilation of a model to .pte format using the VGF backend, the requirements are:

And for building and running your application using the generic executor_runner:

Using the Arm VGF Backend

The VGF Minimal Example demonstrates how to lower a module using the VGF backend.

The main configuration point for the lowering is the VgfCompileSpec consumed by the partitioner and quantizer. To extract the VGF file for integration into applications without the ExecuTorch runtime, use VgfCompileSpec.dump_intermediate_artifacts_to().
The full user-facing API is documented below.

class VgfCompileSpec(tosa_spec: executorch.backends.arm.tosa.specification.TosaSpecification | str | None = None, compiler_flags: list[str] | None = None)

Normalise inputs and populate the underlying Arm compile spec.

Args:

  • tosa_spec (TosaSpecification | str | None): TOSA specification to target. Strings are parsed via TosaSpecification.create_from_string. Defaults to "TOSA-1.0+FP+INT+int4+int16".
  • compiler_flags (list[str] | None): Optional converter-backend flags.
def VgfCompileSpec.dump_debug_info(self, debug_mode: executorch.backends.arm.common.arm_compile_spec.ArmCompileSpec.DebugMode | None):

Dump debugging information into the intermediates path.

Args:

  • debug_mode: The debug mode to use for dumping debug information.
def VgfCompileSpec.dump_intermediate_artifacts_to(self, output_path: str | None):

Sets a path for dumping intermediate results during such as tosa and pte.

Args:

  • output_path: Path to dump intermediate results to.
def VgfCompileSpec.get_output_order_workaround(self) -> bool:

Gets whether the output order workaround is being applied.

def VgfCompileSpec.set_output_order_workaround(self, output_order_workaround: bool):

Sets whether to apply the output order workaround.

Args:

  • output_order_workaround: Boolean indicating whether to apply the workaround.
def VgfCompileSpec.set_pass_pipeline_config(self, config: executorch.backends.arm.common.pipeline_config.ArmPassPipelineConfig) -> None:

Sets the configuration that controls how the Arm pass pipeline should behave. Subclasses may override to tweak defaults for specific targets.

Args:

  • config: The custom ArmPassPipelineConfig to set.

Partitioner API

See Partitioner API for more information of the Partitioner API.

Quantization

The VGF quantizer supports Post Training Quantization (PT2E) and Quantization-Aware Training (QAT).

Partial quantization is supported, allowing users to quantize only specific parts of the model while leaving others in floating-point.

For more information on quantization, see Quantization.

Runtime Integration

The VGF backend can use the default ExecuTorch runner. The steps required for building and running it are explained in the VGF Backend Tutorial. The example application is recommended to use for testing basic functionality of your lowered models, as well as a starting point for developing runtime integrations for your own targets.

Example: Image classification flow

examples/arm/image_classification_example_vgf contains a ready-to-run DeiT image classification pipeline for VGF targets. The example README documents how to:

  • Export the quantized INT8 weights via model_export/export_deit.py.
  • Use the provided requirements file to install the ML SDK converter scripts and produce a .pte artifact.
  • Build and launch the Vulkan-based runtime under runtime/, which loads the .pte alongside the generated VGF blob.

Following this walkthrough ensures you exercise the same lowering + runtime flow described in the rest of this guide but with a concrete end-to-end sample.

Reference

→{doc}/backends/arm-vgf/arm-vgf-partitioner — Partitioner options.

→{doc}/backends/arm-vgf/arm-vgf-quantization — Supported quantization schemes.

→{doc}/backends/arm-vgf/arm-vgf-troubleshooting — Debug common issues.

→{doc}/backends/arm-vgf/tutorials/arm-vgf-tutorials — Tutorials.

→{doc}/backends/arm-vgf/VGF_op_support — VGF supported operators.

:maxdepth: 2
:hidden:
:caption: Arm VGF Backend

arm-vgf-partitioner
arm-vgf-quantization
arm-vgf-troubleshooting
tutorials/arm-vgf-tutorials
VGF_op_support