This tutorial demonstrates how to deploy Llama models on Raspberry Pi 4/5 devices using ExecuTorch:
- Prerequisites: Linux host machine, Python 3.10-3.14, conda environment, Raspberry Pi 4/5
- Setup: Automated cross-compilation using
setup.shscript for ARM toolchain installation - Export: Convert Llama models to optimized
.pteformat with quantization options - Deploy: Transfer binaries to Raspberry Pi and configure runtime libraries
- Optimize: Build optimization and performance tuning techniques
- Result: Efficient on-device Llama inference
Operating System: Linux x86_64 (Ubuntu 20.04+ or CentOS Stream 9+)
Software Dependencies:
- Python 3.10-3.14 (ExecuTorch requirement)
- conda or venv for environment management
- CMake 3.29.6+
- Git for repository cloning
Supported Devices: Raspberry Pi 4 and Raspberry Pi 5 with 64-bit OS
Memory Requirements:
- RAM & Storage Varies by model size and optimization level
- 64-bit Raspberry Pi OS (Bullseye or newer)
Verify your host machine compatibility:
# Check OS and architecture
uname -s # Should output: Linux
uname -m # Should output: x86_64
# Check Python version
python3 --version # Should be 3.10-3.14
# Check required tools
hash cmake git md5sum 2>/dev/null || echo "Missing required tools"
cmake --version # Should be 3.29.6+ at minimum
## Development Environment Setup
### Clone ExecuTorch Repository
First, clone the ExecuTorch repository with the Raspberry Pi support:
```bash
# Create project directory
mkdir ~/executorch-rpi && cd ~/executorch-rpi && git clone -b release/1.0 https://github.com/pytorch/executorch.git &&
cd executorch# Create conda environment
conda create -yn executorch python=3.10.0
conda activate executorch
# Upgrade pip
pip install --upgrade pipAlternative: Virtual Environment If you prefer Python's built-in virtual environment:
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pipRefer to → {doc}getting-started for more details.
Run the following script on your Linux host machine:
# Run the Raspberry Pi setup script for Pi 5
examples/raspberry_pi/setup.sh pi5On successful completion, you should see the following output:
[100%] Linking CXX executable llama_main
[100%] Built target llama_main
[SUCCESS] LLaMA runner built successfully
==== Verifying Build Outputs ====
[SUCCESS] ✓ llama_main (6.1M)
[SUCCESS] ✓ libllama_runner.so (4.0M)
[SUCCESS] ✓ libextension_module.a (89K) - static library
✓ ExecuTorch cross-compilation setup completed successfully!Download the Llama model from Hugging Face or any other source, and make sure that following files exist.
- consolidated.00.pth (model weights)
- params.json (model config)
- tokenizer.model (tokenizer)
After downloading the Llama model, export it to ExecuTorch format using the provided script:
#### Set these paths to point to the exported files. Following is an example instruction to export a llama model
LLAMA_QUANTIZED_CHECKPOINT=path/to/consolidated.00.pth
LLAMA_PARAMS=path/to/params.json
python -m extension.llm.export.export_llm \
--config examples/models/llama/config/llama_xnnpack_spinquant.yaml \
+base.model_class="llama3_2" \
+base.checkpoint="${LLAMA_QUANTIZED_CHECKPOINT:?}" \
+base.params="${LLAMA_PARAMS:?}"The file llama3_2.pte will be generated at the place where you run the command
- For more details see Option A: Download and Export Llama3.2 1B/3B Model
- Also refer to → {doc}
llm/export-llmfor more details.
After successful cross-compilation, transfer the required files:
##### Set Raspberry Pi details
export RPI_UN="pi" # Your Raspberry Pi username
export RPI_IP="your-rpi-ip-address"
##### Create deployment directory on Raspberry Pi
ssh $RPI_UN@$RPI_IP 'mkdir -p ~/executorch-deployment'
##### Copy main executable
scp cmake-out/examples/models/llama/llama_main $RPI_UN@$RPI_IP:~/executorch-deployment/
##### Copy runtime library
scp cmake-out/examples/models/llama/runner/libllama_runner.so $RPI_UN@$RPI_IP:~/executorch-deployment/
##### Copy model file
scp llama3_2.pte $RPI_UN@$RPI_IP:~/executorch-deployment/
scp ./tokenizer.model $RPI_UN@$RPI_IP:~/executorch-deployment/SSH into your Raspberry Pi and configure the runtime:
cd ~/executorch-deployment
echo 'export LD_LIBRARY_PATH=$(pwd):$LD_LIBRARY_PATH' > setup_env.sh
chmod +x setup_env.sh
#### Make executable
chmod +x llama_mainsource setup_env.sh
./llama_main --helpMake sure that the output does not have any GLIBC / other library mismatch errors in the output. If you see any, follow the troubleshooting steps below.
Problem: The binary was compiled with a newer GLIBC version (2.38) than what's available on your Raspberry Pi (2.36).
Error Symptoms:
./llama_main: /lib/aarch64-linux-gnu/libm.so.6: version `GLIBC_2.38' not found (required by ./llama_main)
./llama_main: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.38' not found (required by ./llama_main)
./llama_main: /lib/aarch64-linux-gnu/libstdc++.so.6: version `CXXABI_1.3.15' not found (required by ./llama_main)
./llama_main: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.38' not found (required by /lib/libllama_runner.so)There are two potential solutions:
-
Solution A: Modify the Pi to match the binary (run on Pi)
-
Solution B: Modify the binary to match the Pi (run on host)
- Check your current GLIBC version:
ldd --version
# Output: ldd (Debian GLIBC 2.36-9+rpt2+deb12u12) 2.36⚠️ Compatibility Warning and Safety Check:
# Just check and warn - don't do the upgrade
current_glibc=$(ldd --version | head -n1 | grep -o '[0-9]\+\.[0-9]\+')
required_glibc="2.38"
echo "Current GLIBC: $current_glibc"
echo "Required GLIBC: $required_glibc"
if [[ $(echo "$current_glibc < $required_glibc" | bc -l) -eq 1 ]]; then
echo ""
echo "⚠️ WARNING: Your GLIBC version is too old"
echo " You need to upgrade to continue with the next steps"
echo " Consider using Solution B (rebuild binary) for better safety"
echo ""
else
echo "✅ Your GLIBC version is already compatible"
fiNOTE: If the output shows "
- Upgrade to newer GLIBC:
# Add Debian unstable repository
echo "deb http://deb.debian.org/debian sid main contrib non-free" | sudo tee -a /etc/apt/sources.list
# Update package lists
sudo apt update
# Install newer GLIBC packages
sudo apt-get -t sid install libc6 libstdc++6
# Reboot system
sudo reboot- Verify compatibility after reboot:
cd ~/executorch-deployment
source setup_env.sh
# Test that the binary works
if ./llama_main --help &>/dev/null; then
echo "✅ GLIBC upgrade successful - binary is compatible"
else
echo "❌ GLIBC upgrade failed - binary still incompatible"
echo "Consider rolling back or refer to documentation for troubleshooting"
fi- Test the fix:
cd ~/executorch-deployment
source setup_env.sh
./llama_main --model_path ./llama3_2.pte --tokenizer_path ./tokenizer.model --seq_len 128 --prompt "Hello"Important Notes:
- Select "Yes" when prompted to restart services
- Press Enter to keep current version for configuration files
- Backup important data before upgrading
If you prefer not to upgrade your Raspberry Pi system:
- Copy Pi's filesystem to host machine:
# On Raspberry Pi - install rsync
ssh pi@<your-rpi-ip>
sudo apt update && sudo apt install rsync
exit
# On host machine - copy Pi's filesystem
mkdir -p ~/rpi5-sysroot
rsync -aAXv --exclude={"/proc","/sys","/dev","/run","/tmp","/mnt","/media","/lost+found"} \
pi@<your-rpi-ip>:/ ~/rpi5-sysroot- Update CMake toolchain file:
# Edit arm-toolchain-pi5.cmake
# Replace this line:
# set(CMAKE_SYSROOT "${TOOLCHAIN_PATH}/aarch64-none-linux-gnu/libc")
# With this:
set(CMAKE_SYSROOT "/home/yourusername/rpi5-sysroot")
set(CMAKE_FIND_ROOT_PATH "${CMAKE_SYSROOT}")- Rebuild binaries:
# Clean and rebuild
rm -rf cmake-out
./examples/raspberry_pi/rpi_setup.sh pi5 --force-rebuild
# Verify GLIBC version
strings ./cmake-out/examples/models/llama/llama_main | grep GLIBC_
# Should show max GLIBC_2.36 (matching your Pi)Problem: Required libraries are not found at runtime.
Error Symptoms:
./llama_main: error while loading shared libraries: libllama_runner.so: cannot open shared object fileSolution:
# Ensure you're in the correct directory and environment is set
cd ~/executorch-deployment
source setup_env.sh
./llama_main --helpRoot Cause: Either LD_LIBRARY_PATH is not set or you're not in the deployment directory.
Problem: Warning messages about JSON parsing errors after running the llama_main binary.
Error Symptoms:
E tokenizers:hf_tokenizer.cpp:60] Error parsing json file: [json.exception.parse_error.101]Solution: These warnings can be safely ignored. They don't affect model inference.
After resolving issues, test with:
cd ~/executorch-deployment
source setup_env.sh
./llama_main --model_path ./llama3_2.pte --tokenizer_path ./tokenizer.model --seq_len 128 --prompt "What is the meaning of life?"Enable ExecuTorch logging:
# Set log level for debugging
export ET_LOG_LEVEL=Info
./llama_main --model_path ./model.pte --verbosecd ~/executorch-deployment
source setup_env.sh
./llama_main --model_path ./llama3_2.pte --tokenizer_path ./tokenizer.model --seq_len 128 --prompt "What is the meaning of life?"Happy Inferencing!