You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Qualcomm AI Engine Direct - [Multimodal] granite-speech-3.3-2b (#18740)
Summary:
- Support granite-speech-3.3-2b
- Extend Audio modality in QNNMultimodal AOT flow
- Extend Audio modality in QNNMultimodal runner
- Support encoder model sharding
Pull Request resolved: #18740
Test Plan:
#### CI
``` bash
python -m backends.qualcomm.tests.test_qnn_delegate TestExampleMultimodalityScript.test_static_asr --model_name granite_speech_3_3-2b build-android --executorch_root . -a . -m SM8750 -s ${SERIAL_NUM}
```
#### Script
```bash
python examples/qualcomm/oss_scripts/llama/llama.py -b build-android -s ${SERIAL_NUM} -m SM8750 --decoder_model granite_speech_3_3-2b --model_mode hybrid --prefill_ar_len 128 --max_seq_len 1024 --prompt "can you transcribe the speech into a written format?" --audio_path "https://huggingface.co/ibm-granite/granite-speech-3.3-2b/resolve/main/10226_10111_000000.wav?download=true"
```
Audio file: https://huggingface.co/ibm-granite/granite-speech-3.3-2b/resolve/main/10226_10111_000000.wav?download=true
Prompt: "can you transcribe the speech into a written format?"
Result
``` bash
I 00:00:16.333997 executorch:multimodal_runner.cpp:542] RSS after finishing text generation: 614.941406 MiB (0 if unsupported)
I 00:00:16.334231 executorch:stats.h:161] Prompt Tokens: 212 Generated Tokens: 201
I 00:00:16.334356 executorch:stats.h:167] Model Load Time: 1.460000 (seconds)
I 00:00:16.334419 executorch:stats.h:177] Total inference time: 14.871000 (seconds) Rate: 13.516240 (tokens/second)
I 00:00:16.334480 executorch:stats.h:185] Prompt evaluation: 0.798000 (seconds) Rate: 265.664160 (tokens/second)
I 00:00:16.334541 executorch:stats.h:196] Generated 201 tokens: 14.073000 (seconds) Rate: 14.282669 (tokens/second)
I 00:00:16.334629 executorch:stats.h:204] Time to first generated token: 0.798000 (seconds)
I 00:00:16.334688 executorch:stats.h:211] Sampling time over 413 tokens: 0.479000 (seconds)
[INFO] [Qnn ExecuTorch]: Destroy Qnn context
[INFO] [Qnn ExecuTorch]: Destroy Qnn context
[INFO] [Qnn ExecuTorch]: Destroy Qnn context
[INFO] [Qnn ExecuTorch]: Destroy Qnn context
[INFO] [Qnn ExecuTorch]: Destroy Qnn context
[INFO] [Qnn ExecuTorch]: Destroy Qnn context
[INFO] [Qnn ExecuTorch]: Destroy Qnn context
[INFO] [Qnn ExecuTorch]: Destroy Qnn context
[INFO] [Qnn ExecuTorch]: Destroy Qnn context
[INFO] [Qnn ExecuTorch]: Destroy Qnn context
[INFO] [Qnn ExecuTorch]: Destroy Qnn context
[INFO] [Qnn ExecuTorch]: Destroy Qnn context
[INFO] [Qnn ExecuTorch]: Destroy Qnn context
[INFO] [Qnn ExecuTorch]: Destroy Qnn device
PyTorchObserver {"prefill_token_per_sec":265.664,"decode_token_per_sec":14.2827,"prompt_tokens":212,"generated_tokens":201,"model_load_start_ms":1744743525724,"model_load_end_ms":1744743527184,"inference_start_ms":1744743527186,"inference_end_ms":1744743542057,"prompt_eval_end_ms":1744743527984,"first_token_ms":1744743527984,"aggregate_sampling_time_ms":479,"SCALING_FACTOR_UNITS_PER_SECOND":1000}
[INFO] [Qnn ExecuTorch]: Destroy Qnn backend
/data/local/tmp/yuyazhua/executorch/static_llm/outputs/outputs.txt: 1 file pulled. 0.9 MB/s (1170 bytes in 0.001s)
/data/local/tmp/yuyazhua/executorch/static_llm/outputs/inference_speed.txt: 1 file pulled. 0.0 MB/s (7 bytes in 0.002s)
[INFO 2026-04-08 00:22:11,849 llama.py:243] Device Inference Results[0]:
<|start_of_role|>system<|end_of_role|>You are Granite, developed by IBM. You are a helpful AI assistant.<|end_of_text|>
<|start_of_role|>user<|end_of_role|>can you transcribe the speech into a written format?<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|>It appears you've provided a fragment of a sentence, possibly from a poem or text, and you're asking for a transcription or translation into written format. However, without the complete context or original text, it's challenging to accurately transcribe or translate it.
If we were to proceed with a hypothetical example, here's a possible continuation of the sentence in a written format:
"After his nap, Timothy leisurely stretched his foot, first one then the other, carefully selecting the choicest bits. Turning over the food, he methodically picked out the desired portions, meticulously choosing what was to be included in his meal."
This continuation assumes a narrative style, where Timothy is taking care of food preparation. The original sentence seems to be a playful or poetic exploration of a character's actions, possibly related to food preparation or a cooking process.<|end_of_text|>
```
cc: abhinaykukkadapu, cccclai, haowhsu-quic
Differential Revision: D101574849
Pulled By: abhinaykukkadapu
audio_path="https://huggingface.co/ibm-granite/granite-speech-3.3-2b/resolve/main/10226_10111_000000.wav?download=true", # Audio content: after his nap,...
0 commit comments