Neural Speed supports the following models:
| Model Name | INT8 | INT4 | Transformer Version | Max tokens length | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| RTN | GPTQ | AWQ | AutoRound | RTN | GPTQ | AWQ | AutoRound | |||
| Meta-Llama-3-8B-Instruct | ✅ | ✅ | ✅ | ✅ | Latest | 8192 | ||||
| TinyLlama-1.1B, LLaMA2-tB, LLaMA2-13B, LLaMA2-70B | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Latest | 4096 |
| LLaMA-7B, LLaMA-13B | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Latest | 2048 |
| CodeLlama-7b | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Latest | 16384 |
| Solar-10.7B | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Latest | 4096 |
| Neural-Chat-7B-v3-1, Neural-Chat-7B-v3-2 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Latest | 32768 |
| Mistral-7B, Mistral-7B-Instruct-v0.2, Mixtral-8x7B | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | 4.36.0 or newer | 32768 |
| Qwen-7B, Qwen-14B, Qwen1.5-7B, Qwen1.5-0.5B | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Latest | 8192 / 32768 |
| GPT-J-6B | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Latest | 2048 |
| GPT-NeoX-20B | ✅ | ✅ | Latest | 2048 | ||||||
| Dolly-v2-3B | ✅ | ✅ | 4.28.1 or newer | 2048 | ||||||
| MPT-7B, MPT-30B | ✅ | ✅ | Latest | 2048 | ||||||
| Falcon-7B, Falcon-40B | ✅ | ✅ | Latest | 2048 | ||||||
| BLOOM-7B | ✅ | ✅ | Latest | 2048 | ||||||
| OPT-125m, OPT-1.3B, OPT-13B | ✅ | ✅ | Latest | 2048 | ||||||
| ChatGLM-6B, ChatGLM2-6B, ChatGLM3-6B, GLM-4-9B | ✅ | ✅ | 4.33.1 | 2048 / 32768 | ||||||
| Baichuan-13B-Chat,Baichuan2-13B-Chat,Baichuan2-7B-Chat | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | 4.33.1 | 4096 |
| phi-2, phi-1_5 phi-1 | ✅ | ✅ | Latest | 2048 | ||||||
| phi-3-128k, phi-3-48k | ✅ | ✅ | Latest | 128k | ||||||
| StableLM-2-1_6B, StableLM-3B, StableLM-2-12B | ✅ | ✅ | Latest | 4096 | ||||||
| gemma-2b-it , gemma-7b | ✅ | ✅ | Latest | 8192 | ||||||
| Whisper-tiny, Whisper-base Whisper-small Whisper-medium Whisper-large | ✅ | ✅ | Latest | 448 | ||||||
| Model Name | INT8 | INT4 | Transformer Version | ||||||
|---|---|---|---|---|---|---|---|---|---|
| RTN | GPTQ | AWQ | AutoRound | RTN | GPTQ | AWQ | AutoRound | ||
| Code-LLaMA-7B, Code-LLaMA-13B | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Latest |
| Magicoder-6.7B | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Latest |
| StarCoder-1B, StarCoder-3B, StarCoder-15.5B | ✅ | ✅ | Latest | ||||||
| Stable-Code-3B | ✅ | ✅ | Latest | ||||||
| Model Name | |||||
|---|---|---|---|---|---|
| F32 | F16 | Q4_0 | Q8_0 | BTLA | |
| TheBloke/Llama-2-7B-Chat-GGUF | ✅ | ✅ | ✅ | ✅ | |
| TheBloke/Mistral-7B-v0.1-GGUF, TheBloke/Mistral-7B-v0.2-GGUF, | ✅ | ✅ | ✅ | ✅ | |
| TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF | ✅ | ✅ | ✅ | ✅ | |
| TheBloke/SOLAR-10.7B-Instruct-v1.0-GGUF | ✅ | ✅ | ✅ | ✅ | |
| TheBloke/CodeLlama-7B-GGUF,TheBloke/CodeLlama-13B-GGUF | ✅ | ✅ | ✅ | ✅ | |
| Qwen1.5-7B-Chat-GGUF | ✅ | ✅ | ✅ | ✅ | |
| Code-LLaMA-7B, Code-LLaMA-13B | ✅ | ✅ | ✅ | ✅ | ✅ |
| meta-llama/Llama-2-7b-chat-hf | ✅ | ✅ | ✅ | ✅ | ✅ |
| upstage/SOLAR-10.7B-Instruct-v1.0 | ✅ | ✅ | ✅ | ✅ | ✅ |
| Qwen-7B-Chat, Qwen1.5-7B-Chat | ✅ | ✅ | ✅ | ✅ | ✅ |
| tiiuae/falcon-7 | ✅ | ✅ | ✅ | ✅ | ✅ |
| tiiuae/falcon-40b | ✅ | ✅ | ✅ | ✅ | ✅ |
| mpt-7b | ✅ | ✅ | ✅ | ✅ | ✅ |
| mpt-30b | ✅ | ✅ | ✅ | ✅ | ✅ |
| bloomz-7b1 | ✅ | ✅ | ✅ | ✅ | ✅ |