Skip to content

Commit 6dacfd1

Browse files
sophies927vizhur
andauthored
Add remaining local models to azureml registry (#4156)
* Create asset files for remaining local models * Create asset files for missing NPU models * Update phi-3 generic GPU models * Update local metadata to include new directoryPath and promptTemplate tags (#4152) * Add directoryPath and promptTemplate for models that have them * Add remaining directoryPath tags * Add missing Phi model promptTemplate tags * Add missing Phi-4 and DeepSeek promptTemplate tags * Remove Phi-4-mini prompt templates for now * Fix promptTemplates for Phi-4 models * Update Phi generic GPU model files * Update model descriptions for Phi-4 and Qwen * Update description.md and model.yaml files for Phi-4-reasoning models * Update asset files for DeepSeek NPU and Phi-4-reasoning * Update model.yaml for Qwen-coder models * Update spec.yaml files for Qwen-coder models * Update file and vram fields for some models * Add file and vram for remaining Qwen-Coder models * Update spec.yaml to fix assetId * Update spec.yaml to fix licenseDescription * Update spec.yaml to fix licenseDescription * Update spec.yaml to fix assetId * Update spec.yaml to fix alias --------- Co-authored-by: vizhur <vizhur@microsoft.com>
1 parent 88d1a1e commit 6dacfd1

128 files changed

Lines changed: 1504 additions & 0 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
extra_config: model.yaml
2+
spec: spec.yaml
3+
type: model
4+
categories: ["Local"]
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
This model is an optimized version of DeepSeek-R1-Distill-Qwen-1.5B to enable local inference on QNN NPUs. This model uses QuaRot and GPTQ quantization.
2+
3+
# Model Description
4+
- **Developed by:** Microsoft
5+
- **Model type:** ONNX
6+
- **License:** MIT
7+
- **Model Description:** This is a conversion of the DeepSeek-R1-Distill-Qwen-1.5B for local inference on QNN NPUs.
8+
- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.
9+
10+
# Base Model Information
11+
See Hugging Face model [DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) for details.
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
path:
2+
container_name: models
3+
container_path: deepseek/deepseek-r1-distill-qwen/deepseek-r1-distill-qwen-1.5b/onnx/npu/qnn-deepseek-r1-distill-qwen-1.5b
4+
storage_name: automlcesdkdataresources
5+
type: azureblob
6+
publish:
7+
description: description.md
8+
type: custom_model
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
$schema: https://azuremlschemas.azureedge.net/latest/model.schema.json
2+
name: deepseek-r1-distill-qwen-1.5b-qnn-npu
3+
version: 1
4+
path: ./
5+
tags:
6+
foundryLocal: ""
7+
license: "MIT"
8+
licenseDescription: "This model is provided under the License Terms available at <https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/blob/main/LICENSE>."
9+
author: Microsoft
10+
inputModalities: "text"
11+
outputModalities: "text"
12+
task: chat-completion
13+
maxOutputTokens: 2048
14+
alias: deepseek-r1-1.5b
15+
directoryPath: qnn-deepseek-r1-distill-qwen-1.5b
16+
promptTemplate: "{\"assistant\": \"{Content}\", \"prompt\": \"\\\\u003C\\\\uFF5CUser\\\\uFF5C\\\\u003E{Content}\\\\u003C\\\\uFF5CAssistant\\\\uFF5C\\\\u003E\"}"
17+
type: custom_model
18+
variantInfo:
19+
parents:
20+
- assetId: azureml://registries/azureml/models/deepseek-r1-distill-qwen-1.5b/versions/1
21+
variantMetadata:
22+
modelType: 'ONNX'
23+
quantization: ['QuaRot', 'GPTQ']
24+
device: 'npu'
25+
executionProvider: 'QNNExecutionProvider'
26+
fileSizeBytes: 1632077086
27+
vRamFootprintBytes: 1336043110
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
extra_config: model.yaml
2+
spec: spec.yaml
3+
type: model
4+
categories: ["Local"]
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
This model is an optimized version of DeepSeek-R1-Distill-Qwen-7B to enable local inference on QNN NPUs. This model uses QuaRot and GPTQ quantization.
2+
3+
# Model Description
4+
- **Developed by:** Microsoft
5+
- **Model type:** ONNX
6+
- **License:** MIT
7+
- **Model Description:** This is a conversion of the DeepSeek-R1-Distill-Qwen-7B for local inference on QNN NPUs.
8+
- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.
9+
10+
# Base Model Information
11+
See Hugging Face model [DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) for details.
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
path:
2+
container_name: models
3+
container_path: deepseek/deepseek-r1-distill-qwen/deepseek-r1-distill-qwen-7b/onnx/npu/qnn-deepseek-r1-distill-qwen-7b
4+
storage_name: automlcesdkdataresources
5+
type: azureblob
6+
publish:
7+
description: description.md
8+
type: custom_model
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
$schema: https://azuremlschemas.azureedge.net/latest/model.schema.json
2+
name: deepseek-r1-distill-qwen-7b-qnn-npu
3+
version: 1
4+
path: ./
5+
tags:
6+
foundryLocal: ""
7+
license: "MIT"
8+
licenseDescription: "This model is provided under the License Terms available at <https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B/blob/main/LICENSE>."
9+
author: Microsoft
10+
inputModalities: "text"
11+
outputModalities: "text"
12+
task: chat-completion
13+
maxOutputTokens: 2048
14+
alias: deepseek-r1-7b
15+
directoryPath: qnn-deepseek-r1-distill-qwen-7b
16+
promptTemplate: "{\"assistant\": \"{Content}\", \"prompt\": \"\\\\u003C\\\\uFF5CUser\\\\uFF5C\\\\u003E{Content}\\\\u003C\\\\uFF5CAssistant\\\\uFF5C\\\\u003E\"}"
17+
type: custom_model
18+
variantInfo:
19+
parents:
20+
- assetId: azureml://registries/azureml/models/deepseek-r1-distill-qwen-7b/versions/1
21+
variantMetadata:
22+
modelType: 'ONNX'
23+
quantization: ['QuaRot', 'GPTQ']
24+
device: 'npu'
25+
executionProvider: 'QNNExecutionProvider'
26+
fileSizeBytes: 3987199754
27+
vRamFootprintBytes: 3301640765
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
extra_config: model.yaml
2+
spec: spec.yaml
3+
type: model
4+
categories: ["Local"]
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
This model is an optimized version of Phi-4-mini-reasoning to enable local inference on CUDA GPUs. This model uses RTN quantization.
2+
3+
# Model Description
4+
- **Developed by:** Microsoft
5+
- **Model type:** ONNX
6+
- **License:** MIT
7+
- **Model Description:** This is a conversion of the Phi-4-mini-reasoning for local inference on CUDA GPUs.
8+
- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.
9+
10+
# Base Model Information
11+
See Hugging Face model [Phi-4-mini-reasoning](https://huggingface.co/microsoft/Phi-4-mini-reasoning) for details.

0 commit comments

Comments
 (0)