Add support for deepseek-r1-distill-qwen-7b and deepseek-r1-distill-qwen-7b local model variants (#4137)

sophies927 · web-flow · commit 811a82d5e950 · 2025-05-01T15:11:19.000-07:00
* Create asset.yaml for DeepSeek-R1-Distill-Qwen-1.5B

* Upload files for DeepSeek-R1-Distill-Qwen-1.5B

* Create asset.yaml for DeepSeek-R1-Distill-Qwen-1.5B-cuda-gpu

* Add files for DeepSeek-R1-Distill-Qwen-1.5B-cuda-gpu

* Create asset.yaml for DeepSeek-R1-Distill-Qwen-1.5B-generic-cpu

* Add files for DeepSeek-R1-Distill-Qwen-1.5B-generic-cpu

* Create asset.yaml for DeepSeek-R1-Distill-Qwen-1.5B-generic-gpu

* Add files for DeepSeek-R1-Distill-Qwen-1.5B-generic-gpu

* Create asset.yaml for DeepSeek-R1-Distill-Qwen-1.5B-qnn-npu

* Add files for DeepSeek-R1-Distill-Qwen-1.5B-qnn-npu

* Update and rename model.yml to model.yaml

* Update and rename spec.yml to spec.yaml

* Update description.md

* Update and rename model.yml to model.yaml

* Update and rename spec.yml to spec.yaml

* Update and rename description (1).md to description.md

* Update and rename model (1).yaml to model.yaml

* Update and rename spec (1).yaml to spec.yaml

* Update and rename description (2).md to description.md

* Update and rename model (2).yaml to model.yaml

* Update and rename spec (2).yaml to spec.yaml

* Update and rename description (3).md to description.md

* Update and rename model (3).yaml to model.yaml

* Update and rename spec (3).yaml to spec.yaml

* Create asset.yaml for DeepSeek-R1-Distill-Qwen-7B

* Add files for DeepSeek-R1-Distill-Qwen-7B

* Create asset.yaml for DeepSeek-R1-Distill-Qwen-7B-cuda-gpu

* Add files for DeepSeek-R1-Distill-Qwen-7B-cuda-gpu

* Create asset.yaml for DeepSeek-R1-Distill-Qwen-7B-generic-cpu

* Add files for DeepSeek-R1-Distill-Qwen-7B-generic-cpu

* Create asset.yaml for DeepSeek-R1-Distill-Qwen-7B-generic-gpu

* Add files for DeepSeek-R1-Distill-Qwen-7B-generic-gpu

* Create asset.yaml for DeepSeek-R1-Distill-Qwen-7B-qnn-npu

* Add files for DeepSeek-R1-Distill-Qwen-7B-qnn-npu

* Update and rename description (5).md to description.md

* Update and rename model (5).yaml to model.yaml

* Update and rename spec (5).yaml to spec.yaml

* Update and rename description (6).md to description.md

* Update and rename model (6).yaml to model.yaml

* Update and rename spec (6).yaml to spec.yaml

* Update and rename description (7).md to description.md

* Update and rename model (7).yaml to model.yaml

* Update and rename spec (7).yaml to spec.yaml

* Update and rename description (8).md to description.md

* Update and rename model (8).yaml to model.yaml

* Update and rename spec (8).yaml to spec.yaml

* Update spec.yaml

* Update and rename description (4).md to description.md

* Update and rename model (4).yaml to model.yaml

* Update and rename spec (4).yaml to spec.yaml

* Update model.yaml

* Update spec.yaml

* Update model.yaml

* Update spec.yaml

* Update model.yaml

* Update spec.yaml

* Update model.yaml

* Update spec.yaml

* Update model.yaml

* Update model.yaml

* Update spec.yaml

* Update model.yaml

* Update spec.yaml

* Update spec.yaml

* Update spec.yaml

* Update spec.yaml

* Update model.yaml

* Update spec.yaml

* Update model.yaml

* Update spec.yaml

* Update model.yaml

* Update spec.yaml

* Update model.yaml

* Update spec.yaml

* Update model.yaml for 1.5b cuda gpu

* Update model.yaml for 1.5b generic cpu

* Update model.yaml for 1.5b generic gpu

* Delete assets/models/system/DeepSeek-R1-Distill-Qwen-1.5B-qnn-npu directory

* Update model.yaml

* Update model.yaml for 7b cuda gpu

* Update model.yaml for 7b generic cpu

* Update model.yaml for 7b generic gpu

* Delete assets/models/system/DeepSeek-R1-Distill-Qwen-7B-qnn-npu directory

* Update model.yaml

* Update foundrylocal tag to foundryLocal
diff --git a/assets/models/system/DeepSeek-R1-Distill-Qwen-1.5B-cuda-gpu/asset.yaml b/assets/models/system/DeepSeek-R1-Distill-Qwen-1.5B-cuda-gpu/asset.yaml
@@ -0,0 +1,4 @@
+extra_config: model.yaml
+spec: spec.yaml
+type: model
+categories: ["Local"]
diff --git a/assets/models/system/DeepSeek-R1-Distill-Qwen-1.5B-cuda-gpu/description.md b/assets/models/system/DeepSeek-R1-Distill-Qwen-1.5B-cuda-gpu/description.md
@@ -0,0 +1,11 @@
+This model is an optimized version of DeepSeek-R1-Distill-Qwen-1.5B to enable local inference on CUDA GPUs. This model uses RTN quantization.
+
+# Model Description
+- **Developed by:** Microsoft
+- **Model type:** ONNX
+- **License:** MIT
+- **Model Description:** This is a conversion of the DeepSeek-R1-Distill-Qwen-1.5B for local inference on CUDA GPUs.
+- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.
+
+# Base Model Information
+See Hugging Face model [DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) for details.
diff --git a/assets/models/system/DeepSeek-R1-Distill-Qwen-1.5B-cuda-gpu/model.yaml b/assets/models/system/DeepSeek-R1-Distill-Qwen-1.5B-cuda-gpu/model.yaml
@@ -0,0 +1,8 @@
+path:
+  container_name: models
+  container_path: foundrylocal/foundry-local/deepseek-r1-distill-qwen-1.5b/onnx/cuda/cuda-int4-rtn-block-32
+  storage_name: automlcesdkdataresources
+  type: azureblob
+publish:
+  description: description.md
+  type: custom_model
diff --git a/assets/models/system/DeepSeek-R1-Distill-Qwen-1.5B-cuda-gpu/spec.yaml b/assets/models/system/DeepSeek-R1-Distill-Qwen-1.5B-cuda-gpu/spec.yaml
@@ -0,0 +1,24 @@
+$schema: https://azuremlschemas.azureedge.net/latest/model.schema.json
+name: deepseek-r1-distill-qwen-1.5b-cuda-gpu
+version: 1
+path: ./
+tags:
+  foundryLocal: ""
+  license: "MIT"
+  licenseDescription: "This model is provided under the License Terms available at <https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/blob/main/LICENSE>."
+  author: Microsoft
+  inputModalities: "text"
+  outputModalities: "text"
+  task: chat-completion
+  maxOutputTokens: 2048
+type: custom_model
+variantInfo:
+  parents:
+  - assetId: azureml://registries/azureml/models/deepseek-r1-distill-qwen-1.5b/versions/1
+  variantMetadata:
+    modelType: 'ONNX'
+    quantization: ['RTN']
+    device: 'gpu'
+    executionProvider: 'CUDAExecutionProvider'
+    fileSizeBytes: 1073741824
+    vRamFootprintBytes: 1362861314
diff --git a/assets/models/system/DeepSeek-R1-Distill-Qwen-1.5B-generic-cpu/asset.yaml b/assets/models/system/DeepSeek-R1-Distill-Qwen-1.5B-generic-cpu/asset.yaml
@@ -0,0 +1,4 @@
+extra_config: model.yaml
+spec: spec.yaml
+type: model
+categories: ["Local"]
diff --git a/assets/models/system/DeepSeek-R1-Distill-Qwen-1.5B-generic-cpu/description.md b/assets/models/system/DeepSeek-R1-Distill-Qwen-1.5B-generic-cpu/description.md
@@ -0,0 +1,11 @@
+This model is an optimized version of DeepSeek-R1-Distill-Qwen-1.5B to enable local inference on CPUs. This model uses RTN quantization.
+
+# Model Description
+- **Developed by:** Microsoft
+- **Model type:** ONNX
+- **License:** MIT
+- **Model Description:** This is a conversion of the DeepSeek-R1-Distill-Qwen-1.5B for local inference on CPUs.
+- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.
+
+# Base Model Information
+See Hugging Face model [DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) for details.
diff --git a/assets/models/system/DeepSeek-R1-Distill-Qwen-1.5B-generic-cpu/model.yaml b/assets/models/system/DeepSeek-R1-Distill-Qwen-1.5B-generic-cpu/model.yaml
@@ -0,0 +1,8 @@
+path:
+  container_name: models
+  container_path: foundrylocal/foundry-local/deepseek-r1-distill-qwen-1.5b/onnx/cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4
+  storage_name: automlcesdkdataresources
+  type: azureblob
+publish:
+  description: description.md
+  type: custom_model
diff --git a/assets/models/system/DeepSeek-R1-Distill-Qwen-1.5B-generic-cpu/spec.yaml b/assets/models/system/DeepSeek-R1-Distill-Qwen-1.5B-generic-cpu/spec.yaml
@@ -0,0 +1,24 @@
+$schema: https://azuremlschemas.azureedge.net/latest/model.schema.json
+name: deepseek-r1-distill-qwen-1.5b-generic-cpu
+version: 1
+path: ./
+tags:
+  foundryLocal: ""
+  license: "MIT"
+  licenseDescription: "This model is provided under the License Terms available at <https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/blob/main/LICENSE>."
+  author: Microsoft
+  inputModalities: "text"
+  outputModalities: "text"
+  task: chat-completion
+  maxOutputTokens: 2048
+type: custom_model
+variantInfo:
+  parents:
+  - assetId: azureml://registries/azureml/models/deepseek-r1-distill-qwen-1.5b/versions/1
+  variantMetadata:
+    modelType: 'ONNX'
+    quantization: ['RTN']
+    device: 'cpu'
+    executionProvider: 'CPUExecutionProvider'
+    fileSizeBytes: 1964944541
+    vRamFootprintBytes: 1965162961
diff --git a/assets/models/system/DeepSeek-R1-Distill-Qwen-1.5B-generic-gpu/asset.yaml b/assets/models/system/DeepSeek-R1-Distill-Qwen-1.5B-generic-gpu/asset.yaml
@@ -0,0 +1,4 @@
+extra_config: model.yaml
+spec: spec.yaml
+type: model
+categories: ["Local"]
diff --git a/assets/models/system/DeepSeek-R1-Distill-Qwen-1.5B-generic-gpu/description.md b/assets/models/system/DeepSeek-R1-Distill-Qwen-1.5B-generic-gpu/description.md
@@ -0,0 +1,11 @@
+This model is an optimized version of DeepSeek-R1-Distill-Qwen-1.5B to enable local inference on GPUs. This model uses RTN quantization.
+
+# Model Description
+- **Developed by:** Microsoft
+- **Model type:** ONNX
+- **License:** MIT
+- **Model Description:** This is a conversion of the DeepSeek-R1-Distill-Qwen-1.5B for local inference on GPUs.
+- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.
+
+# Base Model Information
+See Hugging Face model [DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) for details.
diff --git a/assets/models/system/DeepSeek-R1-Distill-Qwen-1.5B-generic-gpu/model.yaml b/assets/models/system/DeepSeek-R1-Distill-Qwen-1.5B-generic-gpu/model.yaml
@@ -0,0 +1,8 @@
+path:
+  container_name: models
+  container_path: foundrylocal/foundry-local/deepseek-r1-distill-qwen-1.5b/onnx/directml/directml-int4-rtn-block-32-acc-level-4
+  storage_name: automlcesdkdataresources
+  type: azureblob
+publish:
+  description: description.md
+  type: custom_model
diff --git a/assets/models/system/DeepSeek-R1-Distill-Qwen-1.5B-generic-gpu/spec.yaml b/assets/models/system/DeepSeek-R1-Distill-Qwen-1.5B-generic-gpu/spec.yaml
@@ -0,0 +1,24 @@
+$schema: https://azuremlschemas.azureedge.net/latest/model.schema.json
+name: deepseek-r1-distill-qwen-1.5b-generic-gpu
+version: 1
+path: ./
+tags:
+  foundryLocal: ""
+  license: "MIT"
+  licenseDescription: "This model is provided under the License Terms available at <https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/blob/main/LICENSE>."
+  author: Microsoft
+  inputModalities: "text"
+  outputModalities: "text"
+  task: chat-completion
+  maxOutputTokens: 2048
+type: custom_model
+variantInfo:
+  parents:
+  - assetId: azureml://registries/azureml/models/deepseek-r1-distill-qwen-1.5b/versions/1
+  variantMetadata:
+    modelType: 'ONNX'
+    quantization: ['RTN']
+    device: 'gpu'
+    executionProvider: 'WebGPUExecutionProvider'
+    fileSizeBytes: 1362648117
+    vRamFootprintBytes: 1362929128
diff --git a/assets/models/system/DeepSeek-R1-Distill-Qwen-1.5B/asset.yaml b/assets/models/system/DeepSeek-R1-Distill-Qwen-1.5B/asset.yaml
@@ -0,0 +1,4 @@
+extra_config: model.yaml
+spec: spec.yaml
+type: model
+categories: ["Local"]
diff --git a/assets/models/system/DeepSeek-R1-Distill-Qwen-1.5B/description.md b/assets/models/system/DeepSeek-R1-Distill-Qwen-1.5B/description.md
@@ -0,0 +1,17 @@
+This model is an optimized version of DeepSeek-R1-Distill-Qwen-1.5B for local inference. Optimized models are published here in ONNX format to run on CPU, GPU, and NPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these targets.
+
+# ONNX Models
+Here are some of the optimized configurations we have added:
+1.	ONNX model for CPU and mobile using RTN quantization.
+2.	ONNX model for GPU using RTN quantization.
+3.	ONNX model for NPU using QuaRot and GPTQ quantization.
+
+# Model Description
+- **Developed by:** Microsoft
+- **Model type:** ONNX
+- **License:** MIT
+- **Model Description:** This is a conversion of the DeepSeek-R1-Distill-Qwen-1.5B for local inference.
+- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.
+
+# Base Model Information
+See Hugging Face model [DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) for details.
diff --git a/assets/models/system/DeepSeek-R1-Distill-Qwen-1.5B/model.yaml b/assets/models/system/DeepSeek-R1-Distill-Qwen-1.5B/model.yaml
@@ -0,0 +1,8 @@
+path:
+  container_name: models
+  container_path: foundrylocal/foundry-local/deepseek-r1-distill-qwen-1.5b
+  storage_name: automlcesdkdataresources
+  type: azureblob
+publish:
+  description: description.md
+  type: custom_model
diff --git a/assets/models/system/DeepSeek-R1-Distill-Qwen-1.5B/spec.yaml b/assets/models/system/DeepSeek-R1-Distill-Qwen-1.5B/spec.yaml
@@ -0,0 +1,7 @@
+$schema: https://azuremlschemas.azureedge.net/latest/model.schema.json
+name: deepseek-r1-distill-qwen-1.5b
+version: 1
+path: ./
+tags:
+  foundryLocal: ""
+type: custom_model
diff --git a/assets/models/system/DeepSeek-R1-Distill-Qwen-7B-cuda-gpu/asset.yaml b/assets/models/system/DeepSeek-R1-Distill-Qwen-7B-cuda-gpu/asset.yaml
@@ -0,0 +1,4 @@
+extra_config: model.yaml
+spec: spec.yaml
+type: model
+categories: ["Local"]
diff --git a/assets/models/system/DeepSeek-R1-Distill-Qwen-7B-cuda-gpu/description.md b/assets/models/system/DeepSeek-R1-Distill-Qwen-7B-cuda-gpu/description.md
@@ -0,0 +1,11 @@
+This model is an optimized version of DeepSeek-R1-Distill-Qwen-7B to enable local inference on CUDA GPUs. This model uses RTN quantization.
+
+# Model Description
+- **Developed by:** Microsoft
+- **Model type:** ONNX
+- **License:** MIT
+- **Model Description:** This is a conversion of the DeepSeek-R1-Distill-Qwen-7B for local inference on CUDA GPUs.
+- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.
+
+# Base Model Information
+See Hugging Face model [DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) for details.
diff --git a/assets/models/system/DeepSeek-R1-Distill-Qwen-7B-cuda-gpu/model.yaml b/assets/models/system/DeepSeek-R1-Distill-Qwen-7B-cuda-gpu/model.yaml
@@ -0,0 +1,8 @@
+path:
+  container_name: models
+  container_path: foundrylocal/foundry-local/deepseek-r1-distill-qwen-7b/onnx/cuda/cuda-int4-rtn-block-32
+  storage_name: automlcesdkdataresources
+  type: azureblob
+publish:
+  description: description.md
+  type: custom_model
diff --git a/assets/models/system/DeepSeek-R1-Distill-Qwen-7B-cuda-gpu/spec.yaml b/assets/models/system/DeepSeek-R1-Distill-Qwen-7B-cuda-gpu/spec.yaml
@@ -0,0 +1,24 @@
+$schema: https://azuremlschemas.azureedge.net/latest/model.schema.json
+name: deepseek-r1-distill-qwen-7b-cuda-gpu
+version: 1
+path: ./
+tags:
+  foundryLocal: ""
+  license: "MIT"
+  licenseDescription: "This model is provided under the License Terms available at <https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B/blob/main/LICENSE>."
+  author: Microsoft
+  inputModalities: "text"
+  outputModalities: "text"
+  task: chat-completion
+  maxOutputTokens: 2048
+type: custom_model
+variantInfo:
+  parents:
+  - assetId: azureml://registries/azureml/models/deepseek-r1-distill-qwen-7b/versions/1
+  variantMetadata:
+    modelType: 'ONNX'
+    quantization: ['RTN']
+    device: 'gpu'
+    executionProvider: 'CUDAExecutionProvider'
+    fileSizeBytes: 5096273664
+    vRamFootprintBytes: 5096487731
diff --git a/assets/models/system/DeepSeek-R1-Distill-Qwen-7B-generic-cpu/asset.yaml b/assets/models/system/DeepSeek-R1-Distill-Qwen-7B-generic-cpu/asset.yaml
@@ -0,0 +1,4 @@
+extra_config: model.yaml
+spec: spec.yaml
+type: model
+categories: ["Local"]
diff --git a/assets/models/system/DeepSeek-R1-Distill-Qwen-7B-generic-cpu/description.md b/assets/models/system/DeepSeek-R1-Distill-Qwen-7B-generic-cpu/description.md
@@ -0,0 +1,11 @@
+This model is an optimized version of DeepSeek-R1-Distill-Qwen-7B to enable local inference on CPUs. This model uses RTN quantization.
+
+# Model Description
+- **Developed by:** Microsoft
+- **Model type:** ONNX
+- **License:** MIT
+- **Model Description:** This is a conversion of the DeepSeek-R1-Distill-Qwen-7B for local inference on CPUs.
+- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.
+
+# Base Model Information
+See Hugging Face model [DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) for details.
diff --git a/assets/models/system/DeepSeek-R1-Distill-Qwen-7B-generic-cpu/model.yaml b/assets/models/system/DeepSeek-R1-Distill-Qwen-7B-generic-cpu/model.yaml
@@ -0,0 +1,8 @@
+path:
+  container_name: models
+  container_path: foundrylocal/foundry-local/deepseek-r1-distill-qwen-7b/onnx/cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4
+  storage_name: automlcesdkdataresources
+  type: azureblob
+publish:
+  description: description.md
+  type: custom_model
diff --git a/assets/models/system/DeepSeek-R1-Distill-Qwen-7B-generic-cpu/spec.yaml b/assets/models/system/DeepSeek-R1-Distill-Qwen-7B-generic-cpu/spec.yaml
@@ -0,0 +1,24 @@
+$schema: https://azuremlschemas.azureedge.net/latest/model.schema.json
+name: deepseek-r1-distill-qwen-7b-generic-cpu
+version: 1
+path: ./
+tags:
+  foundryLocal: ""
+  license: "MIT"
+  licenseDescription: "This model is provided under the License Terms available at <https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B/blob/main/LICENSE>."
+  author: Microsoft
+  inputModalities: "text"
+  outputModalities: "text"
+  task: chat-completion
+  maxOutputTokens: 2048
+type: custom_model
+variantInfo:
+  parents:
+  - assetId: azureml://registries/azureml/models/deepseek-r1-distill-qwen-7b/versions/1
+  variantMetadata:
+    modelType: 'ONNX'
+    quantization: ['RTN']
+    device: 'cpu'
+    executionProvider: 'CPUExecutionProvider'
+    fileSizeBytes: 6667944735
+    vRamFootprintBytes: 6668163013
diff --git a/assets/models/system/DeepSeek-R1-Distill-Qwen-7B-generic-gpu/asset.yaml b/assets/models/system/DeepSeek-R1-Distill-Qwen-7B-generic-gpu/asset.yaml
@@ -0,0 +1,4 @@
+extra_config: model.yaml
+spec: spec.yaml
+type: model
+categories: ["Local"]
diff --git a/assets/models/system/DeepSeek-R1-Distill-Qwen-7B-generic-gpu/description.md b/assets/models/system/DeepSeek-R1-Distill-Qwen-7B-generic-gpu/description.md
@@ -0,0 +1,11 @@
+This model is an optimized version of DeepSeek-R1-Distill-Qwen-7B to enable local inference on GPUs. This model uses RTN quantization.
+
+# Model Description
+- **Developed by:** Microsoft
+- **Model type:** ONNX
+- **License:** MIT
+- **Model Description:** This is a conversion of the DeepSeek-R1-Distill-Qwen-7B for local inference on GPUs.
+- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.
+
+# Base Model Information
+See Hugging Face model [DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) for details.
diff --git a/assets/models/system/DeepSeek-R1-Distill-Qwen-7B-generic-gpu/model.yaml b/assets/models/system/DeepSeek-R1-Distill-Qwen-7B-generic-gpu/model.yaml
@@ -0,0 +1,8 @@
+path:
+  container_name: models
+  container_path: foundrylocal/foundry-local/deepseek-r1-distill-qwen-7b/onnx/directml/directml-int4-rtn-block-32-acc-level-4
+  storage_name: automlcesdkdataresources
+  type: azureblob
+publish:
+  description: description.md
+  type: custom_model
diff --git a/assets/models/system/DeepSeek-R1-Distill-Qwen-7B-generic-gpu/spec.yaml b/assets/models/system/DeepSeek-R1-Distill-Qwen-7B-generic-gpu/spec.yaml
@@ -0,0 +1,24 @@
+$schema: https://azuremlschemas.azureedge.net/latest/model.schema.json
+name: deepseek-r1-distill-qwen-7b-generic-gpu
+version: 1
+path: ./
+tags:
+  foundryLocal: ""
+  license: "MIT"
+  licenseDescription: "This model is provided under the License Terms available at <https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B/blob/main/LICENSE>."
+  author: Microsoft
+  inputModalities: "text"
+  outputModalities: "text"
+  task: chat-completion
+  maxOutputTokens: 2048
+type: custom_model
+variantInfo:
+  parents:
+  - assetId: azureml://registries/azureml/models/deepseek-r1-distill-qwen-7b/versions/1
+  variantMetadata:
+    modelType: 'ONNX'
+    quantization: ['RTN']
+    device: 'gpu'
+    executionProvider: 'WebGPUExecutionProvider'
+    fileSizeBytes: 5096273664
+    vRamFootprintBytes: 5096556544
diff --git a/assets/models/system/DeepSeek-R1-Distill-Qwen-7B/asset.yaml b/assets/models/system/DeepSeek-R1-Distill-Qwen-7B/asset.yaml
@@ -0,0 +1,4 @@
+extra_config: model.yaml
+spec: spec.yaml
+type: model
+categories: ["Local"]
diff --git a/assets/models/system/DeepSeek-R1-Distill-Qwen-7B/description.md b/assets/models/system/DeepSeek-R1-Distill-Qwen-7B/description.md
@@ -0,0 +1,17 @@
+This model is an optimized version of DeepSeek-R1-Distill-Qwen-7B to enable local inference. Optimized models are published here in ONNX format to run on CPU, GPU, and NPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these targets.
+
+# ONNX Models
+Here are some of the optimized configurations we have added:
+1.	ONNX model for CPU and mobile using RTN quantization.
+2.	ONNX model for GPU using RTN quantization.
+3.	ONNX model for NPU using QuaRot and GPTQ quantization.
+
+# Model Description
+- **Developed by:** Microsoft
+- **Model type:** ONNX
+- **License:** MIT
+- **Model Description:** This is a conversion of the DeepSeek-R1-Distill-Qwen-7B for local inference.
+- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.
+
+# Base Model Information
+See Hugging Face model [DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) for details.
diff --git a/assets/models/system/DeepSeek-R1-Distill-Qwen-7B/model.yaml b/assets/models/system/DeepSeek-R1-Distill-Qwen-7B/model.yaml
@@ -0,0 +1,8 @@
+path:
+  container_name: models
+  container_path: foundrylocal/foundry-local/deepseek-r1-distill-qwen-7b
+  storage_name: automlcesdkdataresources
+  type: azureblob
+publish:
+  description: description.md
+  type: custom_model
diff --git a/assets/models/system/DeepSeek-R1-Distill-Qwen-7B/spec.yaml b/assets/models/system/DeepSeek-R1-Distill-Qwen-7B/spec.yaml
@@ -0,0 +1,7 @@
+$schema: https://azuremlschemas.azureedge.net/latest/model.schema.json
+name: deepseek-r1-distill-qwen-7b
+version: 1
+path: ./
+tags:
+  foundryLocal: ""
+type: custom_model