Skip to content

Commit 04c6f44

Browse files
authored
Add partial set of local models to azureml registry (#4142)
* Update files for initial set of parent models * Add empty files for new parent models * Populate model files for new local models * Update all new folder names to lowercase titles * Partially update variant model files * Update Llama model files * Add files for Phi models * Update Qwen variant model files * Add Mistral model variants and files * Remove Qwen coder local model parents * Update description.md for Mistral CUDA variant * Update description.md for Mistral generic CPU variant * Update description.md for Mistral generic GPU variant * Update description.md for DeepSeek Llama generic GPU variant * Update description.md for DeepSeek Qwen 14b CUDA variant * Update description.md for DeepSeek Qwen 14b generic CPU variant * Update description.md for DeepSeek Qwen 14b generic GPU variant * Update description.md for Llama-3.1 CUDA variant * Update description.md for Llama-3.1 generic CPU variant * Update description.md for Llama-3.1 generic GPU variant * Update description.md for Llama-3.1 parent model * Update description.md for Llama-3.2 1b CUDA variant * Update description.md for Llama-3.2 1b CUDA variant * Update description.md for Llama-3.2 1b generic CPU variant * Update description.md for Llama-3.2 1b generic GPU variant * Update description.md for Llama-3.2 1b parent model * Update description.md for Llama-3.2 3b CUDA variant * Update description.md for Llama-3.2 3b generic CPU variant * Update description.md for Llama-3.2 3b generic GPU variant * Update description.md for Llama-3.2 3b parent model * Update model.yaml for phi-3-mini-128k-instruct CUDA variant * Update model.yaml for phi-3-mini-128k-instruct generic CPU variant * Correct Qwen license info * Update versions for variants w/ existing parent models * Fix assetIds for existing parent models * Remove Llama model files * Remove duplicate Phi-4-mini-instruct parent model * Add shortName tag to model variants * Update tag name from shortName to baseName * Update tag name from baseName to alias
1 parent c36c672 commit 04c6f44

178 files changed

Lines changed: 1979 additions & 0 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

assets/models/system/DeepSeek-R1-Distill-Qwen-1.5B-cuda-gpu/spec.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ tags:
1111
outputModalities: "text"
1212
task: chat-completion
1313
maxOutputTokens: 2048
14+
alias: deepseek-r1-1.5b
1415
type: custom_model
1516
variantInfo:
1617
parents:

assets/models/system/DeepSeek-R1-Distill-Qwen-1.5B-generic-cpu/spec.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ tags:
1111
outputModalities: "text"
1212
task: chat-completion
1313
maxOutputTokens: 2048
14+
alias: deepseek-r1-1.5b
1415
type: custom_model
1516
variantInfo:
1617
parents:

assets/models/system/DeepSeek-R1-Distill-Qwen-1.5B-generic-gpu/spec.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ tags:
1111
outputModalities: "text"
1212
task: chat-completion
1313
maxOutputTokens: 2048
14+
alias: deepseek-r1-1.5b
1415
type: custom_model
1516
variantInfo:
1617
parents:

assets/models/system/DeepSeek-R1-Distill-Qwen-7B-cuda-gpu/spec.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ tags:
1111
outputModalities: "text"
1212
task: chat-completion
1313
maxOutputTokens: 2048
14+
alias: deepseek-r1-7b
1415
type: custom_model
1516
variantInfo:
1617
parents:

assets/models/system/DeepSeek-R1-Distill-Qwen-7B-generic-cpu/spec.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ tags:
1111
outputModalities: "text"
1212
task: chat-completion
1313
maxOutputTokens: 2048
14+
alias: deepseek-r1-7b
1415
type: custom_model
1516
variantInfo:
1617
parents:

assets/models/system/DeepSeek-R1-Distill-Qwen-7B-generic-gpu/spec.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ tags:
1111
outputModalities: "text"
1212
task: chat-completion
1313
maxOutputTokens: 2048
14+
alias: deepseek-r1-7b
1415
type: custom_model
1516
variantInfo:
1617
parents:
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
extra_config: model.yaml
2+
spec: spec.yaml
3+
type: model
4+
categories: ["Local"]
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
This model is an optimized version of Mistral-7B-Instruct-v0.2 to enable local inference on CUDA GPUs. This model uses RTN quantization.
2+
3+
# Model Description
4+
- **Developed by:** Microsoft
5+
- **Model type:** ONNX
6+
- **License:** apache-2.0
7+
- **Model Description:** This is a conversion of the Mistral-7B-Instruct-v0.2 for local inference on CUDA GPUs.
8+
- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.
9+
10+
# Base Model Information
11+
See Hugging Face model [Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) for details.
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
path:
2+
container_name: models
3+
container_path: foundrylocal/onnx-models/mistral-7b-instruct-v0.2/onnx/cuda/mistral-7b-instruct-v0.2-cuda-int4-rtn-block-32
4+
storage_name: automlcesdkdataresources
5+
type: azureblob
6+
publish:
7+
description: description.md
8+
type: custom_model
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
$schema: https://azuremlschemas.azureedge.net/latest/model.schema.json
2+
name: mistralai-Mistral-7B-Instruct-v0-2-cuda-gpu
3+
version: 1
4+
path: ./
5+
tags:
6+
foundryLocal: ""
7+
license: "apache-2.0"
8+
licenseDescription: "This model is provided under the License Terms available at <https://www.apache.org/licenses/LICENSE-2.0.html>."
9+
author: Microsoft
10+
inputModalities: "text"
11+
outputModalities: "text"
12+
task: chat-completion
13+
maxOutputTokens: 2048
14+
alias: mistral-7b-v0.2
15+
type: custom_model
16+
variantInfo:
17+
parents:
18+
- assetId: azureml://registries/azureml/models/mistralai-Mistral-7B-Instruct-v0-2/versions/6
19+
variantMetadata:
20+
modelType: 'ONNX'
21+
quantization: ['RTN']
22+
device: 'gpu'
23+
executionProvider: 'CUDAExecutionProvider'
24+
fileSizeBytes: 4273492459
25+
vRamFootprintBytes: 4273717196

0 commit comments

Comments
 (0)