-
-
Notifications
You must be signed in to change notification settings - Fork 47
Expand file tree
/
Copy pathplayground.yaml
More file actions
39 lines (39 loc) · 840 Bytes
/
playground.yaml
File metadata and controls
39 lines (39 loc) · 840 Bytes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
apiVersion: llmaz.io/v1alpha1
kind: OpenModel
metadata:
name: opt-6--7b
spec:
familyName: opt
source:
modelHub:
modelID: facebook/opt-6.7b
inferenceConfig:
flavors:
- name: a10 # gpu type
limits:
nvidia.com/gpu: 1
---
apiVersion: llmaz.io/v1alpha1
kind: OpenModel
metadata:
name: opt-125m
spec:
familyName: opt
source:
modelHub:
modelID: facebook/opt-125m
# Draft model's inferenceFlavors will not impact the speculative-decoding,
# only target model will be considered, so we ignore the flavor configurations here.
---
apiVersion: inference.llmaz.io/v1alpha1
kind: Playground
metadata:
name: vllm-speculator
spec:
replicas: 1
modelClaims:
models:
- name: opt-6--7b # the target model
role: main
- name: opt-125m # the draft model
role: draft