Dedicated endpoint specification.
| Field | Type | Required | Description |
|---|---|---|---|
autoscaling_cooldown |
int | ✔️ | The cooldown period in seconds between scaling operations. |
autoscaling_max |
int | ✔️ | The maximum number of replicas allowed. |
autoscaling_min |
int | ✔️ | The minimum number of replicas to maintain. |
creator_id |
str | ✔️ | The ID of the user who created the endpoint. |
gpu_type |
str | ✔️ | The type of GPU to use for the endpoint. |
max_batch_size |
int | ✔️ | The maximum batch size for inference requests. |
name |
str | ✔️ | The name of the endpoint. |
num_gpu |
int | ✔️ | The number of GPUs to use per replica. |
project_id |
str | ✔️ | The ID of the project that owns the endpoint. |
team_id |
str | ✔️ | The ID of the team that owns the endpoint. |
tokenizer_add_special_tokens |
bool | ✔️ | Whether to add special tokens in tokenizer input. |
tokenizer_skip_special_tokens |
bool | ✔️ | Whether to skip special tokens in tokenizer output. |
curr_replica_cnt |
OptionalNullable[int] | ➖ | The current number of replicas. |
desired_replica_cnt |
OptionalNullable[int] | ➖ | The desired number of replicas. |
instance_id |
OptionalNullable[str] | ➖ | The ID of the instance. |
max_input_length |
OptionalNullable[int] | ➖ | The maximum allowed input length. |
updated_replica_cnt |
OptionalNullable[int] | ➖ | The updated number of replicas. |