Skip to content

Latest commit

 

History

History
26 lines (22 loc) · 4.63 KB

File metadata and controls

26 lines (22 loc) · 4.63 KB

DedicatedEndpointSpec

Dedicated endpoint specification.

Fields

Field Type Required Description
autoscaling_cooldown int ✔️ The cooldown period in seconds between scaling operations.
autoscaling_max int ✔️ The maximum number of replicas allowed.
autoscaling_min int ✔️ The minimum number of replicas to maintain.
creator_id str ✔️ The ID of the user who created the endpoint.
gpu_type str ✔️ The type of GPU to use for the endpoint.
max_batch_size int ✔️ The maximum batch size for inference requests.
name str ✔️ The name of the endpoint.
num_gpu int ✔️ The number of GPUs to use per replica.
project_id str ✔️ The ID of the project that owns the endpoint.
team_id str ✔️ The ID of the team that owns the endpoint.
tokenizer_add_special_tokens bool ✔️ Whether to add special tokens in tokenizer input.
tokenizer_skip_special_tokens bool ✔️ Whether to skip special tokens in tokenizer output.
curr_replica_cnt OptionalNullable[int] The current number of replicas.
desired_replica_cnt OptionalNullable[int] The desired number of replicas.
instance_id OptionalNullable[str] The ID of the instance.
max_input_length OptionalNullable[int] The maximum allowed input length.
updated_replica_cnt OptionalNullable[int] The updated number of replicas.