DedicatedEndpointSpec

Dedicated endpoint specification.

Fields

Field	Type	Required	Description
`autoscaling_cooldown`	int	✔️	The cooldown period in seconds between scaling operations.
`autoscaling_max`	int	✔️	The maximum number of replicas allowed.
`autoscaling_min`	int	✔️	The minimum number of replicas to maintain.
`creator_id`	str	✔️	The ID of the user who created the endpoint.
`gpu_type`	str	✔️	The type of GPU to use for the endpoint.
`max_batch_size`	int	✔️	The maximum batch size for inference requests.
`name`	str	✔️	The name of the endpoint.
`num_gpu`	int	✔️	The number of GPUs to use per replica.
`project_id`	str	✔️	The ID of the project that owns the endpoint.
`team_id`	str	✔️	The ID of the team that owns the endpoint.
`tokenizer_add_special_tokens`	bool	✔️	Whether to add special tokens in tokenizer input.
`tokenizer_skip_special_tokens`	bool	✔️	Whether to skip special tokens in tokenizer output.
`curr_replica_cnt`	OptionalNullable[int]	➖	The current number of replicas.
`desired_replica_cnt`	OptionalNullable[int]	➖	The desired number of replicas.
`instance_id`	OptionalNullable[str]	➖	The ID of the instance.
`max_input_length`	OptionalNullable[int]	➖	The maximum allowed input length.
`updated_replica_cnt`	OptionalNullable[int]	➖	The updated number of replicas.