Decreasing warmup for use on spot instances

### Feature request

I'd like to deploy TEI (CPU v1.8 serving Qwen3 0.6B quantized) on a spot instance. To do so, the container would have to start up in 14 seconds to maintain QoS.
Poking around, I saw `MAX_WARMUP_SEQUENCE_LENGTH`, but that appears to only be used for Intel HPU deployments. I already set my max tokens to a modest value (`MAX_BATCH_TOKENS=1028`). 
Is there anything else I can do to drive down warmup time, or anything I can cache to speed it up for a subsequent startup?

### Motivation

Reduce costs by moving TEI replicas to spot instances

### Your contribution

Happy to write a PR if there's a suitable path forward, or update the docs if a strategy already exists

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decreasing warmup for use on spot instances #819

Feature request

Motivation

Your contribution

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Decreasing warmup for use on spot instances #819

Description

Feature request

Motivation

Your contribution

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions