@@ -137,6 +137,40 @@ gpulab deploy \
137137 -- python train.py --epochs 100 --lr 0.001 --batch-size 32
138138```
139139
140+ ## Serverless GPUs
141+
142+ Serverless endpoints use the same API key auth as containers.
143+
144+ ``` bash
145+ # See available serverless templates, GPU types, regions, volumes, and policy templates
146+ gpulab serverless options
147+
148+ # Create an endpoint
149+ gpulab serverless create \
150+ --name llama-api \
151+ --template pytorch \
152+ --gpu-type " RTX 4090" \
153+ --memory 32 \
154+ --port 8000 \
155+ --min-replicas 0 \
156+ --max-replicas 2 \
157+ --concurrency 1 \
158+ -e HF_TOKEN \
159+ --command " python app.py"
160+
161+ # Inspect, invoke, and read logs/history
162+ gpulab serverless inspect llama-api
163+ gpulab serverless invoke llama-api /v1/chat/completions -d ' {"prompt":"hello"}' --wait
164+ gpulab serverless requests llama-api
165+ gpulab serverless autoscaling-logs llama-api
166+ gpulab serverless logs llama-api --replica all
167+ gpulab serverless logs llama-api --deploy
168+
169+ # Update or delete
170+ gpulab serverless update llama-api --max-replicas 4 --autoscaling-template pending_requests_linear
171+ gpulab serverless delete llama-api --force
172+ ```
173+
140174## Commands
141175
142176| Command | Description |
@@ -158,6 +192,11 @@ gpulab deploy \
158192| ` gpulab templates ` | List templates |
159193| ` gpulab gpus types ` | List GPU types |
160194| ` gpulab volumes ` | List volumes |
195+ | ` gpulab serverless ` | Manage serverless GPU endpoints |
196+ | ` gpulab serverless logs <endpoint> ` | View serverless replica container logs |
197+ | ` gpulab serverless requests <endpoint> ` | View serverless request logs |
198+ | ` gpulab serverless autoscaling-logs <endpoint> ` | View autoscaling history |
199+ | ` gpulab update ` | Update the CLI from GitHub Releases |
161200
162201## Global Flags
163202
0 commit comments