| title | Frontend |
|---|
The Dynamo Frontend is the API gateway for serving LLM inference requests. It provides OpenAI-compatible HTTP endpoints and KServe gRPC endpoints, handling request preprocessing, routing, and response formatting.
| Feature | Status |
|---|---|
| OpenAI Chat Completions API | ✅ Supported |
| OpenAI Completions API | ✅ Supported |
| KServe gRPC v2 API | ✅ Supported |
| Streaming responses | ✅ Supported |
| Multi-model serving | ✅ Supported |
| Integrated routing | ✅ Supported |
| Tool calling | ✅ Supported |
- Dynamo platform installed
etcdandnats-server -jsrunning- At least one backend worker registered
python -m dynamo.frontend --http-port 8000This starts an OpenAI-compatible HTTP server with integrated pre/post processing and routing. Backends are auto-discovered when they call register_model.
The frontend does the pre and post processing. To do this it will need access to the model configuration files: config.json, tokenizer.json, tokenizer_config.json, etc. It does not need the weights.
Frontend will download the files it needs from Hugging Face, no setup is required. However we recommend setting up modelexpress-server and a shared folder such as a Kubernetes PVC. This ensures the model is only downloaded once across the whole cluster.
If the model is not available on Hugging Face, such as a private or customized model, you will need to make the model files available locally at the same file path as on the backend. The backend's --model-path <here> will need to exist on the frontend and contain at least the configuration (JSON) files.
python -m dynamo.frontend --kserve-grpc-serverSee the Frontend Guide for KServe-specific configuration and message formats.
apiVersion: nvidia.com/v1alpha1
kind: DynamoGraphDeployment
metadata:
name: frontend-example
spec:
graphs:
- name: frontend
replicas: 1
services:
- name: Frontend
image: nvcr.io/nvidia/dynamo/dynamo-vllm:latest
command:
- python
- -m
- dynamo.frontend
- --http-port
- "8000"| Parameter | Default | Description |
|---|---|---|
--http-port |
8000 | HTTP server port |
--kserve-grpc-server |
false | Enable KServe gRPC server |
--router-mode |
round_robin |
Routing strategy: round_robin, random, kv |
See the Frontend Guide for full configuration options.
| Document | Description |
|---|---|
| Frontend Guide | KServe gRPC configuration and integration |
| Router Documentation | KV-aware routing configuration |