This is a simple example showing how you can quickly get started deploying Large Language Models with Dynamo.
Before running this example, ensure you have the following services running:
- etcd: A distributed key-value store used for service discovery and metadata storage
- NATS: A high-performance message broker for inter-component communication
You can start these services using Docker Compose:
docker compose -f deploy/docker-compose.yml up -d- Frontend - A built-in component that launches an OpenAI compliant HTTP server, a pre-processor, and a router in a single process
- vLLM Backend - A built-in component that runs vLLM within the Dynamo runtime
---
title: Request Flow
---
flowchart TD
A["Users/Clients<br/>(HTTP)"] --> B["Frontend<br/>HTTP API endpoint<br/>(OpenAI Style)"]
B --> C["NATS Message Broker<br/>(Inter-component communication)"]
C --> D["vLLM Backend<br/>(NATS subscriber)"]
D --> C
C --> B
B --> A
There are three steps to deploy and use LLM with Dynamo.
Open a new terminal and run:
python -m dynamo.vllm --model Qwen/Qwen3-0.6BLeave this terminal running - it will show vLLM Backend logs.
Open another terminal and interact with the deployed engine using the built-in frontend component. You have two options:
- Interactive Command Line Interface
python -m dynamo.frontend --interactive- HTTP Server
python -m dynamo.frontend --http-port 8000Leave this terminal running as well - it will show Frontend logs.
If you launched the frontend in interactive mode, simply start typing and hit Enter to have an interactive chat with your LLM.
If you launched the frontend in HTTP mode, you can send requests via curl, or any OpenAI compatible client program or library.
curl -X POST http://localhost:8000/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{
"model": "Qwen/Qwen3-0.6B",
"messages": [
{ "role": "user", "content": "Tell me a story about a brave cat" }
],
"stream": false,
"max_tokens": 1028
}'When you're done with the quickstart example, follow these steps to clean up:
In each terminal where you started Dynamo components, press Ctrl+C to stop them:
- Stop the vLLM Backend (terminal from step 1)
- Stop the Frontend (terminal from step 2)
If you don't plan to run any more examples, stop the etcd and NATS services that were started with Docker Compose:
docker compose -f deploy/metrics/docker-compose.yml downThis will stop and remove the containers for etcd and NATS.
When you run the two commands above, here's what Dynamo does to spin up the necessary processes and connect your HTTP requests to the vLLM Backend:
At startup, each Dynamo component (vLLM Backend, Frontend) connects to the DistributedRuntime, which involves creating connections to two critical infrastructure services:
- etcd: A distributed key-value store used for service discovery and metadata storage
- NATS: A high-performance message broker for inter-component communication
When the vLLM Backend starts up, it registers itself as a component in etcd with one or more endpoints.
This registration includes each endpoint's NATS subject for communication and is tied to a lease that automatically expires if the component goes offline.
Inspecting the Component Registry
If you want to find out more about the internal organization of components in Dynamo, you can inspect the contents of etcd using the etcdctl command line tool. For this example, you can try running
etcdctl get "instances" --prefixwhich will show you each registered endpoint, along with their associated NATS subject. Note that the specific etcd and NATS info is internal and always subject to change -- in future examples we'll show how to use the DistributedRuntime itself to communicate across components.
When the Frontend starts, it doesn't receive an explicit pointer to the vLLM Backend component. Instead, it constantly watches etcd for registered models, automatically discovering the vLLM Backend component and its endpoints when it becomes available.
When you send an HTTP request to the Frontend:
- Request Packaging: The Frontend wraps your HTTP request in a standardized internal format with routing metadata
- NATS Subject Resolution: Using the discovered endpoints in etcd, it determines the appropriate NATS endpoint
- Message Dispatch: The request is published to the discovered NATS subject, where the target vLLM Backend picks it up
- Response Streaming: The vLLM Backend executes the request, and streams responses back through NATS which the Frontend converts back to HTTP
One of Dynamo's key strengths is that this entire system works seamlessly whether components are:
- Running on the same machine (like in this quickstart)
- Distributed across multiple nodes in a cluster
- Deployed in different availability zones
The same two commands work in all scenarios, as long as all components can connect with the DistributedRuntime - Dynamo handles the networking complexity automatically.