You've got Dream Server running. Now what? This guide shows how to connect your apps.
Dream Server exposes an OpenAI-compatible API. Just point your SDK at localhost.
pip install openaifrom openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1", # Dream Server llama-server
api_key="not-needed" # Local, no auth required
)
response = client.chat.completions.create(
model="qwen2.5-32b-instruct", # Your running model
messages=[
{"role": "user", "content": "Hello!"}
]
)
print(response.choices[0].message.content)npm install openaiimport OpenAI from 'openai';
const openai = new OpenAI({
baseURL: 'http://localhost:8080/v1',
apiKey: 'not-needed',
});
const response = await openai.chat.completions.create({
model: 'qwen2.5-32b-instruct',
messages: [{ role: 'user', content: 'Hello!' }],
});
console.log(response.choices[0].message.content);curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2.5-32b-instruct",
"messages": [{"role": "user", "content": "Hello!"}]
}'pip install langchain langchain-openaifrom langchain_openai import ChatOpenAI
llm = ChatOpenAI(
base_url="http://localhost:8080/v1",
api_key="not-needed",
model="qwen2.5-32b-instruct",
temperature=0.7,
)
response = llm.invoke("Explain quantum computing in one sentence.")
print(response.content)from langchain_openai import OpenAIEmbeddings
from langchain_qdrant import Qdrant
embeddings = OpenAIEmbeddings(
base_url="http://localhost:8090/v1", # Embeddings service
api_key="not-needed",
)
# Connect to Dream Server's Qdrant
qdrant = Qdrant.from_existing_collection(
embeddings=embeddings,
collection_name="documents",
url="http://localhost:6333",
)
# Query documents
results = qdrant.similarity_search("What is the policy on refunds?")Continue is an open-source AI code assistant that works in VS Code.
- Install Continue extension in VS Code
- Edit
~/.continue/config.json:
{
"models": [
{
"title": "Dream Server",
"provider": "openai",
"model": "qwen2.5-32b-instruct",
"apiBase": "http://localhost:8080/v1",
"apiKey": "not-needed"
}
]
}- Restart VS Code, select "Dream Server" in Continue panel
Cursor supports custom API endpoints.
- Open Cursor Settings → Models
- Add custom model:
- API Base:
http://localhost:8080/v1 - API Key:
not-needed - Model:
qwen2.5-32b-instruct
- API Base:
Dream Server includes n8n for workflow automation. Access at http://localhost:5678.
- Open n8n at http://localhost:5678
- Log in with the credentials from your
.env(N8N_USER/N8N_PASS) - Create a new workflow or import from the n8n template library
- Use the "HTTP Request" node pointed at
http://llama-server:8080/v1/chat/completions(Docker-internal URL)
| Workflow | Description |
|---|---|
| Chat Endpoint | HTTP webhook → LLM → response |
| Document Q&A | File upload → embeddings → Qdrant → LLM |
| Voice Transcription | Audio → Whisper STT → text |
| TTS API | Text → Kokoro TTS → audio |
| Voice-to-Voice | STT → LLM → TTS pipeline |
| Endpoint | Description |
|---|---|
POST /v1/chat/completions |
Chat (OpenAI compatible) |
POST /v1/completions |
Text completion |
GET /v1/models |
List available models |
GET /health |
Health check |
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8080/v1", api_key="x")
stream = client.chat.completions.create(
model="qwen2.5-32b-instruct",
messages=[{"role": "user", "content": "Write a poem"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)Key variables in .env (see .env.example for the full list):
| Variable | Default | Description |
|---|---|---|
OLLAMA_PORT |
8080 | llama-server external port (maps to internal 8080) |
WEBUI_PORT |
3000 | Open WebUI port |
N8N_PORT |
5678 | n8n workflows port |
LLM_MODEL |
(tier-dependent) | Model name for OpenClaw/dashboard |
CTX_SIZE |
16384 | Context window size (tokens) |
GGUF_FILE |
(tier-dependent) | GGUF model filename in data/models/ |
Local-only, no auth required. Good for development.
Set in .env:
LLM_API_KEY=your-secret-key
Then include in requests:
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="your-secret-key"
)WebUI has built-in user management:
- First user becomes admin
- Configure auth in WebUI settings
- Set
WEBUI_AUTH=truein.env
Check running model name:
curl http://localhost:8080/v1/modelsUse the exact model name in your requests.
Ensure services are running:
docker compose psCheck llama-server is ready:
docker compose logs llama-server | tail -20First request after start triggers model warm-up. Wait 30-60 seconds.
Built by The Collective