Skip to content

Commit bd38d1f

Browse files
authored
Merge pull request #9 from martysai/ecstatic-mclaren
Add multi-model support for Qwen 2.5 and Qwen3 Coder
2 parents 6c89019 + 957aa21 commit bd38d1f

File tree

7 files changed

+1023
-49
lines changed

7 files changed

+1023
-49
lines changed

.gitattributes

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# Auto detect text files and perform LF normalization
2+
* text=auto
3+
4+
# Python files
5+
*.py text eol=lf
6+
7+
# Markdown files
8+
*.md text eol=lf
9+
10+
# JSON files
11+
*.json text eol=lf
12+
13+
# YAML files
14+
*.yml text eol=lf
15+
*.yaml text eol=lf
16+
17+
# Shell scripts
18+
*.sh text eol=lf
19+
20+
# Configuration files
21+
*.toml text eol=lf
22+
*.cfg text eol=lf
23+
*.ini text eol=lf
24+
25+
# Keep Windows batch files with CRLF
26+
*.bat text eol=crlf
27+
*.cmd text eol=crlf

README.md

Lines changed: 125 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -54,8 +54,10 @@ Test Dataset + Model Predictions --> [benchmark.py] --> Metrics Report
5454
QLoRA (4-bit quantization) for training on 1-2 A100 GPUs.
5555

5656
- **`serve.py`** - FastAPI inference server that uses ollama API to generate
57-
docstrings. The server uses a hard-coded system prompt for NumPy-style docstring
58-
generation.
57+
docstrings. Supports multiple Qwen Coder models with model-specific configurations.
58+
59+
- **`models.py`** - Model configuration registry with sampling parameters for
60+
Qwen 2.5 Coder and Qwen3 Coder variants.
5961

6062
### Evaluation (`src/evaluation/`)
6163

@@ -97,15 +99,22 @@ ollama as the backend. The server uses a system prompt stored in
9799
### Prerequisites
98100

99101
1. **Install ollama**: Make sure [ollama](https://ollama.ai/) is installed and running locally
100-
2. **Pull a model**: Download a code model (e.g., `qwen2.5-coder:32b`):
102+
2. **Pull a model**: Download one of the supported code models:
101103
```bash
102-
ollama pull qwen2.5-coder:32b
104+
# Qwen 2.5 Coder (dense models)
105+
ollama pull qwen2.5-coder:32b # Default, ~18GB Q4
106+
ollama pull qwen2.5-coder:14b # Mid-size, ~8GB Q4
107+
ollama pull qwen2.5-coder:7b # Fast, ~4GB Q4
108+
109+
# Qwen3 Coder (MoE model)
110+
ollama pull qwen3-coder:30b-a3b # Best quality, ~18GB Q4, 256K context
103111
```
104112

105113
### Starting the Server
106114

107115
Start the FastAPI server using uvicorn:
108116

117+
**Linux/macOS:**
109118
```bash
110119
# Using uvicorn directly
111120
uvicorn src.training.serve:app --host 0.0.0.0 --port 8000
@@ -114,19 +123,75 @@ uvicorn src.training.serve:app --host 0.0.0.0 --port 8000
114123
python -m src.training.serve
115124
```
116125

126+
**Windows (PowerShell):**
127+
```powershell
128+
uvicorn src.training.serve:app --host 0.0.0.0 --port 8000
129+
```
130+
117131
The server will start on `http://localhost:8000` by default.
118132

119133
### Configuration
120134

121135
The server can be configured using environment variables:
122136

123137
- `OLLAMA_URL` - Ollama API endpoint (default: `http://localhost:11434/api/chat`)
124-
- `OLLAMA_MODEL` - Model name to use (default: `qwen2.5-coder:32b`)
138+
- `OLLAMA_MODEL` - Model key or Ollama model name (default: `qwen2.5-coder-32b`)
125139
- `REQUEST_TIMEOUT` - Request timeout in seconds (default: `120.0`)
126140

127-
Example:
141+
**Linux/macOS:**
142+
```bash
143+
OLLAMA_MODEL=qwen3-coder-30b uvicorn src.training.serve:app --port 8000
144+
```
145+
146+
**Windows (PowerShell):**
147+
```powershell
148+
$env:OLLAMA_MODEL="qwen3-coder-30b"; uvicorn src.training.serve:app --port 8000
149+
```
150+
151+
**Windows (CMD):**
152+
```cmd
153+
set OLLAMA_MODEL=qwen3-coder-30b && uvicorn src.training.serve:app --port 8000
154+
```
155+
156+
### Available Models
157+
158+
| Model Key | Ollama Model | Architecture | Memory (Q4) | Context | Description |
159+
|-----------|--------------|--------------|-------------|---------|-------------|
160+
| `qwen2.5-coder-32b` | `qwen2.5-coder:32b` | Dense | ~18GB | 32K | Default, balanced quality/speed |
161+
| `qwen2.5-coder-14b` | `qwen2.5-coder:14b` | Dense | ~8GB | 32K | Mid-size, good performance |
162+
| `qwen2.5-coder-7b` | `qwen2.5-coder:7b` | Dense | ~4GB | 32K | Fast inference |
163+
| `qwen3-coder-30b` | `qwen3-coder:30b-a3b` | MoE | ~18GB | 256K | Best quality, 3.3B active params |
164+
165+
Each model has optimized sampling parameters:
166+
- **Qwen 2.5 Coder**: temperature=0.7, top_p=0.9, top_k=40
167+
- **Qwen3 Coder**: temperature=1.0, top_p=0.95, top_k=40 (per official recommendations)
168+
169+
### Model Selection
170+
171+
You can select a model in two ways:
172+
173+
1. **Environment variable** (applies to all requests):
174+
```bash
175+
OLLAMA_MODEL=qwen3-coder-30b uvicorn src.training.serve:app
176+
```
177+
178+
2. **Per-request** (via API):
179+
```bash
180+
curl -X POST http://localhost:8000/generate \
181+
-H "Content-Type: application/json" \
182+
-d '{"code": "def add(x, y): return x + y", "model": "qwen3-coder-30b"}'
183+
```
184+
185+
### List Available Models
186+
187+
**Via CLI:**
188+
```bash
189+
python scripts/run_ollama.py --list-models
190+
```
191+
192+
**Via API:**
128193
```bash
129-
OLLAMA_MODEL=qwen2.5-coder:7b uvicorn src.training.serve:app --port 8000
194+
curl http://localhost:8000/models
130195
```
131196

132197
### API Endpoints
@@ -143,7 +208,9 @@ curl http://localhost:8000/health
143208
```json
144209
{
145210
"status": "healthy",
146-
"service": "ollama"
211+
"service": "ollama",
212+
"active_model": "Qwen 2.5 Coder 32B",
213+
"ollama_model": "qwen2.5-coder:32b"
147214
}
148215
```
149216

@@ -169,12 +236,14 @@ curl -X POST http://localhost:8000/generate \
169236

170237
**Request Body:**
171238
- `code` (required): Python function code as a string
172-
- `max_new_tokens` (optional): Maximum number of tokens to generate (default: 256)
239+
- `max_new_tokens` (optional): Maximum number of tokens to generate (uses model default if not specified)
240+
- `model` (optional): Model key or Ollama model name to use for this request
173241

174242
**Response (200 OK):**
175243
```json
176244
{
177-
"docstring": "\"\"\"Compute the sum of two numbers.\n\nParameters\n----------\nx : int\n First number.\ny : int\n Second number.\n\nReturns\n-------\nint\n Sum of x and y.\n\"\"\""
245+
"docstring": "\"\"\"Compute the sum of two numbers.\n\nParameters\n----------\nx : int\n First number.\ny : int\n Second number.\n\nReturns\n-------\nint\n Sum of x and y.\"\"\"",
246+
"model": "qwen2.5-coder:32b"
178247
}
179248
```
180249

@@ -185,12 +254,57 @@ curl -X POST http://localhost:8000/generate \
185254
}
186255
```
187256

257+
#### List Models
258+
259+
Get available model configurations:
260+
261+
```bash
262+
curl http://localhost:8000/models
263+
```
264+
265+
**Response (200 OK):**
266+
```json
267+
{
268+
"default": "qwen2.5-coder-32b",
269+
"active": "qwen2.5-coder-32b",
270+
"models": [
271+
{
272+
"key": "qwen2.5-coder-32b",
273+
"name": "Qwen 2.5 Coder 32B",
274+
"ollama_model": "qwen2.5-coder:32b",
275+
"context_window": 32768,
276+
"architecture": "dense",
277+
"memory_q4": "~18GB",
278+
"description": "Dense 32B model, good balance of quality and speed"
279+
}
280+
]
281+
}
282+
```
283+
284+
### CLI Tool
285+
286+
The CLI tool allows testing docstring generation directly:
287+
288+
```bash
289+
# Use default model
290+
python scripts/run_ollama.py --user "def add(x, y): return x + y"
291+
292+
# Use specific model by key
293+
python scripts/run_ollama.py --model-key qwen3-coder-30b --user "def foo(): pass"
294+
295+
# Use raw Ollama model name
296+
python scripts/run_ollama.py --model qwen2.5-coder:7b --user "def bar(): pass"
297+
298+
# List available models
299+
python scripts/run_ollama.py --list-models
300+
```
301+
188302
### Testing
189303

190304
Run the test suite to verify the API endpoints:
191305

192306
```bash
193-
pytest tests/test_serve.py -v
307+
pytest tests/test_serve.py tests/test_models.py -v
194308
```
195309

196310
## Dataset

0 commit comments

Comments
 (0)