Skip to content

Commit e0f1828

Browse files
authored
[WIP] Qwen2.5 7B on-device code docstrings generation for macOS Silicon (#8)
1 parent 9302246 commit e0f1828

File tree

12 files changed

+1741
-20
lines changed

12 files changed

+1741
-20
lines changed

.gitattributes

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# Auto detect text files and perform LF normalization
2+
* text=auto
3+
4+
# Python files
5+
*.py text eol=lf
6+
7+
# Markdown files
8+
*.md text eol=lf
9+
10+
# JSON files
11+
*.json text eol=lf
12+
13+
# YAML files
14+
*.yml text eol=lf
15+
*.yaml text eol=lf
16+
17+
# Shell scripts
18+
*.sh text eol=lf
19+
20+
# Configuration files
21+
*.toml text eol=lf
22+
*.cfg text eol=lf
23+
*.ini text eol=lf
24+
25+
# Keep Windows batch files with CRLF
26+
*.bat text eol=crlf
27+
*.cmd text eol=crlf

.github/workflows/ci.yml

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
name: CI
2+
3+
on:
4+
push:
5+
branches: [master, qwen2.5-coder]
6+
pull_request:
7+
branches: [master, qwen2.5-coder]
8+
9+
jobs:
10+
test:
11+
runs-on: ubuntu-latest
12+
13+
steps:
14+
- uses: actions/checkout@v4
15+
16+
- name: Set up Python 3.12
17+
uses: actions/setup-python@v5
18+
with:
19+
python-version: "3.12"
20+
cache: 'pip'
21+
22+
- name: Install dependencies
23+
run: |
24+
python -m pip install --upgrade pip
25+
pip install -e ".[dev]"
26+
27+
- name: Run linter
28+
run: python -m ruff check .
29+
continue-on-error: true
30+
31+
- name: Run tests with coverage
32+
run: |
33+
python -m pytest tests/ -v --tb=short --cov=src --cov-report=term-missing --cov-report=xml
34+
35+
- name: Upload coverage to Codecov
36+
uses: codecov/codecov-action@v4
37+
with:
38+
token: ${{ secrets.CODECOV_TOKEN }}
39+
files: ./coverage.xml
40+
flags: unittests
41+
name: ci-coverage
42+
fail_ci_if_error: false
43+
verbose: true

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ build/
2323

2424
# IDE
2525
.vscode/
26+
.pytest_cache/
2627

2728
# Jupyter
2829
notebooks/.ipynb_checkpoints/

README.md

Lines changed: 222 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -53,8 +53,11 @@ Test Dataset + Model Predictions --> [benchmark.py] --> Metrics Report
5353
- **`train_lora.py`** - LoRA fine-tuning using HuggingFace Trainer + PEFT. Supports
5454
QLoRA (4-bit quantization) for training on 1-2 A100 GPUs.
5555

56-
- **`serve.py`** - FastAPI inference server that loads the fine-tuned model and
57-
serves docstring generation via HTTP.
56+
- **`serve.py`** - FastAPI inference server that uses ollama API to generate
57+
docstrings. Supports multiple Qwen Coder models with model-specific configurations.
58+
59+
- **`models.py`** - Model configuration registry with sampling parameters for
60+
Qwen 2.5 Coder and Qwen3 Coder variants.
5861

5962
### Evaluation (`src/evaluation/`)
6063

@@ -87,6 +90,223 @@ python -m src.data.convert_seed \
8790
--output-dir data/processed/python-method
8891
```
8992

93+
## Serving
94+
95+
The FastAPI inference server provides HTTP endpoints for docstring generation using
96+
ollama as the backend. The server uses a system prompt stored in
97+
`src/training/prompts/system_prompt.md` to generate NumPy-style docstrings.
98+
99+
### Prerequisites
100+
101+
1. **Install ollama**: Make sure [ollama](https://ollama.ai/) is installed and running locally
102+
2. **Pull a model**: Download one of the supported code models:
103+
```bash
104+
# Qwen 2.5 Coder (dense models)
105+
ollama pull qwen2.5-coder:32b # Default, ~18GB Q4
106+
ollama pull qwen2.5-coder:14b # Mid-size, ~8GB Q4
107+
ollama pull qwen2.5-coder:7b # Fast, ~4GB Q4
108+
109+
# Qwen3 Coder (MoE model)
110+
ollama pull qwen3-coder:30b-a3b # Best quality, ~18GB Q4, 256K context
111+
```
112+
113+
### Starting the Server
114+
115+
Start the FastAPI server using uvicorn:
116+
117+
**Linux/macOS:**
118+
```bash
119+
# Using uvicorn directly
120+
uvicorn src.training.serve:app --host 0.0.0.0 --port 8000
121+
122+
# Or run the module directly
123+
python -m src.training.serve
124+
```
125+
126+
**Windows (PowerShell):**
127+
```powershell
128+
uvicorn src.training.serve:app --host 0.0.0.0 --port 8000
129+
```
130+
131+
The server will start on `http://localhost:8000` by default.
132+
133+
### Configuration
134+
135+
The server can be configured using environment variables:
136+
137+
- `OLLAMA_URL` - Ollama API endpoint (default: `http://localhost:11434/api/chat`)
138+
- `OLLAMA_MODEL` - Model key or Ollama model name (default: `qwen2.5-coder-32b`)
139+
- `REQUEST_TIMEOUT` - Request timeout in seconds (default: `120.0`)
140+
141+
**Linux/macOS:**
142+
```bash
143+
OLLAMA_MODEL=qwen3-coder-30b uvicorn src.training.serve:app --port 8000
144+
```
145+
146+
**Windows (PowerShell):**
147+
```powershell
148+
$env:OLLAMA_MODEL="qwen3-coder-30b"; uvicorn src.training.serve:app --port 8000
149+
```
150+
151+
**Windows (CMD):**
152+
```cmd
153+
set OLLAMA_MODEL=qwen3-coder-30b && uvicorn src.training.serve:app --port 8000
154+
```
155+
156+
### Available Models
157+
158+
| Model Key | Ollama Model | Architecture | Memory (Q4) | Context | Description |
159+
|-----------|--------------|--------------|-------------|---------|-------------|
160+
| `qwen2.5-coder-32b` | `qwen2.5-coder:32b` | Dense | ~18GB | 32K | Default, balanced quality/speed |
161+
| `qwen2.5-coder-14b` | `qwen2.5-coder:14b` | Dense | ~8GB | 32K | Mid-size, good performance |
162+
| `qwen2.5-coder-7b` | `qwen2.5-coder:7b` | Dense | ~4GB | 32K | Fast inference |
163+
| `qwen3-coder-30b` | `qwen3-coder:30b-a3b` | MoE | ~18GB | 256K | Best quality, 3.3B active params |
164+
165+
Each model has optimized sampling parameters:
166+
- **Qwen 2.5 Coder**: temperature=0.7, top_p=0.9, top_k=40
167+
- **Qwen3 Coder**: temperature=1.0, top_p=0.95, top_k=40 (per official recommendations)
168+
169+
### Model Selection
170+
171+
You can select a model in two ways:
172+
173+
1. **Environment variable** (applies to all requests):
174+
```bash
175+
OLLAMA_MODEL=qwen3-coder-30b uvicorn src.training.serve:app
176+
```
177+
178+
2. **Per-request** (via API):
179+
```bash
180+
curl -X POST http://localhost:8000/generate \
181+
-H "Content-Type: application/json" \
182+
-d '{"code": "def add(x, y): return x + y", "model": "qwen3-coder-30b"}'
183+
```
184+
185+
### List Available Models
186+
187+
**Via CLI:**
188+
```bash
189+
python scripts/run_ollama.py --list-models
190+
```
191+
192+
**Via API:**
193+
```bash
194+
curl http://localhost:8000/models
195+
```
196+
197+
### API Endpoints
198+
199+
#### Health Check
200+
201+
Check if the service is healthy and ollama is accessible:
202+
203+
```bash
204+
curl http://localhost:8000/health
205+
```
206+
207+
**Response (200 OK):**
208+
```json
209+
{
210+
"status": "healthy",
211+
"service": "ollama",
212+
"active_model": "Qwen 2.5 Coder 32B",
213+
"ollama_model": "qwen2.5-coder:32b"
214+
}
215+
```
216+
217+
**Response (503 Service Unavailable):**
218+
```json
219+
{
220+
"detail": "Service unhealthy: ollama is not running or not accessible"
221+
}
222+
```
223+
224+
#### Generate Docstring
225+
226+
Generate a docstring for a Python function:
227+
228+
```bash
229+
curl -X POST http://localhost:8000/generate \
230+
-H "Content-Type: application/json" \
231+
-d '{
232+
"code": "def add(x, y):\n return x + y",
233+
"max_new_tokens": 256
234+
}'
235+
```
236+
237+
**Request Body:**
238+
- `code` (required): Python function code as a string
239+
- `max_new_tokens` (optional): Maximum number of tokens to generate (uses model default if not specified)
240+
- `model` (optional): Model key or Ollama model name to use for this request
241+
242+
**Response (200 OK):**
243+
```json
244+
{
245+
"docstring": "\"\"\"Compute the sum of two numbers.\n\nParameters\n----------\nx : int\n First number.\ny : int\n Second number.\n\nReturns\n-------\nint\n Sum of x and y.\"\"\"",
246+
"model": "qwen2.5-coder:32b"
247+
}
248+
```
249+
250+
**Response (500 Internal Server Error):**
251+
```json
252+
{
253+
"detail": "Failed to generate docstring: <error message>"
254+
}
255+
```
256+
257+
#### List Models
258+
259+
Get available model configurations:
260+
261+
```bash
262+
curl http://localhost:8000/models
263+
```
264+
265+
**Response (200 OK):**
266+
```json
267+
{
268+
"default": "qwen2.5-coder-32b",
269+
"active": "qwen2.5-coder-32b",
270+
"models": [
271+
{
272+
"key": "qwen2.5-coder-32b",
273+
"name": "Qwen 2.5 Coder 32B",
274+
"ollama_model": "qwen2.5-coder:32b",
275+
"context_window": 32768,
276+
"architecture": "dense",
277+
"memory_q4": "~18GB",
278+
"description": "Dense 32B model, good balance of quality and speed"
279+
}
280+
]
281+
}
282+
```
283+
284+
### CLI Tool
285+
286+
The CLI tool allows testing docstring generation directly:
287+
288+
```bash
289+
# Use default model
290+
python scripts/run_ollama.py --user "def add(x, y): return x + y"
291+
292+
# Use specific model by key
293+
python scripts/run_ollama.py --model-key qwen3-coder-30b --user "def foo(): pass"
294+
295+
# Use raw Ollama model name
296+
python scripts/run_ollama.py --model qwen2.5-coder:7b --user "def bar(): pass"
297+
298+
# List available models
299+
python scripts/run_ollama.py --list-models
300+
```
301+
302+
### Testing
303+
304+
Run the test suite to verify the API endpoints:
305+
306+
```bash
307+
pytest tests/test_serve.py tests/test_models.py -v
308+
```
309+
90310
## Dataset
91311

92312
The seed dataset comes from the [NeuralCodeSum](https://github.com/wasiahmad/NeuralCodeSum)

codecov.yml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
comment:
2+
layout: "reach,diff,flags,tree"
3+
behavior: default
4+
require_changes: false
5+
require_base: no
6+
require_head: yes
7+
8+
# Optional: configure thresholds or ignore patterns below
9+
# coverage:
10+
# precision: 2
11+
# round: down

pyproject.toml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,12 +24,15 @@ dependencies = [
2424
"safetensors",
2525
"fastapi>=0.104.0",
2626
"uvicorn>=0.24.0",
27+
"requests>=2.31.0",
2728
]
2829

2930
[project.optional-dependencies]
3031
dev = [
3132
"pytest>=7.0",
33+
"pytest-cov>=4.0",
3234
"ruff>=0.1.0",
35+
"httpx>=0.24.0",
3336
]
3437

3538
[tool.hatch.build.targets.wheel]

0 commit comments

Comments
 (0)