Skip to content

Commit d291e3f

Browse files
authored
feat: add compare_with_vllm.py example-03 (#38)
* feat: add compare_with_vllm.py
1 parent c82b166 commit d291e3f

8 files changed

Lines changed: 1055 additions & 6 deletions

File tree

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -186,5 +186,8 @@ data/
186186
results/
187187
outputs/
188188

189+
# Example vLLM virtualenv
190+
examples/03_BenchmarkComparison/vllm_venv/
191+
189192
# Cursor artifacts (local development only)
190193
.cursor_artifacts/

README.md

Lines changed: 17 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ inference-endpoint benchmark offline \
6262
--num-samples 5000
6363
```
6464

65-
### Local Testing
65+
### Running Locally
6666

6767
```bash
6868
# Start local echo server
@@ -80,6 +80,18 @@ pkill -f echo_server
8080

8181
See [Local Testing Guide](docs/LOCAL_TESTING.md) for detailed instructions.
8282

83+
### Running Tests and Examples
84+
85+
```bash
86+
# Install tests/ and examples/ dependencies
87+
pip install -r requirements/test.txt
88+
89+
# Run tests (excluding performance and explicit-run tests)
90+
pytest -m "not performance and not run_explicitly"
91+
92+
# Run examples: follow instructions in examples/*/README.md
93+
```
94+
8395
## 📚 Documentation
8496

8597
- [CLI Quick Reference](docs/CLI_QUICK_REFERENCE.md) - Command-line interface guide
@@ -93,14 +105,14 @@ The system follows a modular, event-driven architecture:
93105

94106
```
95107
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
96-
│ Dataset │ │ Load │ │ Endpoint │
97-
│ Manager │───▶│ Generator │───▶│ Client │
108+
│ Dataset │ │ Load │ │ Endpoint │
109+
│ Manager │───▶│ Generator │───▶│ Client │
98110
└─────────────────┘ └─────────────────┘ └─────────────────┘
99111
│ │ │
100112
▼ ▼ ▼
101113
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
102-
│ Metrics │ │ Configuration │ │ Endpoint │
103-
│ Collector │◄───│ Manager │ │ (External) │
114+
│ Metrics │ │ Configuration │ │ Endpoint │
115+
│ Collector │◄───│ Manager │ │ (External) │
104116
└─────────────────┘ └─────────────────┘ └─────────────────┘
105117
```
106118

Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
# Benchmark Comparison Example
2+
3+
Compare `inference-endpoint` with vLLM's benchmarking tool using identical prompts.
4+
5+
## Prerequisites
6+
7+
**Setup vLLM virtualenv** (isolates vLLM dependencies from inference-endpoint):
8+
9+
```bash
10+
cd examples/03_BenchmarkComparison
11+
./setup_vllm_venv.sh
12+
```
13+
14+
This creates a `vllm_venv` directory with vLLM installed. You can specify a custom location:
15+
16+
```bash
17+
./setup_vllm_venv.sh /path/to/custom/venv
18+
```
19+
20+
**Running inference server** (OpenAI-compatible):
21+
22+
```bash
23+
/path/to/custom/venv/bin/vllm serve Qwen/Qwen2.5-0.5B-Instruct --port 8000
24+
# default: examples/03_BenchmarkComparison/vllm_venv
25+
```
26+
27+
## Usage
28+
29+
```bash
30+
cd examples/03_BenchmarkComparison
31+
python compare_with_vllm.py --model "Qwen/Qwen2.5-0.5B-Instruct" --endpoint http://localhost:8000
32+
```
33+
34+
### Options
35+
36+
| Option | Description | Default |
37+
| --------------------- | -------------------------------- | ----------------------- |
38+
| `--model`, `-m` | Model name (required) | - |
39+
| `--num-prompts`, `-n` | Number of prompts | 100 |
40+
| `--endpoint` | Server URL | `http://localhost:8000` |
41+
| `--max-output-tokens` | Max output tokens | 2000 |
42+
| `--timeout` | Timeout in seconds | 900 |
43+
| `--workers` | Number of workers | 1 |
44+
| `--verbose`, `-v` | Show full output from each run | - |
45+
| `--dry` | Print commands without executing | - |
46+
| `--vllm-venv-dir` | Path to vLLM virtualenv | `./vllm_venv` |
47+
48+
### Example
49+
50+
```bash
51+
python compare_with_vllm.py \
52+
--model "Qwen/Qwen2.5-0.5B-Instruct" \
53+
--num-prompts 200 \
54+
--max-output-tokens 1000
55+
```
56+
57+
## Output
58+
59+
The script runs both benchmarks and displays a comparison table:
60+
61+
```
62+
63+
$ python examples/03_BenchmarkComparison/compare_with_vllm.py --model Qwen/Qwen2.5-0.5B-Instruct --num-prompts 10000
64+
65+
====================================================================================================
66+
Metric | Inference Endpoint | vLLM Benchmark
67+
----------------------------------------------------------------------------------------------------
68+
----------------------------------------------------------------------------------------------------
69+
Test Duration (s) | 284.28 | 300.69
70+
----------------------------------------------------------------------------------------------------
71+
Throughput (req/s) | 35.18 | 33.26
72+
Total Generated Tokens | 4446263 | 4626060
73+
Output Token Throughput (tok/s) | 15640.65 | 15384.64
74+
----------------------------------------------------------------------------------------------------
75+
Mean TTFT (ms) | 137112.88 | 146093.86
76+
Median TTFT (ms) | 137092.46 | 145656.40
77+
P99 TTFT (ms) | 270902.92 | 281810.49
78+
----------------------------------------------------------------------------------------------------
79+
Mean TPOT (ms) | 15.85 | 15.60
80+
Median TPOT (ms) | 15.56 | 15.61
81+
P99 TPOT (ms) | 36.47 | 23.49
82+
----------------------------------------------------------------------------------------------------
83+
Mean ITL (ms) | 15.85 | 15.42
84+
Median ITL (ms) | 15.56 | 12.17
85+
P99 ITL (ms) | 36.47 | 35.96
86+
----------------------------------------------------------------------------------------------------
87+
Mean Output Length (tokens) | 444 | 462
88+
Median Output Length (tokens) | 401 | 406
89+
P99 Output Length (tokens) | 2000 | 2000
90+
====================================================================================================
91+
92+
```

0 commit comments

Comments
 (0)