|
| 1 | +# Benchmark Comparison Example |
| 2 | + |
| 3 | +Compare `inference-endpoint` with vLLM's benchmarking tool using identical prompts. |
| 4 | + |
| 5 | +## Prerequisites |
| 6 | + |
| 7 | +**Setup vLLM virtualenv** (isolates vLLM dependencies from inference-endpoint): |
| 8 | + |
| 9 | +```bash |
| 10 | +cd examples/03_BenchmarkComparison |
| 11 | +./setup_vllm_venv.sh |
| 12 | +``` |
| 13 | + |
| 14 | +This creates a `vllm_venv` directory with vLLM installed. You can specify a custom location: |
| 15 | + |
| 16 | +```bash |
| 17 | +./setup_vllm_venv.sh /path/to/custom/venv |
| 18 | +``` |
| 19 | + |
| 20 | +**Running inference server** (OpenAI-compatible): |
| 21 | + |
| 22 | +```bash |
| 23 | +/path/to/custom/venv/bin/vllm serve Qwen/Qwen2.5-0.5B-Instruct --port 8000 |
| 24 | +# default: examples/03_BenchmarkComparison/vllm_venv |
| 25 | +``` |
| 26 | + |
| 27 | +## Usage |
| 28 | + |
| 29 | +```bash |
| 30 | +cd examples/03_BenchmarkComparison |
| 31 | +python compare_with_vllm.py --model "Qwen/Qwen2.5-0.5B-Instruct" --endpoint http://localhost:8000 |
| 32 | +``` |
| 33 | + |
| 34 | +### Options |
| 35 | + |
| 36 | +| Option | Description | Default | |
| 37 | +| --------------------- | -------------------------------- | ----------------------- | |
| 38 | +| `--model`, `-m` | Model name (required) | - | |
| 39 | +| `--num-prompts`, `-n` | Number of prompts | 100 | |
| 40 | +| `--endpoint` | Server URL | `http://localhost:8000` | |
| 41 | +| `--max-output-tokens` | Max output tokens | 2000 | |
| 42 | +| `--timeout` | Timeout in seconds | 900 | |
| 43 | +| `--workers` | Number of workers | 1 | |
| 44 | +| `--verbose`, `-v` | Show full output from each run | - | |
| 45 | +| `--dry` | Print commands without executing | - | |
| 46 | +| `--vllm-venv-dir` | Path to vLLM virtualenv | `./vllm_venv` | |
| 47 | + |
| 48 | +### Example |
| 49 | + |
| 50 | +```bash |
| 51 | +python compare_with_vllm.py \ |
| 52 | + --model "Qwen/Qwen2.5-0.5B-Instruct" \ |
| 53 | + --num-prompts 200 \ |
| 54 | + --max-output-tokens 1000 |
| 55 | +``` |
| 56 | + |
| 57 | +## Output |
| 58 | + |
| 59 | +The script runs both benchmarks and displays a comparison table: |
| 60 | + |
| 61 | +``` |
| 62 | +
|
| 63 | +$ python examples/03_BenchmarkComparison/compare_with_vllm.py --model Qwen/Qwen2.5-0.5B-Instruct --num-prompts 10000 |
| 64 | +
|
| 65 | +==================================================================================================== |
| 66 | +Metric | Inference Endpoint | vLLM Benchmark |
| 67 | +---------------------------------------------------------------------------------------------------- |
| 68 | +---------------------------------------------------------------------------------------------------- |
| 69 | +Test Duration (s) | 284.28 | 300.69 |
| 70 | +---------------------------------------------------------------------------------------------------- |
| 71 | +Throughput (req/s) | 35.18 | 33.26 |
| 72 | +Total Generated Tokens | 4446263 | 4626060 |
| 73 | +Output Token Throughput (tok/s) | 15640.65 | 15384.64 |
| 74 | +---------------------------------------------------------------------------------------------------- |
| 75 | +Mean TTFT (ms) | 137112.88 | 146093.86 |
| 76 | +Median TTFT (ms) | 137092.46 | 145656.40 |
| 77 | +P99 TTFT (ms) | 270902.92 | 281810.49 |
| 78 | +---------------------------------------------------------------------------------------------------- |
| 79 | +Mean TPOT (ms) | 15.85 | 15.60 |
| 80 | +Median TPOT (ms) | 15.56 | 15.61 |
| 81 | +P99 TPOT (ms) | 36.47 | 23.49 |
| 82 | +---------------------------------------------------------------------------------------------------- |
| 83 | +Mean ITL (ms) | 15.85 | 15.42 |
| 84 | +Median ITL (ms) | 15.56 | 12.17 |
| 85 | +P99 ITL (ms) | 36.47 | 35.96 |
| 86 | +---------------------------------------------------------------------------------------------------- |
| 87 | +Mean Output Length (tokens) | 444 | 462 |
| 88 | +Median Output Length (tokens) | 401 | 406 |
| 89 | +P99 Output Length (tokens) | 2000 | 2000 |
| 90 | +==================================================================================================== |
| 91 | +
|
| 92 | +``` |
0 commit comments