Compare performance results between two Locust runs and show changes relative to a base run. Works with both Locust CSV report.csv outputs and the per-feature HTML reports generated by the Locust web UI.
- Compare any two runs (base vs. current).
- Parses CSV
report.csvfor aggregated and per-endpoint metrics. - Parses per-feature
.htmlpages and compares the latest history sample. - Outputs human-readable tables, markdown with emoji indicators, or machine-friendly JSON.
- Python 3.8+ (no third-party dependencies).
Run directly from GitHub without cloning:
uvx --from 'git+https://github.com/dev-ankit/python-tools.git#subdirectory=tools/locust-compare' locust-compare <base_dir> <current_dir>Or from a local clone (run from tools/locust-compare/ directory):
git clone https://github.com/dev-ankit/python-tools.git
cd python-tools/tools/locust-compare
uvx --from . locust-compare test_runs/HTML-Report-292 test_runs/HTML-Report-294Once published to PyPI, you can run without any prefix:
uvx locust-compare <base_dir> <current_dir>cd tools/locust-compare
pip install .
locust-compare <base_dir> <current_dir>python3 compare_runs.py <base_dir> <current_dir>- Compare two run directories (each containing a
report.csvand HTML files):
locust-compare test_runs/HTML-Report-292 test_runs/HTML-Report-294- Compare two specific CSV files:
locust-compare test_runs/HTML-Report-292/report.csv test_runs/HTML-Report-294/report.csv- JSON output for scripting:
locust-compare test_runs/HTML-Report-292 test_runs/HTML-Report-294 -o json- Markdown output with emoji indicators (✅ better, ❌ worse, ➖ same):
python3 compare_runs.py test_runs/HTML-Report-292 test_runs/HTML-Report-294 -o markdown
- Colorize text output (green=better, red=worse):
locust-compare test_runs/HTML-Report-292 test_runs/HTML-Report-294 --colorExit code is 0 on success and 1 on error.
From CSV report.csv (Aggregated and each request row):
- Requests/s, Request Count, Failure Count
- Average, Median, Min, Max Response Time
- Percentiles: 50%, 66%, 75%, 80%, 90%, 95%, 98%, 99%, 99.9%, 99.99%, 100% (if present)
From HTML feature pages (last entry in window.templateArgs.history):
- Requests/s (
current_rps) - Average Response Time (
total_avg_response_time) - 50% (
response_time_percentile_0.5) - 95% (
response_time_percentile_0.95)
If a metric is not available for an item, it is shown as -.
The -o markdown flag produces markdown tables with emoji indicators for verdicts:
## Aggregated
| Metric | Base | Current | Diff | % Change | Verdict |
| --- | --- | --- | --- | --- | --- |
| Requests/s | 286.200 | 300 | +13.800 | +4.8% | ✅ |
| Request Count | 1500 | 1800 | +300 | +20.0% | ✅ |
| Failure Count | 7 | 4 | -3 | -42.9% | ✅ |
| Average Response Time | 85.200 | 78.500 | -6.700 | -7.9% | ✅ |
| 95% | 150 | 140 | -10 | -6.7% | ✅ |Verdict emojis:
- ✅ Better performance
- ❌ Worse performance
- ➖ No change
The -o json output is a single JSON object containing keys for each compared item.
- CSV items use their request name; the aggregated row is keyed as
Aggregated. - HTML feature pages are keyed as
HTML:<feature_file_stem>.
Each item maps metric names to an object with:
{
"base": number | null,
"current": number | null,
"diff": number | null,
"pct_change": number | null
}
Example (truncated):
{
"Aggregated": {
"Requests/s": {"base": 268.623, "current": 196.786, "diff": -71.836, "pct_change": -26.72},
"Average Response Time": {"base": 71.801, "current": 98.069, ...}
},
"HTML:conferences_widget_all_lists": {
"Requests/s": {"base": 271.5, "current": 189.8, ...},
"95%": {"base": 160, "current": 190, ...}
}
}
- For HTML pages, only the last sample in
window.templateArgs.historyis compared. This typically represents the end-state of the run. If you prefer a different aggregation (mean/max), open an issue or adjust the code where noted. - Request Count and Failure Count are not available from HTML pages and are displayed as
-. - If the base value is
0or missing, percent change is shown as-. - The tool skips non-feature HTML pages such as
htmlpublisher-wrapper.html. - The tool prints a Verdict column. By default, it evaluates improvements as:
- Higher is better:
Requests/s,Request Count. - Lower is better: all response-time metrics and percentiles,
Failure Count,Failures/s. - Neutral (no verdict): other metrics (e.g.,
Average Content Size).
- Higher is better:
tools/locust-compare/
├── compare_runs.py # CLI tool
├── pyproject.toml # Package configuration
├── tests/ # Test suite
└── test_runs/ # Sample Locust outputs for trying the tool
├── HTML-Report-292/
└── HTML-Report-294/
Small and simple by design. If you need additional metrics, output formats, or aggregation modes, feel free to extend compare_runs.py or open a PR.