Problem
Currently, running Phase 1 (Baseline) tests for the concurrent load test suite requires manually executing individual commands for each model × workload combination:
# Llama-3.2-1B - All workloads
for workload in chat rag code; do
ansible-playbook -i inventory/hosts.yml \
llm-benchmark-concurrent-load.yml \
-e "test_model=meta-llama/Llama-3.2-1B-Instruct" \
-e "base_workload=$workload" \
-e "core_sweep_counts=[16,32,64]" \
-e "skip_phase_2=true" \
-e "skip_phase_3=true"
done
# Qwen3-0.6B - Chat and Code
for workload in chat code; do
ansible-playbook -i inventory/hosts.yml \
llm-benchmark-concurrent-load.yml \
-e "test_model=Qwen/Qwen3-0.6B" \
-e "base_workload=$workload" \
-e "core_sweep_counts=[16,32,64]" \
-e "skip_phase_2=true" \
-e "skip_phase_3=true"
done
# ... (repeat for granite, gpt-oss-20b, TinyLlama, opt-125m)
This is tedious, error-prone, and makes it difficult to run the complete test suite consistently.
Proposed Solution
Create a dedicated playbook or script that automates running all Phase 1 tests:
Option A: New Playbook - llm-benchmark-concurrent-load-phase1.yml
ansible-playbook -i inventory/hosts.yml \
llm-benchmark-concurrent-load-phase1.yml \
-e "core_sweep_counts=[16,32,64]"
Option B: Enhanced Existing Playbook - Add suite mode to llm-benchmark-concurrent-load.yml
ansible-playbook -i inventory/hosts.yml \
llm-benchmark-concurrent-load.yml \
-e "run_test_suite=true" \
-e "test_phase=1" \
-e "core_sweep_counts=[16,32,64]"
Option C: Shell Script - run-phase1-suite.sh
./automation/test-execution/scripts/run-phase1-suite.sh \
--cores "16,32,64" \
--models "all" # or specific subset
Requirements
The automation should:
- ✅ Read model × workload combinations from
models/llm-models/model-matrix.yaml
- ✅ Run Phase 1 (baseline, no caching, fixed tokens) for all combinations
- ✅ Support core sweep configuration
- ✅ Allow filtering by model subset (e.g., only small models, only specific models)
- ✅ Provide progress tracking and error reporting
- ✅ Optionally support running Phase 2 and Phase 3 test suites similarly
Benefits
- Consistency: Ensures all models tested with same configuration
- Efficiency: Single command to run entire test suite
- Reproducibility: Easy to reproduce test runs
- CI/CD Ready: Simplifies automation in CI pipelines
Related
- See
tests/concurrent-load/concurrent-load.md for test suite definition
- Current workaround requires ~6 manual command blocks for all models
Problem
Currently, running Phase 1 (Baseline) tests for the concurrent load test suite requires manually executing individual commands for each model × workload combination:
This is tedious, error-prone, and makes it difficult to run the complete test suite consistently.
Proposed Solution
Create a dedicated playbook or script that automates running all Phase 1 tests:
Option A: New Playbook -
llm-benchmark-concurrent-load-phase1.ymlansible-playbook -i inventory/hosts.yml \ llm-benchmark-concurrent-load-phase1.yml \ -e "core_sweep_counts=[16,32,64]"Option B: Enhanced Existing Playbook - Add suite mode to
llm-benchmark-concurrent-load.ymlOption C: Shell Script -
run-phase1-suite.shRequirements
The automation should:
models/llm-models/model-matrix.yamlBenefits
Related
tests/concurrent-load/concurrent-load.mdfor test suite definition