Skip to content

Add automated Phase 1 test suite runner for concurrent load tests #62

@maryamtahhan

Description

@maryamtahhan

Problem

Currently, running Phase 1 (Baseline) tests for the concurrent load test suite requires manually executing individual commands for each model × workload combination:

# Llama-3.2-1B - All workloads
for workload in chat rag code; do
  ansible-playbook -i inventory/hosts.yml \
    llm-benchmark-concurrent-load.yml \
    -e "test_model=meta-llama/Llama-3.2-1B-Instruct" \
    -e "base_workload=$workload" \
    -e "core_sweep_counts=[16,32,64]" \
    -e "skip_phase_2=true" \
    -e "skip_phase_3=true"
done

# Qwen3-0.6B - Chat and Code
for workload in chat code; do
  ansible-playbook -i inventory/hosts.yml \
    llm-benchmark-concurrent-load.yml \
    -e "test_model=Qwen/Qwen3-0.6B" \
    -e "base_workload=$workload" \
    -e "core_sweep_counts=[16,32,64]" \
    -e "skip_phase_2=true" \
    -e "skip_phase_3=true"
done

# ... (repeat for granite, gpt-oss-20b, TinyLlama, opt-125m)

This is tedious, error-prone, and makes it difficult to run the complete test suite consistently.

Proposed Solution

Create a dedicated playbook or script that automates running all Phase 1 tests:

Option A: New Playbook - llm-benchmark-concurrent-load-phase1.yml

ansible-playbook -i inventory/hosts.yml \
  llm-benchmark-concurrent-load-phase1.yml \
  -e "core_sweep_counts=[16,32,64]"

Option B: Enhanced Existing Playbook - Add suite mode to llm-benchmark-concurrent-load.yml

ansible-playbook -i inventory/hosts.yml \
  llm-benchmark-concurrent-load.yml \
  -e "run_test_suite=true" \
  -e "test_phase=1" \
  -e "core_sweep_counts=[16,32,64]"

Option C: Shell Script - run-phase1-suite.sh

./automation/test-execution/scripts/run-phase1-suite.sh \
  --cores "16,32,64" \
  --models "all"  # or specific subset

Requirements

The automation should:

  • ✅ Read model × workload combinations from models/llm-models/model-matrix.yaml
  • ✅ Run Phase 1 (baseline, no caching, fixed tokens) for all combinations
  • ✅ Support core sweep configuration
  • ✅ Allow filtering by model subset (e.g., only small models, only specific models)
  • ✅ Provide progress tracking and error reporting
  • ✅ Optionally support running Phase 2 and Phase 3 test suites similarly

Benefits

  • Consistency: Ensures all models tested with same configuration
  • Efficiency: Single command to run entire test suite
  • Reproducibility: Easy to reproduce test runs
  • CI/CD Ready: Simplifies automation in CI pipelines

Related

  • See tests/concurrent-load/concurrent-load.md for test suite definition
  • Current workaround requires ~6 manual command blocks for all models

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions