Skip to content

Fix/benchmarking#1459

Open
nayana-kumari wants to merge 2 commits into
llm-d:mainfrom
nayana-kumari:fix/benchmarking
Open

Fix/benchmarking#1459
nayana-kumari wants to merge 2 commits into
llm-d:mainfrom
nayana-kumari:fix/benchmarking

Conversation

@nayana-kumari
Copy link
Copy Markdown

  • Enables full end-to-end benchmarking on s390x
  • Fixes PyArrow runtime compatibility issues
  • Improves stability of guidellm harness execution

(.venv) [root@ocpai5 llm-d-benchmark]# oc logs -f guidellm-7h42882i -n llmdbench
===== FIX START =====
===== FIX YQ =====
===== VERIFY ENV =====
/tmp/fixbin/yq
/usr/bin/bc
===== RUN HARNESS =====

  • export LLMDBENCH_RUN_EXPERIMENT_HARNESS=guidellm-llm-d-benchmark.sh
  • LLMDBENCH_RUN_EXPERIMENT_HARNESS=guidellm-llm-d-benchmark.sh
  • export LLMDBENCH_RUN_EXPERIMENT_ANALYZER=guidellm-analyze_results.sh
  • LLMDBENCH_RUN_EXPERIMENT_ANALYZER=guidellm-analyze_results.sh
  • export LLMDBENCH_RUN_EXPERIMENT_RESULTS_DIR=/requests/guidellm-1780565700-yz3g6c_1
  • LLMDBENCH_RUN_EXPERIMENT_RESULTS_DIR=/requests/guidellm-1780565700-yz3g6c_1
  • export LLMDBENCH_CONTROL_WORK_DIR=/requests/guidellm-1780565700-yz3g6c_1
  • LLMDBENCH_CONTROL_WORK_DIR=/requests/guidellm-1780565700-yz3g6c_1
  • export LLMDBENCH_RUN_EXPERIMENT_HARNESS_WORKLOAD_NAME=sanity_random.yaml
  • LLMDBENCH_RUN_EXPERIMENT_HARNESS_WORKLOAD_NAME=sanity_random.yaml
    ++ date -u +%Y-%m-%dT%H:%M:%SZ
  • export LLMDBENCH_HARNESS_START=2026-06-04T09:35:04Z
  • LLMDBENCH_HARNESS_START=2026-06-04T09:35:04Z
  • export 'LLMDBENCH_HARNESS_ARGS=--workload sanity_random.yaml'
  • LLMDBENCH_HARNESS_ARGS='--workload sanity_random.yaml'
    ++ grep '^guidellm:' /workspace/repos.txt
    ++ cut '-d ' -f3
  • export LLMDBENCH_HARNESS_VERSION=v0.5.3
  • LLMDBENCH_HARNESS_VERSION=v0.5.3
  • llm-d-benchmark.sh
    LLMDBENCH_CONTROL_WORK_DIR=/requests/guidellm-1780565700-yz3g6c_1
    LLMDBENCH_DEPLOY_CURRENT_MODEL=ibm-granite/granite-3.3-8b-instruct
    LLMDBENCH_DEPLOY_CURRENT_TOKENIZER=ibm-granite/granite-3.3-8b-instruct
    LLMDBENCH_DEPLOY_METHODS=modelservice
    LLMDBENCH_HARNESS_ARGS=--workload sanity_random.yaml
    LLMDBENCH_HARNESS_GIT_BRANCH=v0.5.3
    LLMDBENCH_HARNESS_GIT_REPO=https://github.com/vllm-project/guidellm.git
    LLMDBENCH_HARNESS_NAME=guidellm
    LLMDBENCH_HARNESS_STACK_ENDPOINT_URL=http://infra-llmdbench-inference-gateway-istio.llmdbench.svc.cluster.local:80
    LLMDBENCH_HARNESS_STACK_NAME=ibm-gran-f422a27c-instruct
    LLMDBENCH_HARNESS_STACK_TYPE=llm-d
    LLMDBENCH_HARNESS_START=2026-06-04T09:35:04Z
    LLMDBENCH_HARNESS_VERSION=v0.5.3
    LLMDBENCH_MAGIC_ENVAR=harness_pod
    LLMDBENCH_RUN_EXPERIMENT_ANALYZER=guidellm-analyze_results.sh
    LLMDBENCH_RUN_EXPERIMENT_HARNESS=guidellm-llm-d-benchmark.sh
    LLMDBENCH_RUN_EXPERIMENT_HARNESS_DIR=guidellm
    LLMDBENCH_RUN_EXPERIMENT_HARNESS_LOADGEN_EC=1
    LLMDBENCH_RUN_EXPERIMENT_HARNESS_MAX_TRIES=3
    LLMDBENCH_RUN_EXPERIMENT_HARNESS_NAME_AUTO=1
    LLMDBENCH_RUN_EXPERIMENT_HARNESS_REPORT_EC=1
    LLMDBENCH_RUN_EXPERIMENT_HARNESS_WORKLOAD_AUTO=1
    LLMDBENCH_RUN_EXPERIMENT_HARNESS_WORKLOAD_NAME=sanity_random.yaml
    LLMDBENCH_RUN_EXPERIMENT_ID=guidellm-1780565700-yz3g6c
    LLMDBENCH_RUN_EXPERIMENT_LAUNCHER=1
    LLMDBENCH_RUN_EXPERIMENT_RESULTS_DIR=/requests/guidellm-1780565700-yz3g6c_1
    LLMDBENCH_RUN_EXPERIMENT_RESULTS_DIR_PREFIX=/requests
    LLMDBENCH_RUN_WORKSPACE_DIR=/workspace
    LLMDBENCH_VLLM_COMMON_INFERENCE_PORT=8000
    LLMDBENCH_VLLM_COMMON_METRICS_PORT=8200
    LLMDBENCH_VLLM_COMMON_METRICS_SCRAPE_ENABLED=true
    LLMDBENCH_VLLM_COMMON_NAMESPACE=llmdbench
    LLMDBENCH_VLLM_MONITORING_METRICS_PATH=/metrics
    Running harness: /usr/local/bin/guidellm-llm-d-benchmark.sh
    Using experiment result dir: /requests/guidellm-1780565700-yz3g6c_1
    Starting metrics collection...
    Metrics collector started with PID: 93
    Metrics collection logs: /requests/guidellm-1780565700-yz3g6c_1/metrics_collection.log
    ✔ OpenAIHTTPBackend backend validated with model
    ibm-granite/granite-3.3-8b-instruct
    {'target':
    'http://infra-llmdbench-inference-gateway-istio.llmdbench.svc.cluster.local:80
    ', 'model': 'ibm-granite/granite-3.3-8b-instruct', 'timeout': 60.0, 'http2':
    True, 'follow_redirects': True, 'verify': False, 'openai_paths': {'health':
    'health', 'models': 'v1/models', 'text_completions': 'v1/completions',
    'chat_completions': 'v1/chat/completions', 'audio_transcriptions':
    'v1/audio/transcriptions', 'audio_translations': 'v1/audio/translations'},
    'validate_backend': {'method': 'GET', 'url':
    'http://infra-llmdbench-inference-gateway-istio.llmdbench.svc.cluster.local:80
    /health'}}
    ✔ Processor resolved
    Using model 'ibm-granite/granite-3.3-8b-instruct' as processor
    ✔ Request loader initialized with inf unique requests
    {'data': "[{'prompt_tokens': 50, 'prompt_tokens_stdev': 10,
    'prompt_tokens_min': 10, 'prompt_tokens_max': 100, 'output_tokens': 50,
    'output_tokens_stdev': 10, 'output_tokens_min': 10, 'output_tokens_max':
    100}]", 'data_args': '[]', 'data_samples': -1, 'preprocessors':
    ['GenerativeColumnMapper', 'GenerativeTextCompletionsRequestFormatter'],
    'collator': 'GenerativeRequestCollator', 'sampler': 'None', 'num_workers': 1,
    'random_seed': 42}
    ✔ Resolved transient phase configurations
    Warmup: percent=None value=None mode='prefer_duration'
    Cooldown: percent=None value=None mode='prefer_duration'
    Rampup (Throughput/Concurrent): 0.0
    ✔ AsyncProfile profile resolved
    {'str': "type_='constant' completed_strategies=[] constraints={'max_seconds':
    30} rampup_duration=0.0 strategy_type='constant' rate=[1.0]
    max_concurrency=None random_seed=42 strategy_types=['constant']", 'type':
    'AsyncProfile', 'class': 'AsyncProfile', 'module':
    'guidellm.benchmark.profiles', 'attributes': {'type_': 'constant',
    'completed_strategies': [], 'constraints': {'max_seconds': 30},
    'rampup_duration': 0.0, 'strategy_type': 'constant', 'rate': [1.0],
    'max_concurrency': 'None', 'random_seed': 42}}
    ✔ Output formats resolved
    {'json':
    "output_path=PosixPath('/requests/guidellm-1780565700-yz3g6c_1/results.json')"
    }
    ✔ Setup complete, starting benchmarks...

ℹ Run Summary Info
|===========|==========|==========|======|======|======|=======|=======|=====|=======|======|=====|
| Benchmark | Timings ||||| Input Tokens ||| Output Tokens |||
| Strategy | Start | End | Dur | Warm | Cool | Comp | Inc | Err | Comp | Inc | Err |

Sec Sec Sec Tot Tot Tot Tot Tot Tot
constant 09:35:12 09:35:42 30.0 0.0 0.0 956.0 607.0 0.0 934.0 60.0 0.0
=========== ========== ========== ====== ====== ====== ======= ======= ===== ======= ====== =====

ℹ Text Metrics Statistics (Completed Requests)
|===========|=======|======|======|======|=======|======|======|======|=======|=======|=======|=======|
| Benchmark | Input Tokens |||| Input Words |||| Input Characters ||||
| Strategy | Per Request || Per Second || Per Request || Per Second || Per Request || Per Second ||

Mdn p95 Mdn Mean Mdn p95 Mdn Mean Mdn p95 Mdn Mean
constant 50.0 63.0 34.0 38.5 36.0 48.0 24.7 27.7 224.0 316.0 156.9 177.6
=========== ======= ====== ====== ====== ======= ====== ====== ====== ======= ======= ======= =======
Benchmark Output Tokens Output Words Output Characters
Strategy Per Request Per Second Per Request Per Second Per Request Per Second
Mdn p95 Mdn Mean Mdn p95 Mdn Mean Mdn p95 Mdn Mean
----------- ------- ------ ------ ------ ------- ------ ------ ------ ------- ------- ------- -------
constant 51.0 65.0 34.3 37.6 34.0 52.0 25.8 27.2 229.0 327.0 164.9 174.1
=========== ======= ====== ====== ====== ======= ====== ====== ====== ======= ======= ======= =======

ℹ Request Token Statistics (Completed Requests)
|===========|======|======|======|======|======|=======|=======|=======|=========|========|
| Benchmark | Input Tok || Output Tok || Total Tok || Stream Iter || Output Tok ||
| Strategy | Per Req || Per Req || Per Req || Per Req || Per Stream Iter ||

Mdn p95 Mdn p95 Mdn p95 Mdn p95 Mdn p95
constant 50.0 63.0 51.0 65.0 98.0 126.0 106.0 134.0 1.0 1.0
=========== ====== ====== ====== ====== ====== ======= ======= ======= ========= ========

ℹ Request Latency Statistics (Completed Requests)
|===========|========|=========|========|========|=======|=======|=======|=======|
| Benchmark | Request Latency || TTFT || ITL || TPOT ||
| Strategy | Sec || ms || ms || ms ||

Mdn p95 Mdn p95 Mdn p95 Mdn p95
constant 8.4 13.4 3017.4 8213.4 113.3 118.2 161.3 320.9
=========== ======== ========= ======== ======== ======= ======= ======= =======

ℹ Server Throughput Statistics (All Requests)
|===========|=====|======|=======|======|=======|=======|========|=======|=======|=======|
| Benchmark | Requests |||| Input Tokens || Output Tokens || Total Tokens ||
| Strategy | Per Sec || Concurrency || Per Sec || Per Sec || Per Sec ||

Mdn Mean Mdn Mean Mdn Mean Mdn Mean Mdn Mean
constant 0.6 0.6 8.0 7.9 35.7 52.1 26.5 33.1 26.8 85.2
=========== ===== ====== ======= ====== ======= ======= ======== ======= ======= =======

✔ Benchmarking complete, generated 1 benchmark(s)
… json : /requests/guidellm-1780565700-yz3g6c_1/results.json
Stopping metrics collection...
Processing collected metrics...
Metrics collection complete. Check metrics_collection.log for details.
Run metadata written to /requests/guidellm-1780565700-yz3g6c_1/run_metadata.yaml
Harness completed successfully.
Harness completed: /usr/local/bin/guidellm-llm-d-benchmark.sh
Running analysis: /usr/local/bin/guidellm-analyze_results.sh
Converting results.json to Benchmark Report v0.1
Warning: LLMDBENCH_VLLM_MODELSERVICE_GAIE_PRESETS_CONFIG empty.
Converting results.json to Benchmark Report v0.2
Environment variable empty: LLMDBENCH_VLLM_MODELSERVICE_GAIE_PRESETS_CONFIG
Results data conversion completed successfully.
Integrating metrics summary into benchmark report(s) v0.2...
Metrics integrated into: /requests/guidellm-1780565700-yz3g6c_1/benchmark_report_v0.2,_results.json_0.yaml
Generating metric plots...
Collecting time series data...
Saved plot: /requests/guidellm-1780565700-yz3g6c_1/metrics/graphs/vllm_prefix_cache_hit_rate.png
Saved plot: /requests/guidellm-1780565700-yz3g6c_1/metrics/graphs/vllm_external_prefix_cache_hit_rate.png
Saved plot: /requests/guidellm-1780565700-yz3g6c_1/metrics/graphs/vllm_kv_cache_usage_perc.png
Saved plot: /requests/guidellm-1780565700-yz3g6c_1/metrics/graphs/vllm_num_requests_running.png
Saved plot: /requests/guidellm-1780565700-yz3g6c_1/metrics/graphs/vllm_num_requests_waiting.png
Saved plot: /requests/guidellm-1780565700-yz3g6c_1/metrics/graphs/vllm_prefix_cache_hits_total.png
Saved plot: /requests/guidellm-1780565700-yz3g6c_1/metrics/graphs/vllm_prefix_cache_queries_total.png
Saved plot: /requests/guidellm-1780565700-yz3g6c_1/metrics/graphs/vllm_external_prefix_cache_hits_total.png
Saved plot: /requests/guidellm-1780565700-yz3g6c_1/metrics/graphs/vllm_external_prefix_cache_queries_total.png
Saved plot: /requests/guidellm-1780565700-yz3g6c_1/metrics/graphs/vllm_num_preemptions_total.png
Saved plot: /requests/guidellm-1780565700-yz3g6c_1/metrics/graphs/inference_pool_average_kv_cache_utilization.png
Saved plot: /requests/guidellm-1780565700-yz3g6c_1/metrics/graphs/inference_pool_average_queue_size.png
Saved plot: /requests/guidellm-1780565700-yz3g6c_1/metrics/graphs/inference_pool_ready_pods.png
Saved plot: /requests/guidellm-1780565700-yz3g6c_1/metrics/graphs/pod_startup_times.png
Saved plot: /requests/guidellm-1780565700-yz3g6c_1/metrics/graphs/replica_status.png

All visualizations saved to: /requests/guidellm-1780565700-yz3g6c_1/metrics/graphs

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

Unsigned commits detected! Please sign your commits.

For instructions on how to set up GPG/SSH signing and verify your commits, please see GitHub Documentation.

Signed-off-by: modassar rana <modassar.rana@ibm.com>
Signed-off-by: modassar rana <modassar.rana@ibm.com>
Comment thread config/scenarios/examples/spyre-s390x.yaml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants