Added llama3.1-70b Benchmarking recipe on A3-Mega nodes by krishnakanthankam-qt · Pull Request #246 · AI-Hypercomputer/gpu-recipes

krishnakanthankam-qt · 2026-06-05T11:08:17Z

Description

Title

Add Llama 3.1 70B Recipe and Optimized Sequential Benchmarking

Summary

Introduces a high-performance recipe for serving and benchmarking Llama 3.1 70B on A3mega GKE node pools.

google-cla · 2026-06-05T11:08:27Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

depksingh · 2026-06-12T05:36:19Z

+
+This recipe supports the following models. Running TRTLLM inference benchmarking on these models are only tested and validated on A3-Mega GKE nodes with certain combination of TP, PP, EP, number of GPU chips, input & output sequence length, precision, etc.
+
+Example model configuration YAML files included in this repo only show a certain combination of parallelism hyperparameters and configs for benchmarking purposes. Input and output length in `/home/akrishnakanth/gpu-recipes/inference/a3mega/llama3.1-70b/trtllm-gke/values.yaml` need to be adjusted according to the model and its configs.


we can remove this

depksingh · 2026-06-12T05:46:20Z

-    rm -rf $engine_dir
-    rm -f $dataset_file
+    rm -rf $engine_dir || true
+    rm -f $dataset_file || true


Please remove

Priya-Quad · 2026-06-16T19:41:12Z

        --backend "pytorch" \
        --kv_cache_free_gpu_mem_fraction $kv_cache_free_gpu_mem_fraction \
-        $extra_args $vl_args > $output_file
+        $extra_args $vl_args | tee "$output_file"


| tee - This change can be reverted back to orginal.

Priya-Quad · 2026-06-16T19:42:04Z

            --dataset $dataset_file \
            --engine_dir $engine_dir \
-            --kv_cache_free_gpu_mem_fraction $kv_cache_free_gpu_mem_fraction $extra_args >$output_file
+            --kv_cache_free_gpu_mem_fraction $kv_cache_free_gpu_mem_fraction $extra_args | tee $output_file


| tee - This also can revert back to original.

Priya-Quad · 2026-06-16T19:42:27Z

+            --kv_cache_free_gpu_mem_fraction $kv_cache_free_gpu_mem_fraction $extra_args | tee $output_file
    fi

-    cat $output_file


add this back to the file.

Priya-Quad · 2026-06-16T19:48:12Z

+  serverArgs:
+    max-model-len: 32768
+    max-num-seqs: 128
+    gpu-memory-utilization: 0.90


please remove gpu-memory-utilization: 0.90 from here, you are passing this value from trtllm-configs

Priya-Quad · 2026-06-16T19:49:41Z

+    helm install -f values.yaml \
+    --set workload.benchmarks.experiments[0].isl=128 \
+    --set workload.benchmarks.experiments[0].osl=128 \
+    --set workload.benchmarks.experiments[0].num_requests=1000 \


Line 235 to 237 can be removed, we are passing these values from values.yaml. we don't usually hardcode any values on Readme

Priya-Quad · 2026-06-16T19:50:58Z

+    $REPO_ROOT/src/helm-charts/a3mega/trtllm-inference/single-node
+    ```
+  > [!NOTE]
+  > You can modify the benchmark configuration at runtime by changing the values for `isl`, `osl`, and `num_requests` (number of prompts) in the Helm command to test different scenarios.


Please check the other recipes to update this line.

Priya-Quad · 2026-06-16T19:51:49Z

+===========================================================
+DATASET DETAILS
+===========================================================
+Dataset Path:         /ssd/token-norm-dist_llama3.1-70b_128_128_tp4.json


change tp4 to tp8

Priya-Quad · 2026-06-16T19:52:51Z

+PYTORCH BACKEND
+===========================================================
+Model:                          nvidia/Llama3.1-70b
+Model Path:                     /ssd/nvidia/Llama3.1-70b


correct the model name

krishnakanthankam-qt added 2 commits June 5, 2026 16:07

Added new recipe for llama3.1-70b on A3-mega nodes

21ba9f6

modified trtllm-launcher.sh for backward compatibility

d4c2a9c

depksingh marked this pull request as draft June 5, 2026 11:14

krishnakanthankam-qt added 2 commits June 11, 2026 14:29

streamlined launcher script and modified helm deployment

5feb2d3

added ld_prelaod path as env to the container

ebc6650

depksingh reviewed Jun 12, 2026

View reviewed changes

krishnakanthankam-qt added 4 commits June 12, 2026 12:47

standardize readme

8bc5552

modified readme

8cdcfe1

updated readme

e1c28af

fixed readme.md

caba832

Priya-Quad reviewed Jun 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added llama3.1-70b Benchmarking recipe on A3-Mega nodes#246

Added llama3.1-70b Benchmarking recipe on A3-Mega nodes#246
krishnakanthankam-qt wants to merge 8 commits into
AI-Hypercomputer:mainfrom
krishnakanthankam-qt:main

krishnakanthankam-qt commented Jun 5, 2026 •

edited

Loading

Uh oh!

google-cla Bot commented Jun 5, 2026

Uh oh!

depksingh Jun 12, 2026

Uh oh!

depksingh Jun 12, 2026

Uh oh!

Priya-Quad Jun 16, 2026

Uh oh!

Priya-Quad Jun 16, 2026

Uh oh!

Priya-Quad Jun 16, 2026

Uh oh!

Priya-Quad Jun 16, 2026

Uh oh!

Priya-Quad Jun 16, 2026

Uh oh!

Priya-Quad Jun 16, 2026

Uh oh!

Priya-Quad Jun 16, 2026

Uh oh!

Priya-Quad Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		This recipe supports the following models. Running TRTLLM inference benchmarking on these models are only tested and validated on A3-Mega GKE nodes with certain combination of TP, PP, EP, number of GPU chips, input & output sequence length, precision, etc.

		Example model configuration YAML files included in this repo only show a certain combination of parallelism hyperparameters and configs for benchmarking purposes. Input and output length in `/home/akrishnakanth/gpu-recipes/inference/a3mega/llama3.1-70b/trtllm-gke/values.yaml` need to be adjusted according to the model and its configs.

Conversation

krishnakanthankam-qt commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Title

Summary

Uh oh!

google-cla Bot commented Jun 5, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

krishnakanthankam-qt commented Jun 5, 2026 •

edited

Loading