|
| 1 | +# HBM Microbnechmarks on tpu7x-2x2x1 |
| 2 | + |
| 3 | +This guide provides instructions for running High Bandwidth Memory (HBM) microbenchmarks on tpu7x-2x2x1 Google Kubernetes Engine (GKE) clusters. It covers creating a node pool, running the benchmarks, and viewing the expected output. |
| 4 | + |
| 5 | +## Create Node Pools |
| 6 | + |
| 7 | +Follow [Setup section](../../Ironwood_Microbenchmarks_readme.md#setup) to create a GKE cluster with one 2x2x1 nodepool. |
| 8 | + |
| 9 | +## Run HBM Microbenchmarks |
| 10 | + |
| 11 | +To run the HBM microbenchmarks, apply the following Kubernetes configuration: |
| 12 | +```bash |
| 13 | +kubectl apply -f tpu7x-2x2x1-hbm-microbenchmark.yaml |
| 14 | +``` |
| 15 | + |
| 16 | +To extract the log of HBM microbenchmark, use `kubectl log`: |
| 17 | +```bash |
| 18 | +kubectl log tpu7x-2x2x1-hbm-microbenchmark |
| 19 | +``` |
| 20 | + |
| 21 | +Once the benchmark completes, you should see logs similar to the example below: |
| 22 | + |
| 23 | +```bash |
| 24 | +Tensor size: 8192.0 MB, time taken (median): 5.3523 ms, bandwidth (median): 3209.812 GB/s |
| 25 | + |
| 26 | +Writing metrics to JSONL file: ../microbenchmarks/hbm/metrics_report.jsonl |
| 27 | +Metrics written to CSV at ../microbenchmarks/hbm/t_single_device_hbm_copy_[A-Z0-9]+.tsv. |
| 28 | +``` |
| 29 | + |
| 30 | +To retrieve the complete results, including the trace and TSV output files, you must keep the pod running after the benchmark completes. To do this, add a `sleep` command to the `tpu7x-2x2x1-hbm-microbenchmark.yaml` file. You can then use `kubectl cp` to copy the output from the pod. |
| 31 | + |
| 32 | +```bash |
| 33 | +kubectl cp tpu7x-2x2x1-hbm-microbenchmark:/microbenchmarks/hbm hbm |
| 34 | +``` |
| 35 | + |
| 36 | +## Expected bandwidth for different matrix size |
| 37 | + |
| 38 | + |
| 39 | +| Matrix Size (Bytes) | Bandwidth (GB/s/core) | Bandwidth (GB/s/chip) | |
| 40 | +|---------------------|-----------------------|-----------------------| |
| 41 | +| 2097152 | 1379.335021 | 2758.670041 | |
| 42 | +| 4194304 | 2249.746091 | 4499.492181 | |
| 43 | +| 8388608 | 2246.129937 | 4492.259875 | |
| 44 | +| 16777216 | 2757.308985 | 5514.61797 | |
| 45 | +| 33554432 | 3009.83593 | 6019.67186 | |
| 46 | +| 67108864 | 3097.217778 | 6194.435556 | |
| 47 | +| 134217728 | 3176.50274 | 6353.005481 | |
| 48 | +| 268435456 | 3167.144485 | 6334.288969 | |
| 49 | +| 536870912 | 3199.020504 | 6398.041009 | |
| 50 | +| 1073741824 | 3198.414211 | 6396.828421 | |
| 51 | +| 2147483648 | 3203.486119 | 6406.972238 | |
| 52 | +| 4294967296 | 3197.879607 | 6395.759214 | |
| 53 | +| 8589934592 | 3210.480912 | 6420.961823 | |
0 commit comments