You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Scale-up bandwidth is always unidirectional (not bidirectional). Vendor spec sheets often quote bidirectional. We halve it for consistency because unidirectional is what matters for actual data transfer between a GPU pair.
TFLOPS are dense tensor core peak without sparsity. Sparsity doubles theoretical FLOPS but real workloads rarely achieve structured sparsity. Dense values reflect realistic performance.
NIC naming is {Model} {PortSpec} not {PortSpec} {Model} because the model name (ConnectX-7, Pollara) identifies the generation, which is more important for comparison. GbE suffix = Ethernet; plain G = InfiniBand (industry convention).
Why NVL72 Systems Have No Scale-Out Topology
GB200 and GB300 NVL72 are rack-scale systems where all 72 GPUs are connected via NVLink domain. There's no "scale out" — the entire rack IS the compute unit. Scale-out would be connecting multiple NVL72 racks, which isn't a standardized topology yet. Fields are null, rendered as N/A¹ with footnote.
Scale-Out Topology Invariants
Leaf switches live inside the rail pod, not at spine level. This is a common mistake in network diagrams. Servers → leaf (intra-pod) → spine (inter-pod).
Spine count = half of leaf count when using the same switch model. This maintains a 2:1 oversubscription ratio, which is the standard non-blocking fabric design.
B200 is special: Uses Google's gIB SKU with separate leaf (TH3) and spine (TH4) switch models, 4 pods of 4 servers instead of the usual 1-pod layout.
Scale Up Switch: Hopper = 7.2Tbit/s Gen 3.0, Blackwell = 28.8Tbit/s Gen 4.0, AMD = null (display "—")
Hardware Data Gotchas
SXM vs NVL72 variants of the same chip have different specs. B200 SXM = 180 GB memory, GB200 NVL72 = 192 GB. NVL72 runs at higher TDP, so TFLOPS differ too.
Blackwell Ultra (B300/GB300) only improves FP4 (1.5x over B200/GB200). FP8 and BF16 are unchanged for SXM. Don't assume all precisions scale equally.
MI355X is the first AMD GPU with FP4 support. MI300X and MI325X don't have it.
Memory type (HBM3/HBM3e) is stored but intentionally not displayed. It's not a meaningful inference performance differentiator — bandwidth matters more, and that's shown separately.