InfiniMetrics Dashboard provides a unified interface to visualize benchmark and evaluation results of AI accelerators across the following scenarios:
-
Communication (NCCL / Collective Communication)
-
Training (Training / Distributed Training)
-
Inference (Direct / Service Inference)
-
Operator (Core Operator Performance)
The benchmark framework produces two types of outputs:
JSON -> configuration / environment / scalar metrics
CSV -> curves / time-series data
The Dashboard automatically loads test results and provides unified analysis capabilities, including:
-
un ID fuzzy search: locate specific test runs using partial Run IDs
-
General filters: filter results by framework, model, device count, etc.
-
Multi-run comparison: select multiple runs to compare performance
-
Performance visualization: display curves such as latency / throughput / loss
-
Statistics and configuration view: inspect throughput statistics, runtime configuration, and environment details
For example, you can enter:
allreduce
service
to perform fuzzy matching on Run IDs
Before using the Dashboard, install the following dependencies:
streamlit
plotly
pandas
Run the following command in the project root directory:
python -m streamlit run dashboard/app.py
Access URL after startup:
Local URL: http://localhost:8501
Network URL: http://<server-ip>:8501
Explanation:
Local URL: accessible only on the local machine
Network URL: accessible from other machines within the same network
Path:
Dashboard → Communication Performance Test
Supported features:
Bandwidth analysis curve - peak bandwidth
Latency analysis curve - average latency
Test duration
GPU memory usage
Communication configuration analysis
Path:
Dashboard → Inference Performance Test
Modes:
Direct Inference
Service Inference
Displayed metrics:
TTFT
Latency
Throughput
GPU memory usage
Inference configuration analysis
Path:
Dashboard → Training Performance Test
Supported features:
Loss curve
Perplexity curve
Throughput curve
GPU memory usage
Training configuration analysis
Path:
Dashboard → Operator Performance Test
Supported metrics:
latency
flops
bandwidth




