Skip to content

Commit 68bc0bf

Browse files
committed
Merge branch 'rachel-stage' into stage
2 parents 8460c32 + 45895bb commit 68bc0bf

25 files changed

Lines changed: 245 additions & 6311 deletions

File tree

content/cumulus-netq-51/Installation-Management/Backup-and-Restore-NetQ.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -160,4 +160,8 @@ nvidia@<hostname>:~$ netq nvl cluster restore /tmp/data-infra/nvlink_cluster_bac
160160
If this step fails for any reason, run `netq nvl bootstrap reset` and then try again.
161161

162162
{{</tab >}}
163-
{{</tabs>}}
163+
{{</tabs>}}
164+
165+
## Related Information
166+
167+
- {{<link title="Troubleshoot NetQ/#troubleshoot-netq-installation-and-upgrade-issues" text="Troubleshoot NetQ Installation and Upgrade">}}

content/cumulus-netq-51/Installation-Management/Configure-Integrations/Integrate-with-Grafana.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -240,7 +240,7 @@ nvidia@netq-server:~$ netq add otlp endpoint tsdb-name <text-tsdb-endpoint> tsdb
240240

241241
NetQ restricts the metrics accepted into the local TSDB by default. To view the default whitelist of permitted metrics, run the `netq show otlp whitelist default` command:
242242

243-
{{<expand "Default OTLP Whitelist">}}
243+
{{<expand "Default OTLP whitelist">}}
244244
```
245245
nvidia@netq-server:~$ netq show otlp whitelist default
246246
- nvswitch_interface_tc_tx_octet

content/cumulus-netq-51/Installation-Management/Install-NetQ/Before-You-Install.md

Lines changed: 30 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ weight: 190
55
toc: 3
66
---
77

8-
This overview is designed to help you understand the various NetQ deployment and installation options.
8+
This overview is designed to help you understand the various NetQ deployment and installation options.
99

1010
## Installation Overview
1111

@@ -14,37 +14,55 @@ Consider the following deployment options and requirements before you install th
1414
| Single Server | Cluster| Scale Cluster |
1515
| --- | --- | --- |
1616
| On-premises only | On-premises only | On-premises only |
17-
| Network size: small<ul></ul><ul><li>1-node: Supports up to 40 switches* </li></ul>| Network size: medium<ul><li>3-node: Supports up to 100 switches*</li></ul>| Network size: large<ul><li>3-node: Supports up to 1,000 switches and 125,000 interfaces* </li><li>5-node: Supports up to 2,000 switches and 250,000 interfaces* </li></ul>|
17+
| Network size: small<ul></ul><ul><li>1-node: Supports up to 40 switches* </li></ul>| Network size: medium<ul><li>3-node: Supports up to 100 switches*</li></ul>| Network size: large<ul></ul><ul><li>Support varies based on number of nodes. See {{<link title="Before You Install/#server-arrangement" text="Server Arrangement">}}.</li></ul> |
1818
| KVM or VMware hypervisor | KVM or VMware hypervisor | KVM or VMware hypervisor |
19-
| No high-availability option | High availability | High availability |
19+
| No high-availability option | High-availability | High-availability |
2020
| System requirements:<ul></ul><ul><li>16 virtual CPUs</li><li>64GB RAM</li><li>500GB SSD disk</li></ul>| System requirements (per node):<ul></ul><ul><li>16 virtual CPUs</li><li>64GB RAM</li><li>500GB SSD disk</li></ul>| System requirements (per node): <ul></ul><ul><li>48 virtual CPUs</li><li>512GB RAM </li><li>3.2TB SSD disk</li></ul>|
2121
| Not supported:<ul><li>NVLink monitoring</li></ul> | Not supported:<ul><li>NVLink monitoring</li>| Not supported:<ul><li>Network snapshots</li><li>Trace requests</li><li>Flow analysis</li><li>MAC commentary</li><li>Duplicate IP address validations</li></ul> Limited support: <ul><li>Link health view (beta)</li></ul>|
2222

23-
*When switches are {{<link title="Integrate NetQ with Grafana/#requirements-and-support" text="configured with both OpenTelemetry (OTLP)">}} and the NetQ agent, switch support per deployment model is reduced by half.
23+
*When switches are {{<link title="Integrate NetQ with Grafana/#requirements-and-support" text="configured with both OpenTelemetry (OTLP)">}} and the NetQ agent, switch support per deployment model is reduced by half.<br>
2424

2525

26-
## Server Arrangement: Single or Cluster
26+
## Server Arrangement
2727

28-
In all deployment models, NetQ agents reside on the switches and hosts they monitor in your network.
28+
**Single server**: A standalone server is easier to set up, configure, and manage, but limits your ability to scale your network monitoring and provides no redundancy in case of a hardware failure.
2929

30-
### Single Server
30+
**Cluster**: The cluster deployment comprises three servers: one master and two workers nodes. NetQ supports high-availability using a virtual IP address. Even if the master node fails, NetQ services remain operational.
3131

32-
A standalone server is easier to set up, configure, and manage, but limits your ability to scale your network monitoring. Deploying multiple servers allows you to limit potential downtime and increase availability by having more than one server that can run the software and store the data. Select the standalone, single-server arrangement for smaller, simpler deployments.
32+
**Scale cluster**: The scale cluster deployment is intended for large network environments and allows you to expand NetQ monitoring capacity by adding nodes as your network grows. NVIDIA typically recommends this deployment for environments with 100 or more switches. It is the only deployment model that supports monitoring for {{<exlink url="https://www.nvidia.com/en-us/data-center/nvlink/" text="NVIDIA NVLink">}}, {{<exlink url="https://www.nvidia.com/en-us/networking/spectrumx/" text="NVIDIA Spectrum-X Ethernet">}}, as well as mixed Ethernet and NVLink networks.
3333

34-
### Cluster of Servers
34+
The following table shows high-level device support per-node for Ethernet-only, NVLink-only, and combined deployments. This deployment model is currently in beta for clusters larger than 5 nodes. See {{<link title="Before You Install/#verified-limits" text="Verified Limits">}} for detailed testing information.
3535

36-
NVIDIA offers two types of cluster deployments: cluster and scale cluster. Both deployments are available on-premises and offer high-availability to provide redundancy in case of node failure.
36+
| Deployment | 3 Nodes | 4 Nodes | 5 Nodes | 6 Nodes | 7 Nodes | 8 Nodes | 9 Nodes |
37+
|------------------------------|---------------------------------|----------------------------------|----------------------------------|----------------------------------|----------------------------------|----------------------------------|----------------------------------|
38+
| Exclusively Ethernet | 500 switches, 2K hosts | 750 switches, 3K hosts | 1000 switches, 4K hosts | 1250 switches, 5K hosts | 1500 switches, 6K hosts | 1750 switches, 7K hosts | 2000 switches, 8K hosts |
39+
| Exclusively NVLink | 128 NVL | 160 NVL | 192 NVL | 224 NVL | 256 NVL | 288 NVL | 320 NVL |
40+
| Ethernet and NVLink combined | 250 switches, 1K hosts, 64 NVL | 375 switches, 1.5K hosts, 96 NVL | 500 switches, 2K hosts, 128 NVL | 625 switches, 2.5K hosts, 160 NVL| 750 switches, 3K hosts, 192 NVL | 875 switches, 3.5K hosts, 224 NVL| 1K switches, 4K hosts, 256 NVL |
3741

38-
The **cluster** implementation comprises three servers: one master and two workers nodes. NetQ supports high availability using a virtual IP address. Even if the master node fails, NetQ services remain operational. This deployment supports networks with up to 100 switches.
3942

40-
The **scale cluster** deployment supports large networks and allows you to adjust NetQ's network monitoring capacity by adding additional nodes to your cluster as your network expands. For example, you can deploy a three-node scale cluster that accommodates up to 1,000 switches. When you add switches to your network, the extensible framework allows you to add additional nodes to support a greater number of switches. NVIDIA recommends this option for networks comprising 100 or more switches with 100 or more interfaces per switch.
43+
{{%notice note%}}
4144

4245
In both cluster deployments, the majority of nodes must be operational for NetQ to function. For example, a three-node cluster can tolerate a one-node failure, but not a two-node failure. Similarly, a five-node cluster can tolerate a two-node failure, but not a three-node failure. If the majority of failed nodes are Kubernetes control plane nodes, NetQ will no longer function. For more information, refer to the {{<exlink url="https://etcd.io/docs/v3.3/faq/" text="etcd documentation">}}.
46+
{{%/notice%}}
47+
## Verified Limits
4348

49+
The following values have been explicitly tested and validated, but they might not reflect the maximum theoretical system limits for NetQ.
50+
51+
| Deployment Type | Verified Features | Verified Scale Limit | Data Rate | Hardware Requirements |
52+
|-----------------|-------------------|-------|-----------|-----------------------|
53+
| 6-node scale cluster: Ethernet + NVLink | - Ethernet agent features: WJH, RoCE, histograms, adaptive routing, interfaces, inventory, BGP sessions, validations<br>- Switch OTLP data collection<br>- DPU OTLP data collection<br>- NVLink data collection: topology, partitions, metrics | - Ethernet switches: 675 (GPUs: 32K)<br>- DPUs: 8K (OTLP data)<br>- NVLink: 450 GB with 72x1 configuration | - NetQ Agent: ~7 Mbps<br>- OTLP switch: 445 MB/s (3.56 Gbps)<br>- OTLP host: 1,000,000 samples/s at 10-second interval<br>- NVLink: ~32,000 messages/s (2,628 ports)<br>- Counters: 112 per GB/s | 6 nodes, each with:<br> - 48 vCPUs<br> - 512 GB RAM<br> - 3 TB SSD/NVMe |
54+
| 6-node scale cluster: Ethernet + NVLink | - Ethernet agent features: WJH, RoCE, histograms, adaptive routing, interfaces, inventory, BGP sessions, validations<br>- Switch OTLP data collection<br>- DPU OTLP data collection | - Ethernet switches: 1,300 (GPUs: 55K)<br>- DPUs: 14K (OTLP data) | - NetQ Agent: ~7 Mbps<br>- OTLP switch: 445 MB/s (3.56 Gbps)<br>- OTLP host: 1,718,750 samples/s at 10-second interval | 6 nodes, each with:<br> - 48 vCPUs<br> - 512 GB RAM<br> - 3 TB SSD/NVMe |
55+
| 5-node scale cluster: Ethernet + NVLink (Ethernet agent only) | - Ethernet agent features: WJH, RoCE, histograms, adaptive routing, interfaces, inventory, BGP sessions, validations | - Ethernet switches: 1,300 (GPUs: 55K) | - NetQ Agent: ~14 Mbps | 5 nodes, each with:<br> - 48 vCPUs<br> - 512 GB RAM<br> - 3 TB SSD/NVMe |
56+
| 3-node scale cluster: Ethernet + NVLink | - Ethernet agent features: WJH, RoCE, histograms, adaptive routing, interfaces, inventory, BGP sessions, validations<br>- Switch OTLP data collection<br>- DPU OTLP data collection<br>- NVLink data collection: topology, partitions, metrics | - Ethernet switches: 250 (GPUs: 8K)<br>- DPUs: 1K (OTLP data)<br>- NVLink: 100 GB with 72x1 configuration | - NetQ Agent: 2.5 Mbps<br>- OTLP switch: 165 MB/s (1.32 Gbps)<br>- OTLP host: 250,000 samples/s at 10-second interval<br>- NVLink: ~9,200 messages/s (2,628 ports)<br>- Counters: 112 per GB/s | 3 nodes, each with:<br> - 48 vCPUs<br> - 512 GB RAM<br> - 3 TB SSD/NVMe |
57+
| 3-node scale cluster: Ethernet-only | - Ethernet agent features: WJH, RoCE, histograms, adaptive routing, interfaces, inventory, BGP sessions, validations<br>- Ethernet OTLP data collection | - Ethernet switches: 500 (GPUs: 16K)<br>- DPUs: 2K (OTLP data) | - NetQ Agent: 5 Mbps<br>- OTLP switch: 330 MB/s (2.64 Gbps)<br>- OTLP host: 500,000 samples/s at 10-second interval | 3 nodes, each with:<br> - 48 vCPUs<br> - 512 GB RAM<br> - 3 TB SSD/NVMe |
58+
| 3-node scale cluster: NVLink-only | - NVLink data collection: topology, partitions, metrics | - NVLink: 110 GB with 72x1 configuration<br>- Partitions: 1,600 | - NVLink: ~10,000 messages/s (2,628 ports)<br>- Counters: 112 per GB/s | 3 nodes, each with:<br> - 48 vCPUs<br> - 512 GB RAM<br> - 3 TB SSD/NVMe |
59+
| 5-node scale cluster: Ethernet-only | - Ethernet agent features: WJH, RoCE, histograms, adaptive routing, interfaces, inventory, BGP sessions, validations<br>- Ethernet OTLP data collection | - Ethernet switches: 1,000 (GPUs: 32K)<br>- DPUs: 4K (OTLP data) | - NetQ Agent: 10 Mbps<br>- OTLP switch: 660 MB/s (5.28 Gbps)<br>- OTLP host: 1,000,000 samples/s at 10-second interval | 5 nodes, each with:<br> - 48 vCPUs<br> - 512 GB RAM<br> - 3 TB SSD/NVMe |
60+
| 3-node cluster (non-scale): Ethernet-only | - Ethernet agent features: WJH, RoCE, histograms, adaptive routing, interfaces, inventory, BGP sessions, validations<br>- Ethernet OTLP data collection | - Ethernet switches: 50 (GPUs: 1.6K) | - NetQ Agent: 500 Kbps<br>- OTLP switch: 33 MB/s (264 Mbps)<br>- OTLP host: 50,000 samples/s at 10-second interval | 3 nodes, each with:<br> - 16 vCPUs<br> - 64 GB RAM<br> - 500 GB SSD/NVMe |
4461

4562
{{%notice note%}}
4663
Large networks have the potential to generate a large amount of data. For large networks, NVIDIA does not recommend using the NetQ CLI; additionally, {{<link title="Access Data with Cards/#table-settings" text="tabular data in the UI">}} is limited to 10,000 rows. If you need to review a large amount of data, NVIDIA recommends downloading and exporting the tabular data as a CSV or JSON file and analyzing it in a spreadsheet program.
4764
{{%/notice%}}
65+
4866
## Base Command Manager
4967

5068
NetQ is also available through NVIDIA's cluster management software, Base Command Manager. Refer to the {{<exlink url="https://docs.nvidia.com/base-command-manager/#product-manuals" text="Base Command Manager administrator and containerization manuals">}} for instructions on how to launch and configure NetQ using Base Command Manager.

content/cumulus-netq-51/Installation-Management/Install-NetQ/Install-NetQ-CLI.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -82,7 +82,7 @@ If you are running NTP in your out-of-band management network with VRF, specify
8282
8383
### Get the NetQ CLI Software Package for Ubuntu
8484
85-
To install the NetQ CLI on an Ubuntu server, you need to install `netq-apps` on each Ubuntu server. This is available from the {{<exlink url="https://download.nvidia.com/cumulus/apps3.cumulusnetworks.com/repos/deb/pool/netq-latest/" text="NetQ repository">}}.
85+
To install the NetQ CLI on an Ubuntu server, you need to install `netq-apps` on each Ubuntu server. This is available from the {{<exlink url="https://edge.urm.nvidia.com/artifactory/sw-nbu-netq-debian-local/pool/netq-latest/" text="NetQ repository">}}.
8686

8787
{{<tabs "Get NetQ CLI Ubuntu">}}
8888

@@ -174,7 +174,7 @@ You can specify a NetQ CLI version in the repository configuration. The followin
174174
```
175175
nvidia@switch:~$ dpkg-query -W -f '${Package}\t${Version}\n' netq-apps
176176
```
177-
You should see version 5.1.0 in the results: netq-apps_<strong>5.1.0</strong>-cld12u5_amd64.deb
177+
You should see version 5.1.0 in the results: netq-apps_<strong>5.1.0</strong>-cld12u7_amd64.deb
178178

179179
4. Continue with NetQ CLI configuration in the next section.
180180

content/cumulus-netq-51/Installation-Management/Install-NetQ/Install-NetQ-System.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ The following deployment models use NetQ to monitor networks that use both Ether
5151
| Server Arrangement | Hypervisor | Requirements & Installation |
5252
| :--- | --- | :---: |
5353
| High-availability scale cluster: three nodes | KVM or VMware | {{<link title="Install NetQ for Ethernet and NVLink" text="Start install">}} |
54-
| High-availability scale cluster: up to six nodes* | KVM or VMware | {{<link title="Install NetQ for Ethernet and NVLink (Beta)" text="Start install">}} |
54+
| High-availability scale cluster: user-defined nodes* | KVM or VMware | {{<link title="Install NetQ for Ethernet and NVLink (Beta)" text="Start install">}} |
5555
{{</tab>}}
5656

5757
{{</tabs>}}

content/cumulus-netq-51/Installation-Management/Install-NetQ/In‌stall-NetQ-Agents.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -144,7 +144,7 @@ If you are running NTP in your out-of-band management network with VRF, specify
144144
145145
### Obtain NetQ Agent Software Package
146146
147-
To install the NetQ Agent you need to install `netq-agent` on each server. This is available from the {{<exlink url="https://download.nvidia.com/cumulus/apps3.cumulusnetworks.com/repos/deb/pool/netq-latest/" text="NetQ repository">}}.
147+
To install the NetQ Agent you need to install `netq-agent` on each server. This is available from the {{<exlink url="https://edge.urm.nvidia.com/artifactory/sw-nbu-netq-debian-local/pool/netq-latest/" text="NetQ repository">}}.
148148
149149
To obtain the NetQ Agent package:
150150
@@ -232,7 +232,7 @@ If you are running NTP in your out-of-band management network with VRF, specify
232232
233233
### Obtain NetQ Agent Software Package
234234
235-
To install the NetQ Agent you need to install `netq-agent` on each server. This is available from the {{<exlink url="https://download.nvidia.com/cumulus/apps3.cumulusnetworks.com/repos/deb/pool/netq-latest/" text="NetQ repository">}}.
235+
To install the NetQ Agent you need to install `netq-agent` on each server. This is available from the {{<exlink url="https://edge.urm.nvidia.com/artifactory/sw-nbu-netq-debian-local/pool/netq-latest/" text="NetQ repository">}}.
236236
237237
To obtain the NetQ Agent package:
238238

content/cumulus-netq-51/Installation-Management/Install-NetQ/Setup-NVLink-Ethernet-Combined-Cluster-Beta.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ bookhidden: true
77
---
88
Follow these steps to set up and configure your VMs in a cluster of servers. First configure the VM on the master node, and then configure the VM on each additional node. NVIDIA recommends installing the virtual machines on different servers to increase redundancy in the event of a hardware failure.
99
{{<notice info>}}
10-
This deployment option is in beta. Deployments with more than three nodes will require a fresh installation upon subsequent NetQ releases.
10+
This deployment option is in beta. Deployments with more than three nodes will require a fresh installation upon subsequent NetQ releases. {{<link title="Before You Install/#server-arrangement" text="View node support information">}}.
1111
{{</notice>}}
1212
## System Requirements
1313

0 commit comments

Comments
 (0)