Skip to content

Latest commit

 

History

History
458 lines (352 loc) · 13.2 KB

File metadata and controls

458 lines (352 loc) · 13.2 KB
title Task 03: Network & RDMA Validation
sidebar_label Task 03: Network & RDMA Validation
sidebar_position 3
description Validate network connectivity, RDMA configuration, and DCB settings for Azure Local cluster

Task 03: Network & RDMA Validation

Runbook Azure

DOCUMENT CATEGORY: Runbook SCOPE: Network and RDMA validation PURPOSE: Validate network stack, RDMA, and DCB configuration MASTER REFERENCE: Microsoft Learn - Validate-DCB

Status: Active


Overview

This step validates the complete network stack including RDMA configuration, DCB (Data Center Bridging) settings, VLAN connectivity, and core network services. A successful network validation is critical for storage performance and cluster communication.

Prerequisites

  • [ ] Infrastructure health validation completed (Step 1)
  • [ ] Administrative access to all cluster nodes
  • [ ] Physical network switches configured for RDMA
  • [ ] VLAN IDs documented and configured

Report Output

All validation results are saved to:

\\<ClusterName>\ClusterStorage$\Collect\validation-reports\03-network-rdma-validation-YYYYMMDD.txt

Part 1: Initialize Validation Environment

1.1 Create Report Directory

# Run from any cluster node
$ClusterName = (Get-Cluster).Name
$DateStamp = Get-Date -Format "yyyyMMdd"
$ReportPath = "C:\ClusterStorage\Collect\validation-reports"
$ReportFile = "$ReportPath\03-network-rdma-validation-$DateStamp.txt"

# Create directory if not exists
if (-not (Test-Path $ReportPath)) {
 New-Item -Path $ReportPath -ItemType Directory -Force
}

# Initialize report
$ReportHeader = @"
================================================================================
NETWORK & RDMA VALIDATION REPORT
================================================================================
Cluster: $ClusterName
Date: $(Get-Date -Format "yyyy-MM-dd HH:mm:ss")
Generated By: $(whoami)
================================================================================

"@
$ReportHeader | Out-File -FilePath $ReportFile -Encoding UTF8

1.2 Install Required Modules

# Install Validate-DCB module if not present
if (-not (Get-Module -ListAvailable -Name Validate-DCB)) {
 Install-Module -Name Validate-DCB -Force -Scope AllUsers
}
Import-Module Validate-DCB

# Verify Test-NetStack is available (built into Windows Server 2022)
Get-Command Test-NetStack -ErrorAction SilentlyContinue

Part 2: RDMA Adapter Validation

2.1 Verify RDMA Adapters

# Check RDMA adapter status on all nodes
$Nodes = (Get-ClusterNode).Name

$RdmaResults = foreach ($Node in $Nodes) {
 Invoke-Command -ComputerName $Node -ScriptBlock {
 Get-NetAdapterRdma | Select-Object @{N='Node';E={$env:COMPUTERNAME}},
 Name, InterfaceDescription, Enabled, 
 @{N='OperationalState';E={if($_.Enabled){"Operational"}else{"Disabled"}}}
 }
}

# Display and log results
$RdmaResults | Format-Table -AutoSize
$RdmaResults | Format-Table -AutoSize | Out-String | Add-Content $ReportFile

# Verify all RDMA adapters are enabled
$DisabledRdma = $RdmaResults | Where-Object { -not $_.Enabled }
if ($DisabledRdma) {
 "WARNING: RDMA disabled on adapters:" | Add-Content $ReportFile
 $DisabledRdma | Format-Table | Out-String | Add-Content $ReportFile
}

2.2 Verify RDMA Mode (RoCE v2 vs iWARP)

# Check RDMA protocol type
$RdmaProtocol = foreach ($Node in $Nodes) {
 Invoke-Command -ComputerName $Node -ScriptBlock {
 Get-NetAdapterAdvancedProperty -Name "Storage*" -RegistryKeyword "*NetworkDirect*" -ErrorAction SilentlyContinue |
 Select-Object @{N='Node';E={$env:COMPUTERNAME}}, Name, RegistryKeyword, RegistryValue
 }
}

"RDMA Protocol Configuration:" | Add-Content $ReportFile
$RdmaProtocol | Format-Table -AutoSize | Out-String | Add-Content $ReportFile

2.3 SMB Direct Status

# Verify SMB Direct is enabled
$SmbDirect = foreach ($Node in $Nodes) {
 Invoke-Command -ComputerName $Node -ScriptBlock {
 Get-SmbClientNetworkInterface | Where-Object RdmaCapable -eq $true |
 Select-Object @{N='Node';E={$env:COMPUTERNAME}}, InterfaceIndex, 
 FriendlyName, RdmaCapable
 }
}

"`nSMB Direct (RDMA) Capable Interfaces:" | Add-Content $ReportFile
$SmbDirect | Format-Table -AutoSize | Out-String | Add-Content $ReportFile

Part 3: DCB Validation

3.1 Run Validate-DCB

# Run comprehensive DCB validation
# This validates Priority Flow Control (PFC) and Enhanced Transmission Selection (ETS)

$DcbResults = Validate-DCB -Verbose

# Log results
"`n" + "="*80 | Add-Content $ReportFile
"DCB VALIDATION RESULTS" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile
$DcbResults | Out-String | Add-Content $ReportFile

3.2 Verify PFC Configuration

# Check Priority Flow Control settings
$PfcSettings = foreach ($Node in $Nodes) {
 Invoke-Command -ComputerName $Node -ScriptBlock {
 Get-NetQosDcbxSetting | Select-Object @{N='Node';E={$env:COMPUTERNAME}},
 InterfaceAlias, Willing
 Get-NetQosFlowControl | Select-Object @{N='Node';E={$env:COMPUTERNAME}},
 Priority, Enabled
 }
}

"`nPriority Flow Control Settings:" | Add-Content $ReportFile
$PfcSettings | Format-Table -AutoSize | Out-String | Add-Content $ReportFile

3.3 Verify ETS Configuration

# Check Enhanced Transmission Selection
$EtsSettings = foreach ($Node in $Nodes) {
 Invoke-Command -ComputerName $Node -ScriptBlock {
 Get-NetQosTrafficClass | Select-Object @{N='Node';E={$env:COMPUTERNAME}},
 Name, Priority, BandwidthPercentage, Algorithm
 }
}

"`nETS Traffic Class Settings:" | Add-Content $ReportFile
$EtsSettings | Format-Table -AutoSize | Out-String | Add-Content $ReportFile

Part 4: Test-NetStack Validation

4.1 Run Network Stack Tests

# Test-NetStack validates the entire network stack
# Run between storage network adapters

# Get storage adapter IPs
$StorageAdapters = Get-NetAdapter -Name "Storage*" | Get-NetIPAddress -AddressFamily IPv4

"`n" + "="*80 | Add-Content $ReportFile
"NETWORK STACK TEST RESULTS (Test-NetStack)" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile

# Run Test-NetStack between first two nodes
$Node1 = $Nodes[0]
$Node2 = $Nodes[1]

# Get storage IPs for each node
$Node1StorageIP = (Invoke-Command -ComputerName $Node1 -ScriptBlock {
 (Get-NetIPAddress -InterfaceAlias "Storage*" -AddressFamily IPv4)[0].IPAddress
})
$Node2StorageIP = (Invoke-Command -ComputerName $Node2 -ScriptBlock {
 (Get-NetIPAddress -InterfaceAlias "Storage*" -AddressFamily IPv4)[0].IPAddress
})

# Run Test-NetStack (requires Windows Server 2022+)
Invoke-Command -ComputerName $Node1 -ScriptBlock {
 param($TargetIP)
 Test-NetStack -Target $TargetIP -EnableFirewallRules
} -ArgumentList $Node2StorageIP | Out-String | Add-Content $ReportFile

4.2 RDMA Traffic Test

# Test RDMA connectivity between nodes
"`nRDMA Traffic Test:" | Add-Content $ReportFile

Invoke-Command -ComputerName $Node1 -ScriptBlock {
 param($TargetIP)
 # Test RDMA using NTttcp with RDMA mode
 Test-NetStack -Target $TargetIP -EnableRDMA
} -ArgumentList $Node2StorageIP | Out-String | Add-Content $ReportFile

Part 5: VLAN Connectivity Validation

5.1 Verify VLAN Configuration

# Check VLAN assignments on virtual adapters
$VlanConfig = foreach ($Node in $Nodes) {
 Invoke-Command -ComputerName $Node -ScriptBlock {
 Get-VMNetworkAdapterVlan -ManagementOS | 
 Select-Object @{N='Node';E={$env:COMPUTERNAME}},
 ParentAdapter, AccessVlanId, NativeVlanId, OperationMode
 }
}

"`n" + "="*80 | Add-Content $ReportFile
"VLAN CONFIGURATION" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile
$VlanConfig | Format-Table -AutoSize | Out-String | Add-Content $ReportFile

5.2 Test VLAN Connectivity

# Ping test across VLANs
$VlanTests = @(
 @{Name="Management"; VLAN=711; TestIP="10.X.X.1"},
 @{Name="Storage1"; VLAN=712; TestIP="10.X.X.1"},
 @{Name="Storage2"; VLAN=713; TestIP="10.X.X.1"},
 @{Name="VM Traffic"; VLAN=714; TestIP="10.X.X.1"}
)

"`nVLAN Connectivity Tests:" | Add-Content $ReportFile
foreach ($Vlan in $VlanTests) {
 # Replace TestIP with actual gateway/target for each VLAN
 $Result = Test-Connection -ComputerName $Vlan.TestIP -Count 2 -Quiet -ErrorAction SilentlyContinue
 $Status = if ($Result) { "PASS" } else { "FAIL" }
 "VLAN $($Vlan.VLAN) ($($Vlan.Name)): $Status" | Add-Content $ReportFile
}

Part 6: Core Network Services Validation

6.1 DNS Resolution

"`n" + "="*80 | Add-Content $ReportFile
"CORE NETWORK SERVICES" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile

# Test DNS resolution on all nodes
$DnsTests = @(
 "management.azure.com",
 "login.microsoftonline.com",
 "$ClusterName",
 "dc01.domain.local" # Replace with actual DC
)

"`nDNS Resolution Tests:" | Add-Content $ReportFile
foreach ($Target in $DnsTests) {
 $Result = Resolve-DnsName -Name $Target -ErrorAction SilentlyContinue
 $Status = if ($Result) { "PASS - $($Result[0].IPAddress)" } else { "FAIL" }
 "$Target : $Status" | Add-Content $ReportFile
}

6.2 NTP Synchronization

# Verify time sync across all nodes
$TimeSync = foreach ($Node in $Nodes) {
 Invoke-Command -ComputerName $Node -ScriptBlock {
 $w32tm = w32tm /query /status 2>&1
 [PSCustomObject]@{
 Node = $env:COMPUTERNAME
 Source = ($w32tm | Select-String "Source:").ToString().Split(":")[1].Trim()
 Stratum = ($w32tm | Select-String "Stratum:").ToString().Split(":")[1].Trim()
 LastSync = ($w32tm | Select-String "Last Successful").ToString().Split(": ")[1]
 }
 }
}

"`nNTP Time Synchronization:" | Add-Content $ReportFile
$TimeSync | Format-Table -AutoSize | Out-String | Add-Content $ReportFile

# Check for time skew between nodes
$TimeDiff = foreach ($Node in $Nodes) {
 Invoke-Command -ComputerName $Node -ScriptBlock {
 [PSCustomObject]@{
 Node = $env:COMPUTERNAME
 CurrentTime = Get-Date -Format "yyyy-MM-dd HH:mm:ss.fff"
 }
 }
}
"`nNode Time Comparison:" | Add-Content $ReportFile
$TimeDiff | Format-Table | Out-String | Add-Content $ReportFile

6.3 Azure Connectivity

# Test connectivity to Azure endpoints
$AzureEndpoints = @(
 "management.azure.com",
 "login.microsoftonline.com",
 "graph.microsoft.com",
 "azurestackr01.azurestack.hci.microsoft.com"
)

"`nAzure Endpoint Connectivity:" | Add-Content $ReportFile
foreach ($Endpoint in $AzureEndpoints) {
 $Result = Test-NetConnection -ComputerName $Endpoint -Port 443 -WarningAction SilentlyContinue
 $Status = if ($Result.TcpTestSucceeded) { "PASS" } else { "FAIL" }
 "$Endpoint : $Status (Latency: $($Result.PingReplyDetails.RoundtripTime)ms)" | Add-Content $ReportFile
}

Part 7: Generate Summary

7.1 Create Validation Summary

# Summary section
"`n" + "="*80 | Add-Content $ReportFile
"NETWORK VALIDATION SUMMARY" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile

$Summary = @"

Validation Category Status
------------------------------- --------
RDMA Adapters Enabled $(if($DisabledRdma){"FAIL"}else{"PASS"})
DCB Configuration $(if($DcbResults -match "FAIL"){"FAIL"}else{"PASS"})
SMB Direct Operational $(if($SmbDirect){"PASS"}else{"FAIL"})
VLAN Connectivity REVIEW ABOVE
DNS Resolution REVIEW ABOVE
NTP Synchronization $(if($TimeSync.Count -eq $Nodes.Count){"PASS"}else{"FAIL"})
Azure Connectivity REVIEW ABOVE

"@

$Summary | Add-Content $ReportFile

# Report location
"`nReport saved to: $ReportFile" | Add-Content $ReportFile
Write-Host "`nNetwork validation complete. Report: $ReportFile" -ForegroundColor Green

Validation Checklist

Category Test Expected Result Status
RDMA All adapters enabled Enabled = True
RDMA SMB Direct operational RDMA-capable interfaces listed
DCB Validate-DCB passes No FAIL results
DCB PFC enabled on correct priority Priority 3 enabled
DCB ETS bandwidth allocation SMB Direct ≥ 50%
VLAN All VLANs accessible Ping succeeds
DNS Name resolution works All targets resolve
NTP Time synchronized Stratum ≤ 4, all nodes sync
Azure Endpoint connectivity All endpoints reachable on 443

Common Issues

RDMA Not Operational

# Re-enable RDMA on adapter
Enable-NetAdapterRdma -Name "Storage1"

# Verify RDMA is operational
Get-NetAdapterRdma -Name "Storage1"

DCB Misconfiguration

# Reset DCB to defaults and reconfigure
# WARNING: May disrupt storage traffic

# Remove existing policies
Get-NetQosPolicy | Remove-NetQosPolicy -Confirm:$false

# Recreate SMB Direct policy
New-NetQosPolicy -Name "SMB Direct" -NetDirectPortMatchCondition 445 -PriorityValue8021Action 3
Enable-NetQosFlowControl -Priority 3
Disable-NetQosFlowControl -Priority 0,1,2,4,5,6,7

Time Skew Between Nodes

# Force time sync
w32tm /resync /force

# Verify sync source
w32tm /query /source

Next Step

Proceed to Task 4: High Availability Testing once network validation is complete.