Skip to content

Commit 5e3b3e7

Browse files
author
Kristopher Turner
committed
docs: add Variables tables and Troubleshooting sections to Part 3 On-Prem Readiness
- Phase 01 (AD): Added Variables from variables.yml tables to all 5 tasks - Phase 01 task-01: Added Troubleshooting section (4 common issues) - Phase 02: Added Variables tables to tasks 01-03 - Phase 02 task-01: Added Troubleshooting section (4 hardware issues) - Phase 03: Added Variables tables to all 4 tasks - Phase 03 task-02: Added Troubleshooting section (5 switch config issues) - Phase 03 task-03: Added Troubleshooting section (4 firewall issues) Part of implementation standardization - aligning all task pages to canonical format.
1 parent b8cf597 commit 5e3b3e7

12 files changed

Lines changed: 186 additions & 0 deletions

docs/implementation/03-onprem-readiness/phase-01-active-directory/task-01-ou-creation-pre-creation-artifacts.mdx

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,21 @@ AD structure should be defined during the [planning phase](../../../planning/).
5454

5555
---
5656

57+
## Variables from variables.yml
58+
59+
| Variable Path | Type | Description |
60+
|--------------|------|-------------|
61+
| `identity.active_directory.ad_clusters_ou_path` | string | OU path for Azure Local cluster computer objects |
62+
| `identity.active_directory.ad_domain_fqdn` | string | Active Directory domain FQDN |
63+
| `identity.accounts.account_lcm_username` | string | Lifecycle Manager deployment account username |
64+
| `identity.accounts.account_lcm_password` | string | LCM account password (keyvault:// URI) |
65+
| `platform.kv_platform_name` | string | Key Vault name for secret retrieval |
66+
| `azure_vms.dc01.resource_group` | string | Resource group of the domain controller VM |
67+
| `azure_vms.dc01.name` | string | Domain controller VM name (for AzVM execution) |
68+
| `azure_vms.dc01.hostname` | string | Domain controller hostname (for Arc execution) |
69+
70+
---
71+
5772
## Execution
5873

5974
<Tabs groupId="deployment-method">
@@ -649,6 +664,17 @@ if (-not $SkipCleanup) {
649664

650665
---
651666

667+
## Troubleshooting
668+
669+
| Issue | Cause | Resolution |
670+
|-------|-------|------------|
671+
| `New-ADOrganizationalUnit` fails with access denied | Insufficient permissions to create OUs | Run as Domain Admin or delegate OU creation rights on the parent container |
672+
| KDS Root Key not effective immediately | Key needs 10-hour replication delay | Use `-EffectiveImmediately` in lab or wait 10 hours in production |
673+
| AsHciADArtifactsPreCreationTool module not found | Module not installed or wrong PS version | Install via `Install-Module AsHciADArtifactsPreCreationTool` on PowerShell 5.1+ |
674+
| LCM user creation fails with duplicate | Account already exists from prior attempt | Verify existing account properties match requirements or remove and recreate |
675+
676+
---
677+
652678
## Next Steps
653679

654680
Proceed to [Task 2 - Security Groups](./task-02-security-groups.mdx) to create optional security groups.

docs/implementation/03-onprem-readiness/phase-01-active-directory/task-02-security-groups.mdx

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,18 @@ The active directory information in these documents should have been decided in
5454
- Domain admin access for group creation
5555
- Understanding of group scope and security requirements
5656

57+
## Variables from variables.yml
58+
59+
| Variable Path | Type | Description |
60+
|--------------|------|-------------|
61+
| `identity.active_directory.ad_security_groups_ou_path` | string | OU path for security group creation |
62+
| `identity.active_directory.security_groups.org_prefix` | string | Organization prefix for group naming |
63+
| `identity.active_directory.security_groups.cluster_id` | string | Cluster identifier for group naming |
64+
| `identity.active_directory.security_groups.<key>.name` | string | Full security group name per role |
65+
| `identity.active_directory.security_groups.<key>.description` | string | Security group description |
66+
67+
---
68+
5769
## Security Group Model
5870

5971
Group names follow the convention `SG-{org_prefix}-{cluster_id}-AZL-{role}`, built dynamically from two fields in `active_directory.security_groups` in `variables.yml`. The `cluster_id` suffix makes groups unique per cluster in the same domain.

docs/implementation/03-onprem-readiness/phase-01-active-directory/task-03-dns-node-a-records.mdx

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,16 @@ The active directory information in these documents should have been decided in
5454
- Node hostnames and management IP addresses
5555
- Authoritative DNS server permissions
5656

57+
## Variables from variables.yml
58+
59+
| Variable Path | Type | Description |
60+
|--------------|------|-------------|
61+
| `identity.active_directory.domain.fqdn` | string | DNS zone name (AD domain FQDN) |
62+
| `cluster_arm_deployment.arc_node_resource_ids` | list | Node resource IDs (hostnames extracted) |
63+
| `cluster_arm_deployment.starting_ip` | string | Starting management IP for node A records |
64+
65+
---
66+
5767
## DNS Record Creation
5868

5969
Pre-create forward lookup A records for each Azure Local node hostname. Do NOT create cluster name (CNO) or virtual client access name (VCO) records now—those are generated later by the failover cluster process.

docs/implementation/03-onprem-readiness/phase-01-active-directory/task-04-service-admin-accounts.mdx

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,17 @@ The active directory information in these documents should have been decided in
5656
- Domain admin access for account creation
5757
- Secure password storage solution (vault)
5858

59+
## Variables from variables.yml
60+
61+
| Variable Path | Type | Description |
62+
|--------------|------|-------------|
63+
| `identity.active_directory.ad_service_accounts_ou_path` | string | OU path for service account creation |
64+
| `identity.active_directory.ad_security_groups_ou_path` | string | OU path for gMSA readers group |
65+
| `identity.accounts.account_lcm_username` | string | LCM account name (cluster ID extracted) |
66+
| `platform.kv_platform_name` | string | Key Vault name for break-glass password storage |
67+
68+
---
69+
5970
## Account Creation
6071

6172
Lifecycle Manager (LCM) user created in Step 1. Add break‑glass admin and optional gMSA scaffold.

docs/implementation/03-onprem-readiness/phase-01-active-directory/task-05-group-assignments.mdx

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,17 @@ The active directory information in these documents should have been decided in
5252
- Service and admin accounts created (Step 4)
5353
- Domain admin access for group management
5454

55+
## Variables from variables.yml
56+
57+
| Variable Path | Type | Description |
58+
|--------------|------|-------------|
59+
| `identity.accounts.account_lcm_username` | string | LCM account name (cluster ID extracted for group lookup) |
60+
| `identity.active_directory.security_groups` | object | Full security groups configuration object |
61+
| `identity.active_directory.security_groups.<key>.name` | string | Security group name per role |
62+
| `identity.active_directory.security_groups.<key>.members` | string | Members to assign to each group |
63+
64+
---
65+
5566
## Procedures
5667

5768
<Tabs groupId="deployment-method">

docs/implementation/03-onprem-readiness/phase-02-enterprise-readiness/task-01-hardware-inspection.mdx

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,19 @@ Perform visual inspection of all physical infrastructure to verify installation
4040

4141
---
4242

43+
## Variables from variables.yml
44+
45+
| Variable Path | Type | Description |
46+
|---|---|---|
47+
| `compute.nodes[]` | Array | Node hostnames, management IPs, iDRAC IPs, MAC addresses |
48+
| `compute.nodes[].hostname` | String | Server hostname for label verification |
49+
| `compute.nodes[].idrac_ip` | String | iDRAC IP address for remote management |
50+
| `networking.network_devices.opengear` | Object | OpenGear console server hostname, IP, model, port mappings |
51+
| `networking.network_devices.switch_primary` | Object | Primary ToR switch hostname, IP, model |
52+
| `networking.network_devices.switch_secondary` | Object | Secondary ToR switch hostname, IP, model |
53+
54+
---
55+
4356
## Execution Options
4457

4558

@@ -250,6 +263,17 @@ Document any discrepancies found during inspection:
250263

251264
---
252265

266+
## Troubleshooting
267+
268+
| Issue | Possible Cause | Resolution |
269+
|---|---|---|
270+
| Server not powering on | PDU circuit breaker tripped or power cable loose | Verify PDU status lights, reseat power cables, check circuit allocation |
271+
| iDRAC not accessible | Incorrect IP assignment or VLAN tagging | Verify iDRAC IP matches LLD, confirm OOB VLAN is trunked to management switch |
272+
| Missing serial port labels | Cable labels not applied during rack installation | Cross-reference OpenGear port mapping document and re-label |
273+
| Network link LED not lit | SFP+ module not seated or wrong cable type | Reseat SFP+, verify DAC cable type matches switch port requirements |
274+
275+
---
276+
253277
## Validation
254278

255279
**All checks passed?**

docs/implementation/03-onprem-readiness/phase-02-enterprise-readiness/task-02-network-service-verification.mdx

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,18 @@ Azure and Dell endpoint connectivity testing occurs in **Phase 03 Step 4** after
5656

5757
---
5858

59+
## Variables from variables.yml
60+
61+
| Variable Path | Type | Description |
62+
|---|---|---|
63+
| `cluster_arm_deployment.dns_servers` | Array | DNS server IP addresses for resolution tests |
64+
| `cluster_arm_deployment.domain_fqdn` | String | Active Directory domain FQDN |
65+
| `compute.nodes[]` | Array | Node details for connectivity testing |
66+
| `networking.network_devices.opengear` | Object | OpenGear console server IP for OOB verification |
67+
| `networking.onprem.vlans.management.gateway` | String | Management VLAN gateway IP for routing tests |
68+
69+
---
70+
5971
## DNS Resolution Tests
6072

6173
<Tabs groupId="deployment-method">

docs/implementation/03-onprem-readiness/phase-02-enterprise-readiness/task-03-opengear-verification.mdx

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,18 @@ This verification requires access to the Lighthouse portal. Ensure you have the
5252

5353
---
5454

55+
## Variables from variables.yml
56+
57+
| Variable Path | Type | Description |
58+
|---|---|---|
59+
| `networking.network_devices.opengear` | Object | OpenGear hostname, IP, model, Lighthouse enrollment token |
60+
| `networking.network_devices.opengear.ports[]` | Array | Serial port-to-node mappings (port number, connected device, baud rate) |
61+
| `compute.nodes[]` | Array | Node hostnames for console port label verification |
62+
| `networking.onprem.vlans.oob` | Object | OOB VLAN ID, CIDR, gateway for network connectivity tests |
63+
| `virtual_machines.lighthouse` | Object | Lighthouse portal VM details |
64+
65+
---
66+
5567
## Lighthouse Portal Verification
5668

5769
<Tabs groupId="deployment-method">

docs/implementation/03-onprem-readiness/phase-03-network-infrastructure/task-01-opengear-console-server.mdx

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,17 @@ When iDRAC interfaces are connected to the out-of-band management network, they
6161

6262
---
6363

64+
## Variables from variables.yml
65+
66+
| Variable Path | Type | Description |
67+
|---|---|---|
68+
| `networking.network_devices.opengear` | Object | OpenGear hostname, IP, model, Lighthouse enrollment token |
69+
| `networking.network_devices.opengear.ports[]` | Array | Serial port-to-node mappings (port number, connected device) |
70+
| `networking.onprem.vlans.oob` | Object | OOB VLAN ID, CIDR, gateway for management network |
71+
| `compute.nodes[].hostname` | String | Node hostnames for serial port labelling |
72+
73+
---
74+
6475
## Required Firewall Ports for Lighthouse Connectivity
6576

6677
**CRITICAL**: These ports must be allowed outbound from OpenGear NET1 interface to the Internet:

docs/implementation/03-onprem-readiness/phase-03-network-infrastructure/task-02-dell-powerswitch-configuration.mdx

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,18 @@ Configure Dell PowerSwitch TOR switches with Data Center Bridging (DCB) for RDMA
4949

5050
---
5151

52+
## Variables from variables.yml
53+
54+
| Variable Path | Type | Description |
55+
|---|---|---|
56+
| `networking.network_devices.switch_primary` | Object | Primary ToR switch hostname, management IP, model |
57+
| `networking.network_devices.switch_secondary` | Object | Secondary ToR switch hostname, management IP, model |
58+
| `networking.onprem.vlans.management` | Object | Management VLAN ID, CIDR, gateway |
59+
| `networking.onprem.storage.vlans` | Object | Storage VLAN IDs (711-714) for trunk port configuration |
60+
| `compute.nodes[].hostname` | String | Node hostnames for port description labels |
61+
62+
---
63+
5264
## Configuration Areas Overview
5365

5466
| Configuration Area | Purpose | Key Settings |
@@ -451,6 +463,18 @@ show interface status
451463

452464
---
453465

466+
## Troubleshooting
467+
468+
| Issue | Possible Cause | Resolution |
469+
|---|---|---|
470+
| VLT peer link down | Incorrect port-channel member assignment or cable fault | Verify VLT configuration, check cables between switches, confirm port-channel membership |
471+
| PFC not negotiating | DCBX mode mismatch between switch and NIC | Set DCBX to IEEE on both ends, verify NIC firmware supports RoCEv2 |
472+
| Storage VLAN unreachable | VLAN not added to trunk port or VLAN not created | Verify VLAN exists (`show vlan`), confirm trunk port has VLAN tagged |
473+
| Jumbo frames not working | MTU mismatch along path | Confirm MTU 9216 on all switch ports, VLANs, and host NICs in the storage path |
474+
| SSH access denied | SSH service not enabled or management ACL blocking | Run `show ip ssh`, verify SSH enabled, check management ACL entries |
475+
476+
---
477+
454478
## Next Steps
455479

456480
Proceed to [Task 3 - Verify Firewall Endpoints](./task-03-firewall-endpoint-verification.mdx) to verify firewall rules for required Azure and Dell endpoints.

0 commit comments

Comments
 (0)