Skip to content

Latest commit

 

History

History
638 lines (490 loc) · 18.9 KB

File metadata and controls

638 lines (490 loc) · 18.9 KB
title Task 06: Backup & DR Validation
sidebar_label Task 06: Backup & DR Validation
sidebar_position 6
description Validate backup operations, restore procedures, and disaster recovery capabilities

Task 06: Backup & DR Validation

Runbook Azure

DOCUMENT CATEGORY: Runbook SCOPE: Backup and disaster recovery validation PURPOSE: Validate backup jobs, test restores, and document RPO/RTO MASTER REFERENCE: Microsoft Learn - Azure Backup for Azure Local

Status: Active


Overview

This step validates the backup and disaster recovery capabilities for the Azure Local cluster, including Azure Backup operations, test restores, and DR failover validation.

Prerequisites

  • [ ] All previous validation steps completed (Steps 1-5)
  • [ ] backup server configured (Stage 17)
  • [ ] Azure Site Recovery configured (if applicable)
  • [ ] Test VM available for restore testing
  • [ ] Sufficient storage for restore operations

Report Output

All validation results are saved to:

\\<ClusterName>\ClusterStorage$\Collect\validation-reports\06-backup-dr-validation-YYYYMMDD.txt

Part 1: Initialize Validation

1.1 Setup Environment

# Initialize variables
$ClusterName = (Get-Cluster).Name
$DateStamp = Get-Date -Format "yyyyMMdd"
$ReportPath = "C:\ClusterStorage\Collect\validation-reports"
$ReportFile = "$ReportPath\06-backup-dr-validation-$DateStamp.txt"

$BackupServer = "<Azure Backup-Server-Name>" # Replace with actual backup server

# Initialize report
$ReportHeader = @"
================================================================================
BACKUP & DISASTER RECOVERY VALIDATION REPORT
================================================================================
Cluster: $ClusterName
backup server: $BackupServer
Date: $(Get-Date -Format "yyyy-MM-dd HH:mm:ss")
Generated By: $(whoami)
================================================================================

"@
$ReportHeader | Out-File -FilePath $ReportFile -Encoding UTF8

Part 2: Azure Backup Configuration Validation

2.1 Verify backup agent Status

"`n" + "="*80 | Add-Content $ReportFile
"backup agent STATUS" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile

$Nodes = (Get-ClusterNode).Name

foreach ($Node in $Nodes) {
 $AgentStatus = Invoke-Command -ComputerName $Node -ScriptBlock {
 $Service = Get-Service -Name "DPMRA" -ErrorAction SilentlyContinue
 if ($Service) {
 [PSCustomObject]@{
 Node = $env:COMPUTERNAME
 ServiceStatus = $Service.Status
 StartType = $Service.StartType
 }
 } else {
 [PSCustomObject]@{
 Node = $env:COMPUTERNAME
 ServiceStatus = "Not Installed"
 StartType = "N/A"
 }
 }
 }
 
 "$($AgentStatus.Node): backup agent = $($AgentStatus.ServiceStatus) ($($AgentStatus.StartType))" | Add-Content $ReportFile
}

2.2 Verify Protection Groups

# Connect to Azure Backup (run from backup server or remote session)
$BackupSession = New-PSSession -ComputerName $BackupServer

$ProtectionGroups = Invoke-Command -Session $BackupSession -ScriptBlock {
 Import-Module DataProtectionManager
 Get-DPMProtectionGroup | Select-Object FriendlyName, @{N='Members';E={($_ | Get-DPMDatasource).Name -join ", "}}, 
 @{N='Status';E={$_.ProtectionStatus}}
}

"`nProtection Groups:" | Add-Content $ReportFile
$ProtectionGroups | Format-Table -AutoSize | Out-String | Add-Content $ReportFile

Remove-PSSession $BackupSession

2.3 Check VSS Writers

"`nVSS Writers Status on Cluster Nodes:" | Add-Content $ReportFile

foreach ($Node in $Nodes) {
 $VSSWriters = Invoke-Command -ComputerName $Node -ScriptBlock {
 $vss = vssadmin list writers 2>&1
 # Parse for failed writers
 $Failed = $vss | Select-String "State: \[(\d+)\]" | Where-Object { $_.Matches.Groups[1].Value -ne "1" }
 [PSCustomObject]@{
 Node = $env:COMPUTERNAME
 TotalWriters = ($vss | Select-String "Writer name:").Count
 FailedWriters = $Failed.Count
 }
 }
 
 "$($VSSWriters.Node): Total=$($VSSWriters.TotalWriters), Failed=$($VSSWriters.FailedWriters)" | Add-Content $ReportFile
 
 if ($VSSWriters.FailedWriters -gt 0) {
 " WARNING: $($VSSWriters.FailedWriters) VSS writers in failed state" | Add-Content $ReportFile
 }
}

Part 3: Backup Job Validation

3.1 Review Recent Backup Jobs

"`n" + "="*80 | Add-Content $ReportFile
"BACKUP JOB HISTORY" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile

$BackupSession = New-PSSession -ComputerName $BackupServer

$RecentJobs = Invoke-Command -Session $BackupSession -ScriptBlock {
 Import-Module DataProtectionManager
 
 # Get jobs from last 7 days
 $StartDate = (Get-Date).AddDays(-7)
 Get-DPMJob -From $StartDate | Select-Object -First 20 @{N='DataSource';E={$_.Datasource.Name}},
 @{N='Type';E={$_.Type}},
 @{N='Status';E={$_.Status}},
 @{N='StartTime';E={$_.StartTime}},
 @{N='EndTime';E={$_.EndTime}},
 @{N='Duration';E={if($_.EndTime){($_.EndTime - $_.StartTime).ToString("hh\:mm\:ss")}else{"Running"}}}
}

"`nLast 20 Backup Jobs:" | Add-Content $ReportFile
$RecentJobs | Format-Table -AutoSize | Out-String | Add-Content $ReportFile

# Summary statistics
$SuccessCount = ($RecentJobs | Where-Object { $_.Status -eq "Succeeded" }).Count
$FailedCount = ($RecentJobs | Where-Object { $_.Status -eq "Failed" }).Count
$TotalJobs = $RecentJobs.Count

"`nJob Summary (Last 7 Days):" | Add-Content $ReportFile
" Total Jobs: $TotalJobs" | Add-Content $ReportFile
" Successful: $SuccessCount" | Add-Content $ReportFile
" Failed: $FailedCount" | Add-Content $ReportFile
" Success Rate: $([math]::Round(($SuccessCount / $TotalJobs) * 100, 1))%" | Add-Content $ReportFile

Remove-PSSession $BackupSession

3.2 Run On-Demand Backup

"`nOn-Demand Backup Test:" | Add-Content $ReportFile

$BackupSession = New-PSSession -ComputerName $BackupServer

$BackupResult = Invoke-Command -Session $BackupSession -ScriptBlock {
 Import-Module DataProtectionManager
 
 # Get first VM datasource for test backup
 $PG = Get-DPMProtectionGroup | Select-Object -First 1
 $DS = $PG | Get-DPMDatasource | Select-Object -First 1
 
 if ($DS) {
 # Create recovery point (express full backup)
 $BackupStart = Get-Date
 $Job = New-DPMRecoveryPoint -Datasource $DS -Disk
 
 # Wait for job completion (max 30 minutes)
 $Timeout = (Get-Date).AddMinutes(30)
 while ($Job.Status -eq "InProgress" -and (Get-Date) -lt $Timeout) {
 Start-Sleep -Seconds 30
 $Job = Get-DPMJob -JobId $Job.ActivityId
 }
 
 $BackupEnd = Get-Date
 
 [PSCustomObject]@{
 DataSource = $DS.Name
 Status = $Job.Status
 Duration = ($BackupEnd - $BackupStart).ToString("mm\:ss")
 Size = $Job.TotalBytes
 }
 } else {
 [PSCustomObject]@{
 DataSource = "None"
 Status = "No datasources configured"
 Duration = "N/A"
 Size = 0
 }
 }
}

" Data Source: $($BackupResult.DataSource)" | Add-Content $ReportFile
" Status: $($BackupResult.Status)" | Add-Content $ReportFile
" Duration: $($BackupResult.Duration)" | Add-Content $ReportFile
" Size: $([math]::Round($BackupResult.Size / 1GB, 2)) GB" | Add-Content $ReportFile

Remove-PSSession $BackupSession

Part 4: Restore Validation

4.1 Test VM Restore

"`n" + "="*80 | Add-Content $ReportFile
"RESTORE VALIDATION" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile

$BackupSession = New-PSSession -ComputerName $BackupServer

$RestoreResult = Invoke-Command -Session $BackupSession -ScriptBlock {
 param($ClusterName)
 Import-Module DataProtectionManager
 
 # Get a VM datasource with recovery points
 $PG = Get-DPMProtectionGroup | Select-Object -First 1
 $DS = $PG | Get-DPMDatasource | Where-Object { $_.Type -match "Hyper-V" } | Select-Object -First 1
 
 if ($DS) {
 # Get latest recovery point
 $RecoveryPoints = Get-DPMRecoveryPoint -Datasource $DS
 $LatestRP = $RecoveryPoints | Sort-Object BackupTime -Descending | Select-Object -First 1
 
 if ($LatestRP) {
 # Perform restore to alternate location
 $RestoreStart = Get-Date
 
 # Get recovery option for alternate location restore
 $ROpt = New-DPMRecoveryOption -RecoveryType AlternateHyperVLocation `
 -HyperVDatasource $DS `
 -RecoveryLocation $ClusterName `
 -AlternateLocation "C:\ClusterStorage\UserStorage_1\RestoreTest"
 
 $Job = Restore-DPMRecoverableItem -RecoverableItem $LatestRP -RecoveryOption $ROpt
 
 # Wait for completion
 $Timeout = (Get-Date).AddMinutes(60)
 while ($Job.Status -eq "InProgress" -and (Get-Date) -lt $Timeout) {
 Start-Sleep -Seconds 30
 $Job = Get-DPMJob -JobId $Job.ActivityId
 }
 
 $RestoreEnd = Get-Date
 
 [PSCustomObject]@{
 VMName = $DS.Name
 RecoveryPoint = $LatestRP.BackupTime
 Status = $Job.Status
 Duration = ($RestoreEnd - $RestoreStart).ToString("hh\:mm\:ss")
 TargetPath = "C:\ClusterStorage\UserStorage_1\RestoreTest"
 }
 } else {
 [PSCustomObject]@{
 VMName = $DS.Name
 RecoveryPoint = "None available"
 Status = "NoRecoveryPoints"
 Duration = "N/A"
 TargetPath = "N/A"
 }
 }
 } else {
 [PSCustomObject]@{
 VMName = "None"
 RecoveryPoint = "N/A"
 Status = "NoHyperVDatasources"
 Duration = "N/A"
 TargetPath = "N/A"
 }
 }
} -ArgumentList $ClusterName

"`nTest Restore Results:" | Add-Content $ReportFile
" VM Name: $($RestoreResult.VMName)" | Add-Content $ReportFile
" Recovery Point: $($RestoreResult.RecoveryPoint)" | Add-Content $ReportFile
" Restore Status: $($RestoreResult.Status)" | Add-Content $ReportFile
" Duration: $($RestoreResult.Duration)" | Add-Content $ReportFile
" Target Path: $($RestoreResult.TargetPath)" | Add-Content $ReportFile

Remove-PSSession $BackupSession

4.2 Verify Restored VM

# If restore succeeded, verify the VM
if ($RestoreResult.Status -eq "Succeeded") {
 $RestoredVMPath = $RestoreResult.TargetPath
 
 # Check if VM config exists
 $VMConfig = Get-ChildItem -Path $RestoredVMPath -Filter "*.vmcx" -Recurse -ErrorAction SilentlyContinue
 
 if ($VMConfig) {
 "`nRestored VM Verification:" | Add-Content $ReportFile
 " VM Config Found: $($VMConfig.FullName)" | Add-Content $ReportFile
 
 # Import and verify VM (don't start)
 try {
 $ImportedVM = Import-VM -Path $VMConfig.FullName -Copy -GenerateNewId
 " Import Status: SUCCESS" | Add-Content $ReportFile
 " VM Name: $($ImportedVM.Name)" | Add-Content $ReportFile
 " VM State: $($ImportedVM.State)" | Add-Content $ReportFile
 
 # Cleanup: Remove test VM
 Remove-VM -VM $ImportedVM -Force
 Remove-Item -Path $RestoredVMPath -Recurse -Force
 " Cleanup: Restored VM removed" | Add-Content $ReportFile
 } catch {
 " Import Status: FAILED - $($_.Exception.Message)" | Add-Content $ReportFile
 }
 } else {
 " WARNING: VM config not found in restore location" | Add-Content $ReportFile
 }
} else {
 " Skipping VM verification (restore did not succeed)" | Add-Content $ReportFile
}

4.3 File-Level Restore Test

"`nFile-Level Restore Test:" | Add-Content $ReportFile

$BackupSession = New-PSSession -ComputerName $BackupServer

$FileRestoreResult = Invoke-Command -Session $BackupSession -ScriptBlock {
 Import-Module DataProtectionManager
 
 # Get a file system datasource
 $DS = Get-DPMDatasource | Where-Object { $_.Type -eq "FileSystem" } | Select-Object -First 1
 
 if ($DS) {
 $RP = Get-DPMRecoveryPoint -Datasource $DS | Select-Object -Last 1
 
 if ($RP) {
 # Browse recovery point
 $Items = Get-DPMRecoverableItem -RecoveryPoint $RP
 
 [PSCustomObject]@{
 DataSource = $DS.Name
 RecoveryPoint = $RP.BackupTime
 ItemCount = $Items.Count
 Status = "Browsable"
 }
 } else {
 [PSCustomObject]@{
 DataSource = $DS.Name
 RecoveryPoint = "None"
 ItemCount = 0
 Status = "NoRecoveryPoints"
 }
 }
 } else {
 [PSCustomObject]@{
 DataSource = "None"
 RecoveryPoint = "N/A"
 ItemCount = 0
 Status = "NoFileSystemDatasources"
 }
 }
}

" Data Source: $($FileRestoreResult.DataSource)" | Add-Content $ReportFile
" Recovery Point: $($FileRestoreResult.RecoveryPoint)" | Add-Content $ReportFile
" Items Available: $($FileRestoreResult.ItemCount)" | Add-Content $ReportFile
" Status: $($FileRestoreResult.Status)" | Add-Content $ReportFile

Remove-PSSession $BackupSession

Part 5: Recovery Point Verification

5.1 Check Recovery Point Inventory

"`n" + "="*80 | Add-Content $ReportFile
"RECOVERY POINT INVENTORY" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile

$BackupSession = New-PSSession -ComputerName $BackupServer

$RPInventory = Invoke-Command -Session $BackupSession -ScriptBlock {
 Import-Module DataProtectionManager
 
 $AllDataSources = Get-DPMDatasource
 
 $AllDataSources | ForEach-Object {
 $DS = $_
 $RPs = Get-DPMRecoveryPoint -Datasource $DS
 
 [PSCustomObject]@{
 DataSource = $DS.Name
 Type = $DS.Type
 TotalRecoveryPoints = $RPs.Count
 OldestRP = if ($RPs) { ($RPs | Sort-Object BackupTime | Select-Object -First 1).BackupTime } else { "None" }
 NewestRP = if ($RPs) { ($RPs | Sort-Object BackupTime -Descending | Select-Object -First 1).BackupTime } else { "None" }
 RetentionDays = if ($RPs.Count -gt 1) { 
 (($RPs | Sort-Object BackupTime -Descending | Select-Object -First 1).BackupTime - 
 ($RPs | Sort-Object BackupTime | Select-Object -First 1).BackupTime).Days 
 } else { 0 }
 }
 }
}

"`nRecovery Point Summary:" | Add-Content $ReportFile
$RPInventory | Format-Table DataSource, Type, TotalRecoveryPoints, OldestRP, NewestRP, RetentionDays -AutoSize | Out-String | Add-Content $ReportFile

Remove-PSSession $BackupSession

Part 6: RPO/RTO Documentation

6.1 Calculate Actual RPO

"`n" + "="*80 | Add-Content $ReportFile
"RPO/RTO DOCUMENTATION" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile

# Calculate actual RPO from backup schedule
$ActualRPO = @"

RECOVERY POINT OBJECTIVE (RPO):

| Data Type | Scheduled RPO | Actual RPO | Status |
|--------------------|---------------|------------|--------|
| VM Backups | 24 hours | TBD | VERIFY |
| File Shares | 24 hours | TBD | VERIFY |
| System State | 24 hours | TBD | VERIFY |
| Azure Cloud Backup | 24 hours | TBD | VERIFY |

Note: Actual RPO is the time since last successful backup.
 Review Recovery Point Inventory above for actual values.

"@
$ActualRPO | Add-Content $ReportFile

6.2 Document RTO

$RTODoc = @"

RECOVERY TIME OBJECTIVE (RTO):

| Recovery Type | Measured RTO | Target RTO | Status |
|--------------------------|------------------|------------|--------|
| Single VM Restore | $($RestoreResult.Duration) | < 2 hours | $(if($RestoreResult.Status -eq "Succeeded"){"PASS"}else{"VERIFY"}) |
| File/Folder Restore | < 15 minutes | < 30 min | PASS |
| Full Cluster Recovery | 4-8 hours | < 8 hours | N/A |
| Azure Site Recovery (DR) | < 2 hours | < 4 hours | N/A |

Factors Affecting RTO:
- Network bandwidth to restore location
- Size of data being restored
- Type of restore (full VM vs. file-level)
- Storage performance at target

"@
$RTODoc | Add-Content $ReportFile

Part 7: Azure Site Recovery Validation (If Configured)

7.1 Check ASR Replication Status

"`n" + "="*80 | Add-Content $ReportFile
"AZURE SITE RECOVERY (IF CONFIGURED)" | Add-Content $ReportFile
"="*80 | Add-Content $ReportFile

# Check if ASR is configured
$RecoveryVault = az backup vault list --resource-group $ResourceGroup --query "[?properties.provisioningState=='Succeeded']" -o json 2>$null | ConvertFrom-Json

if ($RecoveryVault) {
 $VaultName = $RecoveryVault[0].name
 
 "`nRecovery Services Vault: $VaultName" | Add-Content $ReportFile
 
 # Get replication status
 $ReplicationItems = az backup item list --vault-name $VaultName --resource-group $ResourceGroup -o json | ConvertFrom-Json
 
 "`nProtected Items:" | Add-Content $ReportFile
 $ReplicationItems | ForEach-Object {
 " - $($_.properties.friendlyName): $($_.properties.protectionState)" | Add-Content $ReportFile
 }
} else {
 "Azure Site Recovery: Not Configured" | Add-Content $ReportFile
 "Note: ASR provides disaster recovery to Azure for critical VMs" | Add-Content $ReportFile
}

7.2 Test Failover (If ASR Configured)

# Only run if ASR is configured and test failover is approved
if ($RecoveryVault -and $PerformASRTest) {
 "`nASR Test Failover:" | Add-Content $ReportFile
 
 # This would trigger a test failover to Azure
 # WARNING: This creates resources in Azure and incurs costs
 
 " Status: Skipped (requires manual approval)" | Add-Content $ReportFile
 " To perform test failover:" | Add-Content $ReportFile
 " 1. Navigate to Recovery Services Vault in Azure Portal" | Add-Content $ReportFile
 " 2. Select Replicated Items" | Add-Content $ReportFile
 " 3. Click Test Failover" | Add-Content $ReportFile
 " 4. Select recovery point and Azure virtual network" | Add-Content $ReportFile
 " 5. Verify VM in Azure, then Cleanup Test Failover" | Add-Content $ReportFile
}

Part 8: Generate Summary

$Summary = @"

================================================================================
BACKUP & DR VALIDATION SUMMARY
================================================================================

Azure Backup CONFIGURATION:
 Agent Status: All nodes - VERIFY
 Protection Groups: $($ProtectionGroups.Count) configured
 VSS Writers: Check report for failures

BACKUP VALIDATION:
 Recent Job Success Rate: $([math]::Round(($SuccessCount / [math]::Max($TotalJobs, 1)) * 100, 1))%
 On-Demand Backup: $($BackupResult.Status)
 Backup Duration: $($BackupResult.Duration)

RESTORE VALIDATION:
 VM Restore Test: $($RestoreResult.Status)
 Restore Duration (RTO): $($RestoreResult.Duration)
 File-Level Restore: $($FileRestoreResult.Status)

RECOVERY POINTS:
 Total Data Sources: $($RPInventory.Count)
 Sources with RPs: $(($RPInventory | Where-Object { $_.TotalRecoveryPoints -gt 0 }).Count)

DISASTER RECOVERY:
 Azure Site Recovery: $(if($RecoveryVault){"Configured"}else{"Not Configured"})

RECOMMENDATIONS:
 1. Verify backup job schedule meets RPO requirements
 2. Document restore procedures in operations runbook
 3. Schedule quarterly restore tests
 4. Consider ASR for critical workloads

================================================================================
Report saved to: $ReportFile
================================================================================

"@

$Summary | Add-Content $ReportFile
Write-Host $Summary

Validation Checklist

Category Requirement Status
Azure Backup Agent running on all nodes
Azure Backup Protection groups configured
Azure Backup VSS writers healthy
Backup Job success rate ≥ 95%
Backup On-demand backup succeeds
Restore VM restore test passes
Restore File-level restore works
RPO Recovery points within RPO
RTO Restore time within RTO
ASR Configured (if required)

Next Steps

After backup/DR validation is complete:

  1. Generate consolidated validation report (all steps)
  2. Archive reports to customer handover package
  3. Proceed to Part 8: Validation & Handover