Skip to content
This repository was archived by the owner on Nov 14, 2025. It is now read-only.

Commit 3462f1b

Browse files
authored
Merge pull request #4 from sturrent/develop
Release v1.2.0 - Permission Handling Feature
2 parents 5185a1f + 297768b commit 3462f1b

22 files changed

Lines changed: 955 additions & 230 deletions

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,9 @@ venv.bak/
6262
*~
6363
.DS_Store
6464

65+
# GitHub Copilot instructions (internal development docs)
66+
.github/instructions/
67+
6568
# AKS Diagnostics specific
6669
aks-net-diagnostics_*.json
6770
.aks_cache/

CHANGELOG.md

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,53 @@ All notable changes to the AKS Network Diagnostics tool will be documented in th
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
and this project adheres on [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [1.2.0] - 2025-10-28
9+
10+
### Added
11+
- **Permission Handling Feature**: Graceful handling of insufficient Azure RBAC permissions
12+
- New finding codes: `PERMISSION_INSUFFICIENT_VNET`, `PERMISSION_INSUFFICIENT_VMSS`, `PERMISSION_INSUFFICIENT_LB`, `PERMISSION_INSUFFICIENT_PROBE_TEST`
13+
- Permission error detection in all Azure CLI operations
14+
- Specific permission recommendations with example `az role assignment` commands
15+
- Tool continues analysis with partial data when permissions are insufficient
16+
- Permission findings displayed in separate section of report
17+
- **Connectivity Test Summary**: Added connectivity test results to report summary
18+
- Shows total tests, passed, failed, and could not execute counts
19+
- Consistent formatting between summary and detailed reports
20+
- Clear distinction between test failures (ran but failed) and permission errors (could not execute)
21+
- **Enhanced Error Detection**: Improved connectivity test error analysis
22+
- Detects `AuthorizationFailed` patterns in test failures
23+
- Identifies missing `runCommand/action` permission
24+
- Recommends `Virtual Machine Contributor` role on MC resource group
25+
26+
### Changed
27+
- **Connectivity Tester**: Updated VMSS operations to use permission-aware execution
28+
- VMSS list operations now use `execute_with_permission_check()`
29+
- Handles permission errors gracefully without crashing
30+
- Creates permission findings when VMSS operations fail
31+
- **Report Generator**: Enhanced connectivity test reporting
32+
- Added connectivity tests section to both summary and detailed reports
33+
- Improved labels: "Tests Failed" vs "Could Not Execute"
34+
- Shows execution status for skipped tests (permission_denied, cluster stopped)
35+
36+
### Fixed
37+
- **Probe Test Crash**: Fixed crash when using `--probe-test` flag with limited permissions
38+
- Tool now handles permission errors gracefully
39+
- Creates appropriate permission findings instead of crashing
40+
- Provides actionable guidance on required permissions
41+
42+
### Documentation
43+
- **Permission Handling Plan**: Added comprehensive plan for permission handling implementation
44+
- Documented 15 commits for permission handling feature
45+
- Added test scenarios and validation checklist
46+
- Documented required permissions for all operations
47+
48+
### Known Issues
49+
- **Azure CLI Bug**: Azure CLI v2.78.0 has a bug where `vmss run-command invoke` returns "This is a sample script" instead of actual command output
50+
- Tracked in Azure CLI issue [#32286](https://github.com/Azure/azure-cli/issues/32286)
51+
- Fix merged in PR [#32280](https://github.com/Azure/azure-cli/pull/32280)
52+
- Workaround: Use Azure CLI v2.77 or wait for v2.78.1/v2.79.0
53+
- Our tool handles this gracefully by reporting execution errors
54+
855
## [1.1.2] - 2025-10-17
956

1057
### Added

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -357,7 +357,7 @@ Generated with `--json-report`, contains:
357357
"resource_group": "my-rg",
358358
"subscription": "xxx",
359359
"generated": "2025-10-03T14:30:00Z",
360-
"script_version": "1.1.2"
360+
"script_version": "1.2.0"
361361
},
362362
"cluster_info": { "..." },
363363
"findings": [
@@ -499,6 +499,6 @@ Built for Azure Kubernetes Service troubleshooting by the Azure community.
499499
500500
---
501501

502-
**Version**: 1.1.2
502+
**Version**: 1.2.0
503503
**Last Updated**: October 2025
504504
**Maintained by**: [@sturrent](https://github.com/sturrent)

aks-net-diagnostics.py

Lines changed: 39 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
AKS Network Diagnostics Script
44
Comprehensive read-only analysis of AKS cluster network configuration
55
Author: Azure Networking Diagnostics Generator
6-
Version: 1.1.2
6+
Version: 1.2.0
77
"""
88

99
import argparse
@@ -190,16 +190,16 @@ def check_prerequisites(self):
190190
subprocess.run(
191191
["az", "--version"], capture_output=True, check=True, timeout=AZURE_CLI_TIMEOUT, shell=IS_WINDOWS
192192
)
193-
except (subprocess.CalledProcessError, FileNotFoundError):
194-
raise FileNotFoundError("Azure CLI is not installed or not in PATH")
193+
except (subprocess.CalledProcessError, FileNotFoundError) as exc:
194+
raise FileNotFoundError("Azure CLI is not installed or not in PATH") from exc
195195

196196
# Check if logged in
197197
try:
198198
subprocess.run(
199199
["az", "account", "show"], capture_output=True, check=True, timeout=AZURE_CLI_TIMEOUT, shell=IS_WINDOWS
200200
)
201-
except subprocess.CalledProcessError:
202-
raise PermissionError("Not logged in to Azure. Run 'az login' first.")
201+
except subprocess.CalledProcessError as exc:
202+
raise PermissionError("Not logged in to Azure. Run 'az login' first.") from exc
203203

204204
# Set subscription if provided
205205
if self.subscription:
@@ -212,8 +212,8 @@ def check_prerequisites(self):
212212
shell=IS_WINDOWS,
213213
)
214214
self.logger.info(f"Using Azure subscription: {self.subscription}")
215-
except subprocess.CalledProcessError:
216-
raise ValueError(f"Failed to set subscription: {self.subscription}")
215+
except subprocess.CalledProcessError as exc:
216+
raise ValueError(f"Failed to set subscription: {self.subscription}") from exc
217217
else:
218218
# Get current subscription
219219
current_sub = self.azure_cli_executor.execute(
@@ -233,20 +233,20 @@ def fetch_cluster_information(self):
233233

234234
def analyze_vnet_configuration(self):
235235
"""Analyze VNet configuration using ClusterDataCollector"""
236-
collector = ClusterDataCollector(self.azure_cli_executor, self.logger)
237-
self.vnets_analysis = collector.collect_vnet_info(self.agent_pools)
236+
self.cluster_data_collector = ClusterDataCollector(self.azure_cli_executor, self.logger)
237+
self.vnets_analysis = self.cluster_data_collector.collect_vnet_info(self.agent_pools)
238238

239239
def analyze_outbound_connectivity(self):
240240
"""Analyze outbound connectivity configuration using OutboundConnectivityAnalyzer"""
241-
analyzer = OutboundConnectivityAnalyzer(
241+
self.outbound_analyzer = OutboundConnectivityAnalyzer(
242242
cluster_info=self.cluster_info,
243243
agent_pools=self.agent_pools,
244244
azure_cli=self.azure_cli_executor,
245245
logger=self.logger,
246246
)
247247

248-
self.outbound_analysis = analyzer.analyze(show_details=self.show_details)
249-
self.outbound_ips = analyzer.get_outbound_ips()
248+
self.outbound_analysis = self.outbound_analyzer.analyze(show_details=self.show_details)
249+
self.outbound_ips = self.outbound_analyzer.get_outbound_ips()
250250

251251
def _analyze_node_subnet_udrs(self):
252252
"""Analyze User Defined Routes on node subnets using RouteTableAnalyzer"""
@@ -260,8 +260,6 @@ def analyze_vmss_configuration(self):
260260

261261
def analyze_nsg_configuration(self):
262262
"""Analyze Network Security Group configuration for AKS nodes using modular NSGAnalyzer"""
263-
self.logger.info("Analyzing NSG configuration...")
264-
265263
try:
266264
# Create NSG analyzer instance with the new modular component
267265
nsg_analyzer = NSGAnalyzer(
@@ -346,8 +344,8 @@ def _get_current_client_ip(self):
346344
import urllib.error
347345
import urllib.request
348346

349-
response = urllib.request.urlopen("https://api.ipify.org", timeout=5)
350-
return response.read().decode("utf-8").strip()
347+
with urllib.request.urlopen("https://api.ipify.org", timeout=5) as response:
348+
return response.read().decode("utf-8").strip()
351349
except Exception:
352350
return None
353351

@@ -360,10 +358,10 @@ def check_api_connectivity(self):
360358

361359
def analyze_misconfigurations(self):
362360
"""Analyze potential misconfigurations and failures using MisconfigurationAnalyzer"""
363-
analyzer = MisconfigurationAnalyzer(self.azure_cli_executor, self.logger)
361+
self.misconfiguration_analyzer = MisconfigurationAnalyzer(self.azure_cli_executor, self.logger)
364362

365363
# Run analysis and get findings
366-
findings, cluster_stopped = analyzer.analyze(
364+
findings, cluster_stopped = self.misconfiguration_analyzer.analyze(
367365
cluster_info=self.cluster_info,
368366
outbound_analysis=self.outbound_analysis,
369367
outbound_ips=self.outbound_ips,
@@ -372,6 +370,7 @@ def analyze_misconfigurations(self):
372370
nsg_analysis=self.nsg_analysis,
373371
api_probe_results=self.api_probe_results,
374372
vmss_analysis=self.vmss_analysis,
373+
outbound_analyzer=self.outbound_analyzer,
375374
)
376375

377376
# Store results
@@ -409,6 +408,27 @@ def generate_report(self):
409408
if self.json_report:
410409
report_gen.save_json_report(self.json_report, file_permissions=DEFAULT_FILE_PERMISSIONS)
411410

411+
def collect_permission_findings(self):
412+
"""Collect permission-related findings from all analyzers"""
413+
# Collect from cluster data collector
414+
if hasattr(self, "cluster_data_collector") and hasattr(self.cluster_data_collector, "findings"):
415+
for finding in self.cluster_data_collector.findings:
416+
self.findings.append(finding.to_dict() if hasattr(finding, "to_dict") else finding)
417+
418+
# Collect from outbound analyzer
419+
if hasattr(self, "outbound_analyzer") and hasattr(self.outbound_analyzer, "findings"):
420+
for finding in self.outbound_analyzer.findings:
421+
self.findings.append(finding.to_dict() if hasattr(finding, "to_dict") else finding)
422+
423+
# Collect from misconfiguration analyzer
424+
if hasattr(self, "misconfiguration_analyzer") and hasattr(self.misconfiguration_analyzer, "findings"):
425+
for finding in self.misconfiguration_analyzer.findings:
426+
self.findings.append(finding.to_dict() if hasattr(finding, "to_dict") else finding)
427+
428+
# NSG and DNS analyzer findings are already collected in their respective methods
429+
# Note: Permission findings are created by analyzers with specific context,
430+
# so we don't need to duplicate them from azure_cli.permission_errors
431+
412432
def run(self):
413433
"""Main execution method"""
414434
self.parse_arguments()
@@ -426,6 +446,7 @@ def run(self):
426446
self.analyze_api_server_access()
427447
self.check_api_connectivity()
428448
self.analyze_misconfigurations()
449+
self.collect_permission_findings() # Collect all permission findings before reporting
429450
self.generate_report()
430451

431452

aks_diagnostics/__init__.py

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,6 @@
1-
"""
2-
AKS Network Diagnostics Package
3-
Modular package for analyzing AKS cluster network configurations
4-
"""
1+
"""AKS Network Diagnostics - Comprehensive AKS network configuration analysis tool"""
52

6-
__version__ = "1.1.2"
3+
__version__ = "1.2.0"
74
__author__ = "Azure Networking Diagnostics Generator"
85

96
# Import only the modules that exist

aks_diagnostics/__version__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,6 @@
1010
- Python code uses semantic version without prefix (1.0.0, 2.1.0)
1111
"""
1212

13-
__version__ = "1.1.2"
13+
__version__ = "1.2.0"
1414
__author__ = "Azure Networking Diagnostics Generator"
1515
__description__ = "Comprehensive read-only analysis of AKS cluster network configuration"

aks_diagnostics/azure_cli.py

Lines changed: 122 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ class AzureCLIExecutor:
2424
def __init__(self):
2525
"""Initialize Azure CLI executor"""
2626
self.logger = logging.getLogger("aks_net_diagnostics.azure_cli")
27+
self.permission_errors = [] # Track permission issues encountered
2728

2829
def execute(self, cmd: List[str], expect_json: bool = True, timeout: Optional[int] = None) -> Any:
2930
"""
@@ -80,11 +81,16 @@ def execute(self, cmd: List[str], expect_json: bool = True, timeout: Optional[in
8081
stderr_output = e.stderr.strip() if e.stderr else ""
8182
stdout_output = e.stdout.strip() if e.stdout else ""
8283

83-
self.logger.error(f"Azure CLI command failed: {cmd_str}")
84-
if stderr_output:
85-
self.logger.error(f"Error: {stderr_output}")
86-
elif stdout_output:
87-
self.logger.error(f"Output: {stdout_output}")
84+
# Check if this is a permission error - don't log it as ERROR since it will be handled gracefully
85+
is_permission_error = self._is_authorization_error(stderr_output)
86+
87+
if not is_permission_error:
88+
# Only log as ERROR if it's not a permission issue
89+
self.logger.error(f"Azure CLI command failed: {cmd_str}")
90+
if stderr_output:
91+
self.logger.error(f"Error: {stderr_output}")
92+
elif stdout_output:
93+
self.logger.error(f"Output: {stdout_output}")
8894

8995
# Check for authentication errors
9096
if "az login" in stderr_output.lower() or "authentication" in stderr_output.lower():
@@ -170,3 +176,114 @@ def get_current_subscription(self) -> str:
170176
if isinstance(result, str) and result.strip():
171177
return result.strip()
172178
return ""
179+
180+
def _is_authorization_error(self, stderr: str) -> bool:
181+
"""
182+
Check if error is due to insufficient permissions
183+
184+
Args:
185+
stderr: Standard error output from Azure CLI
186+
187+
Returns:
188+
True if the error indicates authorization/permission failure
189+
"""
190+
error_lower = stderr.lower()
191+
192+
auth_patterns = [
193+
"authorizationfailed",
194+
"does not have authorization",
195+
"insufficient privileges",
196+
"forbidden",
197+
"the client", # Common in "The client 'user@example.com' does not have authorization..."
198+
"(401)", # HTTP 401 Unauthorized
199+
"permission", # Generic permission errors
200+
]
201+
202+
return any(pattern in error_lower for pattern in auth_patterns)
203+
204+
def _extract_permission_action(self, stderr: str, command: List[str]) -> str:
205+
"""
206+
Extract or infer the missing permission from error or command
207+
208+
Args:
209+
stderr: Standard error output from Azure CLI
210+
command: The Azure CLI command that failed (without 'az' prefix)
211+
212+
Returns:
213+
Permission action string (e.g., "Microsoft.Network/virtualNetworks/read")
214+
"""
215+
# Try to extract from error message first
216+
error_lower = stderr.lower()
217+
218+
# Common permission patterns in Azure CLI errors
219+
if "microsoft.network/virtualnetworks/read" in error_lower:
220+
return "Microsoft.Network/virtualNetworks/read"
221+
elif "microsoft.compute/virtualmachinescalesets" in error_lower:
222+
return "Microsoft.Compute/virtualMachineScaleSets/read"
223+
elif "microsoft.network/networksecuritygroups" in error_lower:
224+
return "Microsoft.Network/networkSecurityGroups/read"
225+
elif "microsoft.network/loadbalancers" in error_lower:
226+
return "Microsoft.Network/loadBalancers/read"
227+
elif "microsoft.network/privatednszones" in error_lower:
228+
return "Microsoft.Network/privateDnsZones/read"
229+
230+
# Infer from command if not in error message
231+
cmd_str = " ".join(command).lower()
232+
233+
if "vnet" in cmd_str and ("show" in cmd_str or "list" in cmd_str):
234+
return "Microsoft.Network/virtualNetworks/read"
235+
elif "vmss" in cmd_str and ("show" in cmd_str or "list" in cmd_str):
236+
return "Microsoft.Compute/virtualMachineScaleSets/read"
237+
elif "nsg" in cmd_str and "show" in cmd_str:
238+
return "Microsoft.Network/networkSecurityGroups/read"
239+
elif ("lb" in cmd_str or "load-balancer" in cmd_str) and "show" in cmd_str:
240+
return "Microsoft.Network/loadBalancers/read"
241+
elif "private-dns" in cmd_str:
242+
return "Microsoft.Network/privateDnsZones/read"
243+
elif "aks" in cmd_str and ("show" in cmd_str or "list" in cmd_str):
244+
return "Microsoft.ContainerService/managedClusters/read"
245+
246+
return "Unknown permission (check Azure Activity Log for details)"
247+
248+
def execute_with_permission_check(
249+
self, cmd: List[str], context: str, expect_json: bool = True, timeout: Optional[int] = None
250+
) -> Optional[Any]:
251+
"""
252+
Execute Azure CLI command with permission error handling.
253+
254+
Args:
255+
cmd: Azure CLI command parts (without 'az' prefix)
256+
context: Human-readable context for error messages (e.g., "retrieve VNet details")
257+
expect_json: Whether to parse output as JSON
258+
timeout: Optional custom timeout in seconds
259+
260+
Returns:
261+
Command output or None if permission error occurred
262+
263+
Raises:
264+
Other exceptions if not a permission error
265+
"""
266+
try:
267+
return self.execute(cmd, expect_json=expect_json, timeout=timeout)
268+
except AzureCLIError as e:
269+
# Check if it's a permission error
270+
stderr_output = getattr(e, "stderr", str(e))
271+
272+
if self._is_authorization_error(stderr_output):
273+
# Permission error - log and track
274+
action = self._extract_permission_action(stderr_output, cmd)
275+
276+
permission_error = {
277+
"context": context,
278+
"command": " ".join(["az"] + cmd),
279+
"permission": action,
280+
"error": stderr_output,
281+
}
282+
self.permission_errors.append(permission_error)
283+
284+
self.logger.warning(f"Insufficient permissions to {context}. Required: {action}")
285+
286+
return None # Graceful degradation
287+
288+
# Not a permission error, re-raise
289+
raise

0 commit comments

Comments
 (0)