This repository was archived by the owner on Nov 14, 2025. It is now read-only.
Conversation
- Added .github/instructions/ to .gitignore - These are internal development documents used by Copilot - Should not be tracked in version control
- Bug #5: Fix VNet detection to search only peered VNets with most specific match * Modified _find_dns_server_host_vnet() to accept cluster_info parameter * Now searches only VNets peered with cluster VNet (not all in subscription) * Uses highest prefix length for most specific network match * Fixes incorrect VNet identification when large address spaces overlap - Bug #6: Change DNS misconfiguration severity from CRITICAL to WARNING * Updated message: 'cannot resolve' → 'may not resolve' * Custom DNS can work if properly configured * Enhanced recommendation with three specific configuration options - Bug #3: Fix severity display labels in report output * Show actual severity levels: [CRITICAL], [HIGH], [WARNING], [INFO] * Removed non-existent 'error' severity from all mappings * Fixes confusing [ERROR] labels that didn't match actual severity Validated against aks-api-connection test cluster. Ref: .github/instructions/backport-features.instructions.md
…inks The method was returning an empty list, preventing detection of missing private DNS zone VNet links. Now properly: - Extracts VNet IDs from agent pool subnet IDs - Retrieves VNet details including custom DNS servers - Returns list of VNets with DNS configuration This enables the PDNS_DNS_HOST_VNET_LINK_MISSING finding to be generated when DNS server VNets are not linked to private DNS zones. Fixes Bug #4 from backport plan.
Added 5 new finding codes for permission-related issues: - PERMISSION_INSUFFICIENT_VNET - PERMISSION_INSUFFICIENT_VMSS - PERMISSION_INSUFFICIENT_LB - PERMISSION_INSUFFICIENT_NSG - PERMISSION_INSUFFICIENT_DNS These codes will be used when implementing graceful degradation for insufficient RBAC permissions. Ref: .github/instructions/backport-features.instructions.md Enhancement #1
- Fix W0612: Remove unused variable 'i' in dns_analyzer.py - Fix W1514: Add encoding='utf-8' to open() in report_generator.py - Fix W0707: Add 'from exc' to exception re-raising (validators.py, aks-net-diagnostics.py) - Fix W0613: Prefix unused argument with underscore in route_table_analyzer.py - Fix R1732: Use 'with' statement for urllib.request.urlopen() - Apply Black formatting to models.py and misconfiguration_analyzer.py All quality checks now pass: - Black: PASSED - isort: PASSED - Flake8: PASSED (0 violations) - Pylint: 9.99/10 (up from 9.96/10) - Tests: 139/139 PASSED
Implements graceful degradation when user has insufficient RBAC permissions: Azure CLI Executor (azure_cli.py): - Add permission_errors tracking list - Add _is_authorization_error() to detect auth failures from stderr - Add _extract_permission_action() to identify missing permissions - Add execute_with_permission_check() for graceful degradation - Returns None instead of crashing on permission errors Cluster Data Collector (cluster_data_collector.py): - Add findings list to track permission issues - Use execute_with_permission_check() for VNet retrieval - Use execute_with_permission_check() for VNet peerings - Use execute_with_permission_check() for VMSS operations - Create WARNING findings with actionable remediation steps Benefits: - Tool doesn't crash with limited permissions - Users get partial diagnostics results - Clear guidance on which permissions are needed - Enterprise-friendly for strict RBAC environments Remaining work (future commits): - Update remaining analyzers (misconfiguration, outbound, NSG, DNS) - Add 'Analysis Incomplete' indicator in report_generator - Collect permission findings in main script - Testing with different RBAC roles Related: Permission handling plan in .github/instructions/permission-handling-plan.md
Implemented execute_with_permission_check() throughout all analyzer modules: misconfiguration_analyzer.py: - Private DNS zone listing - VNet link listing (2 places) - VNet retrieval for DNS checks (2 places) - Remote VNet retrieval in peerings - Private DNS zone lookup - Added findings list for permission issues outbound_analyzer.py: - Load Balancer listing - Frontend IP configuration retrieval - Public IP details retrieval (2 places) - NAT Gateway listing - Public IP prefix details - Added findings list with detailed remediation nsg_analyzer.py: - Subnet retrieval - NSG details for subnet NSGs - NSG details for NIC NSGs - Added findings list - Creates FindingCode.PERMISSION_INSUFFICIENT_NSG with scope info dns_analyzer.py: - VNet DNS configuration retrieval - Shows warning when permissions insufficient - Already had findings list All analyzers now: ✅ Use execute_with_permission_check() for Azure operations ✅ Track permission errors in findings list ✅ Continue analysis with partial data ✅ Log debug messages for skipped operations ✅ Provide actionable remediation in findings Next steps: - Update report_generator.py with 'Analysis Incomplete' indicator - Collect all permission findings in main script - Test with different RBAC permission levels
Quality fixes: - Removed trailing whitespace in azure_cli.py - Fixed f-string without interpolation (4 instances) - Applied Black formatting to outbound_analyzer.py Test fixes: - Updated test mocks to use execute_with_permission_check() - Fixed test_cluster_data_collector.py (3 tests) - Fixed test_nsg_analyzer.py (2 tests) Results: ✅ All 139/139 tests passing ✅ Pylint 9.98/10 (above 9.5 target) ✅ Flake8 clean ✅ Black formatting clean ✅ Ready for integration testing
Issue: - NSG findings were being created twice: 1. By nsg_analyzer.py (authoritative source) 2. By misconfiguration_analyzer.py (duplicate) - This resulted in duplicate warnings with slightly different wording Examples of duplicates: - 'has rules that may block inter-node communication' (nsg_analyzer) vs 'on subnet has X rule(s) that may block...' (misconfiguration) - 'may block AKS traffic but is overridden' (nsg_analyzer) vs 'could block AKS traffic, but higher-priority...' (misconfiguration) Fix: - Removed _analyze_nsg_issues() method from misconfiguration_analyzer.py - NSG analyzer is now the single source of truth for NSG findings - Added comment explaining why NSG analysis is skipped in misconfiguration Impact: - Each finding now appears exactly once in output - Cleaner, more professional diagnostic reports - No functional changes to NSG analysis logic Tested with aks-overlay cluster - confirmed duplicates eliminated. All 139 tests passing, Pylint 9.98/10.
Issue: - 'Analyzing NSG configuration...' logged twice during execution: 1. In main script's analyze_nsg_configuration() method 2. In NSG analyzer's analyze() method Fix: - Removed log message from main script - NSG analyzer logs the message (single source) - Main script just orchestrates without duplicate logging Tested with aks-overlay - confirmed message appears only once.
Critical bug fix for permission handling: Issue: - execute_with_permission_check() only caught subprocess.CalledProcessError - execute() method converts subprocess errors to AzureCLIError - Result: Permission errors were NOT handled gracefully, tool crashed Fix: - Changed execute_with_permission_check() to catch AzureCLIError - Extract stderr from exception attributes - Properly detect authorization errors and return None - Track permission errors for reporting Impact: - Tool now gracefully degrades with limited permissions - Logs warnings for missing permissions - Continues analysis with partial data - No more crashes on AuthorizationFailed errors Tested with: - Service principal with limited permissions (Contributor on cluster only) - Successfully handled 5+ permission errors gracefully - Completed analysis with partial diagnostics This was a CRITICAL bug that prevented the entire permission handling infrastructure from working correctly.
Enhancement to permission handling: Before: - Permission warnings only logged to console - Not shown in findings summary or detailed report - Users had to scan logs to find permission issues After: - Permission findings collected from all analyzers - Displayed in findings summary - Detailed view shows: * Specific permission required (e.g., Microsoft.Network/virtualNetworks/read) * Example Azure CLI command to grant permission * Clear finding codes (PERMISSION_INSUFFICIENT_VNET, etc.) Implementation: - Added collect_permission_findings() method to main script - Stores analyzer instances to access their findings - Collects findings from: * cluster_data_collector * outbound_analyzer * misconfiguration_analyzer * NSG and DNS analyzers (already collected) - Calls before generate_report() to ensure all findings included Tested with limited permissions (aks-dns-ex1): - Shows 3 permission findings in summary - Detailed view provides clear remediation steps - No duplicate findings This completes the permission handling user experience.
- Modified azure_cli.py execute() to detect permission errors and suppress ERROR logging (only show WARNING from execute_with_permission_check) - Updated route_table_analyzer.py _get_subnet_details() to use execute_with_permission_check() for cleaner permission handling - Changed exception logging from INFO to DEBUG level - Updated test mocks to use execute_with_permission_check() This reduces log noise from 3+ lines per permission error down to 1 clean WARNING line with required permission info. Before: 2025-10-27 14:58:16 - INFO - Error getting subnet details... ERROR: (AuthorizationFailed)... Code: AuthorizationFailed... 2025-10-27 14:58:16 - ERROR - Load Balancer outbound type... After: 2025-10-27 15:03:39 - WARNING - Insufficient permissions to retrieve subnet 'aks-subnet' details. Required: Microsoft.Network/virtualNetworks/read
- Modified misconfiguration_analyzer to accept outbound_analyzer parameter - Check for PERMISSION_INSUFFICIENT_LB finding before creating NO_OUTBOUND_IPS - Updated report_generator to display helpful message when outbound config cannot be retrieved due to insufficient permissions - Improved has_outbound_data check to avoid false positives with empty dicts This prevents the tool from incorrectly reporting 'No outbound IP addresses detected' when the real issue is insufficient permissions to retrieve the load balancer configuration. Before (with limited permissions): **Outbound Configuration:** **Findings Summary:** - [WARNING] No outbound IP addresses detected for loadBalancer outbound type After (with limited permissions): **Outbound Configuration:** - Unable to retrieve loadBalancer configuration due to insufficient permissions - See permission findings below for details **Findings Summary:** - [WARNING] Insufficient permissions to retrieve Load Balancer configuration...
…essage - Added permission check to _print_outbound_connectivity() method - Now displays outbound section in --details mode even when permissions lacking - Shows clear message: 'Unable to retrieve loadBalancer configuration...' - Consistent behavior between summary and detailed reports This ensures the detailed report (--details flag) also shows the outbound configuration section with appropriate messaging when permissions are insufficient, matching the behavior of the summary report.
- Auto-formatted with Black to maintain code style consistency - No functional changes - Pylint score: 9.98/10 - All 139 tests passing
- Updated cluster_data_collector.py to use execute_with_permission_check() for VMSS details - Updated connectivity_tester.py to use execute_with_permission_check() for VMSS list and instances - Added warning message when VMSS instances unavailable for connectivity testing - Tool now skips probe tests gracefully instead of crashing with permission errors - Updated test mocks to match new execute_with_permission_check() calls - All 139 tests passing
- Add PERMISSION_INSUFFICIENT_PROBE_TEST finding code - Detect when probe tests are skipped due to lack of permissions - Display permission finding in both summary and detailed reports - Distinguish between empty VMSS list (no instances) vs None (permission error) - Update tests to handle new permission detection logic
…ndation - Detect AuthorizationFailed errors in connectivity test failures - Provide specific permission recommendation: Microsoft.Compute/virtualMachineScaleSets/virtualMachines/runCommand/action - Include az role assignment command example with Virtual Machine Contributor role - Update finding message to clearly state 'due to insufficient permissions' - Distinguish between permission errors and other test failures
- Add Connectivity Tests section to summary report showing total/passed/failed/errors - Add Connectivity Tests section to detailed report with consistent formatting - Improve status labels: 'Tests Failed' vs 'Could Not Execute' for clarity - Show 'Skipped due to insufficient permissions' when applicable - Consistent formatting between summary and detailed reports
- Add permission handling feature with graceful degradation - Add connectivity test summary to reports - Update documentation with permission requirements - Add data collection flow diagram showing permission handling - Document Azure CLI v2.78.0 bug and workaround BREAKING: None FEATURES: - Permission-aware execution with 4 new finding codes - Connectivity test results in report summary - Graceful continuation with partial data when permissions insufficient FIXES: - Probe test crash with limited permissions DOCUMENTATION: - Enhanced ARCHITECTURE.md with permission handling section - Added permission requirements to security considerations - Visual data collection flow diagram with error paths - Updated CHANGELOG.md with v1.2.0 release notes
Owner
Author
✅ CI Fix AppliedFixed Flake8 violation (F541: f-string without placeholders) in Issue: Line 152 had an f-string without any placeholders Status:
CI should now pass. The workflow will re-run automatically with the new commit. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Release v1.2.0 - Permission Handling Feature
This PR merges the permission handling feature from
developtomainfor release v1.2.0.🎯 Summary
Major feature release adding comprehensive permission handling with graceful degradation when Azure RBAC permissions are insufficient.
✨ New Features
Permission Handling:
PERMISSION_INSUFFICIENT_VNET,PERMISSION_INSUFFICIENT_VMSS,PERMISSION_INSUFFICIENT_LB,PERMISSION_INSUFFICIENT_PROBE_TESTaz role assignmentcommandsConnectivity Test Summary:
🐛 Bug Fixes
--probe-testflag with limited permissions📚 Documentation
ARCHITECTURE.mdwith permission handling sectionCHANGELOG.mdwith v1.2.0 release notesvmss run-command invokereturns "This is a sample script" instead of actual command output📊 Commits Included
Total: 20 commits
🧪 Testing
📦 Release Checklist
Merge Instructions:
Once approved and merged, tag will be created to trigger GitHub Actions for release automation.