azure-diagnostics: Add Inspektor Gadget Reference#1961
azure-diagnostics: Add Inspektor Gadget Reference#1961mqasimsarfraz wants to merge 1 commit intomicrosoft:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds Inspektor Gadget guidance to the azure-diagnostics AKS troubleshooting skill to enable deeper node/pod-level diagnostics when standard Azure/Kubernetes evidence is inconclusive.
Changes:
- Added a new Inspektor Gadget reference document with command patterns, filters, and a symptom-to-gadget map.
- Introduced a “Deep Diagnostics Flow” reference and added contextual IG command snippets to AKS networking, node-issues, and pod-failures playbooks.
- Updated the main AKS troubleshooting guide to include Inspektor Gadget as the third step in the evidence order.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| plugin/skills/azure-diagnostics/aks-troubleshooting/references/inspektor-gadget.md | New IG reference: base command pattern, gadget catalog, and symptom mapping. |
| plugin/skills/azure-diagnostics/aks-troubleshooting/references/command-flows.md | Adds an IG “Deep Diagnostics Flow” and link to the IG reference. |
| plugin/skills/azure-diagnostics/aks-troubleshooting/pod-failures.md | Adds IG deep-diagnostics commands for CrashLoopBackOff/OOM-style investigations. |
| plugin/skills/azure-diagnostics/aks-troubleshooting/node-issues.md | Adds IG guidance for PID pressure / unknown process load. |
| plugin/skills/azure-diagnostics/aks-troubleshooting/networking.md | Adds IG networking/DNS tracing commands for deeper connectivity troubleshooting. |
| plugin/skills/azure-diagnostics/aks-troubleshooting/aks-troubleshooting.md | Updates evidence order and guidance to incorporate IG as a third-step diagnostic tool. |
jongio
left a comment
There was a problem hiding this comment.
A few observations on top of the existing bot review.
Doc-only PR adds Inspektor Gadget commands across four AKS troubleshooting pages plus a new reference. Structure and placement match the existing skill conventions. Three things worth a look below; nothing blocking.
jongio
left a comment
There was a problem hiding this comment.
One small consistency nit on the new networking deep-diagnostics block. Otherwise the latest pass addresses everything I flagged earlier.
mqasimsarfraz
left a comment
There was a problem hiding this comment.
Thanks @jongio for quick reviews! :)
I have addressed the outstanding comment, please feel free to let me know if you have any other thought!
jongio
left a comment
There was a problem hiding this comment.
Confirmed - the trace_tcpdrop reference is out of networking.md and the catalog stays the single source of truth. Nothing else from my side. Thanks for the quick turnarounds.
jongio
left a comment
There was a problem hiding this comment.
Walked through all six files end-to-end. The reference is well-organized - single version pin, clear base command pattern, and the symptom-to-gadget map makes the right gadget easy to find. Contextual IG sections in networking, node-issues, and pod-failures point back to the reference without duplicating it.
No new issues since my last pass. Ship it.
Signed-off-by: Qasim Sarfraz <qasimsarfraz@microsoft.com>
|
@jongio @tmeschter there were conflicts again :(, I pushed latest changes! The PR is ready to go from myside so please feel free to have a look (hopefully for the last time)! |
jongio
left a comment
There was a problem hiding this comment.
Re-reviewed after rebase. All feedback from earlier rounds addressed - version pin centralized, trace_tcpdrop removed from inline references, evidence order corrected. Structure is clean: single reference file for the catalog, contextual sections in networking/node-issues/pod-failures don't duplicate it. CI green.
This PR introduces Inspektor Gadget to allow deeper troubleshooting of AKS clusters. All commands use:
kubectl debugso no additional installation is requiredAlso, gadget commands are added contextually in networking, node-issues, and pod-failures, with the full reference in references/inspektor-gadget.md.
cc: @julia-yin