Skip to content

Commit 0df87d4

Browse files
committed
fix: resolve Vale spelling errors in k8s_infra_monitor README
Wrap Kubernetes-specific terms (OOMKilled, PIDPressure, CrashLoopBackOff, MemoryPressure, DiskPressure, kubeconfig) in backticks so Vale treats them as code. Replace "Triaging" with "Investigating" and "siloed" with "isolated" to use standard dictionary words. Signed-off-by: futhgar <jmaldonado.rosa@gmail.com>
1 parent 7d61085 commit 0df87d4

1 file changed

Lines changed: 7 additions & 7 deletions

File tree

examples/k8s_infra_monitor/README.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -63,12 +63,12 @@ export NVIDIA_API_KEY=<YOUR_API_KEY>
6363

6464
## Use Case Description
6565

66-
Kubernetes clusters generate a constant stream of operational signals — node conditions, pod status changes, events, and resource metrics. Triaging these signals manually is time-consuming, especially in clusters running dozens of workloads across multiple namespaces.
66+
Kubernetes clusters generate a constant stream of operational signals — node conditions, pod status changes, events, and resource metrics. Investigating these signals manually is time-consuming, especially in clusters running dozens of workloads across multiple namespaces.
6767

6868
This example provides an agentic system that:
6969

70-
1. **Gathers node diagnostics**: Checks node readiness, conditions (MemoryPressure, DiskPressure, PIDPressure), and resource utilization via `kubectl top`.
71-
2. **Scans pod health**: Identifies unhealthy pods (CrashLoopBackOff, OOMKilled, Pending, Evicted) and flags containers with high restart counts.
70+
1. **Gathers node diagnostics**: Checks node readiness, conditions (`MemoryPressure`, `DiskPressure`, `PIDPressure`), and resource utilization via `kubectl top`.
71+
2. **Scans pod health**: Identifies unhealthy pods (`CrashLoopBackOff`, `OOMKilled`, `Pending`, `Evicted`) and flags containers with high restart counts.
7272
3. **Collects cluster events**: Retrieves recent Warning events and correlates them with affected resources.
7373
4. **Analyzes resource pressure**: Detects nodes approaching CPU or memory thresholds and flags active pressure conditions.
7474
5. **Classifies severity**: Uses an LLM to classify the overall incident severity based on collected evidence.
@@ -79,7 +79,7 @@ This example provides an agentic system that:
7979
An agentic approach provides significant advantages over static dashboards or rule-based alerting:
8080

8181
- **Contextual investigation**: The agent decides which tools to call based on the query, rather than running every check every time.
82-
- **Cross-signal correlation**: Unlike siloed monitoring tools, the agent correlates data from nodes, pods, events, and resources to identify root causes (e.g., OOMKilled pods + MemoryPressure condition = memory exhaustion on a specific node).
82+
- **Cross-signal correlation**: Unlike isolated monitoring tools, the agent correlates data from nodes, pods, events, and resources to identify root causes (e.g., `OOMKilled` pods + `MemoryPressure` condition = memory exhaustion on a specific node).
8383
- **Natural language reports**: Produces human-readable incident summaries that can be directly shared with team members or fed into ticketing systems.
8484

8585
## How It Works
@@ -123,7 +123,7 @@ functions:
123123
124124
- `offline_mode`: When `true`, tools return pre-defined responses from the offline scenario dataset.
125125
- `cpu_threshold_percent` / `memory_threshold_percent`: Configurable thresholds for resource pressure alerts.
126-
- `kubeconfig_path`: Optional path to a kubeconfig file for live mode. Defaults to the standard `kubectl` config.
126+
- `kubeconfig_path`: Optional path to a `kubeconfig` file for live mode. Defaults to the standard `kubectl` config.
127127

128128
#### Workflow
129129

@@ -152,7 +152,7 @@ Offline mode uses predefined scenarios to simulate cluster issues without requir
152152

153153
Three scenarios are included:
154154
- **`node-not-ready`**: A worker node becomes unreachable, causing pod evictions.
155-
- **`memory-pressure`**: Multiple pods are OOMKilled due to memory exhaustion on a worker node.
155+
- **`memory-pressure`**: Multiple pods are `OOMKilled` due to memory exhaustion on a worker node.
156156
- **`healthy-cluster`**: Normal cluster operations with no issues.
157157

158158
```bash
@@ -196,4 +196,4 @@ nat run \
196196
You can customize the live mode configuration to:
197197
- Target specific namespaces with the `namespaces` list in `pod_health_check`.
198198
- Adjust resource thresholds with `cpu_threshold_percent` and `memory_threshold_percent`.
199-
- Point to a specific kubeconfig file with `kubeconfig_path`.
199+
- Point to a specific `kubeconfig` file with `kubeconfig_path`.

0 commit comments

Comments
 (0)