Skip to content

Commit 3667903

Browse files
committed
articles
1 parent b43ae9d commit 3667903

1 file changed

Lines changed: 17 additions & 20 deletions

File tree

docs/home-lab/articles/ai-log-summary/ai-log-sre.md

Lines changed: 17 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -3,15 +3,15 @@
33
**Project Status:** ✅ Operational
44
**Components:** Grafana Loki, Google Gemini 2.0 Flash, Home Assistant, Unraid, Python
55

6-
## 1. The Problem: Log Fatigue
6+
### 1. The Problem: Log Fatigue
77
In a distributed homelab (Unraid, Proxmox VE, Edge Servers, DNS (Adguard + Unbound), Traefik, Unifi Network, Tailscale ...), logs are scattered everywhere.
88
* **Volume:** My servers generate ~1GB of text logs daily.
99
* **Visibility:** I only looked at logs *after* I noticed something was broken.
1010
* **Noise:** 99% of logs are "Info", masking the 1% "Critical" errors.
1111

1212
I needed a system that wouldn't just *store* logs, but actively *analyze* them and tap me on the shoulder only when it found something I actually needed to see.
1313

14-
## 2. The Solution
14+
### 2. The Solution
1515
I built a centralized logging pipeline using **Grafana** and **Loki** (for storage) and a custom **Python + Gemini** script (for analysis).
1616

1717
Instead of feeding raw logs to an LLM (which is slow and expensive), I implemented a **"Pre-processing Engine"** that:
@@ -22,13 +22,13 @@ Instead of feeding raw logs to an LLM (which is slow and expensive), I implement
2222

2323

2424

25-
<video width="600px" autoplay loop muted playsinline>
25+
<video width="100%" autoplay loop muted playsinline>
2626
<source src="../ai-home-assistant-dashboard.mp4" type="video/mp4">
2727
Your browser does not support the video tag.
2828
</video>
2929

3030

31-
## 3. Architecture Diagram
31+
### 3. Architecture Diagram
3232

3333
```mermaid
3434
graph TD
@@ -69,7 +69,7 @@ graph TD
6969
style Grafana fill:#ff9900,stroke:#333,stroke-width:2px,color:white
7070
```
7171

72-
## 4. Key Features
72+
### 4. Key Features
7373
* **Cost Effic ient:** Uses client-side deduplication to reduce token usage by ~95%.
7474
* **Massive Context:** Can analyze up to 50,000 log lines per run.
7575
* **Self-Healing:** If the report fails, Home Assistant retains the last known state.
@@ -87,21 +87,21 @@ graph TD
8787
<br>
8888
<br>
8989

90-
# 🛠️ Implementation Guide
90+
## 🛠️ Implementation Guide
9191

9292
This guide details how to reproduce the "AI Log SRE" stack.
9393

94-
## Prerequisites
94+
### Prerequisites
9595
* **Unraid Server** (or any Docker host).
9696
* **Google Gemini API Key** (Free tier is sufficient, Paid recommended for high limits).
9797
* **Home Assistant** (for notifications).
9898

9999
---
100100

101-
## Step 1: Central Server (Unraid)
101+
### Step 1: Central Server (Unraid)
102102
We run the `loki` database and the `ai-reporter` script in a single stack.
103103

104-
### Docker Compose
104+
#### Docker Compose
105105
```yaml
106106
services:
107107
loki:
@@ -190,7 +190,7 @@ limits_config:
190190
max_entries_limit_per_query: 50000 # <--- CRITICAL FOR AI
191191
```
192192

193-
### Step 2: Log Collection (Edge Nodes)
193+
#### Step 2: Log Collection (Edge Nodes)
194194
On every other server (Proxmox, Pi, Edge), we run **Promtail** to ship logs to Unraid.
195195

196196
**Promtail Config** (config.yml)
@@ -269,7 +269,7 @@ scrape_configs:
269269
270270
```
271271

272-
### Step 3: The Intelligence (Python Script)
272+
#### Step 3: The Intelligence (Python Script)
273273
This script runs inside the `ai-log-reporter` container.
274274

275275
**Key Logic:**
@@ -311,17 +311,14 @@ automation:
311311

312312
<br>
313313
<br>
314-
<br>
315-
<br>
316-
<br>
317-
<br>
318314

319-
# 📖 User Manual & Operations
320315

321-
## How to Read the Daily Report
316+
## 📖 User Manual & Operations
317+
318+
### How to Read the Daily Report
322319
The AI Summary appears in Home Assistant every morning at 07:00.
323320

324-
### The Iconography
321+
#### The Iconography
325322
* 🔴 **CRITICAL:** Immediate action required.
326323
* *Examples:* Database corruption, Disk failure (SMART), Service boot loops.
327324
* *Action:* Check Grafana immediately.
@@ -332,7 +329,7 @@ The AI Summary appears in Home Assistant every morning at 07:00.
332329
* *Examples:* Timeouts, configuration deprecation warnings.
333330
* *Action:* Add to "Technical Debt" to-do list.
334331

335-
## Troubleshooting
332+
### Troubleshooting
336333
**"Report says: No critical errors found."**
337334
* **Good News:** Your system is healthy!
338335
* **Verification:** Check the `ai-log-reporter` container logs to ensure it actually ran and didn't just fail to fetch data.
@@ -341,7 +338,7 @@ The AI Summary appears in Home Assistant every morning at 07:00.
341338
* Check Home Assistant logs for `Shell Command` errors.
342339
* Ensure the SSH key in Home Assistant allows connection to Unraid without a password.
343340

344-
## Grafana Deep Dive
341+
### Grafana Deep Dive
345342
When the AI reports a "Critical" error, use Grafana to investigate.
346343

347344
**Recommended LogQL Query:**

0 commit comments

Comments
 (0)