Skip to content

Commit 4add01d

Browse files
committed
AGENTS.md
1 parent c18ef79 commit 4add01d

1 file changed

Lines changed: 73 additions & 0 deletions

File tree

AGENTS.md

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
# Agent Documentation (AGENTS.md)
2+
3+
This document provides a technical overview of the **MyGPU** repository to assist LLM agents and developers in understanding the codebase structure, data flow, and implementation details.
4+
5+
---
6+
7+
## 🏗 Repository Structure
8+
9+
### Core Package: `monitor/`
10+
The heart of the application, organized by responsibility:
11+
12+
- **`monitor/api/`**: The FastAPI-based web server.
13+
- `server.py`: Main API definition, WebSocket handling, and routing.
14+
- `templates/` & `static/`: Frontend assets (Vanilla JS, CSS, HTML).
15+
- **`monitor/collectors/`**: Data acquisition layer.
16+
- `gpu.py`: NVIDIA/Apple Silicon GPU metric collection.
17+
- `system.py`: CPU, RAM, Disk, and Hostname info (via `psutil`).
18+
- `network.py`: Network interface statistics.
19+
- **`monitor/benchmark/`**: Stress-testing and physics workloads.
20+
- `runner.py`: Orchestrates benchmark execution.
21+
- `physics_torch.py` / `gpu_setup.py`: PyTorch-based particle physics engine.
22+
- `workloads.py`: GEMM and other computational stress tests.
23+
- **`monitor/storage/`**: Persistance layer.
24+
- `sqlite.py`: Manages the `metrics.db` SQLite database using a unified connector.
25+
- **`monitor/alerting/`**: Alert engine and notifications.
26+
- `rules.py`: Threshold evaluation logic.
27+
- `toaster.py`: Cross-platform system notifications (Windows, Linux, macOS).
28+
- **`monitor/utils/`**: Helper utilities.
29+
- `features.py`: Capability detection (CUDA, CuPy, PyTorch, Platform).
30+
31+
### External Entry Points
32+
- **`health_monitor.py`**: The primary CLI entry point. Uses `click` for commands (`web`, `cli`, `benchmark`, `refresh`).
33+
- **`setup.ps1` / `setup.sh`**: Cross-platform environment installers (uses `uv`).
34+
35+
---
36+
37+
## 🔄 Data Flow & Connectivity
38+
39+
1. **Collection**: `health_monitor.py` starts a background thread or process that periodically triggers `collectors`.
40+
2. **Storage**: Collected metrics are passed to `monitor.storage.sqlite` and appended to the `metrics.db` file.
41+
3. **API Service**: `monitor.api.server` reads live data from memory (cached state) and historical data from the SQLite database.
42+
4. **Frontend**: The web dashboard polls the `/api/status` endpoint for live updates and uses WebSockets (`/ws/simulation`) for real-time benchmark visualization.
43+
5. **Alerting**: The `AlertEngine` evaluates every new metric sample against rules defined in `config.yaml`. If a threshold is hit, it triggers `toaster.py`.
44+
45+
---
46+
47+
## 🛠 Technology Stack
48+
49+
- **Backend**: Python 3.10+, FastAPI (Web Server), Click (CLI).
50+
- **Frontend**: Vanilla JS (Dynamic UI), Chart.js (History graphs).
51+
- **GPU Computing**:
52+
- NVIDIA: `nvidia-ml-py` (NVML) for metrics, `CuPy` or `PyTorch` for benchmarks.
53+
- Apple Silicon: `psutil` and native commands for basic metrics.
54+
- **Environment**: `uv` is the preferred package manager for virtual environments.
55+
56+
---
57+
58+
## 🤖 LLM Implementation Principles
59+
60+
When modifying this repository, please adhere to these guidelines:
61+
62+
1. **Cross-Platform First**: Always consider Windows, Linux, and macOS. Use `platform.system()` and provide fallbacks.
63+
2. **Modular Collectors**: If adding a new metric, create a new file in `monitor/collectors/` and register it in the main loop within `health_monitor.py`.
64+
3. **Non-Blocking API**: API endpoints and WebSockets must remain non-blocking. Use `asyncio` for I/O and `threading` for compute-heavy benchmarks.
65+
4. **Graceful Degredation**: Ensure the dashboard works even if no GPU is detected (fall back to CPU metrics).
66+
5. **Database Integrity**: Use the existing `SQLiteManager` in `monitor/storage/sqlite.py` to ensure thread-safe database access.
67+
68+
---
69+
70+
## ⚠️ Known "Old" or Volatile Files
71+
- **`old/`**: Contains legacy translation and utility scripts. These are preserved for reference but are not part of the runtime.
72+
- **`metrics.db`**: Automatically generated. Can be safely deleted to reset history.
73+
- **`.features_cache`**: Caches hardware detection results. Run `python health_monitor.py refresh` to clear.

0 commit comments

Comments
 (0)