Skip to content

Commit e65abb5

Browse files
authored
Merge pull request #2 from DataBoySu/first-version
First version
2 parents c721caf + ae64b84 commit e65abb5

19 files changed

Lines changed: 2059 additions & 1777 deletions

.gitignore

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,13 +18,19 @@ env/
1818
.vscode/
1919
*.swp
2020

21-
# Data
21+
# Data and cache
2222
*.db
2323
metrics.db
24+
.features_cache
2425

2526
# Logs
2627
*.log
2728

29+
# Old/backup files
30+
*.old
31+
*.bak
32+
*.tmp
33+
2834
# OS
2935
.DS_Store
3036
Thumbs.db

DISTRIBUTION.md

Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
# Distribution Setup Complete
2+
3+
## Summary
4+
Cluster Health Monitor v1.0.0 is now ready for portable ZIP distribution.
5+
6+
## What Was Implemented
7+
8+
### 1. Code Cleanup
9+
- Removed debug print statements from workloads.py
10+
- No emojis or verbose logging in code
11+
- Clean, concise comments throughout
12+
13+
### 2. Feature Detection & Caching
14+
- `monitor/utils/features.py`: Runtime feature detection
15+
- Detects: nvidia-smi, cupy, torch, gpu_benchmark availability
16+
- Results cached in `.features_cache` JSON file
17+
- Fast subsequent loads (no repeated checks)
18+
19+
### 3. Requirements Simplified
20+
- Single `requirements.txt` file
21+
- Core dependencies required
22+
- GPU libraries (cupy/torch) commented as optional
23+
- Setup script prompts for GPU library installation
24+
25+
### 4. PowerShell Setup Script
26+
- `setup.ps1`: Automated Windows setup wizard
27+
- Checks Python 3.8+
28+
- Detects NVIDIA drivers and CUDA version
29+
- Creates virtual environment
30+
- Installs dependencies
31+
- Prompts for CuPy or PyTorch based on CUDA version
32+
- Runs feature detection and caching
33+
- Verifies installation
34+
35+
### 5. Update Mechanism
36+
- CLI: `python health_monitor.py --update`
37+
- Web: "Check for Updates" button in header
38+
- Checks GitHub releases API
39+
- Downloads and applies updates automatically
40+
- Preserves venv, config, and data
41+
42+
### 6. Feature Graying in UI
43+
- `/api/features` endpoint returns cached feature flags
44+
- JavaScript checks features on page load
45+
- Disables benchmark controls if GPU libraries not available
46+
- Visual feedback: opacity 0.5, cursor not-allowed
47+
- Alert message explains missing libraries
48+
49+
### 7. Multi-GPU Support
50+
- Already implemented in gpu.py collector
51+
- Loops through all NVIDIA GPUs via NVML
52+
- Web UI displays all GPUs in grid
53+
- Benchmark supports any GPU (defaults to GPU 0)
54+
55+
### 8. Portable ZIP Distribution
56+
- `package.ps1`: Creates distribution ZIP
57+
- Includes: monitor/, health_monitor.py, config.yaml, requirements.txt, setup.ps1, README.md, LICENSE
58+
- Excludes: venv, __pycache__, .features_cache, *.db
59+
- ~50KB compressed size
60+
- Ready for GitHub releases
61+
62+
### 9. Updated Documentation
63+
- README.md rewritten for ZIP distribution
64+
- Installation: Download → Extract → Run setup.ps1
65+
- Troubleshooting section updated
66+
- Simplified project structure
67+
- Removed development-focused content
68+
69+
## Files Created/Modified
70+
71+
### New Files
72+
- `monitor/utils/features.py` - Feature detection
73+
- `monitor/utils/update.py` - Update mechanism
74+
- `monitor/utils/__init__.py` - Utils module exports
75+
- `setup.ps1` - Windows setup wizard
76+
- `package.ps1` - Distribution packaging script
77+
78+
### Modified Files
79+
- `health_monitor.py` - Added --update flag
80+
- `monitor/api/server.py` - Added /api/features, /api/update/* endpoints
81+
- `monitor/api/templates/index.html` - Update button, feature graying
82+
- `monitor/benchmark/workloads.py` - Removed debug prints
83+
- `requirements.txt` - Simplified to single file
84+
- `README.md` - Complete rewrite for ZIP distribution
85+
86+
### Removed Files
87+
- `requirements-base.txt` - Merged into requirements.txt
88+
- `requirements-gpu.txt` - Merged into requirements.txt
89+
- `setup.py` - No longer using pip package
90+
- `MANIFEST.in` - No longer needed
91+
- `BUILD.md` - Removed
92+
- `CHECKLIST.md` - Removed
93+
- `RELEASE_NOTES.md` - Removed
94+
95+
## Usage
96+
97+
### For End Users
98+
1. Download `cluster-health-monitor-v1.0.0.zip` from releases
99+
2. Extract to desired location
100+
3. Run `setup.ps1` in PowerShell
101+
4. Activate venv and run: `python health_monitor.py monitor --web`
102+
5. Access dashboard at http://localhost:8090
103+
104+
### For Distribution
105+
1. Run `.\package.ps1` to create ZIP
106+
2. Upload `cluster-health-monitor-v1.0.0.zip` to GitHub releases
107+
3. Users download and follow above steps
108+
109+
### For Updates
110+
Users can update via:
111+
- CLI: `python health_monitor.py --update`
112+
- Web: Click "Check for Updates" button
113+
114+
## Next Steps (Future)
115+
- Create GitHub Actions workflow for automated releases
116+
- Add version check on startup (optional notification)
117+
- Multi-platform support (Linux setup.sh)
118+
- Configuration wizard in web UI
119+
- Export/import settings

0 commit comments

Comments
 (0)