Skip to content

Commit 1b3571b

Browse files
committed
Add Local LLM setup instructions and exclude model files from Git
- Add comprehensive download instructions in models/README.md - Exclude LLM model files from Git (.gitignore) - Update main README with model download commands - Add .gitkeep to preserve models/ directory structure - Remove large model file from Git tracking
1 parent e4b3b93 commit 1b3571b

4 files changed

Lines changed: 156 additions & 0 deletions

File tree

.gitignore

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -174,6 +174,14 @@ db.json
174174
*.png
175175
/plans/
176176

177+
# Local LLM models (too large for Git)
178+
models/*.gguf
179+
models/*.bin
180+
models/*.safetensors
181+
models/*.pth
182+
models/*.pt
183+
models/*.ckpt
184+
177185
# Test artifacts
178186
*.tmp
179187
*.log

README.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -114,6 +114,21 @@ The bot includes a local TinyLlama 1.1B model for offline operation:
114114
- **Size**: ~1.1GB
115115
- **Requirements**: ~2GB RAM for optimal performance
116116

117+
**Important**: The model file is not included in the repository due to size limitations. You must download it separately:
118+
119+
```bash
120+
# Download the model (choose one method)
121+
# Option 1: Using wget
122+
wget -O models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \
123+
"https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf"
124+
125+
# Option 2: Using curl
126+
curl -L -o models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \
127+
"https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf"
128+
```
129+
130+
See [models/README.md](models/README.md) for detailed download instructions and troubleshooting.
131+
117132
The model is automatically loaded at startup and provides offline fallback capability.
118133

119134
### 4. Create .env file

models/.gitkeep

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# This file ensures the models/ directory is tracked by Git
2+
# The actual LLM model files are ignored due to size limitations
3+
# See README.md for download instructions

models/README.md

Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
# Local LLM Models
2+
3+
This directory contains the local language model used by EduPlannerBotAI for offline operation.
4+
5+
## Required Model
6+
7+
**Model**: TinyLlama 1.1B Chat v1.0
8+
**Format**: GGUF (quantized)
9+
**Size**: ~1.1GB
10+
**Quantization**: Q4_K_M (4-bit, optimized for memory and speed)
11+
12+
## Download Instructions
13+
14+
### Option 1: Direct Download from Hugging Face
15+
16+
1. Visit the model page: [TinyLlama-1.1B-Chat-v1.0-GGUF](https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF)
17+
2. Download the file: `tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf`
18+
3. Place it in this `models/` directory
19+
4. Ensure the filename matches exactly: `tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf`
20+
21+
### Option 2: Using Hugging Face CLI
22+
23+
```bash
24+
# Install huggingface-hub if not already installed
25+
pip install huggingface-hub
26+
27+
# Download the model
28+
huggingface-cli download TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF \
29+
tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \
30+
--local-dir models/
31+
```
32+
33+
### Option 3: Using wget/curl
34+
35+
```bash
36+
# Using wget
37+
wget -O models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \
38+
"https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf"
39+
40+
# Using curl
41+
curl -L -o models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \
42+
"https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf"
43+
```
44+
45+
## File Structure
46+
47+
After downloading, your directory should look like this:
48+
49+
```
50+
models/
51+
├── README.md
52+
└── tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf # ~1.1GB
53+
```
54+
55+
## Verification
56+
57+
Verify the model is correctly downloaded:
58+
59+
```bash
60+
# Check file exists and size
61+
ls -lh models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
62+
63+
# Expected output:
64+
# -rw-r--r-- 1 user user 1.1G Jan 1 12:00 tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
65+
66+
# Check file integrity (optional)
67+
file models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
68+
```
69+
70+
## Model Specifications
71+
72+
- **Architecture**: TinyLlama 1.1B (Llama architecture)
73+
- **Training Data**: Chat/instruction fine-tuned
74+
- **Context Length**: 2048 tokens
75+
- **Quantization**: Q4_K_M (4-bit, optimized)
76+
- **Memory Usage**: ~2GB RAM during inference
77+
- **Performance**: Good quality for study plan generation
78+
79+
## Troubleshooting
80+
81+
### Model Not Found Error
82+
```
83+
[Local LLM error: Model not loaded]
84+
```
85+
**Solution**: Ensure the model file is in the correct location with the exact filename.
86+
87+
### Memory Issues
88+
```
89+
[Local LLM error: Out of memory]
90+
```
91+
**Solution**:
92+
- Ensure you have at least 2GB RAM available
93+
- Close other memory-intensive applications
94+
- Consider using a smaller model variant
95+
96+
### Slow Performance
97+
**Solutions**:
98+
- Ensure you have a multi-core CPU
99+
- Close unnecessary background processes
100+
- The first request may be slower due to model loading
101+
102+
## Alternative Models
103+
104+
If you prefer a different model, you can use any GGUF format model:
105+
106+
1. **Llama 2 7B**: Better quality, larger size (~4GB)
107+
2. **Mistral 7B**: Excellent performance, medium size (~4GB)
108+
3. **Phi-2**: Good quality, smaller size (~1.4GB)
109+
110+
**Note**: Update the model path in `services/local_llm.py` if using a different model.
111+
112+
## Performance Tips
113+
114+
- **First Run**: The first request will be slower as the model loads into memory
115+
- **Subsequent Requests**: Much faster after initial loading
116+
- **Memory**: Keep at least 2GB RAM free for optimal performance
117+
- **CPU**: Multi-core processors will improve inference speed
118+
119+
## Support
120+
121+
If you encounter issues with the local LLM:
122+
123+
1. Check the bot logs for detailed error messages
124+
2. Verify the model file is correctly placed
125+
3. Ensure sufficient system resources
126+
4. Open an issue on GitHub with error details
127+
128+
## License
129+
130+
The TinyLlama model is licensed under Apache 2.0. See the [Hugging Face page](https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF) for full license details.

0 commit comments

Comments
 (0)