Skip to content

Commit b07e614

Browse files
committed
Harden local LLM config and runtime validation for production
1 parent ac488cc commit b07e614

6 files changed

Lines changed: 170 additions & 133 deletions

File tree

README.md

Lines changed: 21 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,10 @@
44

55
> **Note:** All code comments and docstrings are in English for international collaboration and code clarity. All user-facing messages and buttons are automatically translated to the user's selected language.
66
7-
## 🚀 What's New in v4.0.0
7+
## 🚀 What's New in v4.1.0
88

99
- **🆕 Multi-Level LLM Architecture**: OpenAI → Groq → Local LLM → Fallback Plan
10-
- **🆕 Local LLM Integration**: TinyLlama 1.1B model for offline operation
10+
- **🆕 Local LLM Integration**: Google Gemma 4 model for offline operation
1111
- **🆕 Guaranteed Availability**: Bot works even without internet connection
1212
- **🆕 Enhanced Fallback System**: Robust error handling and service switching
1313
- **🆕 Improved Plan Quality**: Professional-grade study plan templates
@@ -37,7 +37,7 @@ The bot features a sophisticated 4-tier fallback system that ensures reliable se
3737
|----------|---------|-------------|----------|
3838
| **1** | **OpenAI GPT** | Primary model for high-quality plans | Best quality, when available |
3939
| **2** | **Groq** | Secondary model, OpenAI alternative | Fast fallback, reliable service |
40-
| **3** | **Local LLM** | TinyLlama 1.1B local model | Offline operation, privacy |
40+
| **3** | **Local LLM** | Google Gemma 4 local model | Offline operation, privacy |
4141
| **4** | **Fallback Plan** | Predefined professional template | Guaranteed availability |
4242

4343
### ⚡ How It Works
@@ -46,7 +46,7 @@ The bot automatically attempts to generate study plans using available services
4646

4747
1. **Primary**: OpenAI API (if `OPENAI_API_KEY` is set and quota available)
4848
2. **Fallback 1**: [Groq](https://groq.com/) (if `GROQ_API_KEY` is set)
49-
3. **Fallback 2**: Local LLM (TinyLlama 1.1B model)
49+
3. **Fallback 2**: Local LLM (Google Gemma 4 model)
5050
4. **Last Resort**: Local plan generator (comprehensive template)
5151

5252
### 🔄 Translation Fallback
@@ -131,24 +131,24 @@ pip install -r requirements.txt
131131
```
132132

133133
### 3. Set up Local LLM (Recommended)
134-
The bot includes a local TinyLlama 1.1B model for offline operation:
134+
The bot includes a local Google Gemma 4 model for offline operation:
135135

136-
- **Model**: TinyLlama 1.1B Chat v1.0 (Q4_K_M quantized)
136+
- **Model**: Google Gemma 4 Instruct (GGUF, quantized)
137137
- **Format**: GGUF format
138-
- **Size**: ~1.1GB
139-
- **Requirements**: ~2GB RAM for optimal performance
138+
- **Size**: depends on variant/quantization (typically several GB)
139+
- **Requirements**: depends on variant (recommended 8GB+ RAM for 4B class models)
140140

141141
**Important**: The model file is not included in the repository due to size limitations. You must download it separately:
142142

143143
```bash
144144
# Download the model (choose one method)
145145
# Option 1: Using wget
146-
wget -O models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \
147-
"https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf"
146+
wget -O models/google-gemma-4b-it-Q4_K_M.gguf \
147+
"<YOUR_GEMMA4_GGUF_DOWNLOAD_URL>"
148148

149149
# Option 2: Using curl
150-
curl -L -o models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \
151-
"https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf"
150+
curl -L -o models/google-gemma-4b-it-Q4_K_M.gguf \
151+
"<YOUR_GEMMA4_GGUF_DOWNLOAD_URL>"
152152
```
153153

154154
See [models/README.md](models/README.md) for detailed download instructions and troubleshooting.
@@ -161,6 +161,10 @@ Create a `.env` file in the root directory or rename `.env.example` to `.env` an
161161
BOT_TOKEN=your_telegram_bot_token
162162
OPENAI_API_KEY=your_openai_api_key
163163
GROQ_API_KEY=your_groq_api_key
164+
LOCAL_LLM_MODEL_PATH=models/google-gemma-4b-it-Q4_K_M.gguf
165+
LOCAL_LLM_CONTEXT=4096
166+
LOCAL_LLM_THREADS=4
167+
LOCAL_LLM_MAX_TOKENS=512
164168
```
165169
All environment variables are loaded from `.env` automatically.
166170

@@ -200,7 +204,7 @@ EduPlannerBotAI/
200204
│ └── language.py # Language selection and filter
201205
├── services/ # Core logic and helper functions
202206
│ ├── llm.py # Multi-level LLM integration (OpenAI → Groq → Local LLM → Fallback)
203-
│ ├── local_llm.py # Local TinyLlama model integration
207+
│ ├── local_llm.py # Local Google Gemma 4 model integration
204208
│ ├── pdf.py # PDF export
205209
│ ├── txt.py # TXT export
206210
│ ├── reminders.py # Reminder simulation
@@ -221,7 +225,7 @@ EduPlannerBotAI/
221225
| **aiogram** | Telegram Bot Framework | 3.x |
222226
| **OpenAI API** | Primary LLM provider | Latest |
223227
| **Groq API** | Secondary LLM provider | Latest |
224-
| **Local LLM** | TinyLlama 1.1B offline | GGUF |
228+
| **Local LLM** | Google Gemma 4 offline | GGUF |
225229
| **llama-cpp-python** | Local LLM inference | Latest |
226230
| **fpdf** | PDF file generation | Latest |
227231
| **TinyDB** | Lightweight NoSQL database | Latest |
@@ -236,11 +240,11 @@ EduPlannerBotAI/
236240
- **Testing**: pytest with 100% coverage
237241
- **Style**: PEP8 compliant
238242

239-
## 📝 Release 4.0.0 Highlights
243+
## 📝 Release 4.1.0 Highlights
240244

241245
### 🆕 Major Features
242246
- **Multi-Level LLM Architecture**: OpenAI → Groq → Local LLM → Fallback Plan
243-
- **Local LLM Integration**: TinyLlama 1.1B model for offline operation
247+
- **Local LLM Integration**: Google Gemma 4 model for offline operation
244248
- **Guaranteed Availability**: Bot works even without internet connection
245249
- **Enhanced Fallback System**: Robust error handling and service switching
246250

@@ -309,4 +313,4 @@ MIT License - see [LICENSE](LICENSE) file for details.
309313

310314
---
311315

312-
**EduPlannerBotAI v4.0.0** represents a significant milestone, transforming the bot from a simple OpenAI-dependent service into a robust, enterprise-grade system with guaranteed availability and offline operation capabilities. This release sets the foundation for future enhancements while maintaining backward compatibility and improving overall user experience.
316+
**EduPlannerBotAI v4.1.0** represents a significant milestone, transforming the bot from a simple OpenAI-dependent service into a robust, enterprise-grade system with guaranteed availability and offline operation capabilities. This release sets the foundation for future enhancements while maintaining backward compatibility and improving overall user experience.

config.py

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,31 @@
33

44
load_dotenv()
55

6+
7+
def _get_int_env(var_name: str, default: int, min_value: int = 1) -> int:
8+
"""Parse integer env var safely with fallback to default.
9+
10+
Invalid or out-of-range values are ignored to keep startup stable.
11+
"""
12+
raw_value = os.getenv(var_name)
13+
if raw_value is None:
14+
return default
15+
try:
16+
parsed_value = int(raw_value)
17+
if parsed_value < min_value:
18+
return default
19+
return parsed_value
20+
except (TypeError, ValueError):
21+
return default
22+
23+
624
TOKEN = os.getenv("BOT_TOKEN")
725
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
826
GROQ_API_KEY = os.getenv("GROQ_API_KEY")
927
OPENAI_MODEL = os.getenv("OPENAI_MODEL", "gpt-4o-mini")
1028
GROQ_MODEL = os.getenv("GROQ_MODEL", "llama-3.1-8b-instant")
29+
30+
LOCAL_LLM_MODEL_PATH = os.getenv("LOCAL_LLM_MODEL_PATH", "models/google-gemma-4b-it-Q4_K_M.gguf")
31+
LOCAL_LLM_CONTEXT = _get_int_env("LOCAL_LLM_CONTEXT", default=4096, min_value=512)
32+
LOCAL_LLM_THREADS = _get_int_env("LOCAL_LLM_THREADS", default=4, min_value=1)
33+
LOCAL_LLM_MAX_TOKENS = _get_int_env("LOCAL_LLM_MAX_TOKENS", default=512, min_value=32)

models/README.md

Lines changed: 32 additions & 102 deletions
Original file line numberDiff line numberDiff line change
@@ -1,130 +1,60 @@
11
# Local LLM Models
22

3-
This directory contains the local language model used by EduPlannerBotAI for offline operation.
3+
This directory stores the local language model used by EduPlannerBotAI for offline mode.
44

5-
## Required Model
5+
## Default model (updated)
66

7-
**Model**: TinyLlama 1.1B Chat v1.0
8-
**Format**: GGUF (quantized)
9-
**Size**: ~1.1GB
10-
**Quantization**: Q4_K_M (4-bit, optimized for memory and speed)
7+
**Model family**: Google Gemma 4 (instruction-tuned, GGUF)
8+
**Recommended file name**: `google-gemma-4b-it-Q4_K_M.gguf`
9+
**Expected path**: `models/google-gemma-4b-it-Q4_K_M.gguf`
1110

12-
## Download Instructions
11+
> If your GGUF file has a different name, set `LOCAL_LLM_MODEL_PATH` in `.env`.
1312
14-
### Option 1: Direct Download from Hugging Face
13+
## Quick setup
1514

16-
1. Visit the model page: [TinyLlama-1.1B-Chat-v1.0-GGUF](https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF)
17-
2. Download the file: `tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf`
18-
3. Place it in this `models/` directory
19-
4. Ensure the filename matches exactly: `tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf`
15+
1. Download a Gemma 4 GGUF file from your preferred source.
16+
2. Put it into the `models/` folder.
17+
3. Set `.env`:
2018

21-
### Option 2: Using Hugging Face CLI
22-
23-
```bash
24-
# Install huggingface-hub if not already installed
25-
pip install huggingface-hub
26-
27-
# Download the model
28-
huggingface-cli download TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF \
29-
tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \
30-
--local-dir models/
31-
```
32-
33-
### Option 3: Using wget/curl
34-
35-
```bash
36-
# Using wget
37-
wget -O models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \
38-
"https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf"
39-
40-
# Using curl
41-
curl -L -o models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \
42-
"https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf"
19+
```env
20+
LOCAL_LLM_MODEL_PATH=models/google-gemma-4b-it-Q4_K_M.gguf
21+
LOCAL_LLM_CONTEXT=4096
22+
LOCAL_LLM_THREADS=4
23+
LOCAL_LLM_MAX_TOKENS=512
4324
```
4425

45-
## File Structure
46-
47-
After downloading, your directory should look like this:
26+
## File structure
4827

49-
```
28+
```text
5029
models/
5130
├── README.md
52-
└── tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf # ~1.1GB
31+
└── google-gemma-4b-it-Q4_K_M.gguf
5332
```
5433

5534
## Verification
5635

57-
Verify the model is correctly downloaded:
58-
5936
```bash
60-
# Check file exists and size
61-
ls -lh models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
62-
63-
# Expected output:
64-
# -rw-r--r-- 1 user user 1.1G Jan 1 12:00 tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
65-
66-
# Check file integrity (optional)
67-
file models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
37+
ls -lh models/google-gemma-4b-it-Q4_K_M.gguf
38+
file models/google-gemma-4b-it-Q4_K_M.gguf
6839
```
6940

70-
## Model Specifications
41+
## Troubleshooting
7142

72-
- **Architecture**: TinyLlama 1.1B (Llama architecture)
73-
- **Training Data**: Chat/instruction fine-tuned
74-
- **Context Length**: 2048 tokens
75-
- **Quantization**: Q4_K_M (4-bit, optimized)
76-
- **Memory Usage**: ~2GB RAM during inference
77-
- **Performance**: Good quality for study plan generation
43+
### Model not loaded
7844

79-
## Troubleshooting
45+
If you see:
8046

81-
### Model Not Found Error
82-
```
47+
```text
8348
[Local LLM error: Model not loaded]
8449
```
85-
**Solution**: Ensure the model file is in the correct location with the exact filename.
86-
87-
### Memory Issues
88-
```
89-
[Local LLM error: Out of memory]
90-
```
91-
**Solution**:
92-
- Ensure you have at least 2GB RAM available
93-
- Close other memory-intensive applications
94-
- Consider using a smaller model variant
95-
96-
### Slow Performance
97-
**Solutions**:
98-
- Ensure you have a multi-core CPU
99-
- Close unnecessary background processes
100-
- The first request may be slower due to model loading
101-
102-
## Alternative Models
103-
104-
If you prefer a different model, you can use any GGUF format model:
105-
106-
1. **Llama 2 7B**: Better quality, larger size (~4GB)
107-
2. **Mistral 7B**: Excellent performance, medium size (~4GB)
108-
3. **Phi-2**: Good quality, smaller size (~1.4GB)
109-
110-
**Note**: Update the model path in `services/local_llm.py` if using a different model.
111-
112-
## Performance Tips
113-
114-
- **First Run**: The first request will be slower as the model loads into memory
115-
- **Subsequent Requests**: Much faster after initial loading
116-
- **Memory**: Keep at least 2GB RAM free for optimal performance
117-
- **CPU**: Multi-core processors will improve inference speed
118-
119-
## Support
120-
121-
If you encounter issues with the local LLM:
12250

123-
1. Check the bot logs for detailed error messages
124-
2. Verify the model file is correctly placed
125-
3. Ensure sufficient system resources
126-
4. Open an issue on GitHub with error details
51+
Check:
52+
- file path in `LOCAL_LLM_MODEL_PATH`
53+
- read permissions for the model file
54+
- available RAM/CPU resources
12755

128-
## License
56+
### Out-of-memory or slow responses
12957

130-
The TinyLlama model is licensed under Apache 2.0. See the [Hugging Face page](https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF) for full license details.
58+
- Reduce context: `LOCAL_LLM_CONTEXT=2048`
59+
- Use lower-bit quantization if available
60+
- Close other heavy processes

0 commit comments

Comments
 (0)