|
| 1 | +# Local LLM Models |
| 2 | + |
| 3 | +This directory contains the local language model used by EduPlannerBotAI for offline operation. |
| 4 | + |
| 5 | +## Required Model |
| 6 | + |
| 7 | +**Model**: TinyLlama 1.1B Chat v1.0 |
| 8 | +**Format**: GGUF (quantized) |
| 9 | +**Size**: ~1.1GB |
| 10 | +**Quantization**: Q4_K_M (4-bit, optimized for memory and speed) |
| 11 | + |
| 12 | +## Download Instructions |
| 13 | + |
| 14 | +### Option 1: Direct Download from Hugging Face |
| 15 | + |
| 16 | +1. Visit the model page: [TinyLlama-1.1B-Chat-v1.0-GGUF](https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF) |
| 17 | +2. Download the file: `tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf` |
| 18 | +3. Place it in this `models/` directory |
| 19 | +4. Ensure the filename matches exactly: `tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf` |
| 20 | + |
| 21 | +### Option 2: Using Hugging Face CLI |
| 22 | + |
| 23 | +```bash |
| 24 | +# Install huggingface-hub if not already installed |
| 25 | +pip install huggingface-hub |
| 26 | + |
| 27 | +# Download the model |
| 28 | +huggingface-cli download TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF \ |
| 29 | + tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \ |
| 30 | + --local-dir models/ |
| 31 | +``` |
| 32 | + |
| 33 | +### Option 3: Using wget/curl |
| 34 | + |
| 35 | +```bash |
| 36 | +# Using wget |
| 37 | +wget -O models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \ |
| 38 | + "https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf" |
| 39 | + |
| 40 | +# Using curl |
| 41 | +curl -L -o models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \ |
| 42 | + "https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf" |
| 43 | +``` |
| 44 | + |
| 45 | +## File Structure |
| 46 | + |
| 47 | +After downloading, your directory should look like this: |
| 48 | + |
| 49 | +``` |
| 50 | +models/ |
| 51 | +├── README.md |
| 52 | +└── tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf # ~1.1GB |
| 53 | +``` |
| 54 | + |
| 55 | +## Verification |
| 56 | + |
| 57 | +Verify the model is correctly downloaded: |
| 58 | + |
| 59 | +```bash |
| 60 | +# Check file exists and size |
| 61 | +ls -lh models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf |
| 62 | + |
| 63 | +# Expected output: |
| 64 | +# -rw-r--r-- 1 user user 1.1G Jan 1 12:00 tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf |
| 65 | + |
| 66 | +# Check file integrity (optional) |
| 67 | +file models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf |
| 68 | +``` |
| 69 | + |
| 70 | +## Model Specifications |
| 71 | + |
| 72 | +- **Architecture**: TinyLlama 1.1B (Llama architecture) |
| 73 | +- **Training Data**: Chat/instruction fine-tuned |
| 74 | +- **Context Length**: 2048 tokens |
| 75 | +- **Quantization**: Q4_K_M (4-bit, optimized) |
| 76 | +- **Memory Usage**: ~2GB RAM during inference |
| 77 | +- **Performance**: Good quality for study plan generation |
| 78 | + |
| 79 | +## Troubleshooting |
| 80 | + |
| 81 | +### Model Not Found Error |
| 82 | +``` |
| 83 | +[Local LLM error: Model not loaded] |
| 84 | +``` |
| 85 | +**Solution**: Ensure the model file is in the correct location with the exact filename. |
| 86 | + |
| 87 | +### Memory Issues |
| 88 | +``` |
| 89 | +[Local LLM error: Out of memory] |
| 90 | +``` |
| 91 | +**Solution**: |
| 92 | +- Ensure you have at least 2GB RAM available |
| 93 | +- Close other memory-intensive applications |
| 94 | +- Consider using a smaller model variant |
| 95 | + |
| 96 | +### Slow Performance |
| 97 | +**Solutions**: |
| 98 | +- Ensure you have a multi-core CPU |
| 99 | +- Close unnecessary background processes |
| 100 | +- The first request may be slower due to model loading |
| 101 | + |
| 102 | +## Alternative Models |
| 103 | + |
| 104 | +If you prefer a different model, you can use any GGUF format model: |
| 105 | + |
| 106 | +1. **Llama 2 7B**: Better quality, larger size (~4GB) |
| 107 | +2. **Mistral 7B**: Excellent performance, medium size (~4GB) |
| 108 | +3. **Phi-2**: Good quality, smaller size (~1.4GB) |
| 109 | + |
| 110 | +**Note**: Update the model path in `services/local_llm.py` if using a different model. |
| 111 | + |
| 112 | +## Performance Tips |
| 113 | + |
| 114 | +- **First Run**: The first request will be slower as the model loads into memory |
| 115 | +- **Subsequent Requests**: Much faster after initial loading |
| 116 | +- **Memory**: Keep at least 2GB RAM free for optimal performance |
| 117 | +- **CPU**: Multi-core processors will improve inference speed |
| 118 | + |
| 119 | +## Support |
| 120 | + |
| 121 | +If you encounter issues with the local LLM: |
| 122 | + |
| 123 | +1. Check the bot logs for detailed error messages |
| 124 | +2. Verify the model file is correctly placed |
| 125 | +3. Ensure sufficient system resources |
| 126 | +4. Open an issue on GitHub with error details |
| 127 | + |
| 128 | +## License |
| 129 | + |
| 130 | +The TinyLlama model is licensed under Apache 2.0. See the [Hugging Face page](https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF) for full license details. |
0 commit comments