The Deployment Toolkit is a sophisticated Language Model (LLM) application designed to leverage the power of Intel CPUs and GPUs. It features Retrieval Augmented Generation (RAG), a cutting-edge technology that enhances the model's ability to generate accurate and contextually relevant responses by integrating external information retrieval mechanisms.
- CPU: 13th generations of Intel Core processors and above
- GPU: Intel® Arc™ graphics
- RAM: 32GB
- DISK: 128GB
- OpenVINO: 2024.6.0
- NodeJS: v22.13.0 LTS
Please ensure that you have these ports available before running the applications.
| Apps | Port |
|---|---|
| UI | 8010 |
| Backend | 8011 |
| LLM Service | 8012 |
| Text to Speech Service | 8013 |
| Speech to Text Service | 8014 |
Windows 11
Double click on the install-backend.bat
Double click on the install-ui.bat
5.1 Start Ollama by following the doc
5.2 Start Text to speech by following the doc
5.3 Start Speech to text by following the doc
Double click on the run.bat
- Changing the inference device for embedding model. Supported device: ["CPU", "GPU"]
# Example: Loading embedding model on GPU device
export EMBEDDING_DEVICE=GPU- Changing the inference device for reranker model. Supported device: ["CPU", "GPU"]
# Example: Loading reranker model on GPU device
export RERANKER_DEVICE=GPU- Current speech-to-text feature only work with localhost.
- RAG documents will use all the documents that are uploaded.
