Skip to content

Latest commit

 

History

History
96 lines (72 loc) · 3.35 KB

File metadata and controls

96 lines (72 loc) · 3.35 KB

LLM Deployment Toolkit

The Deployment Toolkit is a sophisticated Language Model (LLM) application designed to leverage the power of Intel CPUs and GPUs. It features Retrieval Augmented Generation (RAG), a cutting-edge technology that enhances the model's ability to generate accurate and contextually relevant responses by integrating external information retrieval mechanisms.

LLM Deployment Toolkit

Requirements

Validated hardware

  • CPU: 13th generations of Intel Core processors and above
  • GPU: Intel® Arc™ graphics
  • RAM: 32GB
  • DISK: 128GB

Validated software version

  • OpenVINO: 2024.6.0
  • NodeJS: v22.13.0 LTS

Application ports

Please ensure that you have these ports available before running the applications.

Apps Port
UI 8010
Backend 8011
LLM Service 8012
Text to Speech Service 8013
Speech to Text Service 8014

Quick Start

Ubuntu 24.04 LTS

1. Install prerequisite

2. Install GPU driver

3. Run Script to Setup and Install Docker

./setup.sh

4. Access the App

Navigate to http://localhost:8010

Windows 11

1. Install prerequisite

2. Install GPU driver

3. Follow the document and install the following services in the microservices folder.

  • Ollama: doc
  • Text to speech: doc
  • Speech to text: doc

4. Install RAG Toolkit

4.1 Install backend

Double click on the install-backend.bat

4.2 Install UI

Double click on the install-ui.bat

5. Run application

5.1 Start Ollama by following the doc

5.2 Start Text to speech by following the doc

5.3 Start Speech to text by following the doc

5.4 Start RAG Toolkit

Double click on the run.bat

FAQ

  1. Changing the inference device for embedding model. Supported device: ["CPU", "GPU"]
# Example: Loading embedding model on GPU device
export EMBEDDING_DEVICE=GPU
  1. Changing the inference device for reranker model. Supported device: ["CPU", "GPU"]
# Example: Loading reranker model on GPU device
export RERANKER_DEVICE=GPU

Limitations

  1. Current speech-to-text feature only work with localhost.
  2. RAG documents will use all the documents that are uploaded.