LLM Deployment Toolkit

The Deployment Toolkit is a sophisticated Language Model (LLM) application designed to leverage the power of Intel CPUs and GPUs. It features Retrieval Augmented Generation (RAG), a cutting-edge technology that enhances the model's ability to generate accurate and contextually relevant responses by integrating external information retrieval mechanisms.

Requirements

Validated hardware

CPU: 13th generations of Intel Core processors and above
GPU: Intel® Arc™ graphics
RAM: 32GB
DISK: 128GB

Validated software version

OpenVINO: 2024.6.0
NodeJS: v22.13.0 LTS

Application ports

Please ensure that you have these ports available before running the applications.

Apps	Port
UI	8010
Backend	8011
LLM Service	8012
Text to Speech Service	8013
Speech to Text Service	8014

Quick Start

Ubuntu 24.04 LTS

1. Install prerequisite

Docker

2. Install GPU driver

Intel® Arc™ A-Series Graphics

3. Run Script to Setup and Install Docker

./setup.sh

4. Access the App

Navigate to http://localhost:8010

Windows 11

1. Install prerequisite

2. Install GPU driver

3. Follow the document and install the following services in the microservices folder.

Ollama: doc
Text to speech: doc
Speech to text: doc

4. Install RAG Toolkit

4.1 Install backend

Double click on the install-backend.bat

4.2 Install UI

Double click on the install-ui.bat

5. Run application

5.1 Start Ollama by following the doc

5.2 Start Text to speech by following the doc

5.3 Start Speech to text by following the doc

5.4 Start RAG Toolkit

Double click on the run.bat

FAQ

Changing the inference device for embedding model. Supported device: ["CPU", "GPU"]

# Example: Loading embedding model on GPU device
export EMBEDDING_DEVICE=GPU

Changing the inference device for reranker model. Supported device: ["CPU", "GPU"]

# Example: Loading reranker model on GPU device
export RERANKER_DEVICE=GPU

Limitations

Current speech-to-text feature only work with localhost.
RAG documents will use all the documents that are uploaded.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLM Deployment Toolkit

Requirements

Validated hardware

Validated software version

Application ports

Quick Start

1. Install prerequisite

2. Install GPU driver

3. Run Script to Setup and Install Docker

4. Access the App

1. Install prerequisite

2. Install GPU driver

3. Follow the document and install the following services in the microservices folder.

4. Install RAG Toolkit

4.1 Install backend

4.2 Install UI

5. Run application

5.1 Start Ollama by following the doc

5.2 Start Text to speech by following the doc

5.3 Start Speech to text by following the doc

5.4 Start RAG Toolkit

FAQ

Limitations

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

LLM Deployment Toolkit

Requirements

Validated hardware

Validated software version

Application ports

Quick Start

1. Install prerequisite

2. Install GPU driver

3. Run Script to Setup and Install Docker

4. Access the App

1. Install prerequisite

2. Install GPU driver

3. Follow the document and install the following services in the microservices folder.

4. Install RAG Toolkit

4.1 Install backend

4.2 Install UI

5. Run application

5.1 Start Ollama by following the doc

5.2 Start Text to speech by following the doc

5.3 Start Speech to text by following the doc

5.4 Start RAG Toolkit

FAQ

Limitations