This directory contains the deployment, startup, and image build scripts for EdgeCraftRAG.
The main scripts in this directory are:
quick_start.sh: recommended one-click deployment script for new users, with automatic setup and interactive guidancebootstrap.sh: non-interactive deployment orchestrator that can be used directly or invoked byquick_start.shmodel_download.sh: model preparation helper (supportsvllm/ov, optionalmodel_idandmodel_patharguments)run_ov_baremetal.sh: OpenVINO bare-metal startup scriptrun_ov_container.sh: OpenVINO container startup scriptrun_vllm_baremetal.sh: vLLM bare-metal startup scriptrun_vllm_container.sh: vLLM container startup scriptrun_ovms_baremetal.sh: OVMS bare-metal startup scriptrun_ovms_container.sh: OVMS container startup scriptbuild_images.sh: container image build script
Deployment methods:
| Method | Description | Requirements | Milvus Support |
|---|---|---|---|
| baremetal | Start services as Python processes | Python 3.10+ | No (in-memory only) |
| container | Start services in Docker containers | Docker / Docker Compose | Yes (enabled by default) |
Note: If you need Milvus, use the container deployment method.
Run this from the EdgeCraftRAG root directory:
./tools/quick_start.shThe script behaves as follows by default:
- runs in non-interactive mode
- uses OpenVINO as the default inference backend
- if
INFERENCE_BACKENDis not set, the script resolves it toopenvino - uses
baremetalas the default deployment method whenDEPLOYMENT_METHODis not set
In the default bare-metal flow, the script automatically:
- creates and activates
EdgeCraftRAG/ecrag_venvif it does not exist - validates the Python version (3.10+ required, 3.10/3.11 recommended)
- checks and installs required Python packages
- checks and installs
npmfor baremetal UI startup when needed - validates Intel GPU driver/runtime and auto-installs missing packages on apt-based Linux
- checks and auto-downloads missing models (embedding, reranker, OpenVINO LLM)
- writes a deployment environment snapshot to
workspace/bootstrap.envbefore invokingbootstrap.sh - calls
bootstrap.shto start services
For vLLM deployments and container deployment method, the script also validates Docker and Docker Compose before deployment. On Ubuntu 24.04, if Docker or Docker Compose is missing, the script attempts automatic installation and starts/enables Docker service.
To skip model verification/download when models are already prepared locally:
./tools/quick_start.sh --skip-model-checkEquivalent environment variable:
export SKIP_MODEL_CHECK=1
./tools/quick_start.shIntel GPU driver/runtime validation can be skipped when needed:
./tools/quick_start.sh --skip-gpu-driver-checkEquivalent environment variables:
export SKIP_INTEL_GPU_DRIVER_CHECK=1
# Or keep validation but disable auto-install:
export AUTO_INSTALL_INTEL_GPU_DRIVER=0
./tools/quick_start.shTo disable automatic npm installation during baremetal preparation:
export AUTO_INSTALL_NPM=0
./tools/quick_start.shAfter startup succeeds, the terminal prints a UI access URL such as:
UI access URL: http://${HOST_IP}:8082
Note: If you set DEPLOYMENT_METHOD=container in advance, the script skips venv and pip checks and continues with container deployment.
You can override defaults with environment variables:
export INFERENCE_BACKEND=openvino
export MODEL_PATH="${PWD}/workspace/models"
export DOC_PATH="${PWD}/workspace"
export TMPFILE_PATH="${PWD}/workspace"
export LLM_MODEL="Qwen/Qwen3-8B"
export HOST_IP="$(hostname -I | awk '{print $1}')"
./tools/quick_start.shSelect the backend with INFERENCE_BACKEND:
# OpenVINO (default)
./tools/quick_start.sh
# vLLM_A770
export INFERENCE_BACKEND=vllm_a770
./tools/quick_start.sh
# vLLM_B60
export INFERENCE_BACKEND=vllm_b60
./tools/quick_start.sh
# OVMS
export INFERENCE_BACKEND=ovms
export OVMS_SOURCE_MODEL=OpenVINO/Qwen3-8B-int4-ov
export OVMS_MODEL_NAME=OpenVINO/Qwen3-8B-int4-ov
export OVMS_TARGET_DEVICE=GPU.0
./tools/quick_start.shFor OVMS deployments, the tooling now exports the compose-facing variables directly. The most commonly overridden ones are OVMS_SOURCE_MODEL, OVMS_MODEL_NAME, OVMS_TARGET_DEVICE, OVMS_TOOL_PARSER, and OVMS_MAX_NUM_BATCHED_TOKENS.
Important OVMS behavior:
OVMS_SOURCE_MODELkeeps your original model ID as-is (for exampleQwen/Qwen3-8B).quick_start.shandbootstrap.shboth persist OVMS variables intoworkspace/bootstrap.envfor reuse.- You can replay the exact OVMS configuration with
source workspace/bootstrap.env && ./tools/bootstrap.sh.
Compatibility note: the legacy environment variable COMPOSE_PROFILES is still accepted, but new configurations should use INFERENCE_BACKEND.
Supported INFERENCE_BACKEND values:
openvinovllm_a770vllm_b60ovms
./tools/quick_start.sh -iInteractive mode is suitable for first-time deployment or when you are not sure about the parameters. After you run ./tools/quick_start.sh -i, the script prompts step by step and generates the deployment configuration for the current run.
The interactive flow typically includes:
- choosing the inference backend: OpenVINO / vLLM_A770 / vLLM_B60 / OVMS
- choosing the deployment method: baremetal / container
- configuring key parameters:
HOST_IP,MODEL_PATH,DOC_PATH,TMPFILE_PATH,LLM_MODEL - confirming the configuration and starting deployment, then printing the access URL at the end
Interactive mode is recommended when:
- this is your first installation and you are not familiar with the environment variables or defaults
- you need to switch quickly between different hardware targets or inference backends
- you want to review parameters before deployment to reduce configuration mistakes
Example:
cd EdgeCraftRAG
./tools/quick_start.sh -iThe following examples show common inputs during the interactive flow. Actual prompt text may vary slightly based on the script.
Inference backend: OpenVINO
Deployment method: baremetal
HOST_IP: 192.168.1.20
MODEL_PATH: /home/scale/edgeai/applications.edge.ai.rag/EdgeCraftRAG/workspace/models
DOC_PATH: /home/scale/edgeai/applications.edge.ai.rag/EdgeCraftRAG/workspace
TMPFILE_PATH: /home/scale/edgeai/applications.edge.ai.rag/EdgeCraftRAG/workspace
LLM_MODEL: Qwen/Qwen3-8B
Confirm deployment: y
Inference backend: vLLM_B60
Deployment method: container
HOST_IP: 192.168.1.20
MODEL_PATH: /home/scale/edgeai/applications.edge.ai.rag/EdgeCraftRAG/workspace/models
DOC_PATH: /home/scale/edgeai/applications.edge.ai.rag/EdgeCraftRAG/workspace
TMPFILE_PATH: /home/scale/edgeai/applications.edge.ai.rag/EdgeCraftRAG/workspace
LLM_MODEL: Qwen/Qwen3-8B
Confirm deployment: y
Inference backend: vLLM_A770
Deployment method: container
HOST_IP: 192.168.1.20
MODEL_PATH: /home/scale/edgeai/applications.edge.ai.rag/EdgeCraftRAG/workspace/models
DOC_PATH: /home/scale/edgeai/applications.edge.ai.rag/EdgeCraftRAG/workspace
TMPFILE_PATH: /home/scale/edgeai/applications.edge.ai.rag/EdgeCraftRAG/workspace
LLM_MODEL: Qwen/Qwen3-8B
Confirm deployment: y
Inference backend: OVMS
Deployment method: container
HOST_IP: 192.168.1.20
MODEL_PATH: /home/scale/edgeai/applications.edge.ai.rag/EdgeCraftRAG/workspace/models
DOC_PATH: /home/scale/edgeai/applications.edge.ai.rag/EdgeCraftRAG/workspace
TMPFILE_PATH: /home/scale/edgeai/applications.edge.ai.rag/EdgeCraftRAG/workspace
LLM_MODEL: Qwen/Qwen3-8B
Confirm deployment: y
Notes:
- for a remote server, set
HOST_IPto an address reachable by the client machine - if you need persistent vector retrieval data, use the container deployment method
- if the device is Intel Arc A770, prefer the
vllm_a770configuration
Cleanup:
./tools/quick_start.sh cleanupRun with environment variables defined in advance:
export INFERENCE_BACKEND=openvino
export DEPLOYMENT_METHOD=baremetal
./tools/bootstrap.shUse defaults (openvino + baremetal):
./tools/bootstrap.shConfiguration reuse:
quick_start.shwritesworkspace/bootstrap.envbefore real deployment starts.bootstrap.shalso persists configuration for reuse.- For OVMS, this includes
OVMS_SOURCE_MODEL,OVMS_MODEL_NAME,OVMS_TARGET_DEVICE,OVMS_TOOL_PARSER, and relatedOVMS_*runtime variables.
source workspace/bootstrap.env
./tools/bootstrap.shBasic usage:
./tools/model_download.sh <mode> [model_id] [model_path]Modes:
vllm: prepare embedding/reranker OpenVINO models + vLLM LLM modelov: prepare embedding/reranker OpenVINO models + OpenVINO INT4 LLM model
Optional arguments:
model_id: overridesLLM_MODELfor current runmodel_path: overridesMODEL_PATHfor current run
Examples:
./tools/model_download.sh vllm
./tools/model_download.sh ov Qwen/Qwen3-8B /data/modelsEnvironment behavior:
- if a virtual environment is already active, it is reused
- otherwise, the script creates/activates
ecrag_venvautomatically (same style asquick_start.sh) - missing
python3-venv/pipprerequisites are installed automatically when supported by the system package manager
You can also call the following scripts directly based on inference backend and deployment method:
- OpenVINO baremetal:
./tools/run_ov_baremetal.sh - OpenVINO container:
./tools/run_ov_container.sh - vLLM baremetal:
./tools/run_vllm_baremetal.sh - vLLM container:
./tools/run_vllm_container.sh - OVMS baremetal:
./tools/run_ovms_baremetal.sh - OVMS container:
./tools/run_ovms_container.sh
This is useful when you already know your parameters and want to skip the one-click onboarding flow.
Build all images:
./tools/build_images.shBuild by component:
./tools/build_images.sh mega
./tools/build_images.sh server
./tools/build_images.sh ui
./tools/build_images.sh allFor complete deployment guidance, see ../docs/Advanced_Setup.md.