Real inference requires a CUDA runtime and GPU dependencies. Verify:
python -c "import torch; print(torch.cuda.is_available())"If this prints False, run only the mocked tests or move to a CUDA environment.
Reduce the maximum input pixels:
export VISORAG_QWEN_MAX_PIXELS=501760Use fewer retrieved pages with --top-k, or test on shorter documents.
DOCX conversion requires LibreOffice:
soffice --versionInstall LibreOffice and ensure soffice is on PATH.
Set and pass the same bearer token:
export VISORAG_API_TOKEN="change-me"
python -m visorag query --token "$VISORAG_API_TOKEN" ...top_k must be an integer from 1 to 20.
The default upload limit is 25 MB. Change VISORAG_MAX_UPLOAD_BYTES only if the runtime has enough CPU, disk, and GPU memory for the larger request.
The package is intended to import without loading models. If import allocates CUDA memory or downloads models, inspect recent changes to src/visorag/features/visual_retrieval.py and src/visorag/features/answer_generation.py for eager model loading.