The FastAPI app is created by visorag.api.app:create_app.
POST /query requires a bearer token matching VISORAG_API_TOKEN.
Authorization: Bearer change-meReturns service metadata.
Returns:
{
"status": "ok",
"retrieval_model_loaded": false,
"vl_model_loaded": false,
"device": "cpu",
"vl_model": "Qwen/Qwen2.5-VL-3B-Instruct"
}Model flags are false until the lazy-loading model paths have been used.
Multipart form fields:
| Field | Required | Notes |
|---|---|---|
file |
Yes | PDF, DOCX, PNG, JPG, or JPEG |
query |
Yes | User question or extraction instruction |
query_type |
No | factual, summary, or extraction; defaults to factual |
document_type |
No | Defaults to the uploaded file extension in the API |
top_k |
No | Integer from 1 to 20; defaults to 5 |
Runtime limits:
- Uploads default to 25 MB.
- Documents default to 200 rendered pages maximum.
- Queries are serialized by a process-level lock.
- Qdrant state is in memory and scoped to the request.
Example:
curl -X POST "http://127.0.0.1:8000/query" \
-H "Authorization: Bearer $VISORAG_API_TOKEN" \
-F "file=@/path/to/document.pdf" \
-F "query=What is the invoice total?" \
-F "query_type=factual" \
-F "top_k=5"Factual and summary responses:
{"answer": "The total due is 6610.95."}Extraction responses are flat JSON:
{
"invoice_number": "BPXIN-00550",
"total_due": 6610.95
}Errors:
API validation and authentication errors can return only error and detail because the pipeline has not started:
{"error": "unauthorized", "detail": "Missing Bearer token."}Pipeline errors include a request_id:
{
"request_id": "uuid",
"error": "invalid_top_k",
"detail": "top_k must be between 1 and 20."
}| Code | Meaning |
|---|---|
200 |
Successful health check, service metadata, factual/summary answer, or extraction JSON. |
400 |
Missing query/file, unsupported file type, invalid top_k, invalid document/query type, or document type mismatch. |
401 |
Missing or invalid bearer token. |
413 |
Upload exceeds VISORAG_MAX_UPLOAD_BYTES. |
422 |
Document rendering produced no pages or retrieval returned no pages. |
500 |
Embedding, Qdrant, or unexpected pipeline failure. |
503 |
Local Qwen2.5-VL inference failed or is unavailable. |