This repository uses Azure in three roles:
- Azure Document Intelligence for OCR, structure extraction, and layout Markdown
- Azure OpenAI / Foundry deployments for profile-specific field normalization
- Azure OpenAI embeddings for retrieval vectors
Minimum useful set:
gpt-5-minitext-embedding-3-small
Recommended set for this repository:
gpt-5-minigpt-5.4text-embedding-3-small
Optional additions:
mistral-document-ai-2512- useful if you want to compare Document Intelligence against a second document-specialized parser
Cohere-rerank-v4.0-fast- useful when the project grows into retrieval/reranking benchmarks
Example embedding deployment:
az cognitiveservices account deployment create \
--resource-group "<resource-group>" \
--name "<foundry-resource-name>" \
--deployment-name "text-embedding-3-small" \
--model-name "text-embedding-3-small" \
--model-version "1" \
--model-format "OpenAI" \
--sku-name "GlobalStandard" \
--sku-capacity 10Check current deployments:
az cognitiveservices account deployment list \
--resource-group "<resource-group>" \
--name "<foundry-resource-name>" \
-o tableCheck available models on the resource:
az cognitiveservices account list-models \
--resource-group "<resource-group>" \
--name "<foundry-resource-name>" \
-o tableexport DOC_INTAKE_PROVIDER=azure
export AZURE_DOCINTELLIGENCE_ENDPOINT="https://<resource>.cognitiveservices.azure.com/"
export AZURE_DOCINTELLIGENCE_API_KEY="<key>"
export AZURE_OPENAI_ENDPOINT="https://<resource>.openai.azure.com/"
export AZURE_OPENAI_API_KEY="<key>"
export AZURE_OPENAI_API_VERSION="2025-04-01-preview"
export AZURE_OPENAI_CHAT_DEPLOYMENT="gpt-5-mini"
export AZURE_OPENAI_REASONING_DEPLOYMENT="gpt-5.4"
export AZURE_OPENAI_EMBEDDING_DEPLOYMENT="text-embedding-3-small"The live path in src/multimodal_doc_intake_kit/azure_backend.py does the following:
- submits the source document to
prebuilt-layout - requests Markdown output
- sends the layout output plus profile instructions to the chat deployment
- validates the returned fields into the local Pydantic contracts
- chunks the document
- creates embeddings for each chunk
The implementation is intentionally conservative:
- it keeps OCR/layout separate from schema normalization
- it keeps embeddings separate from extraction
- it preserves a deterministic mode for tests and contract stability
Use gpt-5.4 instead of gpt-5-mini when you need:
- cross-page adjudication across large packets
- explicit review note generation
- richer resolution of ambiguous handwritten or low-quality scans
- harder post-processing logic over extracted evidence
Use text-embedding-3-large instead of text-embedding-3-small when:
- retrieval quality matters more than cost/latency
- chunk count is small
- you are benchmarking downstream search quality
Keep gpt-4o out of new setups here. It still works for many workloads, but it is not the model this repo should lead with.
Closest equivalent pipeline:
- OCR/layout: Google Document AI
- normalization:
gemini-2.5-pro - embeddings:
gemini-embedding-001
Suggested mapping:
Azure Document Intelligence -> Google Document AI
gpt-5-mini / gpt-5.4 -> Gemini 2.5 Pro
text-embedding-3-small -> gemini-embedding-001
If you are not on Azure or Google Cloud, keep the architecture split:
- provider A: OCR/layout
- provider B: schema normalization with structured outputs or tool calling
- provider C: embeddings
That separation is more robust than asking one multimodal model to do all three jobs in one shot.