multimodal-doc-intake-kit

Schema-first document intake service for turning PDF, DOCX, and image inputs into:

validated typed records
retrieval-ready chunks with provenance
operator review tasks for ambiguous fields
layout Markdown and embedding artifacts for downstream RAG pipelines

The repository ships with two execution modes:

deterministic: in-memory, test-friendly, no network calls
azure: live OCR/layout + LLM normalization + chunk embeddings

Pipeline

[local file | URL]
        |
        v
[Azure Document Intelligence: prebuilt-layout]
        |
        v
[gpt-5-mini normalization by profile]
        |
        +--> [review queue for mid-confidence fields]
        |
        v
[chunk builder]
        |
        v
[text-embedding-3-small]

The current live configuration uses:

prebuilt-layout for OCR, table extraction, and layout Markdown
gpt-5-mini for field normalization
gpt-5.4 as the intended adjudication tier for harder follow-up tasks
text-embedding-3-small for retrieval vectors

Profiles

contract_v1
- counterparty_name
- effective_date
- termination_notice_days
invoice_v1
- invoice_number
- currency
- total_due_cents
- vendor_name
manual_v1
- document_title
- revision
- requires_review_cycle

Included Demo Inputs

Generated demo documents live under demo/input:

The image input is intentionally noisy enough to trigger a realistic review path:

Live Demo Output

The repository includes captured outputs from a live Azure-backed run in demo/output.

Summary:

[
  {
    "document_id": "contract-live-2026-04-15",
    "profile": "contract_v1",
    "status": "exported",
    "trace": ["prebuilt-layout", "gpt-5-mini", "text-embedding-3-small"]
  },
  {
    "document_id": "invoice-live-2026-04-15",
    "profile": "invoice_v1",
    "status": "exported",
    "trace": ["prebuilt-layout", "gpt-5-mini", "text-embedding-3-small"]
  },
  {
    "document_id": "manual-live-2026-04-15",
    "profile": "manual_v1",
    "status": "exported",
    "trace": ["prebuilt-layout", "gpt-5-mini", "text-embedding-3-small"]
  }
]

Invoice extraction excerpt:

{
  "invoice_number": "INV-2048-APR",
  "currency": "USD",
  "total_due_cents": 1284450,
  "vendor_name": "Cascade Field Services"
}

Manual scan review path:

{
  "revision": {
    "initial_value": "17",
    "review_action": "correct",
    "corrected_value": "r7"
  }
}

Contract chunk embedding artifact excerpt:

{
  "chunk_id": "de55ec321afe71e63cc5e9a5a85ccef3e159b23c10293e1e999b39c1e8a252d6",
  "dimensions": 1536,
  "vector_preview": [-0.032795, 0.068723, 0.097257, 0.019036]
}

API

POST /api/ingest
GET /api/documents/{document_id}
GET /api/artifacts/{document_id}
POST /api/review/{document_id}
GET /api/exports/{document_id}
GET /healthz

Example ingest request:

curl -X POST http://127.0.0.1:8000/api/ingest \
  -H "Content-Type: application/json" \
  -d '{
    "document_id": "invoice-live-2026-04-15",
    "source": {
      "uri": "/absolute/path/to/field-operations-invoice.pdf",
      "mime_type": "application/pdf",
      "checksum_sha256": "f349c0377448806316fed1124258ef77126a4f30506fa2f972fa1ba0644ee71e"
    },
    "profile": "invoice_v1",
    "locale": "en-US",
    "submitted_at": "2026-04-15T14:20:00Z",
    "metadata": {
      "tenant": "lab-demo"
    }
  }'

Run

Prerequisites:

Python 3.9+
uv

Install:

uv sync --extra dev --extra demo

Deterministic mode:

uv run uvicorn multimodal_doc_intake_kit.main:app --app-dir src --reload

Live Azure mode:

export DOC_INTAKE_PROVIDER=azure
export AZURE_DOCINTELLIGENCE_ENDPOINT="https://<resource>.cognitiveservices.azure.com/"
export AZURE_DOCINTELLIGENCE_API_KEY="<key>"
export AZURE_OPENAI_ENDPOINT="https://<resource>.openai.azure.com/"
export AZURE_OPENAI_API_KEY="<key>"
export AZURE_OPENAI_API_VERSION="2025-04-01-preview"
export AZURE_OPENAI_CHAT_DEPLOYMENT="gpt-5-mini"
export AZURE_OPENAI_REASONING_DEPLOYMENT="gpt-5.4"
export AZURE_OPENAI_EMBEDDING_DEPLOYMENT="text-embedding-3-small"

uv run uvicorn multimodal_doc_intake_kit.main:app --app-dir src --reload

Reproduce The Demo

Generate the sample files:

uv run python scripts/generate_demo_inputs.py

Run the live end-to-end pipeline and write captured artifacts:

uv run python scripts/run_live_demo.py

Artifacts written:

Azure Foundry Notes

Deployment and provider instructions live in docs/azure-foundry.md.

Short version:

keep OCR/layout on Azure Document Intelligence
keep schema normalization on a current chat/reasoning deployment (gpt-5-mini here)
keep embeddings on text-embedding-3-small
add a second reasoning tier only if you need explicit adjudication or post-review synthesis

Equivalent Provider Stack

The Azure implementation is the only provider wired in code today. Equivalent stacks for the same design:

Google Cloud
- OCR/layout: Document AI layout parser or Enterprise Document OCR
- normalization: gemini-2.5-pro
- embeddings: gemini-embedding-001
OpenAI-compatible stack
- OCR/layout: external OCR service of choice
- normalization: current structured-output capable model
- embeddings: a small retrieval embedding model

Project Layout

.
├── demo/
│   ├── input/
│   └── output/
├── docs/
│   ├── architecture.md
│   ├── azure-foundry.md
│   ├── implementation-plan.md
│   └── pipeline-notes.md
├── schemas/
├── scripts/
│   ├── generate_demo_inputs.py
│   └── run_live_demo.py
├── src/multimodal_doc_intake_kit/
│   ├── azure_backend.py
│   ├── config.py
│   ├── main.py
│   ├── models.py
│   ├── pipeline.py
│   ├── service.py
│   └── store.py
└── tests/

Test

uv run pytest -q

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
demo		demo
docs		docs
examples		examples
schemas		schemas
scripts		scripts
src/multimodal_doc_intake_kit		src/multimodal_doc_intake_kit
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

multimodal-doc-intake-kit

Pipeline

Profiles

Included Demo Inputs

Live Demo Output

API

Run

Reproduce The Demo

Azure Foundry Notes

Equivalent Provider Stack

Project Layout

Test

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

multimodal-doc-intake-kit

Pipeline

Profiles

Included Demo Inputs

Live Demo Output

API

Run

Reproduce The Demo

Azure Foundry Notes

Equivalent Provider Stack

Project Layout

Test

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages