📄 doc-sdk is a developer-first, ultra-lightweight TypeScript SDK that standardizes integrating visual language models (VLMs) across supported providers.
🪶 Lightweight: Tiny core, tree-shakeable, zero runtime config for the happy path.
🔌 Provider-Agnostic: Swap between ocrbase, mistral, llamaparse, and azure via a single import.
🧬 Type-Safe Extraction: First-class Zod schemas — get validated, typed JSON out of any document.
parse()— turn a document into textextract()— extract structured JSON from a document, with Zod-typed outputbatchParse()— batch parsebatchExtract()— batch extract
| Provider | Price ($ / 1k pages) |
|---|---|
| ocrbase | 1$ |
| Mistral OCR | 2$ |
| LlamaParse | 3.75$ |
| Azure Doc Intelligence | 10–30$ |
| AWS Textract | 15–50$ |
| Extend AI* | 10–15$ |
| Reducto* | 5–10$ |
| Unstructured* | 30$ |
*Sales call required. Only ocrbase is available today — other providers are coming soon.
bun i document-sdkParse a document:
import { parse } from "document-sdk";
const { text } = await parse({
file: "invoice.pdf",
});Extract structured data:
import { extract, Output } from "document-sdk";
import { z } from "zod";
const { output } = await extract({
file: "invoice.pdf",
output: Output.object({
schema: z.object({
total: z.number(),
vendor: z.string(),
}),
}),
});Set your provider credentials in .env eg. using:
OCRBASE_API_KEY=By default, doc-sdk auto-resolves @doc-sdk/ocrbase if installed. Override via pass a model per call:
import { parse } from "document-sdk";
import { ocrbase } from "@doc-sdk/ocrbase";
await parse({
model: ocrbase("paddleocr"),
file: "invoice.pdf",
});import { parse } from "document-sdk";
import { mistral } from "@doc-sdk/mistral";
await parse({
model: mistral("mistral-ocr-latest"),
file: "invoice.pdf",
});