The EdgeParse WASM SDK brings the full Rust-native PDF extraction engine directly into the browser. No server round-trips, no file uploads to third-party services, no backend infrastructure required.
Primary goals:
- Client-side PDF parsing — extract text, tables, headings, and structure from PDFs entirely in the browser
- Zero-latency extraction — no network calls; parsing runs locally in the user's browser tab
- Privacy by design — PDF data never leaves the user's device
- Universal deployment — works in any modern browser (Chrome, Firefox, Safari, Edge) via standard WebAssembly
EdgeParse WASM is published to multiple registries and CDNs on every tagged release.
The canonical package is edgeparse-wasm on the public npm registry.
npm install edgeparse-wasm
# or
pnpm add edgeparse-wasm
# or
yarn add edgeparse-wasmPackage page: https://www.npmjs.com/package/edgeparse-wasm
Served automatically from npm. No installation required — useful for prototyping, sandboxes, and static sites.
<!-- Latest release -->
<script type="module">
import init, { convert_to_string } from 'https://cdn.jsdelivr.net/npm/edgeparse-wasm/edgeparse_wasm.js';
await init('https://cdn.jsdelivr.net/npm/edgeparse-wasm/edgeparse_wasm_bg.wasm');
</script>
<!-- Pin to a specific version -->
<script type="module">
import init, { convert_to_string } from 'https://cdn.jsdelivr.net/npm/edgeparse-wasm@0.2.4/edgeparse_wasm.js';
await init('https://cdn.jsdelivr.net/npm/edgeparse-wasm@0.2.4/edgeparse_wasm_bg.wasm');
</script>Alternative CDN also served directly from npm.
<script type="module">
import init, { convert_to_string } from 'https://unpkg.com/edgeparse-wasm@0.2.4/edgeparse_wasm.js';
await init('https://unpkg.com/edgeparse-wasm@0.2.4/edgeparse_wasm_bg.wasm');
</script>For enterprise or GitHub-native workflows, the package is also published to GitHub
Packages under the scoped name @raphaelmansuy/edgeparse-wasm.
Authenticate first (read access requires a GitHub token even for public packages):
# 1. Create a Personal Access Token with read:packages scope
# https://github.com/settings/tokens
# 2. Add the scoped registry to .npmrc
echo "@raphaelmansuy:registry=https://npm.pkg.github.com" >> .npmrc
echo "//npm.pkg.github.com/:_authToken=YOUR_TOKEN" >> .npmrc
# 3. Install
npm install @raphaelmansuy/edgeparse-wasmOr set the token via an environment variable in CI:
echo "@raphaelmansuy:registry=https://npm.pkg.github.com" >> .npmrc
echo "//npm.pkg.github.com/:_authToken=${GITHUB_TOKEN}" >> .npmrc
npm install @raphaelmansuy/edgeparse-wasmPackage page: https://github.com/raphaelmansuy/edgeparse/pkgs/npm/edgeparse-wasm
| Registry | Package name | URL |
|---|---|---|
| npm | edgeparse-wasm |
https://www.npmjs.com/package/edgeparse-wasm |
| jsDelivr CDN | (mirrors npm) | https://cdn.jsdelivr.net/npm/edgeparse-wasm |
| unpkg CDN | (mirrors npm) | https://unpkg.com/edgeparse-wasm |
| GitHub Packages | @raphaelmansuy/edgeparse-wasm |
https://github.com/raphaelmansuy/edgeparse/pkgs/npm/edgeparse-wasm |
| GitHub Releases | .tgz tarball |
https://github.com/raphaelmansuy/edgeparse/releases |
| Factor | Server-side | WASM (client-side) |
|---|---|---|
| Latency | Network round-trip + queue + processing | Instant (local CPU) |
| Privacy | PDF uploaded to server | PDF stays on device |
| Infrastructure | Requires backend, scaling, monitoring | Zero infrastructure |
| Cost | Compute + bandwidth per request | Free (runs on user hardware) |
| Offline | Requires internet | Works offline after initial load |
| Factor | JS libraries (pdf.js, etc.) | EdgeParse WASM |
|---|---|---|
| Table extraction | None or basic | Ruling-line + cluster method |
| Heading detection | None | Numbered + unnumbered hierarchy |
| Reading order | Stream order only | XY-Cut++ algorithm |
| Structured output | Raw text | JSON, Markdown, HTML, plain text |
| AI safety filters | None | Hidden text, off-page, tiny-text, OCG |
- Same engine — identical Rust code runs in WASM and native; same accuracy, same output
- ~4 MB — compressed WASM binary, loaded once and cached by the browser
- No dependencies — no Java, no Python, no ML models, no GPU
- TypeScript types — full
.d.tsdefinitions for IDE autocomplete
The WASM package exports three functions:
Parses PDF bytes and returns a structured JavaScript object (the full PdfDocument model with pages, elements, bounding boxes).
import init, { convert } from 'edgeparse-wasm';
await init(); // Load WASM binary (once)
const response = await fetch('/my-report.pdf');
const bytes = new Uint8Array(await response.arrayBuffer());
const doc = convert(bytes, 'json');
// doc.pages[0].elements → [{type: "heading", text: "...", bbox: {...}}, ...]Parses PDF bytes and returns a formatted string output.
import init, { convert_to_string } from 'edgeparse-wasm';
await init();
const bytes = new Uint8Array(await fetch('/report.pdf').then(r => r.arrayBuffer()));
// Get Markdown
const markdown = convert_to_string(bytes, 'markdown');
// Get HTML
const html = convert_to_string(bytes, 'html');
// Get plain text
const text = convert_to_string(bytes, 'text');
// Get JSON string
const json = convert_to_string(bytes, 'json');Returns the EdgeParse version string.
import { version } from 'edgeparse-wasm';
console.log(version()); // "0.2.4"| Parameter | Type | Default | Description |
|---|---|---|---|
pdfBytes |
Uint8Array |
(required) | Raw PDF file bytes |
format |
string | null |
"json" |
"json", "markdown", "html", "text" |
pages |
string | null |
"all" |
Page range: "all", "1-5", "1,3,7" |
readingOrder |
string | null |
"auto" |
"auto" (XY-Cut++) or "off" |
tableMethod |
string | null |
"default" |
"default" (ruling lines) or "cluster" (borderless) |
// src/App.tsx
import { useRef, useState } from 'react';
// Lazy-import so Vite does not pre-bundle the WASM binary.
async function loadEdgeParse() {
const { default: init, convert_to_string } = await import('edgeparse-wasm');
await init();
return { convert_to_string };
}
export default function App() {
const [output, setOutput] = useState('');
const ep = useRef<Awaited<ReturnType<typeof loadEdgeParse>> | null>(null);
const handleFile = async (e: React.ChangeEvent<HTMLInputElement>) => {
const file = e.target.files?.[0];
if (!file) return;
ep.current ??= await loadEdgeParse();
const bytes = new Uint8Array(await file.arrayBuffer());
setOutput(ep.current.convert_to_string(bytes, 'markdown') ?? '');
};
return (
<>
<input type="file" accept=".pdf" onChange={handleFile} />
<pre>{output}</pre>
</>
);
}// vite.config.ts
import { defineConfig } from 'vite';
import react from '@vitejs/plugin-react';
export default defineConfig({
plugins: [react()],
optimizeDeps: { exclude: ['edgeparse-wasm'] },
build: { target: 'esnext' },
});// app/pdf-extract/page.tsx — client component
'use client';
import { useRef, useState } from 'react';
export default function PdfExtract() {
const [md, setMd] = useState('');
const ready = useRef(false);
const handleFile = async (e: React.ChangeEvent<HTMLInputElement>) => {
const file = e.target.files?.[0];
if (!file) return;
if (!ready.current) {
const { default: init } = await import('edgeparse-wasm');
await init();
ready.current = true;
}
const { convert_to_string } = await import('edgeparse-wasm');
const bytes = new Uint8Array(await file.arrayBuffer());
setMd(convert_to_string(bytes, 'markdown') ?? '');
};
return (
<>
<input type="file" accept=".pdf" onChange={handleFile} />
<pre style={{ whiteSpace: 'pre-wrap' }}>{md}</pre>
</>
);
}// next.config.js
/** @type {import('next').NextConfig} */
module.exports = {
webpack(config) {
config.experiments = { ...config.experiments, asyncWebAssembly: true };
return config;
},
};<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<title>EdgeParse WASM demo</title>
</head>
<body>
<input id="pick" type="file" accept=".pdf" />
<pre id="out"></pre>
<script type="module">
import init, { convert_to_string, version }
from 'https://cdn.jsdelivr.net/npm/edgeparse-wasm@0.2.4/edgeparse_wasm.js';
// Pass the .wasm binary URL explicitly when loading from a CDN.
await init('https://cdn.jsdelivr.net/npm/edgeparse-wasm@0.2.4/edgeparse_wasm_bg.wasm');
console.log('EdgeParse', version());
document.getElementById('pick').addEventListener('change', async (e) => {
const file = e.target.files?.[0];
if (!file) return;
const bytes = new Uint8Array(await file.arrayBuffer());
document.getElementById('out').textContent =
convert_to_string(bytes, 'markdown');
});
</script>
</body>
</html>// webpack.config.js
module.exports = {
experiments: { asyncWebAssembly: true },
};// sw.js
const CACHE = 'edgeparse-v1';
self.addEventListener('install', event => {
event.waitUntil(
caches.open(CACHE).then(cache =>
cache.addAll([
'/edgeparse_wasm.js',
'/edgeparse_wasm_bg.wasm',
])
)
);
});
self.addEventListener('fetch', event => {
event.respondWith(
caches.match(event.request).then(r => r ?? fetch(event.request))
);
});Build a web app where users drag-and-drop PDFs and instantly see extracted Markdown, JSON, or HTML — without any server. Ideal for document review tools, note-taking apps, and research assistants.
// In your file upload handler
fileInput.addEventListener('change', async (e) => {
const file = (e.target as HTMLInputElement).files?.[0];
if (!file) return;
const bytes = new Uint8Array(await file.arrayBuffer());
const markdown = convert_to_string(bytes, 'markdown');
document.getElementById('output')!.textContent = markdown;
});Prepare PDF content for retrieval-augmented generation (RAG) pipelines directly in the browser. Extract structured chunks before sending them to an embedding API — only the text leaves the device, never the full PDF.
const doc = convert(bytes, 'json');
// Extract chunks for embedding
const chunks = doc.pages.flatMap(page =>
page.elements
.filter(el => el.type === 'paragraph' || el.type === 'heading')
.map(el => ({
text: el.text,
page: page.page_number,
bbox: el.bbox,
}))
);
// Send only text chunks to your embedding API
const embeddings = await fetch('/api/embed', {
method: 'POST',
body: JSON.stringify({ chunks: chunks.map(c => c.text) }),
});Build Progressive Web Apps (PWAs) that work without internet. Once the WASM binary is cached by the service worker, PDF extraction works entirely offline.
Process confidential documents (medical records, legal contracts, financial statements) without sending data to any server. The PDF never leaves the browser tab.
Deploy PDF conversion tools on static hosting (GitHub Pages, Netlify, Vercel) with zero backend costs. The entire application is client-side JavaScript + WASM.
Build a Chrome/Firefox extension that extracts structured content from any PDF the user opens, adding copy-as-Markdown or export-to-JSON functionality.
Add PDF extraction as a feature in your web application without provisioning additional backend compute. Each user's browser handles its own PDF processing.
# Install wasm-pack
curl https://rustwasm.github.io/wasm-pack/installer/init.sh -sSf | sh
# Build the WASM package (output goes to crates/edgeparse-wasm/pkg/)
cd crates/edgeparse-wasm
wasm-pack build --target web --release# Option 1: Install from local path
npm install ./crates/edgeparse-wasm/pkg
# Option 2: Copy the pkg/ contents into your project
cp -r crates/edgeparse-wasm/pkg/ my-app/src/edgeparse-wasm/Try EdgeParse WASM in your browser: edgeparse.com/demo/
The demo lets you:
- Upload or drag-and-drop any PDF
- View extracted content in Markdown, HTML, JSON, or plain text
- Preview rendered Markdown output
- See per-page PDF rendering alongside extracted content
- All processing happens locally — no server, no uploads