Skip to content

Latest commit

 

History

History
161 lines (113 loc) · 3.97 KB

File metadata and controls

161 lines (113 loc) · 3.97 KB
title Quick Start: WebAssembly
description Use EdgeParse directly in the browser via WebAssembly — no server required.

Overview

EdgeParse compiles to WebAssembly, enabling client-side PDF extraction in any modern browser. The same Rust engine that powers the CLI, Python, and Node.js SDKs runs locally in the user's browser tab.

Key properties:

  • PDF data never leaves the user's device
  • Works offline after initial WASM load (~4 MB)
  • Same accuracy as the native CLI
  • Zero backend infrastructure

Build the WASM package

# Install wasm-pack (one-time)
curl https://rustwasm.github.io/wasm-pack/installer/init.sh -sSf | sh

# Build for browser use
cd crates/edgeparse-wasm
wasm-pack build --target web --release

Output goes to crates/edgeparse-wasm/pkg/.

Install in your project

# Option 1: Link locally
npm install ./path-to/crates/edgeparse-wasm/pkg

# Option 2: Copy pkg/ into your project
cp -r crates/edgeparse-wasm/pkg/ src/edgeparse-wasm/

Basic usage

import init, { convert_to_string } from '@edgeparse/edgeparse-wasm';

// Load WASM binary (call once at startup)
await init();

// Read a PDF file (from user upload, fetch, etc.)
const response = await fetch('/my-report.pdf');
const bytes = new Uint8Array(await response.arrayBuffer());

// Extract Markdown
const markdown = convert_to_string(bytes, 'markdown');
console.log(markdown);

// Extract structured JSON
const json = convert_to_string(bytes, 'json');

// Extract HTML
const html = convert_to_string(bytes, 'html');

// Extract plain text
const text = convert_to_string(bytes, 'text');

Handle user file uploads

import init, { convert_to_string } from '@edgeparse/edgeparse-wasm';

await init();

const fileInput = document.getElementById('pdf-input') as HTMLInputElement;

fileInput.addEventListener('change', async () => {
  const file = fileInput.files?.[0];
  if (!file) return;

  const bytes = new Uint8Array(await file.arrayBuffer());
  const markdown = convert_to_string(bytes, 'markdown');

  document.getElementById('output')!.textContent = markdown;
});

Get structured document data

Use convert_to_string() with 'json' format and JSON.parse() to get the same structured schema as the Python/Node.js SDKs — a flat kids array with element objects:

import init, { convert_to_string } from '@edgeparse/edgeparse-wasm';

await init();

const bytes = new Uint8Array(await file.arrayBuffer());

// Parse JSON string — same schema as Python/Node.js SDK
const doc = JSON.parse(convert_to_string(bytes, 'json'));

// Access structured data
for (const el of doc.kids) {
  console.log(`[${el.type}] page ${el['page number']}: ${el.content ?? ''}`);
}

Page range selection

// Parse only pages 1-5
const markdown = convert_to_string(bytes, 'markdown', '1-5');

// Parse specific pages
const json = convert_to_string(bytes, 'json', '1,3,7');

Table extraction methods

// Default: ruling-line detection (best for tables with borders)
const md1 = convert_to_string(bytes, 'markdown', 'all', 'auto', 'default');

// Cluster method: for borderless tables
const md2 = convert_to_string(bytes, 'markdown', 'all', 'auto', 'cluster');

Vite configuration

// vite.config.ts
import { defineConfig } from 'vite';

export default defineConfig({
  optimizeDeps: {
    exclude: ['@edgeparse/edgeparse-wasm'],
  },
  build: {
    target: 'esnext',
  },
});

Webpack configuration

// webpack.config.js
module.exports = {
  experiments: {
    asyncWebAssembly: true,
  },
};

Live demo

Try EdgeParse WASM in your browser: edgeparse.com/demo/

Upload any PDF and see extracted Markdown, JSON, HTML, or text — all processing runs locally.

Next steps