Skip to content

Commit dc81e2d

Browse files
committed
feat: add WASM SDK docs, demo integration, and site updates
- Add docs/09-wasm-sdk.md with objectives, advantages, API ref, and use cases - Update README.md with WebAssembly SDK section, features, and project layout - Add site pages: quick-start-wasm, api/wasm, guides/wasm-use-cases - Add WebAssembly tab to site landing page QuickStart - Add Live Demo link to site hero and footer - Integrate demo/ build into site/public/demo/ for GitHub Pages - Update deploy-site workflow to build demo before site - Add DEMO_BASE_PATH env var to demo vite config for /demo/ base path
1 parent c7c9d92 commit dc81e2d

16 files changed

Lines changed: 1115 additions & 51 deletions

File tree

.github/workflows/deploy-site.yml

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@ on:
66
- main
77
paths:
88
- 'site/**'
9+
- 'demo/**'
10+
- 'crates/edgeparse-wasm/pkg/**'
911
- '.github/workflows/deploy-site.yml'
1012
workflow_dispatch:
1113

@@ -43,6 +45,18 @@ jobs:
4345
working-directory: site
4446
run: pnpm install --frozen-lockfile
4547

48+
- name: Build demo app
49+
working-directory: demo
50+
run: |
51+
npm ci
52+
DEMO_BASE_PATH=/demo/ npx vite build
53+
54+
- name: Copy demo into site
55+
run: |
56+
rm -rf site/public/demo
57+
mkdir -p site/public/demo
58+
cp -r demo/dist/* site/public/demo/
59+
4660
- name: Build site
4761
working-directory: site
4862
run: pnpm run build

README.md

Lines changed: 70 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010

1111
EdgeParse converts any digital PDF into Markdown, JSON (with bounding boxes), HTML, or plain text — deterministically, without a JVM, without a GPU, without OCR models, and with **best-in-class accuracy** among non-OCR tools on the 200-document benchmark suite included in this repository.
1212

13-
Available as a **Rust library**, **CLI binary**, **Python package** (`edgeparse`), and **Node.js package** (`edgeparse`).
13+
Available as a **Rust library**, **CLI binary**, **Python package** (`edgeparse`), **Node.js package** (`edgeparse`), and **WebAssembly module** for in-browser PDF parsing.
1414

1515
---
1616

@@ -23,6 +23,7 @@ Available as a **Rust library**, **CLI binary**, **Python package** (`edgeparse`
2323
- [CLI Reference](#cli-reference)
2424
- [Python SDK](#python-sdk)
2525
- [Node.js SDK](#nodejs-sdk)
26+
- [WebAssembly SDK](#webassembly-sdk)
2627
- [Architecture](#architecture)
2728
- [Benchmark](#benchmark)
2829
- [Why it matters](#why-it-matters)
@@ -54,6 +55,7 @@ Available as a **Rust library**, **CLI binary**, **Python package** (`edgeparse`
5455
| Markdown, JSON, HTML, plain-text output ||
5556
| Python SDK (PyO3 native extension) ||
5657
| Node.js SDK (NAPI-RS native addon) ||
58+
| WebAssembly SDK (in-browser PDF parsing) ||
5759
| Batch processing API ||
5860
| Hybrid backend support (Docling-Fast) ||
5961
| Zero JVM dependency ||
@@ -388,6 +390,68 @@ npx edgeparse report.pdf --format json --pages "1-5"
388390

389391
---
390392

393+
## WebAssembly SDK
394+
395+
EdgeParse compiles to WebAssembly, enabling **client-side PDF extraction in any modern browser**no server, no uploads, no backend infrastructure.
396+
397+
**Key advantages:**
398+
- Same Rust engine, same accuracyidentical output to CLI/Python/Node
399+
- PDF data never leaves the user's device (privacy by design)
400+
- Works offline after initial WASM load (~4 MB cached)
401+
- Zero infrastructure costdeploy on static hosting
402+
403+
### Quick start
404+
405+
```typescript
406+
import init, { convert_to_string } from '@edgeparse/edgeparse-wasm';
407+
408+
// Load WASM binary (once)
409+
await init();
410+
411+
// Read PDF file from user upload or fetch
412+
const bytes = new Uint8Array(await file.arrayBuffer());
413+
414+
// Extract Markdown
415+
const markdown = convert_to_string(bytes, 'markdown');
416+
417+
// Extract structured JSON
418+
const json = convert_to_string(bytes, 'json');
419+
420+
// Extract HTML
421+
const html = convert_to_string(bytes, 'html');
422+
```
423+
424+
### API
425+
426+
| Function | Returns | Description |
427+
|----------|---------|-------------|
428+
| `convert(bytes, format?, pages?, readingOrder?, tableMethod?)` | JS object | Structured `PdfDocument` with pages, elements, bounding boxes |
429+
| `convert_to_string(bytes, format?, pages?, readingOrder?, tableMethod?)` | `string` | Formatted output (Markdown, JSON, HTML, or text) |
430+
| `version()` | `string` | EdgeParse version |
431+
432+
### Build from source
433+
434+
```bash
435+
# Install wasm-pack
436+
curl https://rustwasm.github.io/wasm-pack/installer/init.sh -sSf | sh
437+
438+
# Build WASM package
439+
cd crates/edgeparse-wasm
440+
wasm-pack build --target web --release
441+
```
442+
443+
Output goes to `crates/edgeparse-wasm/pkg/`. Use it locally or publish to npm.
444+
445+
### Live demo
446+
447+
Try EdgeParse WASM in your browser: **[edgeparse.com/demo/](https://edgeparse.com/demo/)**
448+
449+
Drag-and-drop any PDF and see extracted Markdown, JSON, HTML, or plain textall processing runs locally in your browser.
450+
451+
Full documentation: [docs/09-wasm-sdk.md](docs/09-wasm-sdk.md)
452+
453+
---
454+
391455
## Architecture
392456

393457
### Crate structure
@@ -571,6 +635,7 @@ Technical documentation lives in [`docs/`](docs/):
571635
| [docs/06-sdk-integration.md](docs/06-sdk-integration.md) | CLI flag reference, Python SDK API, Node.js SDK API, Batch API |
572636
| [docs/07-cicd-publishing.md](docs/07-cicd-publishing.md) | CI/CD publishing pipelinehow it works and how to configure it |
573637
| [docs/08-agent-skill.md](docs/08-agent-skill.md) | EdgeParse agent skill`npx skills add`, SKILL.md structure, SDK patterns |
638+
| [docs/09-wasm-sdk.md](docs/09-wasm-sdk.md) | WebAssembly SDKobjectives, API, use cases, build instructions |
574639

575640
---
576641

@@ -587,7 +652,7 @@ edgeparse/
587652
├── crates/
588653
│ ├── pdf-cos/ # lopdf 0.39 fork — low-level PDF object model
589654
│ ├── edgeparse-core/ # Core extraction engine (~90 source files)
590-
│ ├── edgeparse-cli/ # CLI binary (clap, 25+ flags)
655+
│ ├── edgeparse-cli/ # CLI binary (clap, 25+ flags)│ ├── edgeparse-wasm/ # WebAssembly build for browsers│ ├── edgeparse-wasm/ # WebAssembly build for browsers (wasm-bindgen)
591656
│ ├── edgeparse-python/ # PyO3 native Python extension
592657
│ └── edgeparse-node/ # NAPI-RS native Node.js addon
593658
@@ -610,6 +675,9 @@ edgeparse/
610675
611676
├── docs/ # Technical documentation (Markdown)
612677
678+
├── demo/ # Interactive WASM demo (Vite + TypeScript)
679+
│ └── src/ # Demo application source
680+
613681
├── examples/
614682
│ └── pdf/ # Sample PDFs for quick testing
615683
│ ├── lorem.pdf

demo/src/components/format-tabs.ts

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,10 @@
1-
/** FormatTabs — JSON | Markdown | HTML | Text tab switcher. */
1+
/** FormatTabs — JSON | Markdown | HTML | Text tab switcher.
2+
*
3+
* When a format is being rendered (async Markdown/HTML conversion), the active
4+
* tab shows an animated pulse dot via the `--loading` modifier class. This
5+
* gives the user immediate feedback that the tab click was registered and work
6+
* is happening — without blanking the content pane.
7+
*/
28

39
import { el } from '../utils/dom';
410
import { store } from '../state';
@@ -28,6 +34,7 @@ export function createFormatTabs(): HTMLElement {
2834
return btn;
2935
});
3036

37+
// Sync active-tab styling whenever outputFormat changes.
3138
store.subscribe('outputFormat', (value) => {
3239
const current = value as OutputFormat;
3340
for (const btn of buttons) {
@@ -37,6 +44,17 @@ export function createFormatTabs(): HTMLElement {
3744
}
3845
});
3946

47+
// Show a pulse dot on the active tab while an async render is in progress.
48+
// This fires when the OutputViewer sets store.renderStatus = 'rendering'.
49+
store.subscribe('renderStatus', (value) => {
50+
const isRendering = value === 'rendering';
51+
const currentFormat = store.get('outputFormat');
52+
for (const btn of buttons) {
53+
const isActive = btn.dataset.format === currentFormat;
54+
btn.classList.toggle('format-tabs__tab--loading', isActive && isRendering);
55+
}
56+
});
57+
4058
// Set initial state
4159
const initial = store.get('outputFormat');
4260
for (const btn of buttons) {

0 commit comments

Comments
 (0)