|
| 1 | +# edgeparse |
| 2 | + |
| 3 | +> High-performance PDF extraction for Node.js — Rust engine, JavaScript/TypeScript interface. |
| 4 | +
|
| 5 | +[](https://www.npmjs.com/package/edgeparse) |
| 6 | +[](https://github.com/raphaelmansuy/edgeparse/blob/main/LICENSE) |
| 7 | +[](https://github.com/raphaelmansuy/edgeparse) |
| 8 | + |
| 9 | +EdgeParse converts PDF documents to Markdown, JSON, HTML, or plain text. It is powered by a native Rust engine (via N-API) with pre-built binaries — no compilation required. |
| 10 | + |
| 11 | +## Install |
| 12 | + |
| 13 | +```bash |
| 14 | +npm install edgeparse |
| 15 | +# or |
| 16 | +pnpm add edgeparse |
| 17 | +# or |
| 18 | +yarn add edgeparse |
| 19 | +``` |
| 20 | + |
| 21 | +Pre-built binaries are available for: |
| 22 | + |
| 23 | +| Platform | Architecture | |
| 24 | +|---|---| |
| 25 | +| macOS | x64, arm64 (Apple Silicon) | |
| 26 | +| Linux | x64-gnu, arm64-gnu | |
| 27 | +| Windows | x64-msvc | |
| 28 | + |
| 29 | +## Quick Start |
| 30 | + |
| 31 | +```typescript |
| 32 | +import { convert } from 'edgeparse'; |
| 33 | + |
| 34 | +// Convert a PDF to Markdown |
| 35 | +const markdown = convert('report.pdf'); |
| 36 | +console.log(markdown); |
| 37 | + |
| 38 | +// Convert to JSON |
| 39 | +const json = convert('report.pdf', { format: 'json' }); |
| 40 | + |
| 41 | +// Convert specific pages to HTML |
| 42 | +const html = convert('report.pdf', { |
| 43 | + format: 'html', |
| 44 | + pages: [0, 1, 2], // pages 1–3 (0-indexed) |
| 45 | +}); |
| 46 | + |
| 47 | +// Password-protected PDF |
| 48 | +const text = convert('secure.pdf', { |
| 49 | + format: 'markdown', |
| 50 | + password: 'secret', |
| 51 | +}); |
| 52 | +``` |
| 53 | + |
| 54 | +## API |
| 55 | + |
| 56 | +### `convert(inputPath, options?): string` |
| 57 | + |
| 58 | +Converts a PDF file and returns the content as a string. |
| 59 | + |
| 60 | +| Parameter | Type | Description | |
| 61 | +|---|---|---| |
| 62 | +| `inputPath` | `string` | Absolute or relative path to the PDF file | |
| 63 | +| `options.format` | `'markdown' \| 'json' \| 'html' \| 'text'` | Output format (default: `'markdown'`) | |
| 64 | +| `options.pages` | `number[]` | Zero-indexed page numbers to extract (default: all) | |
| 65 | +| `options.password` | `string` | Password for encrypted PDFs | |
| 66 | +| `options.readingOrder` | `'xycut' \| 'default'` | Reading order algorithm (default: `'xycut'`) | |
| 67 | +| `options.tableMethod` | `'border' \| 'cluster'` | Table detection method (default: `'border'`) | |
| 68 | +| `options.imageOutput` | `'embedded' \| 'external' \| 'none'` | Image handling (default: `'none'`) | |
| 69 | + |
| 70 | +### `version(): string` |
| 71 | + |
| 72 | +Returns the edgeparse engine version string. |
| 73 | + |
| 74 | +```typescript |
| 75 | +import { version } from 'edgeparse'; |
| 76 | +console.log(version()); // e.g. "0.1.1" |
| 77 | +``` |
| 78 | + |
| 79 | +## CLI |
| 80 | + |
| 81 | +The package also ships an `edgeparse` CLI binary: |
| 82 | + |
| 83 | +```bash |
| 84 | +npx edgeparse document.pdf |
| 85 | +npx edgeparse document.pdf --format json |
| 86 | +npx edgeparse document.pdf --format html --output output/ |
| 87 | +``` |
| 88 | + |
| 89 | +## TypeScript |
| 90 | + |
| 91 | +Full TypeScript support is included — no `@types` package needed. |
| 92 | + |
| 93 | +```typescript |
| 94 | +import { convert, version } from 'edgeparse'; |
| 95 | +import type { ConvertOptions } from 'edgeparse'; |
| 96 | +``` |
| 97 | + |
| 98 | +## Performance |
| 99 | + |
| 100 | +EdgeParse consistently processes **40+ pages/second** on a modern machine and achieves **88%+ extraction accuracy** on diverse real-world PDFs — dramatically faster than Python-based alternatives. |
| 101 | + |
| 102 | +## Links |
| 103 | + |
| 104 | +- [GitHub](https://github.com/raphaelmansuy/edgeparse) |
| 105 | +- [Documentation](https://edgeparse.com) |
| 106 | +- [PyPI (Python)](https://pypi.org/project/edgeparse/) |
| 107 | +- [crates.io (Rust CLI)](https://crates.io/crates/edgeparse-cli) |
| 108 | +- [crates.io (Rust Core)](https://crates.io/crates/edgeparse-core) |
| 109 | + |
| 110 | +## License |
| 111 | + |
| 112 | +Apache-2.0 — see [LICENSE](https://github.com/raphaelmansuy/edgeparse/blob/main/LICENSE). |
0 commit comments