Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

edgeparse

High-performance PDF extraction for Node.js — Rust engine, JavaScript/TypeScript interface.

npm version License: Apache-2.0 GitHub

EdgeParse converts PDF documents to Markdown, JSON, HTML, or plain text. It is powered by a native Rust engine (via N-API) with pre-built binaries — no compilation required.

Install

npm install edgeparse
# or
pnpm add edgeparse
# or
yarn add edgeparse

Pre-built binaries are available for:

Platform Architecture
macOS x64, arm64 (Apple Silicon)
Linux x64-gnu, arm64-gnu
Windows x64-msvc

Quick Start

import { convert } from 'edgeparse';

// Convert a PDF to Markdown
const markdown = convert('report.pdf');
console.log(markdown);

// Convert to JSON
const json = convert('report.pdf', { format: 'json' });

// Convert specific pages to HTML
const html = convert('report.pdf', {
  format: 'html',
  pages: [0, 1, 2],   // pages 1–3 (0-indexed)
});

// Password-protected PDF
const text = convert('secure.pdf', {
  format: 'markdown',
  password: 'secret',
});

API

convert(inputPath, options?): string

Converts a PDF file and returns the content as a string.

Parameter Type Description
inputPath string Absolute or relative path to the PDF file
options.format 'markdown' | 'json' | 'html' | 'text' Output format (default: 'markdown')
options.pages number[] Zero-indexed page numbers to extract (default: all)
options.password string Password for encrypted PDFs
options.readingOrder 'xycut' | 'default' Reading order algorithm (default: 'xycut')
options.tableMethod 'border' | 'cluster' Table detection method (default: 'border')
options.imageOutput 'embedded' | 'external' | 'none' Image handling (default: 'none')

version(): string

Returns the edgeparse engine version string.

import { version } from 'edgeparse';
console.log(version()); // e.g. "0.2.2"

CLI

The package also ships an edgeparse CLI binary:

npx edgeparse document.pdf
npx edgeparse document.pdf --format json
npx edgeparse document.pdf --format html --output output/

TypeScript

Full TypeScript support is included — no @types package needed.

import { convert, version } from 'edgeparse';
import type { ConvertOptions } from 'edgeparse';

Performance

EdgeParse consistently processes 40+ pages/second on a modern machine and achieves 88%+ extraction accuracy on diverse real-world PDFs — dramatically faster than Python-based alternatives.

Links

License

Apache-2.0 — see LICENSE.