Welcome! This wiki holds the implementation detail we'd rather keep out of the front-page README so it stays scannable. The code-heavy stuff lives here.
- Quick Start — 5 end-to-end snippets that cover ~90 % of real-world usage: parse, iterate chunks, walk the dependency graph, serialise for a DB/vector store, parse from bytes.
- API Reference — full signatures and examples for
parse_workbook,compare_workbooks,export_importer,StageVerifier. - Web API — running the bundled FastAPI app and calling
POST /parsefromcurl/ Python / TypeScript. - Data Models — the Pydantic DTOs you'll be reading in JSON output, field by field.
- Architecture — 8-stage pipeline diagram + module map (parsers → models → formula → analysis → charts → annotation → chunking → rendering → storage → verification → comparison → export).
- Pipeline Internals — how the 8 stages fit together, and where to hook in if you want to extend the parser.
- Benchmark vs
hucre— unbiased head-to-head against the hucre TypeScript engine on the SpreadsheetBench corpus: perf, extraction-count parity, and where each tool wins.
README.md— hero page, architecture diagram, comparison table, community links.docs/WORKBOOK_GRAPH_SPEC.md— the canonical specification for the extraction output.docs/PARSER_KNOWN_ISSUES.md— known edge cases and how we handle them.docs/corpora.md— public benchmark corpora (SpreadsheetBench, EUSES, Enron).CONTRIBUTING.md— dev loop, PR checklist, community channels.CHANGELOG.md— release history.
- 💬 Discord — fastest way to get a real answer from a human.
- 🗣 GitHub Discussions — async Q&A and RFCs.
- 🐞 Issues — bugs, feature requests, parser edge cases.
Something in the wiki out of date or confusing? Open a PR against
docs/wiki/
— the wiki is rebuilt from that directory on every release.