Skip to content

Latest commit

 

History

History
54 lines (46 loc) · 2.87 KB

File metadata and controls

54 lines (46 loc) · 2.87 KB

ks-xlsx-parser Wiki

Welcome! This wiki holds the implementation detail we'd rather keep out of the front-page README so it stays scannable. The code-heavy stuff lives here.

Start here

  • Quick Start — 5 end-to-end snippets that cover ~90 % of real-world usage: parse, iterate chunks, walk the dependency graph, serialise for a DB/vector store, parse from bytes.
  • API Reference — full signatures and examples for parse_workbook, compare_workbooks, export_importer, StageVerifier.
  • Web API — running the bundled FastAPI app and calling POST /parse from curl / Python / TypeScript.
  • Data Models — the Pydantic DTOs you'll be reading in JSON output, field by field.
  • Architecture — 8-stage pipeline diagram + module map (parsers → models → formula → analysis → charts → annotation → chunking → rendering → storage → verification → comparison → export).
  • Pipeline Internals — how the 8 stages fit together, and where to hook in if you want to extend the parser.
  • Benchmark vs hucre — unbiased head-to-head against the hucre TypeScript engine on the SpreadsheetBench corpus: perf, extraction-count parity, and where each tool wins.

Related docs in the main repo

Community

  • 💬 Discord — fastest way to get a real answer from a human.
  • 🗣 GitHub Discussions — async Q&A and RFCs.
  • 🐞 Issues — bugs, feature requests, parser edge cases.

Something in the wiki out of date or confusing? Open a PR against docs/wiki/ — the wiki is rebuilt from that directory on every release.