Skip to content

feat: add page number support to v1 html partition#4327

Merged
badGarnet merged 5 commits intomainfrom
feat/v1-html-page-number
Apr 8, 2026
Merged

feat: add page number support to v1 html partition#4327
badGarnet merged 5 commits intomainfrom
feat/v1-html-page-number

Conversation

@badGarnet
Copy link
Copy Markdown
Collaborator

This PR adds support for page-number when partitioning html using the v1 parser.

  • Add page_number support to the v1 HTML parser by reading data-page-number attributes from ancestor elements, consistent with v2 parser behavior
  • Add _page_number cached property on Flow using efficient parent-chain lookup (O(n) total vs O(n*depth) ancestor walk)
  • Wire page number into all three element-creation paths: text elements, images, and tables
  • Malformed data-page-number values are skipped and fall back to the nearest valid ancestor

@cragwolfe
Copy link
Copy Markdown
Contributor

@badGarnet badGarnet added this pull request to the merge queue Apr 8, 2026
Merged via the queue into main with commit d299095 Apr 8, 2026
53 checks passed
@badGarnet badGarnet deleted the feat/v1-html-page-number branch April 8, 2026 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants