FIX: [DEV-15236] improvements to scheduler and table model predict speed. by benleetownsend · Pull Request #856 · IndicoDataSolutions/finetune

benleetownsend · 2026-04-24T14:26:19Z

Note

Medium Risk
Changes chunking/span-token accounting logic that influences how table text is split into model inputs; performance-focused but could subtly alter chunk boundaries and downstream predictions.

Overview
Improves table-model preprocessing/chunking performance by avoiding repeated token counting and by accelerating token-overlap calculation when token spans are monotonic.

Also makes get_axis_spans build axis span buckets in a single pass (skipping negative indices), which can slightly change how spans are grouped before chunking in edge cases.

^{Reviewed by Cursor Bugbot for commit fc41b70. Bugbot is set up for automated code reviews on this repo. Configure here.}

benleetownsend · 2026-04-24T14:30:17Z

        max_row = max(r[context_key] for r in context)
-        row_spans = [
-            [
+        row_spans = [[] for _ in range(max_row + 1)]


This bucketing change is part of the chunker speedup. On the synthetic 150 x 20 table chunking benchmark, the chunker dropped from 55.443s to 0.263s (~99.5% faster) after the get_axis_spans / _make_chunks / combine_row_spans optimization pass.

benleetownsend · 2026-04-24T14:30:17Z

-                    [mark_token(t) for t in token_spans if overlaps_token(row_span, t)]
-                )
+                num_tokens = 0
+                token_idx = bisect.bisect_left(token_ends, row_span["start"])


This bisect-bounded token scan is another part of the same chunker win. The synthetic 150 x 20 chunking benchmark went from 55.443s to 0.263s (~99.5% faster) after the chunker changes, while preserving output digests.

benleetownsend · 2026-04-24T14:34:22Z

@@ -1,6 +1,7 @@
 """


This files changes are probably worth pulling in. Big improvements for the risk on larger tables.

The other changes maybe we drop as being not valuable enough for the risk at this point.

madisonmay · 2026-04-24T16:53:22Z

 """
 Finetune-style interface for running a pipeline of table and non-table models.
 """
+import bisect


I didn't know this was part of stdlib!

benleetownsend added 2 commits April 24, 2026 15:25

fix: make scheduler stats per-model

ea4f546

fix: preprocessing changes

ccd7d1b

benleetownsend commented Apr 24, 2026

View reviewed changes

madisonmay self-requested a review April 30, 2026 12:55

madisonmay approved these changes Apr 30, 2026

View reviewed changes

benleetownsend added 2 commits April 30, 2026 15:48

chore: keep only table chunker optimization

4faa9ca

fix: add defensive non-monotonic case

fc41b70

benleetownsend requested a review from madisonmay May 1, 2026 17:08

madisonmay approved these changes May 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX: [DEV-15236] improvements to scheduler and table model predict speed.#856

FIX: [DEV-15236] improvements to scheduler and table model predict speed.#856
benleetownsend wants to merge 4 commits intodevelopmentfrom
fix/table_model_scheduler_and_performance

benleetownsend commented Apr 24, 2026 •

edited by cursor Bot

Loading

Uh oh!

benleetownsend Apr 24, 2026

Uh oh!

benleetownsend Apr 24, 2026

Uh oh!

benleetownsend Apr 24, 2026

Uh oh!

benleetownsend Apr 24, 2026

Uh oh!

madisonmay Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

benleetownsend commented Apr 24, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

benleetownsend Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

benleetownsend Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

benleetownsend Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

benleetownsend Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

madisonmay Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

benleetownsend commented Apr 24, 2026 •

edited by cursor Bot

Loading