Skip to content

Expose error model sparsification to python module, update sinter decoders dict#257

Open
noajshu wants to merge 13 commits into
mainfrom
sparsify-errors-python
Open

Expose error model sparsification to python module, update sinter decoders dict#257
noajshu wants to merge 13 commits into
mainfrom
sparsify-errors-python

Conversation

@noajshu

@noajshu noajshu commented May 27, 2026

Copy link
Copy Markdown
Contributor

Exposes the per-shot error model sparsification functionality added in #254 to the Python module and Sinter compatibility layer.

Changes:

  • Added sparsification config fields for Python: sparsify_errors, sparsify_base_degree, sparsify_max_degree, and sparsify_reactivate_limit.
  • Added module-level tesseract.suggest_sparsify_reactivate_limit(num_detectors, sparsify_base_degree).
  • Resolves sparsify_reactivate_limit == -1 only when preparing the decoder.
  • Added validation for sparsification parameters in decoder initialization.
  • Kept CLI sparsification semantics explicit: user-provided --sparsify-reactivate-limit is preserved, while omitted M is stored as -1 until decoder initialization.
  • Changed the C++ CLI default detector ordering to DetIndex when no --det-order-* method is specified. --det-order-bfs remains available for the old BFS behavior.
  • Extended TesseractSinterDecoder to support sparsification config while preserving pickle/serialization compatibility for existing distributed Sinter runs.
  • Registered new sparsified decoders in the Sinter decoders dictionary: tesseract-long-beam-sparsify3, tesseract-long-beam-sparsify2, tesseract-short-beam-sparsify3, and tesseract-short-beam-sparsify2.
  • Added Python and C++ unit tests for sparsify config, validation, heuristic suggestion, Sinter propagation, and end-to-end decoding.

Verification:

  • bazelisk build --jobs=1 --local_resources=cpu=1 src:tesseract
  • bazelisk test --jobs=1 --local_resources=cpu=1 //src:tesseract_tests

TAG=agy
CONV=365755ec-6af0-4d9c-a7e2-4a363289b763

noajshu added 2 commits May 26, 2026 21:56
Exposes the per-shot error model sparsification functionality (added in PR 254)
to the Python module and Sinter compatibility layer.

Changes:
- Moved the sparsify reactivate limit (M) heuristic calculation from the CLI
  to `TesseractConfig::get_sparsify_reactivate_limit()` in C++ core.
- Added validation for sparsification parameters in decoder initialization.
- Simplified CLI sparsification parameter resolution.
- Bound sparsification parameters (`sparsify_errors`, `sparsify_base_degree`,
  `sparsify_max_degree`, `sparsify_reactivate_limit`) and the heuristic
  resolution method in `TesseractConfig` Python bindings.
- Extended `TesseractSinterDecoder` to support sparsification config,
  maintaining pickling/serialization compatibility for distributed Sinter runs.
- Registered new sparsified decoders in Sinter decoders dictionary:
  `tesseract-long-beam-sparsify3`, `tesseract-long-beam-sparsify2`,
  `tesseract-short-beam-sparsify3`, and `tesseract-short-beam-sparsify2`.
- Added comprehensive Python unit tests for core sparsify config, validation,
  and Sinter end-to-end decoding.
- Fixed pre-existing CMake build failure under strict compilers by upgrading
  external HiGHS dependency to v1.14.0.

TAG=agy
CONV=365755ec-6af0-4d9c-a7e2-4a363289b763
@noajshu noajshu changed the title feat(python): expose error model sparsification and add sinter decoders Expose error model sparsification to python module, update sinter decoders dict May 27, 2026
@noajshu noajshu marked this pull request as ready for review June 10, 2026 16:38
@noajshu noajshu requested a review from a team as a code owner June 10, 2026 16:38
@noajshu noajshu requested review from LalehB, aria-googler and mhucka and removed request for a team June 10, 2026 16:38

@mhucka mhucka left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not qualified to evaluate the algorithmic or theoretical correctness, but I tried to look at this from the a basic code perspective. I had a couple of minor items, noted in-line.

In addition, one thing that I don't see (and this may just be due to ignorance of Tesseract Decoder) is an update to the .pyi files. This PR does seem to introduce new properties on some objects, which suggests the .pyi files may become out of date. Not 100% sure – just flagging it for your attention.

Comment thread src/py/README.md
)
decoder = config3.compile_decoder()
print(
"Resolved sparsify reactivation limit:",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% sure, but I suspect there should be a space after the colon.

Suggested change
"Resolved sparsify reactivation limit:",
"Resolved sparsify reactivation limit: ",

Comment thread src/py/tesseract_test.py Outdated
Comment on lines +323 to +326
{"sparsify_reactivate_limit": -2},
"sparsify_reactivate_limit must be >= -1",
),
({"sparsify_max_degree": -2}, "sparsify_max_degree must be >= -1"),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal, just a nit: the indentation seems inconsistent here.

Comment thread src/tesseract.cc Outdated
}
double k = sparsify_base_degree;
return static_cast<int>(
std::round((std::pow(4.5, k - 2.0) / 3.0) * static_cast<double>(num_detectors)));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a risk of overflow in the std::pow expression, if an unexpectedly large k is passed in? (Even if by accident.)

Comment thread src/tesseract_main.cc
}
DetOrder order = DetOrder::DetBFS;
if (det_order_index) {
DetOrder order = DetOrder::DetIndex;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it's using a default detector ordering of DetOrder::DetIndex, but in src/tesseract_sinter_compat.pybind.h line 137, it looks like it may be using a default of DetOrder::DetBFS. Similarly, in src/tesseract.pybind.h it looks like DetOrder::DetBFS may be the default. Does this inconsistency matter?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants