Expose error model sparsification to python module, update sinter decoders dict#257
Expose error model sparsification to python module, update sinter decoders dict#257noajshu wants to merge 13 commits into
Conversation
Exposes the per-shot error model sparsification functionality (added in PR 254) to the Python module and Sinter compatibility layer. Changes: - Moved the sparsify reactivate limit (M) heuristic calculation from the CLI to `TesseractConfig::get_sparsify_reactivate_limit()` in C++ core. - Added validation for sparsification parameters in decoder initialization. - Simplified CLI sparsification parameter resolution. - Bound sparsification parameters (`sparsify_errors`, `sparsify_base_degree`, `sparsify_max_degree`, `sparsify_reactivate_limit`) and the heuristic resolution method in `TesseractConfig` Python bindings. - Extended `TesseractSinterDecoder` to support sparsification config, maintaining pickling/serialization compatibility for distributed Sinter runs. - Registered new sparsified decoders in Sinter decoders dictionary: `tesseract-long-beam-sparsify3`, `tesseract-long-beam-sparsify2`, `tesseract-short-beam-sparsify3`, and `tesseract-short-beam-sparsify2`. - Added comprehensive Python unit tests for core sparsify config, validation, and Sinter end-to-end decoding. - Fixed pre-existing CMake build failure under strict compilers by upgrading external HiGHS dependency to v1.14.0. TAG=agy CONV=365755ec-6af0-4d9c-a7e2-4a363289b763
Added recommendations for setting K in various code types.
mhucka
left a comment
There was a problem hiding this comment.
I'm not qualified to evaluate the algorithmic or theoretical correctness, but I tried to look at this from the a basic code perspective. I had a couple of minor items, noted in-line.
In addition, one thing that I don't see (and this may just be due to ignorance of Tesseract Decoder) is an update to the .pyi files. This PR does seem to introduce new properties on some objects, which suggests the .pyi files may become out of date. Not 100% sure – just flagging it for your attention.
| ) | ||
| decoder = config3.compile_decoder() | ||
| print( | ||
| "Resolved sparsify reactivation limit:", |
There was a problem hiding this comment.
Not 100% sure, but I suspect there should be a space after the colon.
| "Resolved sparsify reactivation limit:", | |
| "Resolved sparsify reactivation limit: ", |
| {"sparsify_reactivate_limit": -2}, | ||
| "sparsify_reactivate_limit must be >= -1", | ||
| ), | ||
| ({"sparsify_max_degree": -2}, "sparsify_max_degree must be >= -1"), |
There was a problem hiding this comment.
Not a big deal, just a nit: the indentation seems inconsistent here.
| } | ||
| double k = sparsify_base_degree; | ||
| return static_cast<int>( | ||
| std::round((std::pow(4.5, k - 2.0) / 3.0) * static_cast<double>(num_detectors))); |
There was a problem hiding this comment.
Is there a risk of overflow in the std::pow expression, if an unexpectedly large k is passed in? (Even if by accident.)
| } | ||
| DetOrder order = DetOrder::DetBFS; | ||
| if (det_order_index) { | ||
| DetOrder order = DetOrder::DetIndex; |
There was a problem hiding this comment.
This looks like it's using a default detector ordering of DetOrder::DetIndex, but in src/tesseract_sinter_compat.pybind.h line 137, it looks like it may be using a default of DetOrder::DetBFS. Similarly, in src/tesseract.pybind.h it looks like DetOrder::DetBFS may be the default. Does this inconsistency matter?
Exposes the per-shot error model sparsification functionality added in #254 to the Python module and Sinter compatibility layer.
Changes:
sparsify_errors,sparsify_base_degree,sparsify_max_degree, andsparsify_reactivate_limit.tesseract.suggest_sparsify_reactivate_limit(num_detectors, sparsify_base_degree).sparsify_reactivate_limit == -1only when preparing the decoder.--sparsify-reactivate-limitis preserved, while omitted M is stored as-1until decoder initialization.--det-order-*method is specified.--det-order-bfsremains available for the old BFS behavior.TesseractSinterDecoderto support sparsification config while preserving pickle/serialization compatibility for existing distributed Sinter runs.tesseract-long-beam-sparsify3,tesseract-long-beam-sparsify2,tesseract-short-beam-sparsify3, andtesseract-short-beam-sparsify2.Verification:
bazelisk build --jobs=1 --local_resources=cpu=1 src:tesseractbazelisk test --jobs=1 --local_resources=cpu=1 //src:tesseract_testsTAG=agy
CONV=365755ec-6af0-4d9c-a7e2-4a363289b763