You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A big flaw in Tesseract's API is that error indices are always in terms
of the internal `errors` vector, which does not correspond directly to
the errors in the (flattened) input DEM given by the user. This causes a
variety of problems. One such problem is that if we generate a DEM with
`--dem-out` (or e.g. similar manual processing steps in python) it may
bear little resemblance to the input DEM. For example all the targets
will be stripped of separators etc. This makes it annoying to use
tesseract-calibrated error models for downstream tasks like
matching-based decoding.
Here we adopt the principle that the user interface to Tesseract/Simplex
decoders should always be in terms of the error indices from the
original flattened DEM as provided by the user. This is now true across
C++, CLI, and Python APIs.
- Added index-mapping support to DEM preprocessing in common:
- merge_indistinguishable_errors(..., error_index_map)
- remove_zero_probability_errors(..., error_index_map)
- `error_index_map` maps original error index to new preprocessed index
- `error_index_map` maps removed / redundant errors to
std::numeric_limits<size_t>::max()
- Update both decoders (TesseractDecoder, SimplexDecoder) to maintain:
- dem_error_to_error (original flattened DEM index -> internal index)
- error_to_dem_error (internal error index -> original flattened DEM
index)
- predicted_errors_buffer reports errors back with original flattened
DEM error indices.
- cost_from_errors and observables-from-errors methods now:
- accept original flattened DEM indices
- throw on unmapped/removed indices (size_t::max())
- Updated Python bindings to use the new helpers.
- Updated pybind common wrappers to pass required map args to common
preprocessing functions.
- Updated --dem-out in both CLI binaries:
- keep original flattened DEM in scope
- emit updated probabilities by iterating original DEM instruction order
- preserve original error instruction tags and arbitrary formatting
(e.g. `D0 ^ D0 D1`) when writing estimated DEM output.
- Updated tests
- Added AGENTS guidance to run Python Bazel tests
0 commit comments