Skip to content

Feature: Stringzilla LUT — faster than OpenCV, float32 LUT/output, per-channel LUT #302

@ternaus

Description

@ternaus

Describe what you are looking for

Feature request: Stringzilla LUT — faster than OpenCV, float32 LUT/output, per-channel LUT

Summary

We use Stringzilla’s LUT (via sz.translate / sz_lut) in Albucore and AlbumentationsX for uint8→uint8 lookup. We need:

  1. Faster than OpenCV — LUT apply should be faster than cv2.LUT (or at least not slower).
  2. uint8 image + float32 LUT → float32 image — so we can feed a float32 LUT and get float32 output (e.g. for normalization: uint8 image normalized to float32 in one pass without a separate convert-then-normalize path).
  3. Per-channel LUT without a loop — accept a LUT array of shape (C, N) (e.g. C channels, 256 entries) and apply channel c’s LUT to channel c in one call, instead of looping over channels in Python.

Current usage

Albucore

  • sz_lut(img, lut, inplace) — uint8 image, uint8 LUT (256 entries), uses sz.translate. Used for add/multiply/power LUT-based ops, and from Albumentations for any uint8 LUT path.
  • Per-channel LUT: When LUT is per-channel (e.g. shape (256, C) or (C, 256)), the code loops over channels: result[..., i] = sz_lut(img[..., i], luts[i], inplace). Same in apply_lut for per-channel value.
  • Normalization: For mean-std or min-max normalization on uint8, Albucore builds a float32 LUT and uses cv2.LUT (not Stringzilla) because sz_lut is uint8-only. So we need a path that accepts float32 LUT and outputs float32 image.

AlbumentationsX

We use sz_lut in several transforms. Categorized below. We also need LUT to accept any input shape: (H, W, C), (N, H, W, C), (D, H, W, C), or (N, D, H, W, C). Ideally without reshape; today we reshape to a contiguous last-dimension and it works, but native support for these shapes would be better.

Working well (single LUT, no loop) — no need to change:

  • HueSaturationValue — hue shift via single LUT on H channel.
  • Solarize — single threshold LUT.
  • Posterize — when same bits for all channels; single LUT.
  • Equalize — single-channel equalization (PIL and CV paths); one LUT per call.
  • MoveToneCurve — when tone curve is scalar (same for all channels); single LUT.
  • RandomGamma — single gamma LUT.
  • RandomDither — binary or single quantization LUT (same for all channels).

Application in loop (need improvement — per-channel LUT in one call):

  • Posterize — when bits are per-channel (e.g. bits=[3, 4, 5]); we loop over channels and call sz_lut per channel.
  • MoveToneCurve — when low_y / high_y are per-channel arrays; we build LUTs shape (C, 256) and loop: result[..., i] = sz_lut(img[..., i], luts[i], ...).
  • AutoContrast — per-channel LUT from histogram bounds; loop over channels.
  • EqualizeHistogram — per-channel equalization; loop over channels.
  • OrderedDither — multi-level dithering; we reuse the same LUT but still loop over channels.

Need float32 output (need improvement):

  • RandomBrightnessContrast / BrightnessContrast — we build a float32 LUT then cast to uint8 and call sz_lut; we lose precision. We want uint8 image + float32 LUT → float32 image when output is float.
  • Normalize — uint8 image normalized to float32 (mean-std or min-max); needs float32 LUT → float32 image in one pass (currently Albucore uses cv2.LUT for this path).

What we need

1. Performance

  • LUT apply should be faster than OpenCV cv2.LUT for the same input (uint8 image, 256-element LUT). We already use Stringzilla for uint8/uint8; keeping or improving that advantage is important.

2. uint8 image + float32 LUT → float32 image

  • Input: uint8 image of shape (H, W), (H, W, C), (N, H, W, C), (D, H, W, C), or (N, D, H, W, C); float32 LUT of length 256 (values in any range, e.g. normalized [0, 1] or mean-std normalized).
  • Output: float32 image of the same shape; out[..., c] = lut[img[..., c]] (last dimension is channel when present).
  • Use case: Normalization (mean-std, min-max) and any pipeline that wants to map uint8 indices through a float LUT without converting the whole image to float first. So we can do “uint8 in → float32 out” in one LUT call.

3. Per-channel LUT: shape (C, N) in one call

  • Input: uint8 image of shape (H, W, C), (N, H, W, C), (D, H, W, C), or (N, D, H, W, C); LUT array of shape (C, 256) (or (C, N) with N=256): one LUT per channel.
  • Output: same shape uint8 (or float32 if LUT is float32); channel c is transformed by luts[c, :].
  • Current workaround: We do a Python loop: for c in range(C): result[..., c] = sz_lut(img[..., c], luts[c], ...). We want a single API call that applies each channel’s LUT to that channel (no loop), so the backend can vectorize over channels.

If the same API supports both “single LUT (1D)” and “per-channel LUT (C, 256)”, that would cover our use cases. All of the above should accept any input shape (H, W, C), (N, H, W, C), (D, H, W, C), or (N, D, H, W, C) without the caller having to reshape; we currently reshape to a contiguous last-dimension and it works, but native support for these shapes is preferred.


Summary table

Feature Current Request
Speed Stringzilla used for uint8/uint8 Stay faster than OpenCV cv2.LUT
Dtypes uint8 image + uint8 LUT → uint8 Add: uint8 image + float32 LUTfloat32 image (e.g. for normalization)
Per-channel Loop in Python: one sz_lut per channel Accept LUT shape (C, 256) and apply in one call (no loop)

Why this helps

  • Normalization: One pass uint8→float32 with a float32 LUT instead of convert-to-float then normalize (or using cv2.LUT in a separate code path). Same pipeline, better performance and simpler code.
  • Per-channel ops: Equalize, tone curve, reduce_bits, and any per-channel LUT run in a loop today; a single (C, 256) LUT call would remove the loop and let the backend optimize (SIMD over channels, etc.).

If this is implemented in Stringzilla (or exposed in Albucore on top of it), we would use it in Albucore and AlbumentationsX to simplify code, improve performance, and support uint8→float32 LUT and per-channel LUT without Python loops.

Can you contribute to the implementation?

  • I can contribute

Is your feature request specific to a certain interface?

It applies to everything

Contact Details

No response

Is there an existing issue for this?

  • I have searched the existing issues

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions