Skip to content

Part 1 decode: marker-bit dequantization and block_states removal #317

@osamu620

Description

@osamu620

Context

j2k_dequant() reads the per-sample decoded-bit-plane index p from block_states (block_dequant.cpp:61, 91):

```cpp
N_b = 30 - (state >> 3) + 1;
```

This is the only remaining dependency on block_states once #315 is done. If we can get p from somewhere else, the block_states buffer can be deleted entirely — saving one allocation/zero per codeblock and its ~4 KB of memory traffic in the hot path.

Book recipe (§17.1.3)

Instead of storing p in a separate byte-per-sample array, embed it in sample_buf itself via a marker bit:

  • When a sample first becomes significant (σ transitions 0 → 1), set a marker bit in sample_buf[j2 + j1*sstride] at position pLSB - 1 (one below the current bit-plane's LSB). This happens in sigprop's samples[...] |= 1 << p path and cleanup's equivalent.
  • Dequant recovers p by counting trailing zeros of the sample (after masking off the sign bit), which identifies the lowest set bit — the marker.

Scope

  • block_decoding.cpp: at each σ=1 transition, also OR the marker bit (1 << (p - 1) is fine in the natural flow since p >= 1 at decode time).
  • block_dequant.cpp + NEON/AVX2/AVX512 variants: replace the block_states byte load with a trailing-zero-count on the decoded sample.
  • coding_units.cpp + subband_row_buf.cpp: delete block_states / blkstate_stride / all allocation and zeroing for the Part 1 decode path. HT keeps its own block_states (different bit semantics, out of scope).
  • j2k_codeblock: delete block_states / blkstate_stride fields (Part 1 only — HT uses a separate allocation path after Part 1 decode: full consumer port of σ/σ̄/π/χ̂ onto packed stripe-column word #315 migrates them).

Why this is not shipped yet

Without #315 in place, block_states still carries σ/σ̄/π/χ̂ for several consumers, so deleting it isn't possible. D alone on top of the current branch would save only the single block_states byte load in dequant — dequant is already SIMD'd, so the ceiling is ~2–5 % and the cost in touching all dequant kernels isn't justified.

D becomes attractive once #315 is merged: at that point block_states holds only p-index, and D lets us delete the whole buffer.

Dependency graph

Acceptance

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions