Skip to content

Expand TRT decoder YAML config for composite decoding [depends on PR #524]#536

Open
wsttiger wants to merge 10 commits into
NVIDIA:mainfrom
wsttiger:update_trt_decoder_yaml
Open

Expand TRT decoder YAML config for composite decoding [depends on PR #524]#536
wsttiger wants to merge 10 commits into
NVIDIA:mainfrom
wsttiger:update_trt_decoder_yaml

Conversation

@wsttiger

@wsttiger wsttiger commented May 8, 2026

Copy link
Copy Markdown
Collaborator

Add YAML/config support for TRT decoder runtime options including batch size,
CUDA graph execution, global decoder selection, and PyMatching-specific global
decoder parameters. Wire realtime decoder construction so TRT configs receive
the top-level observable matrix from O_sparse, and pass the same O matrix into
PyMatching global decoder params for composite observable decoding.

Expose the new config fields through Python bindings and heterogeneous_map
round-tripping. Extend YAML tests for TRT config round-trip, runtime parameter
conversion, and O_sparse-to-O injection.

Update test_trt_decoder_composite to support an optional --config-yaml path,
allowing the existing composite demo to construct and run a real TRT+PyMatching
decoder directly from YAML while preserving the original manual CLI path.

bmhowe23 and others added 5 commits April 29, 2026 23:57
…output

Add a "predecoder" execution mode to the TensorRT decoder so it can be
chained with a second decoder (e.g. PyMatching) and return logical-frame
observables directly. The TRT model is assumed to emit a single output
that concatenates [pre_L (num_observables entries), residual_dets (rest)].

New constructor parameters:
- "batch_size": required when the ONNX model has a dynamic batch dim.
  Used to size the optimization profile and pre-allocate I/O buffers.
- "global_decoder" + "global_decoder_params": optional decoder name and
  params for a follow-up decoder run on the residual_dets portion of
  the TRT output. Created with the same H passed to trt_decoder.
- "O": observables matrix (num_observables x block_size). Enables
  decode()/decode_batch() to return the predicted logical frame.
  Number of observables is inferred from O.shape()[0].

Decode behavior matrix:
- no global_decoder, no O   -> raw TRT output (unchanged).
- no global_decoder, O      -> return the pre_L prefix only.
- global_decoder, no O      -> entire output -> global_decoder.result.
- global_decoder, O         -> residual -> global_decoder; return
                               pre_L XOR global_decoder.logical_frame.

Constructor validation when O is set:
- output_size_per_sample >= num_observables, and
- when global_decoder_ is set,
  output_size_per_sample == num_observables + global_decoder.syndrome_size.

Other changes:
- Dynamic batch support: setInputShape per call when the model's batch
  dim is -1; ONNX builder now installs a min/opt/max optimization
  profile when "batch_size" is provided.
- Split decode_batch into a typed decode_batch_impl<float|uint8_t> for
  cleaner dtype dispatch (engine I/O dtypes float32 / uint8 unchanged).
- Better INFO logging: total non-zero input vs residual detector counts
  per batch to help diagnose predecoder behavior.

Signed-off-by: Ben Howe <bhowe@nvidia.com>
Add a realtime test/demo that initializes the TensorRT decoder from an ONNX
predecoder model with PyMatching configured as the global decoder. The driver
loads detector, observable, parity-check, observable, and prior data from the
Stim export bundle, decodes samples through the composite TRT+PyMatching path,
and reports latency, throughput, correctness, and residual-syndrome diagnostics.

Register the new test_trt_decoder_composite target when TensorRT, realtime,
and the TRT decoder plugin are available.

Signed-off-by: Scott Thornton <wsttiger@gmail.com>
Add YAML/config support for TRT decoder runtime options including batch size,
CUDA graph execution, global decoder selection, and PyMatching-specific global
decoder parameters. Wire realtime decoder construction so TRT configs receive
the top-level observable matrix from O_sparse, and pass the same O matrix into
PyMatching global decoder params for composite observable decoding.

Expose the new config fields through Python bindings and heterogeneous_map
round-tripping. Extend YAML tests for TRT config round-trip, runtime parameter
conversion, and O_sparse-to-O injection.

Update test_trt_decoder_composite to support an optional --config-yaml path,
allowing the existing composite demo to construct and run a real TRT+PyMatching
decoder directly from YAML while preserving the original manual CLI path.

Signed-off-by: Scott Thornton <wsttiger@gmail.com>
…yaml

# Conflicts:
#	libs/qec/unittests/realtime/CMakeLists.txt
#	libs/qec/unittests/realtime/test_trt_decoder_composite.cpp
@wsttiger wsttiger marked this pull request as ready for review May 11, 2026 22:10
wsttiger added 5 commits May 12, 2026 00:45
Replace the TRT decoder's hardcoded optional PyMatching global decoder params
with a tagged global_decoder_config variant. Preserve PyMatching as the current
supported concrete config while using std::monostate for the unset case.

Update heterogeneous-map conversion, YAML mapping, and Python bindings so the
existing PyMatching YAML/Python surface continues to round-trip. Extend the YAML
unit test to verify the PyMatching variant arm is selected and still produces
the expected runtime parameter map.

Signed-off-by: Scott Thornton <wsttiger@gmail.com>
…yaml

# Conflicts:
#	libs/qec/python/bindings/py_decoding_config.cpp
Signed-off-by: Scott Thornton <wsttiger@gmail.com>
Signed-off-by: Scott Thornton <wsttiger@gmail.com>
@wsttiger wsttiger force-pushed the update_trt_decoder_yaml branch from 6c2eefc to 26be6b4 Compare May 29, 2026 17:15
@wsttiger wsttiger requested a review from melody-ren May 29, 2026 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants