Skip to content

Adapt DALI to nvImageCodec 0.8.0#6293

Merged
jantonguirao merged 3 commits intoNVIDIA:mainfrom
jantonguirao:nvimgcodec_0_8_0_pr
Apr 21, 2026
Merged

Adapt DALI to nvImageCodec 0.8.0#6293
jantonguirao merged 3 commits intoNVIDIA:mainfrom
jantonguirao:nvimgcodec_0_8_0_pr

Conversation

@jantonguirao
Copy link
Copy Markdown
Contributor

@jantonguirao jantonguirao commented Apr 14, 2026

Category:

New feature (non-breaking change which adds functionality)

Description:

Adapts DALI to the nvImageCodec 0.8.0 API. The new version introduces breaking API changes that
require updates to DALI's integration layer.

Key changes:

  • Bump the required nvImageCodec version range from 0.7.x to 0.8.x
  • Add nvImageCodec 0.8.0 headers under third_party/nvimgcodec/include/ as a local fallback
    when the system package is not found, before attempting the remote download
  • Update nvimgcodecCodeStreamCreateFromHostMem call to match the new API signature
  • Rework qa/TL1_decoder_perf/test.sh to collect nsys profiles on failure for easier debugging

Additional information:

Affected modules and functionalities:

  • cmake/Dependencies.common.cmake — version bump and local header fallback logic
  • dali/operators/imgcodec/util/nvimagecodec_types.cc — API call signature update
  • third_party/nvimgcodec/include/ — new local copy of 0.8.0 headers
  • qa/TL1_decoder_perf/test.sh — improved failure diagnostics with nsys profiles

Key points relevant for the review:

  • The nvimgcodecCodeStreamCreateFromHostMem API gained an additional nullable parameter; we pass nullptr to preserve existing behavior.

Tests:

  • Existing tests apply
    • qa/TL1_decoder_perf/test.sh — decoder performance tests (also reworked in this PR)
  • N/A

Checklist

Documentation

  • N/A

DALI team only

Requirements

  • Affects existing requirements

REQ IDs: N/A

JIRA TASK: N/A

Copilot AI review requested due to automatic review settings April 14, 2026 10:36
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates DALI’s nvImageCodec integration to work with nvImageCodec 0.8.0 by adjusting build-time dependency logic, updating the API call site, and improving decoder perf QA diagnostics (nsys capture on failure).

Changes:

  • Bump nvImageCodec required version range to >= 0.8.0, < 0.9.0 and add a local-header fallback under third_party/nvimgcodec/include/.
  • Update nvimgcodecCodeStreamCreateFromHostMem usage to the new 0.8.x signature (additional nullable parameter).
  • Rework qa/TL1_decoder_perf/test.sh to re-run with Nsight Systems profiling on performance-check failure.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
cmake/Dependencies.common.cmake Updates required nvImageCodec version range and adds local-header fallback logic (before remote download).
dali/operators/imgcodec/util/nvimagecodec_types.cc Adjusts DALI’s nvimgcodec host-mem codestream creation call to match the 0.8.x API.
qa/TL1_decoder_perf/test.sh Adds nsys install step and failure-time profiling re-runs; expands perf matrix (incl. 32-stream runs).
third_party/nvimgcodec/include/nvimgcodec.h Adds vendored nvImageCodec 0.8.0 header as a local fallback when system package isn’t found.
third_party/nvimgcodec/include/nvimgcodec_version.h Adds vendored nvImageCodec version header for the local fallback headers.
Comments suppressed due to low confidence (1)

cmake/Dependencies.common.cmake:330

  • The fallback download still fetches nvImageCodec 0.7.0 headers (URL + SHA), but this PR bumps the required version range to 0.8.x and the code now calls the 0.8 API. If the system package isn’t found and the local headers are missing, the build will pull 0.7 headers and fail to compile (signature mismatch) or compile against the wrong API. Update the downloaded archive URL/HASH to a 0.8.x release (and adjust the extracted include path if needed).
        URL      https://developer.download.nvidia.com/compute/nvimgcodec/redist/nvimgcodec/linux-x86_64/nvimgcodec-linux-x86_64-0.8.0.22-archive.tar.xz
        URL_HASH SHA512=2a400f75c619a10c3dbcd298a83ef3307f6e08453b2cfb5040f6b22c64c7be0ac4552a2a80ed057afe7657cf0bb8cc2d54cdccf8bc50ffdf34cfd05b45082978
      )
      FetchContent_Populate(nvimgcodec_headers)
      set(nvimgcodec_INCLUDE_DIR "${nvimgcodec_headers_SOURCE_DIR}/${CUDA_VERSION_MAJOR}/include")
      if (NOT EXISTS "${nvimgcodec_INCLUDE_DIR}/nvimgcodec.h")
        message(FATAL_ERROR "nvimgcodec.h not found in ${nvimgcodec_INCLUDE_DIR} - something went wrong with the download")
      endif()
    endif()
    message(STATUS "Using nvimgcodec_INCLUDE_DIR=${nvimgcodec_INCLUDE_DIR}")
    include_directories(SYSTEM ${nvimgcodec_INCLUDE_DIR})

    # Setting default installation path for dynamic loading

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cmake/Dependencies.common.cmake
Comment thread qa/TL1_decoder_perf/test.sh
Comment thread qa/TL1_decoder_perf/test.sh Outdated
Comment on lines +144 to +147
NSYS_REP="decoder_perf_legacy.nsys-rep" run_bench "${LOG1}" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p rn50
NSYS_REP="decoder_perf_nvimgcodec.nsys-rep" run_bench "${LOG2}" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p rn50 --experimental_decoder
NSYS_REP="decoder_perf_ndd_legacy.nsys-rep" run_bench "${LOG1_NDD}" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p ndd_rn50
NSYS_REP="decoder_perf_ndd_nvimgcodec.nsys-rep" run_bench "${LOG2_NDD}" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p ndd_rn50 --experimental_decoder
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failure path says it will “re-run all benchmarks with nsys profiling”, but it only profiles 4 runs (legacy/nvimgcodec and NDD variants). Since the pass/fail condition also depends on the new-thread-pool runs and the 32-stream runs, a failure in those cases won’t produce an nsys profile for the failing configuration. Consider profiling the same set of benchmarks that were executed in run_all_benchmarks (including DALI_USE_NEW_THREAD_POOL=1 and NVIMGCODEC_DEFAULT_NUM_CUDA_STREAMS=32 variants) so the collected traces correspond to the actual failing check.

Suggested change
NSYS_REP="decoder_perf_legacy.nsys-rep" run_bench "${LOG1}" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p rn50
NSYS_REP="decoder_perf_nvimgcodec.nsys-rep" run_bench "${LOG2}" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p rn50 --experimental_decoder
NSYS_REP="decoder_perf_ndd_legacy.nsys-rep" run_bench "${LOG1_NDD}" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p ndd_rn50
NSYS_REP="decoder_perf_ndd_nvimgcodec.nsys-rep" run_bench "${LOG2_NDD}" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p ndd_rn50 --experimental_decoder
NSYS_REP="decoder_perf_legacy.nsys-rep" run_bench "${LOG1}" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p rn50
NSYS_REP="decoder_perf_nvimgcodec.nsys-rep" run_bench "${LOG2}" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p rn50 --experimental_decoder
NSYS_REP="decoder_perf_ndd_legacy.nsys-rep" run_bench "${LOG1_NDD}" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p ndd_rn50
NSYS_REP="decoder_perf_ndd_nvimgcodec.nsys-rep" run_bench "${LOG2_NDD}" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p ndd_rn50 --experimental_decoder
NSYS_REP="decoder_perf_legacy_new_tp.nsys-rep" DALI_USE_NEW_THREAD_POOL=1 run_bench "dali_legacy_new_tp.nsys.log" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p rn50
NSYS_REP="decoder_perf_nvimgcodec_new_tp.nsys-rep" DALI_USE_NEW_THREAD_POOL=1 run_bench "dali_nvimgcodec_new_tp.nsys.log" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p rn50 --experimental_decoder
NSYS_REP="decoder_perf_ndd_legacy_new_tp.nsys-rep" DALI_USE_NEW_THREAD_POOL=1 run_bench "dali_ndd_legacy_new_tp.nsys.log" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p ndd_rn50
NSYS_REP="decoder_perf_ndd_nvimgcodec_new_tp.nsys-rep" DALI_USE_NEW_THREAD_POOL=1 run_bench "dali_ndd_nvimgcodec_new_tp.nsys.log" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p ndd_rn50 --experimental_decoder
NSYS_REP="decoder_perf_legacy_32_streams.nsys-rep" NVIMGCODEC_DEFAULT_NUM_CUDA_STREAMS=32 run_bench "dali_legacy_32_streams.nsys.log" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p rn50
NSYS_REP="decoder_perf_nvimgcodec_32_streams.nsys-rep" NVIMGCODEC_DEFAULT_NUM_CUDA_STREAMS=32 run_bench "dali_nvimgcodec_32_streams.nsys.log" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p rn50 --experimental_decoder
NSYS_REP="decoder_perf_ndd_legacy_32_streams.nsys-rep" NVIMGCODEC_DEFAULT_NUM_CUDA_STREAMS=32 run_bench "dali_ndd_legacy_32_streams.nsys.log" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p ndd_rn50
NSYS_REP="decoder_perf_ndd_nvimgcodec_32_streams.nsys-rep" NVIMGCODEC_DEFAULT_NUM_CUDA_STREAMS=32 run_bench "dali_ndd_nvimgcodec_32_streams.nsys.log" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p ndd_rn50 --experimental_decoder

Copilot uses AI. Check for mistakes.
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 14, 2026

Greptile Summary

This PR adapts DALI to the nvImageCodec 0.8.0 API by bumping version ranges in cmake and conda configs, updating the nvimgcodecCodeStreamCreateFromHostMem call to include the new nullable parameter (passed as nullptr), and reworking the decoder performance test script to collect nsys profiles automatically on benchmark failure.

Confidence Score: 5/5

Safe to merge — all changes are mechanical version bumps and a one-line API adaptation with a well-understood nullptr default.

The only finding is a P2 style note about missing pipefail in the nsys pipe, which affects diagnostic convenience on failure but not correctness of the performance tests themselves. All version-bump and API changes are internally consistent.

No files require special attention.

Important Files Changed

Filename Overview
cmake/Dependencies.common.cmake Version range bumped from 0.7.x to 0.8.x; remote download URL/hash and static GIT_TAG updated consistently to 0.8.0.
dali/operators/imgcodec/util/nvimagecodec_types.cc Added nullptr for the new nullable parameter in nvimgcodecCodeStreamCreateFromHostMem; preserves existing behavior correctly.
conda/third_party/dali_nvimagecodec/recipe/meta.yaml Simple version bump from 0.7.0 to 0.8.0 for both build_version and git_rev.
qa/TL1_decoder_perf/test.sh Adds do_once for nsys install, refactors benchmarks into run_all_benchmarks, and adds nsys profiling on failure; PERF_RESULT1_TP/PERF_RESULT2_TP removed from pass/fail gate (now informational); one P2: nsys pipe failure silently masked without pipefail.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Build with BUILD_NVIMAGECODEC] --> B{WITH_DYNAMIC_NVIMGCODEC?}
    B -- Yes --> C[find_package nvimgcodec 0.8.0...<0.9.0]
    C -- Found --> E[Use system headers]
    C -- Not Found --> D[FetchContent: download 0.8.0.22 tarball]
    D --> E
    B -- No --> F[ExternalProject: clone v0.8.0 from GitHub]
    E --> G[nvimgcodecCodeStreamCreateFromHostMem\ninstance, handle, data, length, nullptr]
    F --> G
Loading

Reviews (4): Last reviewed commit: "Update decoder performance test script" | Re-trigger Greptile

Comment thread qa/TL1_decoder_perf/test.sh Outdated
apt update && apt install -y --no-install-recommends gnupg
echo "deb http://developer.download.nvidia.com/devtools/repos/ubuntu$(source /etc/lsb-release && echo "$DISTRIB_RELEASE" | tr -d .)/$(dpkg --print-architecture) /" \
| tee /etc/apt/sources.list.d/nvidia-devtools.list
apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 security Insecure HTTP key fetch in do_once

The GPG key for the apt repository is fetched over plain HTTP (http://), making it vulnerable to MITM substitution — an attacker on-path could serve a rogue key, then sign and inject malicious packages (e.g., nsight-systems-cli) that get executed during CI. Use HTTPS instead.

Suggested change
apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
curl -fsSL https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub | apt-key add -

Comment thread qa/TL1_decoder_perf/test.sh Outdated
Comment on lines +9 to +12
echo "deb http://developer.download.nvidia.com/devtools/repos/ubuntu$(source /etc/lsb-release && echo "$DISTRIB_RELEASE" | tr -d .)/$(dpkg --print-architecture) /" \
| tee /etc/apt/sources.list.d/nvidia-devtools.list
apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
apt update && apt install -y nsight-systems-cli
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Hardcoded OS/arch in GPG key URL

The devtools apt repo URL is dynamically constructed from lsb_release and dpkg --print-architecture, but the GPG key URL is hardcoded to ubuntu1804/x86_64. On Ubuntu 20.04+ or ARM runners the key URL and the repo URL will reference different distributions, and apt-key is deprecated on Ubuntu 22.04+. Consider aligning the key source to the detected distro or switching to gpg --dearmor.

Comment thread qa/TL1_decoder_perf/test.sh Outdated
Comment on lines +144 to +147
NSYS_REP="decoder_perf_legacy.nsys-rep" run_bench "${LOG1}" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p rn50
NSYS_REP="decoder_perf_nvimgcodec.nsys-rep" run_bench "${LOG2}" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p rn50 --experimental_decoder
NSYS_REP="decoder_perf_ndd_legacy.nsys-rep" run_bench "${LOG1_NDD}" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p ndd_rn50
NSYS_REP="decoder_perf_ndd_nvimgcodec.nsys-rep" run_bench "${LOG2_NDD}" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p ndd_rn50 --experimental_decoder
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 nsys re-run omits thread-pool and 32-stream benchmarks

On a failure, the nsys profiling pass re-runs only the four baseline benchmarks (LOG1, LOG2, LOG1_NDD, LOG2_NDD), but the success gate also checks PERF_RESULT1_TP, PERF_RESULT2_TP, PERF_RESULT2_32STREAMS, and PERF_RESULT2_NDD_32STREAMS. If a failure is caused by one of those omitted cases, no nsys profile will be captured for it, defeating the diagnostic purpose.

@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [48491354]: BUILD STARTED

@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [48499746]: BUILD STARTED

@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [48499746]: BUILD FAILED

@jantonguirao jantonguirao force-pushed the nvimgcodec_0_8_0_pr branch 2 times, most recently from 7619784 to 3298041 Compare April 14, 2026 17:32
@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [48519658]: BUILD STARTED

@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [48519658]: BUILD PASSED

@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [48491354]: BUILD FAILED

@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [48791256]: BUILD STARTED

@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [48791256]: BUILD PASSED

- Bump required nvImageCodec version range from 0.7.x to 0.8.x
- Update nvimgcodec download URL and hash to 0.8.0.22
- Update nvimgcodecCodeStreamCreateFromHostMem call to match the new
  0.8.0 API signature (added nullable parameter)
- Rework qa/TL1_decoder_perf/test.sh to collect nsys profiles on
  failure for easier debugging
…2-stream benchmarks

PERF_RESULT1_TP and PERF_RESULT2_TP are now informational only since the
new thread pool is still experimental. The 32-stream benchmark variants
are removed as they are no longer needed.
@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [48992654]: BUILD STARTED

@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [48992654]: BUILD PASSED

@jantonguirao jantonguirao merged commit 32768d6 into NVIDIA:main Apr 21, 2026
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants