Adapt DALI to nvImageCodec 0.8.0#6293
Conversation
3bbd756 to
e3ab81b
Compare
There was a problem hiding this comment.
Pull request overview
Updates DALI’s nvImageCodec integration to work with nvImageCodec 0.8.0 by adjusting build-time dependency logic, updating the API call site, and improving decoder perf QA diagnostics (nsys capture on failure).
Changes:
- Bump nvImageCodec required version range to
>= 0.8.0, < 0.9.0and add a local-header fallback underthird_party/nvimgcodec/include/. - Update
nvimgcodecCodeStreamCreateFromHostMemusage to the new 0.8.x signature (additional nullable parameter). - Rework
qa/TL1_decoder_perf/test.shto re-run with Nsight Systems profiling on performance-check failure.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
cmake/Dependencies.common.cmake |
Updates required nvImageCodec version range and adds local-header fallback logic (before remote download). |
dali/operators/imgcodec/util/nvimagecodec_types.cc |
Adjusts DALI’s nvimgcodec host-mem codestream creation call to match the 0.8.x API. |
qa/TL1_decoder_perf/test.sh |
Adds nsys install step and failure-time profiling re-runs; expands perf matrix (incl. 32-stream runs). |
third_party/nvimgcodec/include/nvimgcodec.h |
Adds vendored nvImageCodec 0.8.0 header as a local fallback when system package isn’t found. |
third_party/nvimgcodec/include/nvimgcodec_version.h |
Adds vendored nvImageCodec version header for the local fallback headers. |
Comments suppressed due to low confidence (1)
cmake/Dependencies.common.cmake:330
- The fallback download still fetches nvImageCodec 0.7.0 headers (URL + SHA), but this PR bumps the required version range to 0.8.x and the code now calls the 0.8 API. If the system package isn’t found and the local headers are missing, the build will pull 0.7 headers and fail to compile (signature mismatch) or compile against the wrong API. Update the downloaded archive URL/HASH to a 0.8.x release (and adjust the extracted include path if needed).
URL https://developer.download.nvidia.com/compute/nvimgcodec/redist/nvimgcodec/linux-x86_64/nvimgcodec-linux-x86_64-0.8.0.22-archive.tar.xz
URL_HASH SHA512=2a400f75c619a10c3dbcd298a83ef3307f6e08453b2cfb5040f6b22c64c7be0ac4552a2a80ed057afe7657cf0bb8cc2d54cdccf8bc50ffdf34cfd05b45082978
)
FetchContent_Populate(nvimgcodec_headers)
set(nvimgcodec_INCLUDE_DIR "${nvimgcodec_headers_SOURCE_DIR}/${CUDA_VERSION_MAJOR}/include")
if (NOT EXISTS "${nvimgcodec_INCLUDE_DIR}/nvimgcodec.h")
message(FATAL_ERROR "nvimgcodec.h not found in ${nvimgcodec_INCLUDE_DIR} - something went wrong with the download")
endif()
endif()
message(STATUS "Using nvimgcodec_INCLUDE_DIR=${nvimgcodec_INCLUDE_DIR}")
include_directories(SYSTEM ${nvimgcodec_INCLUDE_DIR})
# Setting default installation path for dynamic loading
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| NSYS_REP="decoder_perf_legacy.nsys-rep" run_bench "${LOG1}" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p rn50 | ||
| NSYS_REP="decoder_perf_nvimgcodec.nsys-rep" run_bench "${LOG2}" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p rn50 --experimental_decoder | ||
| NSYS_REP="decoder_perf_ndd_legacy.nsys-rep" run_bench "${LOG1_NDD}" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p ndd_rn50 | ||
| NSYS_REP="decoder_perf_ndd_nvimgcodec.nsys-rep" run_bench "${LOG2_NDD}" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p ndd_rn50 --experimental_decoder |
There was a problem hiding this comment.
The failure path says it will “re-run all benchmarks with nsys profiling”, but it only profiles 4 runs (legacy/nvimgcodec and NDD variants). Since the pass/fail condition also depends on the new-thread-pool runs and the 32-stream runs, a failure in those cases won’t produce an nsys profile for the failing configuration. Consider profiling the same set of benchmarks that were executed in run_all_benchmarks (including DALI_USE_NEW_THREAD_POOL=1 and NVIMGCODEC_DEFAULT_NUM_CUDA_STREAMS=32 variants) so the collected traces correspond to the actual failing check.
| NSYS_REP="decoder_perf_legacy.nsys-rep" run_bench "${LOG1}" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p rn50 | |
| NSYS_REP="decoder_perf_nvimgcodec.nsys-rep" run_bench "${LOG2}" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p rn50 --experimental_decoder | |
| NSYS_REP="decoder_perf_ndd_legacy.nsys-rep" run_bench "${LOG1_NDD}" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p ndd_rn50 | |
| NSYS_REP="decoder_perf_ndd_nvimgcodec.nsys-rep" run_bench "${LOG2_NDD}" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p ndd_rn50 --experimental_decoder | |
| NSYS_REP="decoder_perf_legacy.nsys-rep" run_bench "${LOG1}" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p rn50 | |
| NSYS_REP="decoder_perf_nvimgcodec.nsys-rep" run_bench "${LOG2}" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p rn50 --experimental_decoder | |
| NSYS_REP="decoder_perf_ndd_legacy.nsys-rep" run_bench "${LOG1_NDD}" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p ndd_rn50 | |
| NSYS_REP="decoder_perf_ndd_nvimgcodec.nsys-rep" run_bench "${LOG2_NDD}" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p ndd_rn50 --experimental_decoder | |
| NSYS_REP="decoder_perf_legacy_new_tp.nsys-rep" DALI_USE_NEW_THREAD_POOL=1 run_bench "dali_legacy_new_tp.nsys.log" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p rn50 | |
| NSYS_REP="decoder_perf_nvimgcodec_new_tp.nsys-rep" DALI_USE_NEW_THREAD_POOL=1 run_bench "dali_nvimgcodec_new_tp.nsys.log" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p rn50 --experimental_decoder | |
| NSYS_REP="decoder_perf_ndd_legacy_new_tp.nsys-rep" DALI_USE_NEW_THREAD_POOL=1 run_bench "dali_ndd_legacy_new_tp.nsys.log" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p ndd_rn50 | |
| NSYS_REP="decoder_perf_ndd_nvimgcodec_new_tp.nsys-rep" DALI_USE_NEW_THREAD_POOL=1 run_bench "dali_ndd_nvimgcodec_new_tp.nsys.log" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p ndd_rn50 --experimental_decoder | |
| NSYS_REP="decoder_perf_legacy_32_streams.nsys-rep" NVIMGCODEC_DEFAULT_NUM_CUDA_STREAMS=32 run_bench "dali_legacy_32_streams.nsys.log" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p rn50 | |
| NSYS_REP="decoder_perf_nvimgcodec_32_streams.nsys-rep" NVIMGCODEC_DEFAULT_NUM_CUDA_STREAMS=32 run_bench "dali_nvimgcodec_32_streams.nsys.log" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p rn50 --experimental_decoder | |
| NSYS_REP="decoder_perf_ndd_legacy_32_streams.nsys-rep" NVIMGCODEC_DEFAULT_NUM_CUDA_STREAMS=32 run_bench "dali_ndd_legacy_32_streams.nsys.log" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p ndd_rn50 | |
| NSYS_REP="decoder_perf_ndd_nvimgcodec_32_streams.nsys-rep" NVIMGCODEC_DEFAULT_NUM_CUDA_STREAMS=32 run_bench "dali_ndd_nvimgcodec_32_streams.nsys.log" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p ndd_rn50 --experimental_decoder |
e3ab81b to
3a108de
Compare
Greptile SummaryThis PR adapts DALI to the nvImageCodec 0.8.0 API by bumping version ranges in cmake and conda configs, updating the Confidence Score: 5/5Safe to merge — all changes are mechanical version bumps and a one-line API adaptation with a well-understood The only finding is a P2 style note about missing No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Build with BUILD_NVIMAGECODEC] --> B{WITH_DYNAMIC_NVIMGCODEC?}
B -- Yes --> C[find_package nvimgcodec 0.8.0...<0.9.0]
C -- Found --> E[Use system headers]
C -- Not Found --> D[FetchContent: download 0.8.0.22 tarball]
D --> E
B -- No --> F[ExternalProject: clone v0.8.0 from GitHub]
E --> G[nvimgcodecCodeStreamCreateFromHostMem\ninstance, handle, data, length, nullptr]
F --> G
Reviews (4): Last reviewed commit: "Update decoder performance test script" | Re-trigger Greptile |
| apt update && apt install -y --no-install-recommends gnupg | ||
| echo "deb http://developer.download.nvidia.com/devtools/repos/ubuntu$(source /etc/lsb-release && echo "$DISTRIB_RELEASE" | tr -d .)/$(dpkg --print-architecture) /" \ | ||
| | tee /etc/apt/sources.list.d/nvidia-devtools.list | ||
| apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub |
There was a problem hiding this comment.
Insecure HTTP key fetch in
do_once
The GPG key for the apt repository is fetched over plain HTTP (http://), making it vulnerable to MITM substitution — an attacker on-path could serve a rogue key, then sign and inject malicious packages (e.g., nsight-systems-cli) that get executed during CI. Use HTTPS instead.
| apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub | |
| curl -fsSL https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub | apt-key add - |
| echo "deb http://developer.download.nvidia.com/devtools/repos/ubuntu$(source /etc/lsb-release && echo "$DISTRIB_RELEASE" | tr -d .)/$(dpkg --print-architecture) /" \ | ||
| | tee /etc/apt/sources.list.d/nvidia-devtools.list | ||
| apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub | ||
| apt update && apt install -y nsight-systems-cli |
There was a problem hiding this comment.
Hardcoded OS/arch in GPG key URL
The devtools apt repo URL is dynamically constructed from lsb_release and dpkg --print-architecture, but the GPG key URL is hardcoded to ubuntu1804/x86_64. On Ubuntu 20.04+ or ARM runners the key URL and the repo URL will reference different distributions, and apt-key is deprecated on Ubuntu 22.04+. Consider aligning the key source to the detected distro or switching to gpg --dearmor.
| NSYS_REP="decoder_perf_legacy.nsys-rep" run_bench "${LOG1}" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p rn50 | ||
| NSYS_REP="decoder_perf_nvimgcodec.nsys-rep" run_bench "${LOG2}" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p rn50 --experimental_decoder | ||
| NSYS_REP="decoder_perf_ndd_legacy.nsys-rep" run_bench "${LOG1_NDD}" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p ndd_rn50 | ||
| NSYS_REP="decoder_perf_ndd_nvimgcodec.nsys-rep" run_bench "${LOG2_NDD}" ${TASKSET} python hw_decoder_bench.py ${BENCH_ARGS} -p ndd_rn50 --experimental_decoder |
There was a problem hiding this comment.
nsys re-run omits thread-pool and 32-stream benchmarks
On a failure, the nsys profiling pass re-runs only the four baseline benchmarks (LOG1, LOG2, LOG1_NDD, LOG2_NDD), but the success gate also checks PERF_RESULT1_TP, PERF_RESULT2_TP, PERF_RESULT2_32STREAMS, and PERF_RESULT2_NDD_32STREAMS. If a failure is caused by one of those omitted cases, no nsys profile will be captured for it, defeating the diagnostic purpose.
|
CI MESSAGE: [48491354]: BUILD STARTED |
|
CI MESSAGE: [48499746]: BUILD STARTED |
|
CI MESSAGE: [48499746]: BUILD FAILED |
7619784 to
3298041
Compare
|
CI MESSAGE: [48519658]: BUILD STARTED |
|
CI MESSAGE: [48519658]: BUILD PASSED |
|
CI MESSAGE: [48491354]: BUILD FAILED |
|
CI MESSAGE: [48791256]: BUILD STARTED |
|
CI MESSAGE: [48791256]: BUILD PASSED |
- Bump required nvImageCodec version range from 0.7.x to 0.8.x - Update nvimgcodec download URL and hash to 0.8.0.22 - Update nvimgcodecCodeStreamCreateFromHostMem call to match the new 0.8.0 API signature (added nullable parameter) - Rework qa/TL1_decoder_perf/test.sh to collect nsys profiles on failure for easier debugging
…2-stream benchmarks PERF_RESULT1_TP and PERF_RESULT2_TP are now informational only since the new thread pool is still experimental. The 32-stream benchmark variants are removed as they are no longer needed.
72b3c22 to
a76e547
Compare
|
CI MESSAGE: [48992654]: BUILD STARTED |
|
CI MESSAGE: [48992654]: BUILD PASSED |
Category:
New feature (non-breaking change which adds functionality)
Description:
Adapts DALI to the nvImageCodec 0.8.0 API. The new version introduces breaking API changes that
require updates to DALI's integration layer.
Key changes:
third_party/nvimgcodec/include/as a local fallbackwhen the system package is not found, before attempting the remote download
nvimgcodecCodeStreamCreateFromHostMemcall to match the new API signatureqa/TL1_decoder_perf/test.shto collect nsys profiles on failure for easier debuggingAdditional information:
Affected modules and functionalities:
cmake/Dependencies.common.cmake— version bump and local header fallback logicdali/operators/imgcodec/util/nvimagecodec_types.cc— API call signature updatethird_party/nvimgcodec/include/— new local copy of 0.8.0 headersqa/TL1_decoder_perf/test.sh— improved failure diagnostics with nsys profilesKey points relevant for the review:
nvimgcodecCodeStreamCreateFromHostMemAPI gained an additional nullable parameter; we passnullptrto preserve existing behavior.Tests:
qa/TL1_decoder_perf/test.sh— decoder performance tests (also reworked in this PR)Checklist
Documentation
DALI team only
Requirements
REQ IDs: N/A
JIRA TASK: N/A