Skip to content

[26.04_linux-nvidia] NVIDIA: SAUCE: ovl: keep err zero after successful ovl_cache_get()#425

Closed
nirmoy wants to merge 1 commit into
NVIDIA:26.04_linux-nvidiafrom
nirmoy:codex/nvbug-6144764-ovl-ptrerr-26.04
Closed

[26.04_linux-nvidia] NVIDIA: SAUCE: ovl: keep err zero after successful ovl_cache_get()#425
nirmoy wants to merge 1 commit into
NVIDIA:26.04_linux-nvidiafrom
nirmoy:codex/nvbug-6144764-ovl-ptrerr-26.04

Conversation

@nirmoy

@nirmoy nirmoy commented May 15, 2026

Copy link
Copy Markdown
Collaborator

Summary

Fix NVBug 6144764 on 26.04_linux-nvidia by keeping err zero after a successful ovl_cache_get() in ovl_iterate_merged().

The installer crash is an overlayfs readdir failure while rsync reads through overlayfs during BaseOS/DGX OS installation. The bad path is the same as syzbot a16fb0cce329a320661c: a successful cache pointer is passed to PTR_ERR(), truncating pointer bits into a bogus int that can later be returned as a non-errno value.

Sibling BOS PR: #423

Bug Links

Validation

  • Cherry-picked cleanly onto upstream/26.04_linux-nvidia.
  • git show --check --format=short HEAD: clean.
  • scripts/checkpatch.pl --strict --ignore COMMIT_LOG_USE_LINK,COMMIT_LOG_LONG_LINE --git HEAD: 0 errors, 0 warnings.
  • Earlier validation on arm64 virtme/KVM KASAN:
    • unpatched / Amir-only controls reproduced the overlayfs crash.
    • patched v2 completed 5/5 runs clean with OVL_SYZ_DONE rc=0 and no Oops/KASAN/panic markers.

@nirmoy nirmoy marked this pull request as draft May 15, 2026 16:42
@github-actions

github-actions Bot commented May 15, 2026

Copy link
Copy Markdown
Contributor

PR Validation Report

PR Lint ✅ All checks passed

Details
Checking 1 commits...

Cherry-pick digest:
┌──────────────┬──────────────────────────────────────────────────────────────────┬────────────┬─────────┬───────────────────────────┐
│ Local        │ Referenced upstream / Patch subject                              │ Patch-ID   │ Subject │ SoB chain                 │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ b6d5ce0fbbd5 │ ovl: keep err zero after successful ovl_cache_get()              │ match      │ found   │ ok, backporter: nirmoyd   │
└──────────────┴──────────────────────────────────────────────────────────────────┴────────────┴─────────┴───────────────────────────┘

Lint: all checks passed.

@nirmoy

nirmoy commented May 16, 2026

Copy link
Copy Markdown
Collaborator Author

Boro review

Latest watcher review: open review

Head: b6d5ce0fbbd5

This comment is maintained by nv-pr-bot. It is updated when the GitHub watcher publishes a newer review.

@nvmochs

nvmochs commented May 18, 2026

Copy link
Copy Markdown
Collaborator

@nirmoy Sounds like this should be in -next tomorrow and make next week's 7.1-rc. Our target build is for Thursday, so should be able to pick from -next tomorrow (assuming it shows up).

@nvmochs

nvmochs commented May 20, 2026

Copy link
Copy Markdown
Collaborator

@nirmoy Can you fix up the trailers in this commit?

@nirmoy nirmoy force-pushed the codex/nvbug-6144764-ovl-ptrerr-26.04 branch from f3e1344 to e7767e5 Compare May 20, 2026 13:49
@nirmoy nirmoy marked this pull request as ready for review May 20, 2026 13:55
@nvmochs

nvmochs commented May 20, 2026

Copy link
Copy Markdown
Collaborator

@nirmoy

All of these should be removed, no?

Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>

I would think the trailers would look like:

Fixes: https://github.com/NVIDIA/NV-Kernels/commit/d25e4b739f8378419f990983f2542160e79738c5 ("ovl: refactor ovl_iterate() and port to cred guard")
Reported-by: syzbot+a16fb0cce329a320661c@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=a16fb0cce329a320661c
Cc: stable@vger.kernel.org
Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
(backported from https://lore.kernel.org/r/20260514144258.3068715-1-nirmoyd@nvidia.com)
Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>

Then once we review, Brad will add the Acks and his sign-off when applying the patch.

BugLink: https://bugs.launchpad.net/bugs/2150640

ovl_iterate_merged() stores PTR_ERR(cache) in err before checking
IS_ERR(cache). On success err holds the truncated cache pointer and
can be returned as a bogus non-zero error.

The syzbot reproducer reaches this through overlay-on-overlay readdir:

  getdents64
    iterate_dir(outer overlay file)
      ovl_iterate_merged()
        ovl_cache_get()
          ovl_dir_read_merged()
            ovl_dir_read()
              iterate_dir(inner overlay file)
                ovl_iterate_merged()

Only compute PTR_ERR(cache) on the error path.

Fixes: d25e4b7 ("ovl: refactor ovl_iterate() and port to cred guard")
Reported-by: syzbot+a16fb0cce329a320661c@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=a16fb0cce329a320661c
Cc: stable@vger.kernel.org
Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
(backported from https://lore.kernel.org/r/20260514144258.3068715-1-nirmoyd@nvidia.com)
Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
@nirmoy nirmoy force-pushed the codex/nvbug-6144764-ovl-ptrerr-26.04 branch from e7767e5 to b6d5ce0 Compare May 20, 2026 14:24
@nvmochs

nvmochs commented May 20, 2026

Copy link
Copy Markdown
Collaborator

Thanks Nirmoy, no further issues from me!

Acked-by: Matthew R. Ochs <mochs@nvidia.com>

@clsotog clsotog left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acked-by: Carol L Soto <csoto@nvidia.com>

@nvmochs

nvmochs commented May 20, 2026

Copy link
Copy Markdown
Collaborator

Merged, closing PR.

7df64053ee60 (nresolute/main-next) NVIDIA: SAUCE: ovl: keep err zero after successful ovl_cache_get()

@nvmochs nvmochs closed this May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants