Skip to content

restorer: unmap vDSO target before ARCH_MAP_VDSO#3012

Open
cjolivier01 wants to merge 1 commit into
checkpoint-restore:criu-devfrom
cjolivier01:fix-vdso-map-occupied-address
Open

restorer: unmap vDSO target before ARCH_MAP_VDSO#3012
cjolivier01 wants to merge 1 commit into
checkpoint-restore:criu-devfrom
cjolivier01:fix-vdso-map-occupied-address

Conversation

@cjolivier01
Copy link
Copy Markdown

@cjolivier01 cjolivier01 commented Apr 28, 2026

Summary

When the restorer uses ARCH_MAP_VDSO_* instead of parking CRIU's own runtime vDSO, unmap the reserved runtime vDSO tail before asking the kernel to map a fresh vDSO there.

CRIU already reserves vdso_rt_size bytes at the end of the bootstrap mapping and records that address in vdso_rt_parked_at. The non-ARCH_MAP_VDSO path moves CRIU's current vDSO/VVAR pair into that reserved range with vdso_do_park(). The ARCH_MAP_VDSO path skips parking and asks the kernel to create a new vDSO/VVAR pair at vdso_rt_parked_at instead.

The problem is that the bootstrap tail is still mapped at that point. If the requested address is occupied, the kernel does not necessarily install the vDSO/VVAR pair at the requested address. On current x86 kernels the syscall can still succeed, but the mapping is placed elsewhere. CRIU then continues with map_vdso() recording vdso_rt_parked_at as the runtime vDSO location, even though the actual vDSO/VVAR pair is somewhere else.

That leaves vdso_maps_rt inconsistent with the restored address space and can break later vDSO handling, including gettimeofday setup and vDSO proxification decisions.

Root cause

unmap_old_vmas() keeps the bootstrap mapping intact and excludes the runtime vDSO tail from the range it unmaps. That is correct for the parking path, because vdso_do_park() needs a reserved destination range.

For the ARCH_MAP_VDSO path, however, the same reserved range has to be free before calling arch_map_vdso(). The kernel treats the supplied address as a requested mapping location; if it is already occupied, the call can succeed while choosing a different address. CRIU does not observe that actual address and instead assumes the requested address was used.

This patch frees the reserved tail immediately before map_vdso() when args->can_map_vdso is true.

Reproducer

This reproducer models the CRIU state at the failing point:

  1. reserve a mapping to stand in for the bootstrap tail at vdso_rt_parked_at
  2. remove the process's existing VDSO/VVAR mappings, matching the restorer after old VMAs are unmapped
  3. call arch_prctl(ARCH_MAP_VDSO_64, vdso_rt_parked_at) while that address is still occupied
  4. repeat with the address unmapped first

On an Ubuntu 22.04 x86_64 kernel (5.19.0-45-generic) the occupied case succeeds but maps VVAR/VDSO elsewhere:

case: requested address is still mapped before ARCH_MAP_VDSO_64
requested address: 0x7ffff7f99000
arch_prctl(ARCH_MAP_VDSO_64, 0x7ffff7f99000) returned 8192 errno 0
  7ffff7f99000-7ffff7fa9000 rw-p 00000000 00:00 0
  7ffff7fbd000-7ffff7fc1000 r--p 00000000 00:00 0                          [vvar]
  7ffff7fc1000-7ffff7fc3000 r-xp 00000000 00:00 0                          [vdso]
requested address used for vvar/vdso start: no
reproduced: syscall succeeded, but vvar/vdso was mapped elsewhere

case: requested address was unmapped before ARCH_MAP_VDSO_64
requested address: 0x7ffff7f99000
arch_prctl(ARCH_MAP_VDSO_64, 0x7ffff7f99000) returned 8192 errno 0
  7ffff7f99000-7ffff7f9d000 r--p 00000000 00:00 0                          [vvar]
  7ffff7f9d000-7ffff7f9f000 r-xp 00000000 00:00 0                          [vdso]
requested address used for vvar/vdso start: yes
control: unmapping the target first lets ARCH_MAP_VDSO_64 use it

Both cases behaved as expected.

A minimal standalone reproducer is:

#define _GNU_SOURCE
#include <errno.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <sys/wait.h>
#include <unistd.h>

#ifndef ARCH_MAP_VDSO_64
#define ARCH_MAP_VDSO_64 0x2003
#endif

static bool is_vdso_line(const char *line)
{
	return strstr(line, "[vdso]") || strstr(line, "[vvar]") || strstr(line, "[vvar_vclock]");
}

static void unmap_current_vdso(void)
{
	FILE *fp = fopen("/proc/self/maps", "r");
	unsigned long start[8], end[8];
	char line[512];
	int nr = 0;

	while (fgets(line, sizeof(line), fp)) {
		if (is_vdso_line(line) && sscanf(line, "%lx-%lx", &start[nr], &end[nr]) == 2)
			nr++;
	}
	fclose(fp);

	while (nr--)
		munmap((void *)start[nr], end[nr] - start[nr]);
}

static void dump_vdso_lines(unsigned long target)
{
	FILE *fp = fopen("/proc/self/maps", "r");
	char line[512];

	while (fgets(line, sizeof(line), fp)) {
		unsigned long start, end;

		if (sscanf(line, "%lx-%lx", &start, &end) != 2)
			continue;
		if (is_vdso_line(line) || (start <= target && target < end))
			printf("  %s", line);
	}
	fclose(fp);
}

static void run_case(bool leave_occupied)
{
	void *tail = mmap(NULL, 64 * 1024, PROT_READ | PROT_WRITE,
			  MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
	unsigned long target = (unsigned long)tail;
	long ret;

	printf("\ncase: target %s before ARCH_MAP_VDSO_64\n",
	       leave_occupied ? "occupied" : "unmapped");
	printf("target: %#lx\n", target);

	if (!leave_occupied)
		munmap(tail, 64 * 1024);

	unmap_current_vdso();
	errno = 0;
	ret = syscall(SYS_arch_prctl, ARCH_MAP_VDSO_64, target);
	printf("arch_prctl returned %ld errno %d\n", ret, errno);
	dump_vdso_lines(target);
}

int main(void)
{
	if (!fork()) {
		run_case(true);
		exit(0);
	}
	wait(NULL);

	if (!fork()) {
		run_case(false);
		exit(0);
	}
	wait(NULL);

	return 0;
}

User-visible impact

The bug is only on the restore path where CRIU uses ARCH_MAP_VDSO_* instead of parking the runtime vDSO/VVAR mapping. It is architecture and kernel behavior dependent, but when it triggers CRIU's runtime vDSO bookkeeping no longer describes the actual mappings in the restored process.

The fix keeps the existing allocation model and only frees the already-reserved runtime VDSO tail at the point where it must become the destination for ARCH_MAP_VDSO_*.

Validation

  • git diff --check
  • compiled and ran the standalone reproducer above

@cjolivier01 cjolivier01 marked this pull request as ready for review April 28, 2026 20:19
Copilot AI review requested due to automatic review settings April 28, 2026 20:19
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes an x86 restore-path issue where using ARCH_MAP_VDSO_* could successfully map a new vDSO/VVAR pair at an address different from CRIU’s requested vdso_rt_parked_at when the reserved bootstrap tail is still mapped there, leaving CRIU’s runtime vDSO bookkeeping inconsistent with the actual restored address space.

Changes:

  • Unmaps the reserved “runtime vDSO tail” region at vdso_rt_parked_at before calling map_vdso() when args->can_map_vdso is enabled.
  • Adds an explanatory comment describing why the tail must be freed for ARCH_MAP_VDSO_* to honor the requested address.

Comment thread criu/pie/restorer.c Outdated
Comment thread criu/pie/restorer.c Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a restore-path correctness issue in restorer when using ARCH_MAP_VDSO_*: the reserved “runtime vDSO tail” at vdso_rt_parked_at must be unmapped before requesting the kernel to map a new vDSO/VVAR there, otherwise the kernel may place it at a different address and CRIU’s bookkeeping becomes inconsistent.

Changes:

  • Unmap the reserved runtime vDSO parking range (vdso_rt_parked_at..+vdso_rt_size) immediately before calling map_vdso() when args->can_map_vdso is set.
  • Add error handling/logging to abort restore if the unmap fails.

Comment thread criu/pie/restorer.c Outdated
@avagin
Copy link
Copy Markdown
Member

avagin commented Apr 28, 2026

@cjolivier01 thank you for working on this. please write a detailed commit message.

@cjolivier01 cjolivier01 force-pushed the fix-vdso-map-occupied-address branch from 89a7df1 to 9b40bc6 Compare April 28, 2026 21:25
@cjolivier01
Copy link
Copy Markdown
Author

@cjolivier01 thank you for working on this. please write a detailed commit message.

done

@3idey
Copy link
Copy Markdown
Contributor

3idey commented Apr 28, 2026

@cjolivier01 the DCO check is failing, you need a Signed-off-by line in your commit.

@cjolivier01 cjolivier01 force-pushed the fix-vdso-map-occupied-address branch 3 times, most recently from e26141c to 5b57c35 Compare April 28, 2026 23:37
When the kernel supports ARCH_MAP_VDSO, the restorer asks the kernel to map a fresh runtime vDSO/VVAR pair at task_args->vdso_rt_parked_at. That address is also the tail of the bootstrap mapping reserved for the runtime vDSO when the restorer cannot use ARCH_MAP_VDSO.

The bootstrap unmap helper keeps this tail mapped so a parked runtime vDSO remains reachable. On the ARCH_MAP_VDSO path, leaving the tail mapped means the requested address is still occupied before arch_prctl(). Kernels that use MAP_FIXED_NOREPLACE for ARCH_MAP_VDSO fail with EEXIST there, and any alternate placement would leave CRIU's runtime vDSO bookkeeping pointing at the wrong address.

Unmap the reserved tail before calling map_vdso(). Check the sys_munmap() return value and abort restore with a clear error if the parking area cannot be released, instead of continuing into map_vdso() with inconsistent assumptions.

Validated with:

- make -j$(nproc) criu/criu

- sudo env PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python ./test/zdtm.py run -t zdtm/static/vdso00 -t zdtm/static/vdso01 -t zdtm/static/vdso02 -t zdtm/static/vdso-proxy --criu-bin ./criu/criu --pycriu-search-path ./lib --keep-going --ignore-taint -k failed

Signed-off-by: Chris Olivier <colivier@tesla.com>
@cjolivier01 cjolivier01 force-pushed the fix-vdso-map-occupied-address branch from 5b57c35 to 644f7b9 Compare April 28, 2026 23:38
@cjolivier01
Copy link
Copy Markdown
Author

@cjolivier01 the DCO check is failing, you need a Signed-off-by line in your commit.

fixed

@cjolivier01
Copy link
Copy Markdown
Author

Normally I don't resolve my own PR's comments, so unresolve if that's not the procedure. I figured since none were resolved that maybe I was supposed to do it.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a restore-time vDSO placement inconsistency when the restorer uses ARCH_MAP_VDSO_*: it ensures the reserved runtime vDSO “parking” tail of the bootstrap mapping is unmapped before requesting the kernel to map a fresh vDSO/VVAR pair at that address.

Changes:

  • Unmap the reserved runtime vDSO tail (vdso_rt_parked_at / vdso_rt_size) before calling map_vdso() when args->can_map_vdso is true.
  • Add an error path that aborts restore if the unmap of the reserved area fails, preventing map_vdso() from proceeding with an occupied target.

@avagin
Copy link
Copy Markdown
Member

avagin commented Apr 29, 2026

unmap_old_vmas() keeps the bootstrap mapping intact and excludes the runtime vDSO tail from the range it unmaps. That is correct for the parking path, because vdso_do_park() needs a reserved destination range.

The bootstrap region is allocated by restorer_get_vma_hint. restorer_get_vma_hint is looking for a unused region in the current address space. It means that we don't expect to see any unexpected mappings in the bootstrap area. @cjolivier01 could you give more details where this issue has been triggered. What environments was it, what criu verion was used, etc.

@avagin
Copy link
Copy Markdown
Member

avagin commented May 6, 2026

unmap_old_vmas() keeps the bootstrap mapping intact and excludes the runtime vDSO tail from the range it unmaps. That is correct for the parking path, because vdso_do_park() needs a reserved destination range.

The bootstrap region is allocated by restorer_get_vma_hint. restorer_get_vma_hint is looking for a unused region in the current address space. It means that we don't expect to see any unexpected mappings in the bootstrap area. @cjolivier01 could you give more details where this issue has been triggered. What environments was it, what criu verion was used, etc.

@cjolivier01 friendly ping. We need to understand what can be mapped there. I think you change can hide a deeper issue.

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 57.25%. Comparing base (9a1453b) to head (644f7b9).
⚠️ Report is 1 commits behind head on criu-dev.

Additional details and impacted files
@@             Coverage Diff              @@
##           criu-dev    #3012      +/-   ##
============================================
+ Coverage     57.22%   57.25%   +0.02%     
============================================
  Files           154      154              
  Lines         40440    40440              
  Branches       8863     8863              
============================================
+ Hits          23142    23154      +12     
+ Misses        17034    17022      -12     
  Partials        264      264              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@cjolivier01
Copy link
Copy Markdown
Author

cjolivier01 commented May 6, 2026 via email

@avagin
Copy link
Copy Markdown
Member

avagin commented May 6, 2026

@cjolivier01 thank you. While I was reviewing your patch, I found and fixed one issue #3017. It might be related to your issue.

@avagin
Copy link
Copy Markdown
Member

avagin commented May 18, 2026

@cjolivier01 any update?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants