Intro
Hi!
I am a student working on embodied AI and robotics simulation. I use MuJoCo via LIBERO for large‑scale reinforcement learning in headless Docker containers with multiple NVIDIA GPUs.
My setup
- MuJoCo 3.4.0, Python, x86_64, Ubuntu 22.04 Docker (--gpus all)
- 7× RTX 3090, Driver 550.163.01, CUDA 12.4
What's happening? What did you expect?
When I restrict a process to a specific GPU using CUDA_VISIBLE_DEVICES and then create an
EGL context with MuJoCo (MUJOCO_GL=egl), the rendering load lands on a completely different
physical GPU. This happens for every GPU index I tried, not just a single one.
For example, if I set CUDA_VISIBLE_DEVICES=2, the rendering appears on GPU 0; if I set
CUDA_VISIBLE_DEVICES=3, it appears on GPU 5. The mapping is deterministic on a given
machine but differs across servers.
I expected CUDA_VISIBLE_DEVICES=2 to unambiguously select physical GPU 2, just as it does for CUDA computation. Instead, to actually use physical GPU X, I must set CUDA_VISIBLE_DEVICES to a different “Translated” value, which – after EGL’s internal remapping – ends up targeting the originally intended GPU. The mapping is shown below (measured on a 7×RTX 3090 node):
Original (desired physical GPU) → Translated (changed CUDA_VISIBLE_DEVICES)
0 → 2
1 → 1
2 → 0
3 → 5
4 → 6
5 → 4
6 → 3
Without this translation table, there is no reliable way to target a specific GPU for EGL
rendering.
Example: Inference workers use GPU 0 and 1, environment workers are assigned to GPU 2 and 3.
Actual GPU memory from nvidia-smi:
OSMesa mode:
Only inference workers on GPU 0,1

EGL mode:
Environment workers incorrectly running on GPU 0,6

Steps for reproduction
- On a multi-GPU machine, run the script below with CUDA_VISIBLE_DEVICES=2.
- Monitor nvidia-smi in another terminal.
- Observe the GPU with increased memory is not GPU 2 (on my machine it is GPU 0).
- Repeat with other values to build the translation table.
Minimal model for reproduction
Issue is in EGL initialization, no MJCF needed.
Code required for reproduction
import os, time
os.environ["MUJOCO_GL"] = "egl"
os.environ["MUJOCO_EGL_DEVICE_ID"] = "2" # change this index as needed
from mujoco.egl import egl_ext as EGL
import OpenGL.EGL as EGL2
devices = EGL.eglQueryDevicesEXT()
print(f"Devices: {len(devices)}")
d = devices[2]
disp = EGL.eglGetPlatformDisplayEXT(EGL.EGL_PLATFORM_DEVICE_EXT, d, None)
EGL.eglInitialize(disp, None, None)
print("Vendor:", EGL2.eglQueryString(disp, EGL2.EGL_VENDOR))
attrs = [EGL.EGL_SURFACE_TYPE, EGL.EGL_PBUFFER_BIT,
EGL.EGL_RENDERABLE_TYPE, EGL.EGL_OPENGL_BIT,
EGL.EGL_RED_SIZE, 8, EGL.EGL_GREEN_SIZE, 8, EGL.EGL_BLUE_SIZE, 8, EGL.EGL_NONE]
cfg = (EGL.EGLConfig * 1)()
n = EGL.c_int()
EGL.eglChooseConfig(disp, attrs, cfg, 1, n)
ctx = EGL.eglCreateContext(disp, cfg[0], EGL.EGL_NO_CONTEXT, None)
surf = EGL.eglCreatePbufferSurface(disp, cfg[0], [EGL.EGL_WIDTH, 256, EGL.EGL_HEIGHT, 256, EGL.EGL_NONE])
EGL.eglMakeCurrent(disp, surf, surf, ctx)
print("Context bound to device 2. Check nvidia-smi for 30s...")
time.sleep(30)
EGL.eglTerminate(disp)
print("Done.")
Confirmations
Intro
Hi!
I am a student working on embodied AI and robotics simulation. I use MuJoCo via LIBERO for large‑scale reinforcement learning in headless Docker containers with multiple NVIDIA GPUs.
My setup
What's happening? What did you expect?
When I restrict a process to a specific GPU using CUDA_VISIBLE_DEVICES and then create an
EGL context with MuJoCo (MUJOCO_GL=egl), the rendering load lands on a completely different
physical GPU. This happens for every GPU index I tried, not just a single one.
For example, if I set CUDA_VISIBLE_DEVICES=2, the rendering appears on GPU 0; if I set
CUDA_VISIBLE_DEVICES=3, it appears on GPU 5. The mapping is deterministic on a given
machine but differs across servers.
I expected CUDA_VISIBLE_DEVICES=2 to unambiguously select physical GPU 2, just as it does for CUDA computation. Instead, to actually use physical GPU X, I must set CUDA_VISIBLE_DEVICES to a different “Translated” value, which – after EGL’s internal remapping – ends up targeting the originally intended GPU. The mapping is shown below (measured on a 7×RTX 3090 node):
Original (desired physical GPU) → Translated (changed CUDA_VISIBLE_DEVICES)
0 → 2
1 → 1
2 → 0
3 → 5
4 → 6
5 → 4
6 → 3
Without this translation table, there is no reliable way to target a specific GPU for EGL
rendering.
Example: Inference workers use GPU 0 and 1, environment workers are assigned to GPU 2 and 3.
Actual GPU memory from nvidia-smi:
OSMesa mode:

Only inference workers on GPU 0,1
EGL mode:

Environment workers incorrectly running on GPU 0,6
Steps for reproduction
Minimal model for reproduction
Issue is in EGL initialization, no MJCF needed.
Code required for reproduction
import os, time
os.environ["MUJOCO_GL"] = "egl"
os.environ["MUJOCO_EGL_DEVICE_ID"] = "2" # change this index as needed
from mujoco.egl import egl_ext as EGL
import OpenGL.EGL as EGL2
devices = EGL.eglQueryDevicesEXT()
print(f"Devices: {len(devices)}")
d = devices[2]
disp = EGL.eglGetPlatformDisplayEXT(EGL.EGL_PLATFORM_DEVICE_EXT, d, None)
EGL.eglInitialize(disp, None, None)
print("Vendor:", EGL2.eglQueryString(disp, EGL2.EGL_VENDOR))
attrs = [EGL.EGL_SURFACE_TYPE, EGL.EGL_PBUFFER_BIT,
EGL.EGL_RENDERABLE_TYPE, EGL.EGL_OPENGL_BIT,
EGL.EGL_RED_SIZE, 8, EGL.EGL_GREEN_SIZE, 8, EGL.EGL_BLUE_SIZE, 8, EGL.EGL_NONE]
cfg = (EGL.EGLConfig * 1)()
n = EGL.c_int()
EGL.eglChooseConfig(disp, attrs, cfg, 1, n)
ctx = EGL.eglCreateContext(disp, cfg[0], EGL.EGL_NO_CONTEXT, None)
surf = EGL.eglCreatePbufferSurface(disp, cfg[0], [EGL.EGL_WIDTH, 256, EGL.EGL_HEIGHT, 256, EGL.EGL_NONE])
EGL.eglMakeCurrent(disp, surf, surf, ctx)
print("Context bound to device 2. Check nvidia-smi for 30s...")
time.sleep(30)
EGL.eglTerminate(disp)
print("Done.")
Confirmations