Skip to content

Update to Linux 6.9 drivers#361

Merged
dumbbell merged 791 commits into
freebsd:masterfrom
dumbbell:update-to-linux-6.9
Aug 9, 2025
Merged

Update to Linux 6.9 drivers#361
dumbbell merged 791 commits into
freebsd:masterfrom
dumbbell:update-to-linux-6.9

Conversation

@dumbbell
Copy link
Copy Markdown
Member

@dumbbell dumbbell commented Jun 23, 2025

This is the backport of the DRM drivers from Linux 6.9.

Progress:

Changes in Linux 6.9

You can read this Phoronix article to learn about the changes in the DRM drivers in Linux 6.9:
https://www.phoronix.com/news/Linux-6.9-DRM

Patches to linuxkpi

This update depends on the following patches to linuxkpi in FreeBSD.

These patches are maintained in the following repository and branch:
https://github.com/dumbbell/freebsd-src/tree/drm-related-linuxkpi-changes

Patches were submitted for review:

Firmware updates

There is no associated firmware update for now (to be checked).

How to test

You need to run a recent FreeBSD 15-CURRENT to test it.

Here are some instructions:

  1. You need to checkout the FreeBSD src branch I mentionned, drm-related-linuxkpi-changes, and compile a kernel from that branch:

    git clone -b drm-related-linuxkpi-changes https://github.com/dumbbell/freebsd-src.git
    cd freebsd-src
    make -j8 buildkernel DEBUG_FLAGS=-g
    
    # This installs the kernel under another name, `kernel.drm`. Thus, you keep the default kernel
    # in case of trouble.
    sudo make installkernel DEBUG_FLAGS=-g INSTKERNNAME=kernel.drm
  2. You need to checkout the branch referenced in this pull request and compile it:

    git clone -b update-to-linux-6.9 https://github.com/dumbbell/drm-kmod.git
    cd drm-kmod
    make -j8 DEBUG_FLAGS=-g SYSDIR=/path/to/freebsd-src-from-step1/sys
    sudo make install DEBUG_FLAGS=-g SYSDIR=/path/to/freebsd-src-from-step1/sys KMODDIR=/boot/kernel.drm
    
  3. Load the relevant driver(s) as you usually do.

@centromere
Copy link
Copy Markdown

Hi @dumbbell, thank you for doing this work! I tested your changes with the latest upstream firmware, but I was unsuccessful in starting sway:

00:00:00.000 [INFO] [sway/main.c:321] Sway version 1.11-rc4
00:00:00.000 [INFO] [sway/main.c:322] wlroots version 0.19.0
00:00:00.003 [INFO] [sway/main.c:78] FreeBSD logos 15.0-CURRENT FreeBSD 15.0-CURRENT #0 n278189-b7d1c89bd2c6: Tue Jun 24 20:12:32 EDT 2025     root@logos:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64
00:00:00.003 [INFO] [sway/main.c:94] Contents of /etc/os-release:
00:00:00.003 [INFO] [sway/main.c:78] NAME=FreeBSD
00:00:00.003 [INFO] [sway/main.c:78] VERSION="15.0-CURRENT"
00:00:00.003 [INFO] [sway/main.c:78] VERSION_ID="15.0"
00:00:00.003 [INFO] [sway/main.c:78] ID=freebsd
00:00:00.003 [INFO] [sway/main.c:78] ANSI_COLOR="0;31"
00:00:00.003 [INFO] [sway/main.c:78] PRETTY_NAME="FreeBSD 15.0-CURRENT"
00:00:00.003 [INFO] [sway/main.c:78] CPE_NAME="cpe:/o:freebsd:freebsd:15.0"
00:00:00.003 [INFO] [sway/main.c:78] HOME_URL="https://FreeBSD.org/"
00:00:00.003 [INFO] [sway/main.c:78] BUG_REPORT_URL="https://bugs.FreeBSD.org/"
00:00:00.003 [INFO] [sway/main.c:66] LD_LIBRARY_PATH=
00:00:00.003 [INFO] [sway/main.c:66] LD_PRELOAD=
00:00:00.003 [INFO] [sway/main.c:66] PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin:/usr/local/bin:/home/alex/bin
00:00:00.003 [INFO] [sway/main.c:66] SWAYSOCK=
00:00:00.003 [INFO] [sway/main.c:351] Starting sway version 1.11-rc4
00:00:00.003 [DEBUG] [sway/server.c:236] Initializing Wayland server
00:00:00.003 [INFO] [wlr] [libseat] [libseat/libseat.c:77] Seat opened with backend 'seatd'
00:00:00.003 [INFO] [wlr] [libseat] [libseat/backend/seatd.c:212] Enabling seat
00:00:00.003 [INFO] [wlr] [backend/session/session.c:108] Successfully loaded libseat session
00:00:00.013 [INFO] [wlr] [backend/backend.c:248] Found 1 GPUs
00:00:00.014 [INFO] [wlr] [backend/drm/backend.c:225] Initializing DRM backend for /dev/dri/card0 (i915)
00:00:00.014 [DEBUG] [wlr] [backend/drm/drm.c:110] Using atomic DRM interface
00:00:00.014 [DEBUG] [wlr] [backend/drm/drm.c:132] ADDFB2 modifiers supported
00:00:00.014 [INFO] [wlr] [backend/drm/drm.c:310] Found 4 DRM CRTCs
00:00:00.014 [INFO] [wlr] [backend/drm/drm.c:268] Found 24 DRM planes
00:00:00.016 [INFO] [wlr] [render/egl.c:205] Supported EGL client extensions: EGL_EXT_device_base EGL_EXT_device_enumeration EGL_EXT_device_query EGL_EXT_platform_base EGL_KHR_client_get_all_proc_addresses EGL_EXT_client_extensions EGL_KHR_debug EGL_EXT_platform_device EGL_EXT_explicit_device EGL_EXT_platform_wayland EGL_KHR_platform_wayland EGL_EXT_platform_x11 EGL_KHR_platform_x11 EGL_EXT_platform_xcb EGL_MESA_platform_gbm EGL_KHR_platform_gbm EGL_MESA_platform_surfaceless
00:00:00.016 [DEBUG] [wlr] [render/egl.c:523] Using EGL device /dev/dri/card0
MESA: warning: Could not get intel_device_info.
libEGL warning: egl: failed to create dri2 screen
00:00:00.075 [ERROR] [wlr] [EGL] command: eglInitialize, error: EGL_NOT_INITIALIZED (0x3001), message: "DRI2: failed to create screen"
MESA: warning: Could not get intel_device_info.
libEGL warning: egl: failed to create dri2 screen
00:00:00.129 [ERROR] [wlr] [EGL] command: eglInitialize, error: EGL_NOT_INITIALIZED (0x3001), message: "DRI2: failed to create screen"
MESA: warning: Could not get intel_device_info.
libEGL warning: egl: failed to create dri2 screen
00:00:00.186 [ERROR] [wlr] [EGL] command: eglInitialize, error: EGL_NOT_INITIALIZED (0x3001), message: "DRI2: failed to create screen"
00:00:00.186 [ERROR] [wlr] [EGL] command: eglInitialize, error: EGL_NOT_INITIALIZED (0x3001), message: "eglInitialize"
00:00:00.186 [ERROR] [wlr] [render/egl.c:268] Failed to initialize EGL
00:00:00.186 [ERROR] [wlr] [render/egl.c:609] Failed to initialize EGL context
00:00:00.186 [ERROR] [wlr] [render/gles2/renderer.c:499] Could not initialize EGL
00:00:00.186 [DEBUG] [wlr] [render/wlr_renderer.c:199] Failed to create a GLES2 renderer. Skipping!
00:00:00.186 [INFO] [wlr] [render/vulkan/renderer.c:2490] The vulkan renderer is only experimental and not expected to be ready for daily use
00:00:00.186 [INFO] [wlr] [render/vulkan/renderer.c:2492] Run with VK_INSTANCE_LAYERS=VK_LAYER_KHRONOS_validation to enable the validation layer
00:00:00.265 [DEBUG] [wlr] [render/vulkan/vulkan.c:118] Vulkan instance extension VK_KHR_device_group_creation v1
00:00:00.265 [DEBUG] [wlr] [render/vulkan/vulkan.c:118] Vulkan instance extension VK_KHR_display v23
00:00:00.265 [DEBUG] [wlr] [render/vulkan/vulkan.c:118] Vulkan instance extension VK_KHR_external_fence_capabilities v1
00:00:00.265 [DEBUG] [wlr] [render/vulkan/vulkan.c:118] Vulkan instance extension VK_KHR_external_memory_capabilities v1
00:00:00.265 [DEBUG] [wlr] [render/vulkan/vulkan.c:118] Vulkan instance extension VK_KHR_external_semaphore_capabilities v1
00:00:00.265 [DEBUG] [wlr] [render/vulkan/vulkan.c:118] Vulkan instance extension VK_KHR_get_display_properties2 v1
00:00:00.265 [DEBUG] [wlr] [render/vulkan/vulkan.c:118] Vulkan instance extension VK_KHR_get_physical_device_properties2 v2
00:00:00.265 [DEBUG] [wlr] [render/vulkan/vulkan.c:118] Vulkan instance extension VK_KHR_get_surface_capabilities2 v1
00:00:00.265 [DEBUG] [wlr] [render/vulkan/vulkan.c:118] Vulkan instance extension VK_KHR_surface v25
00:00:00.265 [DEBUG] [wlr] [render/vulkan/vulkan.c:118] Vulkan instance extension VK_KHR_surface_protected_capabilities v1
00:00:00.265 [DEBUG] [wlr] [render/vulkan/vulkan.c:118] Vulkan instance extension VK_KHR_wayland_surface v6
00:00:00.265 [DEBUG] [wlr] [render/vulkan/vulkan.c:118] Vulkan instance extension VK_KHR_xcb_surface v6
00:00:00.265 [DEBUG] [wlr] [render/vulkan/vulkan.c:118] Vulkan instance extension VK_KHR_xlib_surface v6
00:00:00.265 [DEBUG] [wlr] [render/vulkan/vulkan.c:118] Vulkan instance extension VK_EXT_acquire_drm_display v1
00:00:00.265 [DEBUG] [wlr] [render/vulkan/vulkan.c:118] Vulkan instance extension VK_EXT_acquire_xlib_display v1
00:00:00.265 [DEBUG] [wlr] [render/vulkan/vulkan.c:118] Vulkan instance extension VK_EXT_debug_report v10
00:00:00.265 [DEBUG] [wlr] [render/vulkan/vulkan.c:118] Vulkan instance extension VK_EXT_debug_utils v2
00:00:00.265 [DEBUG] [wlr] [render/vulkan/vulkan.c:118] Vulkan instance extension VK_EXT_direct_mode_display v1
00:00:00.265 [DEBUG] [wlr] [render/vulkan/vulkan.c:118] Vulkan instance extension VK_EXT_display_surface_counter v1
00:00:00.265 [DEBUG] [wlr] [render/vulkan/vulkan.c:118] Vulkan instance extension VK_EXT_headless_surface v1
00:00:00.265 [DEBUG] [wlr] [render/vulkan/vulkan.c:118] Vulkan instance extension VK_EXT_surface_maintenance1 v1
00:00:00.265 [DEBUG] [wlr] [render/vulkan/vulkan.c:118] Vulkan instance extension VK_EXT_swapchain_colorspace v4
00:00:00.265 [DEBUG] [wlr] [render/vulkan/vulkan.c:118] Vulkan instance extension VK_KHR_portability_enumeration v1
00:00:00.265 [DEBUG] [wlr] [render/vulkan/vulkan.c:118] Vulkan instance extension VK_LUNARG_direct_driver_loading v1
MESA: warning: Could not get intel_device_info.
00:00:00.268 [INFO] [wlr] [render/vulkan/vulkan.c:253] Vulkan device: 'llvmpipe (LLVM 19.1.7, 256 bits)'
00:00:00.268 [INFO] [wlr] [render/vulkan/vulkan.c:254]   Device type: 'cpu'
00:00:00.268 [INFO] [wlr] [render/vulkan/vulkan.c:255]   Supported API version: 1.3.278
00:00:00.268 [INFO] [wlr] [render/vulkan/vulkan.c:256]   Driver version: 0.0.1
00:00:00.268 [INFO] [wlr] [render/vulkan/vulkan.c:344]   Driver name: llvmpipe (Mesa 24.1.7 (LLVM 19.1.7))
00:00:00.268 [DEBUG] [wlr] [render/vulkan/vulkan.c:352]   Ignoring physical device "llvmpipe (LLVM 19.1.7, 256 bits)": VK_EXT_physical_device_drm not supported
00:00:00.268 [ERROR] [wlr] [render/vulkan/renderer.c:2503] Could not match drm and vulkan device
00:00:00.268 [DEBUG] [wlr] [render/wlr_renderer.c:199] Failed to create a Vulkan renderer. Skipping!
00:00:00.268 [ERROR] [wlr] [render/wlr_renderer.c:279] Could not initialize renderer
00:00:00.268 [ERROR] [sway/server.c:255] Failed to create renderer

dmesg | grep drm:

[drm] Got Intel graphics stolen memory base 0x0, size 0x0
drmn0: <drmn> on vgapci0
vgapci0: child drmn0 requested pci_enable_io
vgapci0: child drmn0 requested pci_enable_io
drmn0: [drm] GT0: Incompatible option enable_guc=-1 - undocumented flag
drmn0: [drm] GT1: Incompatible option enable_guc=-1 - undocumented flag
drmn0: [drm] *ERROR* Unexpected child device config size 40 (expected 39 for VBT version 258)
drmn0: successfully loaded firmware image 'i915/mtl_dmc.bin'
drmn0: [drm] Finished loading DMC firmware i915/mtl_dmc.bin (v2.23)
lkpi_iic0: <LinuxKPI I2C> on drmn0
lkpi_iic1: <LinuxKPI I2C> on drmn0
lkpi_iic2: <LinuxKPI I2C> on drmn0
lkpi_iic3: <LinuxKPI I2C> on drmn0
lkpi_iic4: <LinuxKPI I2C> on drmn0
lkpi_iic5: <LinuxKPI I2C> on drmn0
lkpi_iic6: <LinuxKPI I2C> on drmn0
lkpi_iic7: <LinuxKPI I2C> on drmn0
lkpi_iic8: <LinuxKPI I2C> on drmn0
drmn0: successfully loaded firmware image 'i915/mtl_guc_70.bin'
drmn0: successfully loaded firmware image 'i915/mtl_guc_70.bin'
drmn0: successfully loaded firmware image 'i915/mtl_huc_gsc.bin'
drmn0: successfully loaded firmware image 'i915/mtl_gsc_1.bin'
drmn0: [drm] GT0: GuC firmware i915/mtl_guc_70.bin version 70.36.0
drmn0: [drm] GT0: GUC: submission enabled
drmn0: [drm] GT0: GUC: SLPC enabled
drmn0: [drm] GT0: GUC: RC enabled
drmn0: [drm:0xffffffff83f3e750s] CI tainted:0 by 0xffffffff83f3e750S
lkpi_iic9: <LinuxKPI I2C> on drm1
lkpi_iic10: <LinuxKPI I2C> on drm3
lkpi_iic11: <LinuxKPI I2C> on drm4
lkpi_iic12: <LinuxKPI I2C> on drm5
[drm] Initialized i915 1.6.0 20230929 for drmn0 on minor 0
VT: Replacing driver "efifb" with new "drmfb".
name=drmn0 id=i915drmfb flags=0x0 stride=7680
[drm] Got Intel graphics stolen memory base 0x0, size 0x0
drmn0: <drmn> on vgapci0
vgapci0: child drmn0 requested pci_enable_io
vgapci0: child drmn0 requested pci_enable_io
drmn0: [drm] GT0: Incompatible option enable_guc=-1 - undocumented flag
drmn0: [drm] GT1: Incompatible option enable_guc=-1 - undocumented flag
drmn0: [drm] *ERROR* Unexpected child device config size 40 (expected 39 for VBT version 258)
drmn0: successfully loaded firmware image 'i915/mtl_dmc.bin'
drmn0: [drm] Finished loading DMC firmware i915/mtl_dmc.bin (v2.23)
lkpi_iic0: <LinuxKPI I2C> on drmn0
lkpi_iic1: <LinuxKPI I2C> on drmn0
lkpi_iic2: <LinuxKPI I2C> on drmn0
lkpi_iic3: <LinuxKPI I2C> on drmn0
lkpi_iic4: <LinuxKPI I2C> on drmn0
lkpi_iic5: <LinuxKPI I2C> on drmn0
lkpi_iic6: <LinuxKPI I2C> on drmn0
lkpi_iic7: <LinuxKPI I2C> on drmn0
lkpi_iic8: <LinuxKPI I2C> on drmn0
drmn0: successfully loaded firmware image 'i915/mtl_guc_70.bin'
drmn0: successfully loaded firmware image 'i915/mtl_guc_70.bin'
drmn0: successfully loaded firmware image 'i915/mtl_huc_gsc.bin'
drmn0: successfully loaded firmware image 'i915/mtl_gsc_1.bin'
drmn0: [drm] GT0: GuC firmware i915/mtl_guc_70.bin version 70.44.1
drmn0: [drm] GT0: GUC: submission enabled
drmn0: [drm] GT0: GUC: SLPC enabled
drmn0: [drm] GT0: GUC: RC enabled
drmn0: [drm:0xffffffff83f3e750s] CI tainted:0 by 0xffffffff83f3e750S
lkpi_iic9: <LinuxKPI I2C> on drm1
lkpi_iic10: <LinuxKPI I2C> on drm3
lkpi_iic11: <LinuxKPI I2C> on drm4
lkpi_iic12: <LinuxKPI I2C> on drm5
[drm] Initialized i915 1.6.0 20230929 for drmn0 on minor 0
VT: Replacing driver "efifb" with new "drmfb".
name=drmn0 id=i915drmfb flags=0x0 stride=7680

Hardware info: https://bsd-hardware.info/?probe=fd3536cb97

@dumbbell
Copy link
Copy Markdown
Member Author

Thank you @centromere for the test and feedback!

Could you please share the entire dmesg without filtering it? Some DRM-related messages do not contai "drm" in it.

Could you also please run sway(1) inside truss(1) like this:

truss -fd -o truss.log sway -d -c /usr/local/etc/sway/config > sway.log 2>&1

... and share both truss.log and sway.log? This should show us a bit more details about the ioctls sent to the driver.

@centromere
Copy link
Copy Markdown

dmesg.log
sway.log
truss.log

@centromere
Copy link
Copy Markdown

@dumbbell, is there anything else I can provide to be of assistance to you?

@dumbbell
Copy link
Copy Markdown
Member Author

@dumbbell, is there anything else I can provide to be of assistance to you?

I still didn’t have a chance to deeply look at the code to understand what it wants to do here. Thank you for sharing the log files by the way! It should be enough to investigate, though it’s unlikely I will be able to do it in the coming week.

@dumbbell
Copy link
Copy Markdown
Member Author

dumbbell commented Jul 8, 2025

I just looked at firmware and the Linux 6.9 drivers don’t use newer firmwares. However, I put a branch with updated existing files. I don’t know if this can fix some of the bugs people saw though.

Here is the branch:
https://github.com/dumbbell/drm-kmod-firmware/pull/new/update-firmwares-to-20250708

@dumbbell
Copy link
Copy Markdown
Member Author

dumbbell commented Jul 8, 2025

@centromere: Here is my current understanding understanding:

Mesa performs a series of ioctl(2) calls to the device. You can find the breakdown below, based on the sources of Mesa 24.1.7 currently available in the Ports tree, and the truss.log file you shared:

14070: 0.623884751 ioctl(15,0xc0406400 { IORW 0x64('d'), 0, 64 },0xdad4947d800) = 0 (0x0) # DRM_IOCTL_VERSION, query length (mesa 24.1.7, src/intel/dev/intel_device_info.c:1684 -> libdrm)
14070: 0.623972419 ioctl(15,0xc0406400 { IORW 0x64('d'), 0, 64 },0xdad4947d800) = 0 (0x0) # DRM_IOCTL_VERSION, fetch data
14070: 0.624050465 ioctl(15,0xc0106479 { IORW 0x64('d'), 121, 16 },0x820d5a848) = 0 (0x0) # DRM_IOCTL_I915_QUERY=DRM_I915_QUERY_HWCONFIG_BLOB, query length (mesa 24.1.7, src/intel/dev/i915/intel_device_info.c:557)
14070: 0.624124719 ioctl(15,0xc0106479 { IORW 0x64('d'), 121, 16 },0x820d5a848) = 0 (0x0) # DRM_IOCTL_I915_QUERY=DRM_I915_QUERY_HWCONFIG_BLOB, fetch data
14070: 0.624757613 ioctl(15,0xc0106446 { IORW 0x64('d'), 70, 16 },0x820d5a8e0) = 0 (0x0)  # DRM_IOCTL_I915_GETPARAM=I915_PARAM_CS_TIMESTAMP_FREQUENCY (mesa, src/intel/dev/i915/intel_device_info.c:565)
14070: 0.624839915 ioctl(15,0xc0106446 { IORW 0x64('d'), 70, 16 },0x820d5a8e0) = 0 (0x0)  # DRM_IOCTL_I915_GETPARAM=I915_PARAM_REVISION (mesa, src/intel/dev/i915/intel_device_info.c:573)
14070: 0.624908270 ioctl(15,0xc0106479 { IORW 0x64('d'), 121, 16 },0x820d5a848) = 0 (0x0) # DRM_IOCTL_I915_QUERY=DRM_I915_QUERY_TOPOLOGY_INFO, query length (mesa, src/intel/dev/i915/intel_device_info.c:301)
14070: 0.624978644 ioctl(15,0xc0106479 { IORW 0x64('d'), 121, 16 },0x820d5a848) = 0 (0x0) # DRM_IOCTL_I915_QUERY=DRM_I915_QUERY_TOPOLOGY_INFO, fetch data
14070: 0.625047200 ioctl(15,0xc0106479 { IORW 0x64('d'), 121, 16 },0x820d5a848) = 0 (0x0) # DRM_IOCTL_I915_QUERY=DRM_I915_QUERY_GEOMETRY_SUBSLICES, query length (mesa, src/intel/dev/i915/intel_device_info.c:306) (or DRM_I915_QUERY_MEMORY_REGIONS? src/intel/dev/i915/intel_device_info.c:333)

I’m not sure about the last DRM_IOCTL_I915_QUERY because it depends on the result of the previous DRM_IOCTL_I915_QUERY. This ioctl(2) is called twice in a row: once to query the size of the data to fetch, once to actually fetch the data after allocating a buffer according to the queried size. That last DRM_IOCTL_I915_QUERY is called once only, not twice. The system call returned 0, thus it succeeded. This makes me think that the allocation failed because it never fetched the data.

I suppose that the returned size is incorrect and triggered an error from calloc(3). I would like to know the params of that last DRM_IOCTL_I915_QUERY: the number of the parameter and the returned size.

Are you able to recompile Mesa from the Ports tree after applying a patch? Here is diff that adds a printf() to help me understand what the kernel returned:

--- work/mesa-24.1.7/src/intel/common/i915/intel_gem.h.orig	2025-07-09 01:06:36.275143000 +0200
+++ work/mesa-24.1.7/src/intel/common/i915/intel_gem.h	2025-07-09 01:09:28.522240000 +0200
@@ -75,6 +75,7 @@
    };
 
    int ret = intel_ioctl(fd, DRM_IOCTL_I915_QUERY, &args);
+   fprintf(stderr, "DRM_IOCTL_I915_QUERY: query_id=%lu ret=%d length=%d\n", query_id, ret, item.length);
    if (ret != 0)
       return -errno;
    else if (item.length < 0)

You can apply it graphics/mesa-dri in the Ports tree after compiling it once to make sure the build succeeds without. Then install the port. Note that I only compile-tested the patch, I didn’t run it. Don’t hesitate to ask if something is unclear!

@centromere
Copy link
Copy Markdown

I have collected the requested information.

sway-stderr.log

@dumbbell
Copy link
Copy Markdown
Member Author

dumbbell commented Jul 9, 2025

Thank you!

Indeed, the returned length looks like an error code (-22 = -EINVAL). In fact, this is a legit way of returning an error for this ioctl because you can perform several queries in a single system call. Mesa checks if the length is negative and uses it as an error code if it’s the case.

query_id=6 is DRM_I915_QUERY_GEOMETRY_SUBSLICES. Now, why does this query returns -EINVAL?

In the i915 driver, this would be this code (drivers/gpu/drm/i915/i915_query.c:111):

	classinstance = *((struct i915_engine_class_instance *)&query_item->flags);

	engine = intel_engine_lookup_user(i915, (u8)classinstance.engine_class,
					  (u8)classinstance.engine_instance);

	if (!engine)
		return -EINVAL;

	if (engine->class != RENDER_CLASS)
		return -EINVAL;

So the lookup fails or returns an unexpected engine, based on the query flags passed by Mesa. I can’t dig deeper right now, but I will follow that lead.

@dumbbell
Copy link
Copy Markdown
Member Author

dumbbell commented Jul 9, 2025

@centromere: Also, I forgot to ask, is it a regression compared to previous DRM drivers?

@centromere
Copy link
Copy Markdown

@centromere: Also, I forgot to ask, is it a regression compared to previous DRM drivers?

No. This is brand new hardware which has never successfully had graphics with FreeBSD. Graphics do work with Ubuntu 24.04, however.

@dumbbell
Copy link
Copy Markdown
Member Author

dumbbell commented Jul 9, 2025

Do you know which version of the kernel Ubuntu 24.04 uses?

@centromere
Copy link
Copy Markdown

6.8.0-63-generic, Mesa version 24.2.8.

@centromere
Copy link
Copy Markdown

I've applied the following change:

diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c
index c3a2d15c84..5b537887d5 100644
--- a/drivers/gpu/drm/i915/i915_query.c
+++ b/drivers/gpu/drm/i915/i915_query.c
@@ -112,12 +112,17 @@ static int query_geometry_subslices(struct drm_i915_private *i915,
 
 	engine = intel_engine_lookup_user(i915, (u8)classinstance.engine_class,
 					  (u8)classinstance.engine_instance);
+	i915_report_error(i915, "query_geometry_subslices: engine_class=%hhu, engine_instance=%hhu\n", (u8)classinstance.engine_class, (u8)classinstance.engine_instance);
 
-	if (!engine)
+	if (!engine) {
+		i915_report_error(i915, "!engine\n");
 		return -EINVAL;
+	}
 
-	if (engine->class != RENDER_CLASS)
+	if (engine->class != RENDER_CLASS) {
+		i915_report_error(i915, "engine->class = %hhu\n", engine->class);
 		return -EINVAL;
+	}
 
 	sseu = &engine->gt->info.sseu;
 

which resulted in the following dmesg output:

drmn0: query_geometry_subslices: engine_class=0, engine_instance=0
drmn0: Please file a bug on drm/i915; see https://drm.pages.freedesktop.org/intel-docs/how-to-file-i915-bugs.html for details.drmn0: engine->class = 5
drmn0: query_geometry_subslices: engine_class=0, engine_instance=0
drmn0: engine->class = 5
drmn0: query_geometry_subslices: engine_class=0, engine_instance=0
drmn0: engine->class = 5
drmn0: query_geometry_subslices: engine_class=0, engine_instance=0
drmn0: engine->class = 5

@dumbbell
Copy link
Copy Markdown
Member Author

dumbbell commented Jul 9, 2025

Thank you!

The class and instance being 0 is logical as Mesa sets the flags to 0.

I looked at the diff between Mesa 24.1.7 and 24.2.8. I didn't spot anything relevant. Next, I will look at rbtree in linuxkpi and fixes committed to 6.8.x and 6.9.x in Linux.

@centromere
Copy link
Copy Markdown

I've discovered that intel_engine_lookup_user uses uabi_class for its comparison, but query_geometry_subslices uses class. The intel_engine_cs struct is defined here.

intel_engine_lookup_user succeeds because uabi_class is 0. query_geometry_subslices fails because class is 5 (not RENDER_CLASS, which is 0).

@dumbbell
Copy link
Copy Markdown
Member Author

The driver populates debug information through lindebugfs. In particular, we can get details about the engines in lindebugfs/dri/1/i915_engine_info. Unfortunately, it doesn’t work currently. I will look into this. This should help us learn more about your problem.

@dumbbell dumbbell force-pushed the update-to-linux-6.9 branch from dee696a to 09683dc Compare July 10, 2025 23:06
@dumbbell
Copy link
Copy Markdown
Member Author

I pushed two fixes to my freebsd-src branch, as well as a small patch to drm-kmod. It fixes debugfs support for me.

Could you please try to mount lindebugfs, then share the content of /path/to/lindebugfs/dri/0/i915_engine_info? Can you do the same from Ubuntu?

@centromere
Copy link
Copy Markdown

Ubuntu 24.04

Linux logos 6.11.0-29-generic #29~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Jun 26 14:16:59 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

i915_engine_info-ubuntu.txt

@centromere
Copy link
Copy Markdown

When I attempted to cat the file on FreeBSD with your patches, I received a kernel panic:
IMG_20250710_205532_681

@dumbbell
Copy link
Copy Markdown
Member Author

Sorry for the delay, @centromere, it’s been a couple busy weeks.

I fixed the implementation of hex_dump_to_buffer(). The original implementation didn’t honored some of the properties of the Linux implementation. For me, the output is now equivalent of the output on Linux according to the i915_engine_info-ubuntu.txt file you shared.

I pushed everything to my freebsd-src branch. Could you please give it another try?

That said, the panic you report seems unrelated because it happens further down the call stack. I noticed in your message that you are running Linux 6.11, not 6.8. So perhaps 6.10/6.11 contains fixes or improvements that we miss?

@dumbbell dumbbell force-pushed the update-to-linux-6.9 branch from 09683dc to 11774df Compare July 26, 2025 16:49
@centromere
Copy link
Copy Markdown

I have given it another try:

IMG_20250728_141226_212

@amshafer
Copy link
Copy Markdown
Contributor

amshafer commented Aug 3, 2025

Tracked down the cause of the class/uabi_class lookup failure today. It happens due to intel_gt_init failing, which prevents us from calling intel_engines_driver_register. The engine initialization is weirdly split, so the uabi_class doesn't get set until intel_engines_driver_register, so when it is erroneously zero the lookup stuff no longer works.

It looks like intel_gt_init fails because intel_gt_retire_requests_timeout times out and returns -ETIME. I'm not sure why this happens, the GuC seems to be initialized properly and with this PR I don't see the GPU hang warnings I've seen on earlier versions. I think something during initialization is wedging and then when we go through the gt->timelines to wait for things here the fences don't get signaled.

@centromere
Copy link
Copy Markdown

Thank you @amshafer. Is there anything I can do to be of service at this time?

@amshafer
Copy link
Copy Markdown
Contributor

amshafer commented Aug 4, 2025

I think one good test would be confirming that the same 6.9 linux version actually supports this chip. I think we have only confirmed 6.11 so far. Given that there's an issue with requests timing out it could be a common issue that got fixed in 6.10 or 6.11.

@centromere
Copy link
Copy Markdown

It works on Ubuntu 24.04, which runs kernel version 6.8.

Meenakshikumar Somasundaram and others added 20 commits August 9, 2025 16:31
[Why]
During DP tunnel creation, CM preallocates BW and reduces
estimated BW of other DPIA. CM release preallocation only
when allocation is complete. Display mode validation logic
validates timings based on bw available per host router.
In multi display setup, this causes bw allocation failure
when allocation greater than estimated bw.

[How]
Do zero alloc to make the CM to release preallocation and
update estimated BW correctly for all DPIAs per host router.

Reviewed-by: PeiChen Huang <peichen.huang@amd.com>
Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This reverts drm/amdgpu: fix ftrace event amdgpu_bo_move always move
on same heap. The basic problem here is that after the move the old
location is simply not available any more.

Some fixes were suggested, but essentially we should call the move
notification before actually moving things because only this way we have
the correct order for DMA-buf and VM move notifications as well.

Also rework the statistic handling so that we don't update the eviction
counter before the move.

v2: add missing NULL check

Signed-off-by: Christian König <christian.koenig@amd.com>
Fixes: 94aeb4117343 ("drm/amdgpu: fix ftrace event amdgpu_bo_move always move on same heap")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3171
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
CC: stable@vger.kernel.org
Add VCO speed parameters in the bounding box array.

Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why && How]
Screen flickering saw on 4K@60 eDP with high refresh rate external
monitor when booting up in DC mode. DC Mode Capping is disabled
which caused wrong UCLK being used.

Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Leo Ma <hanghong.ma@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[why]
preOS will not support display mode programming and link training
for UHBR rates.

[how]
If we detect a sink that's UHBR capable, disable seamless boot

Reviewed-by: Anthony Koo <anthony.koo@amd.com>
Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Sung Joon Kim <sungjoon.kim@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch adds a missed handling of PL domain doorbell while
handling VRAM faults.

Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Fixes: a6ff969fe9cb ("drm/amdgpu: fix visible VRAM handling during faults")
Reviewed-by: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Panel replay was enabled by default in commit 5950efe25ee0
("drm/amd/display: Enable Panel Replay for static screen use case"), but
it isn't working properly at least on some BOE and AUO panels.  Instead
of being static the screen is solid black when active.  As it's a new
feature that was just introduced that regressed VRR disable it for now
so that problem can be properly root caused.

Cc: Tom Chung <chiahsuan.chung@amd.com>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3344
Fixes: 5950efe25ee0 ("drm/amd/display: Enable Panel Replay for static screen use case")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We missed setting the CCS mode during resume and engine resets.
Create a workaround to be added in the engine's workaround list.
This workaround sets the XEHP_CCS_MODE value at every reset.

The issue can be reproduced by running:

  $ clpeak --kernel-latency

Without resetting the CCS mode, we encounter a fence timeout:

  Fence expiration time out i915-0000:03:00.0:clpeak[2387]:2!

Fixes: 6db31251bb26 ("drm/i915/gt: Enable only one CCS for compute workload")
Reported-by: Gnattu OC <gnattuoc@me.com>
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/10895
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: <stable@vger.kernel.org> # v6.2+
Tested-by: Gnattu OC <gnattuoc@me.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Tested-by: Krzysztof Gibala <krzysztof.gibala@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240426000723.229296-1-andi.shyti@linux.intel.com
(cherry picked from commit 4cfca03f76413db115c3cc18f4370debb1b81b2b)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Intel hardware is capable of programming the Maud/Naud SDPs on its
own based on real-time clocks. While doing so, it takes care
of any deviations from the theoretical values. Programming the registers
explicitly with static values can interfere with this logic. Therefore,
let the HW decide the Maud and Naud SDPs on it's own.

Cc: stable@vger.kernel.org # v5.17
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/8097
Co-developed-by: Kai Vehmanen <kai.vehmanen@intel.com>
Signed-off-by: Kai Vehmanen <kai.vehmanen@intel.com>
Signed-off-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com>
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Signed-off-by: Animesh Manna <animesh.manna@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240430091825.733499-1-chaitanya.kumar.borah@intel.com
(cherry picked from commit 8e056b50d92ae7f4d6895d1c97a69a2a953cf97b)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Starting BDB version 239, hdr_dpcd_refresh_timeout is introduced to
backlight BDB data. Commit 700034566d68 ("drm/i915/bios: Define more BDB
contents") updated the backlight BDB data accordingly. This broke the
parsing of backlight BDB data in VBT for versions 236 - 238 (both
inclusive) and hence the backlight controls are not responding on units
with the concerned BDB version.

backlight_control information has been present in backlight BDB data
from at least BDB version 191 onwards, if not before. Hence this patch
extracts the backlight_control information for BDB version 191 or newer.
Tested on Chromebooks using Jasperlake SoC (reports bdb->version = 236).
Tested on Chromebooks using Raptorlake SoC (reports bdb->version = 251).

v2: removed checking the block size of the backlight BDB data
    [vsyrjala: this is completely safe thanks to commit e163cfb4c96d
     ("drm/i915/bios: Make copies of VBT data blocks")]

Fixes: 700034566d68 ("drm/i915/bios: Define more BDB contents")
Cc: stable@vger.kernel.org
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Karthikeyan Ramasubramanian <kramasub@chromium.org>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240221180622.v2.1.I0690aa3e96a83a43b3fc33f50395d334b2981826@changeid
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
(cherry picked from commit c286f6a973c66c0d993ecab9f7162c790e7064c8)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
The debug print clearly lacks a \n at the end. Add it.

Fixes: 8f86c82aba8b ("drm/connector: demote connector force-probes for non-master clients")
Reviewed-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
Reviewed-by: Simon Ser <contact@emersion.fr>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240502153234.1.I2052f01c8d209d9ae9c300b87c6e4f60bd3cc99e@changeid
[Why]
Underflow occurs when running Netflix in a 4k144 eDP + 4k60 HDMI FRL
setup. It is caused by latency varying based on the DCFCLK/FCLK state.

[How]
Enable urgent latency adjustment and match the reference to existing
ASIC that also see increased latency at low FCLK.

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Nicholas Susanto <nicholas.susanto@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
This fixes a bug introduced by commit c53655545141 ("drm/amd/display: dsc
mst re-compute pbn for changes on hub").
The change caused light-up issues with a second display that required
DSC on some MST docks.

[How]
Use Virtual DPCD for DSC caps in MST case.

[Limitations]
This change only affects MST DSC devices that follow specifications
additional changes are required to check for old MST DSC devices such as
ones which do not check for Virtual DPCD registers.

Reviewed-by: Swapnil Patel <swapnil.patel@amd.com>
Reviewed-by: Hersen Wu <hersenxs.wu@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Agustin Gutierrez <agustin.gutierrez@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
…ual eDP

[Why]
Idle optimizations are blocked if there's more than one eDP connector
on the board - blocking S0i3 and IPS2 for static screen.

[How]
Fix the checks to correctly detect number of active eDP.
Also restrict the eDP support to panels that have correct feature
support.

Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Some older MST hubs do not report DPCD registers according to
specification.

[How]
This change re-applies commit c53655545141 ("drm/amd/display: dsc mst
re-compute pbn for changes on hub").
With an additional check for these older MST devices.

Reviewed-by: Swapnil Patel <swapnil.patel@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Agustin Gutierrez <agustin.gutierrez@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
….11 users

Limit the workaround introduced by commit 31729e8c21ec ("drm/amd/pm: fixes
a random hang in S4 for SMU v13.0.4/11") to only run in the s4 path.

Cc: Tim Huang <Tim.Huang@amd.com>
Fixes: 31729e8c21ec ("drm/amd/pm: fixes a random hang in S4 for SMU v13.0.4/11")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3351
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
It incorrectly claimed a resource isn't CPU visible if it's located at
the very end of CPU visible VRAM.

Fixes: a6ff969fe9cb ("drm/amdgpu: fix visible VRAM handling during faults")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3343
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reported-and-Tested-by: Jeremy Day <jsday@noreason.ca>
Signed-off-by: Michel Dänzer <mdaenzer@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
CC: stable@vger.kernel.org
…reeBSD]

`hexdump()` was already redefined earlier in the source file.

Sponsored by:	The FreeBSD Foundation
It is implemented in FreeBSD as of ...

Sponsored by:	The FreeBSD Foundation
@dumbbell dumbbell force-pushed the update-to-linux-6.9 branch from b230476 to 2b36b05 Compare August 9, 2025 14:32
@dumbbell dumbbell merged commit 2b36b05 into freebsd:master Aug 9, 2025
@dumbbell dumbbell deleted the update-to-linux-6.9 branch August 9, 2025 14:41
@centromere
Copy link
Copy Markdown

@dumbbell Where is the best place to continue troubleshooting? Should I open a new issue here on GitHub?

@dumbbell
Copy link
Copy Markdown
Member Author

The problem doesn't seem to be related to any version update, so it might be easier to track it in a dedicated issue indeed.

@ltning
Copy link
Copy Markdown

ltning commented Oct 3, 2025

I tried drm-latest-kmod on stable/15 as of last night, with .. interesting results. This is a framework 13", intel 13th gen afaik (not ultra). Works fine with 6.6.image

@emaste
Copy link
Copy Markdown
Member

emaste commented Oct 3, 2025

I tried drm-latest-kmod on stable/15 as of last night, with .. interesting results. This is a framework 13", intel 13th gen afaik (not ultra). Works fine with 6.6.

Try switching to a console vty and back and see if it goes away. There are reports of various sorts of corruption in the discussion in #332. On 6.10 I still have corrupted colours at startup sometimes, which are gone after a switch to console vty and back.

@ltning
Copy link
Copy Markdown

ltning commented Oct 3, 2025

Ah, I should have mentioned - even the console vty is broken with only green kernel messages being visible, and even that only partially.

While at it, I tried to pkg upgrade my framework 16" (amd), since it's on pkgbase with snapshot releases. Currently at ~15alpha4, and with drm-66 or drm-latest built from ports it panics (and I have dumps to prove it :). Where should I drop those core.txt files and whatnot?

@dumbbell
Copy link
Copy Markdown
Member Author

Ah, I should have mentioned - even the console vty is broken with only green kernel messages being visible, and even that only partially.

I have the same problem locally. I started to look at this issue.

Where should I drop those core.txt files and whatnot?

You can share here if that’s ok with you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.