Skip to content

tpci_kernel: Fix Kernel Panic in test_assign_resources caused by BAR address shifts #1303

@xiaotaonotrouble

Description

@xiaotaonotrouble

1. Environment:

Architecture: aarch64
Firmware: ACPI / UEFI
Kernel Version: 5.10 and 6.6 (Reproduced on both)
Hardware: Huawei Kunpeng Server with HiSilicon Hi1822 Ethernet Controller (or other devices with large SR-IOV VF BARs)
LTP Version: version 4.1.0 d447424

2. Problem Description

When running the test_assign_resources function in the ltp_tpci module, the test iterates over device BARs, sequentially calling pci_release_resource() followed by pci_assign_resource().

On certain enterprise platforms (especially ARM64 ACPI systems with complex firmware allocations), this operation causes the BAR's physical MMIO address to be reassigned to a completely different address. This breaks the synchronization between the hardware state and the kernel software state. The immediate consequence is severe device malfunction, and kernel panic is possible.

3. Root Cause Analysis & Severe Impact

Through debugging the kernel source and dmesg logs, we identified the exact sequence of events leading to this crash:

1) Firmware Inheritance:

During early boot, the ACPI subsystem natively parses the UEFI/BIOS allocations. The BIOS may not The kernel successfully claims these resources via pci_bus_claim_resources()

2) Topology Destruction:

The LTP test forcefully calls pci_release_resource(), stripping the parent pointer and detaching the resource from the iomem tree. (Note that the driver node still attaches to the BAR resource node as a child, and its state is untouched)

3) First-Fit Address Shift:

The test then immediately calls pci_assign_resource(). The kernel's allocator (find_resource()) uses first-fit strategy. It starts searching for an empty slot from the very beginning (base) of the bridge window. BIOS allocation may leave a big enough hole at a lower address, which results in a different physical address compared to its original BIOS allocation.

4) Device Malfunction and Kernel Panic: Malfunction:

The hardware MMIO address shifts, but the loaded endpoint driver's ioremap mappings still point to the old address. The device driver is now reading/writing invalid memory, rendering the device completely unresponsive.

Dangling Pointer Panic: Under this broken state, driver unbinding/unloading fails to cleanly release the resource. Crucially, many Linux drivers assign a static string literal from their module memory (e.g., .rodata) to resource->name rather than duplicating the string. When the driver module is unloaded, the memory backing that string is freed. The stranded resource node in the iomem tree now holds a dangling pointer. Any subsequent cat /proc/iomem triggers an invalid memory access and an immediate Kernel Panic.

4. Proposed Solution

Since pci_release_resource and pci_assign_resource are exported symbols, they can be invoked, but doing so on an actively driven endpoint device violates the Linux Device Model.
Before modifying the underlying physical memory resources of an endpoint device, programmers must guarantee that no driver is actively managing it. The proposed solution is to explicitly call device_release_driver() to unbind the driver before releasing the resource, and then call device_attach() to rebind it after the new resource is successfully assigned. (Don't care about bridge devices for now)

Proposed Patch:

diff --git a/testcases/kernel/device-drivers/pci/tpci_kernel/ltp_tpci.c b/testcases/kernel/device-drivers/pci/tpci_kernel/ltp_tpci.c
index 660b3a423..4612f3512 100644
--- a/testcases/kernel/device-drivers/pci/tpci_kernel/ltp_tpci.c
+++ b/testcases/kernel/device-drivers/pci/tpci_kernel/ltp_tpci.c
@@ -442,9 +442,26 @@ static int test_assign_resources(void)
 
         if (r->flags & IORESOURCE_MEM &&
             r->flags & IORESOURCE_PREFETCH) {
+           
+            if (dev->hdr_type == PCI_HEADER_TYPE_NORMAL) {
+                if (dev->dev.driver) {
+                    device_release_driver(&dev->dev);
+                }
+            }
+
             pci_release_resource(dev, i);
             ret = pci_assign_resource(dev, i);
             prk_info("assign resource to '%d', ret '%d'", i, ret);
+
+            if (ret == 0) {
+                if (dev->hdr_type == PCI_HEADER_TYPE_NORMAL) {
+                    int attach_ret = device_attach(&dev->dev);
+                    if (attach_ret < 0) {
+                        prk_info("device_attach failed for endpoint, ret: %d", attach_ret);
+                    }
+                }
+            }
+
             rc |= (ret < 0 && ret != -EBUSY) ? TFAIL : TPASS;
         }
     }
-- 
2.43.0

note: if apply this patch, some devices may be excluded to ensure the test runs successfully, such as the main NIC device. Or the ssh connection will be disconnected.

I also have another patch that take bridge devices into considerations:

diff --git a/testcases/kernel/device-drivers/pci/tpci_kernel/ltp_tpci.c b/testcases/kernel/device-drivers/pci/tpci_kernel/ltp_tpci.c
index 660b3a423..362fc9758 100644
--- a/testcases/kernel/device-drivers/pci/tpci_kernel/ltp_tpci.c
+++ b/testcases/kernel/device-drivers/pci/tpci_kernel/ltp_tpci.c
@@ -442,9 +442,47 @@ static int test_assign_resources(void)
 
 		if (r->flags & IORESOURCE_MEM &&
 			r->flags & IORESOURCE_PREFETCH) {
+			
+			if (dev->hdr_type == PCI_HEADER_TYPE_NORMAL) {
+                if (dev->dev.driver) {
+                    device_release_driver(&dev->dev);
+                }
+            } else if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
+                if (i < PCI_BRIDGE_RESOURCES) {
+                    if (dev->dev.driver) {
+                        device_release_driver(&dev->dev);
+                    }
+                } else {
+                    struct pci_bus *bus = dev->subordinate;
+                    struct pci_dev *child, *tmp;
+
+                    if (bus) {
+                        list_for_each_entry_safe(child, tmp, &bus->devices, bus_list) {
+                            pci_stop_and_remove_bus_device(child);
+                        }
+                    }
+                }
+            }
+
 			pci_release_resource(dev, i);
 			ret = pci_assign_resource(dev, i);
 			prk_info("assign resource to '%d', ret '%d'", i, ret);
+
+			if (ret == 0) {
+				if (dev->hdr_type == PCI_HEADER_TYPE_NORMAL) {
+                    device_attach(&dev->dev);
+                } else if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
+                    if (i < PCI_BRIDGE_RESOURCES) {
+                        device_attach(&dev->dev);
+                    } else {
+                        struct pci_bus *bus = dev->subordinate;
+                        if (bus) {
+                            pci_rescan_bus(bus);
+                        }
+                    }
+                }
+			}
+
 			rc |= (ret < 0 && ret != -EBUSY) ? TFAIL : TPASS;
 		}
 	}
-- 
2.43.0

The logic is as follows:
for bridge devices, if the BAR is device BAR, unbind drivers and bind drivers as normal endpoint device;
if the BAR is memory window, recursively remove all downstream devices, after pci_assign_resource, call pci_rescan_bus to add them back.

This operation is too heavy, I'm not sure if its appropriate to apply in just test suite. So, I'm for the version that only considers end-point devices.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions