[Deepin-Kernel-SIG] [linux 6.6-y] [Upstream] [Intel] Intel: Backport to support QuickAssist Technology(QAT) live migration for in-tree driver#837
Conversation
ANBZ: #9185 commit 909f4ab upstream. Intel-SIG: commit 909f4ab iommu: Add new iommu op to create domains owned by userspace Backport to support Intel QAT live migration for in-tree driver Introduce a new iommu_domain op to create domains owned by userspace, e.g. through IOMMUFD. These domains have a few different properties compares to kernel owned domains: - They may be PAGING domains, but created with special parameters. For instance aperture size changes/number of levels, different IOPTE formats, or other things necessary to make a vIOMMU work - We have to track all the memory allocations with GFP_KERNEL_ACCOUNT to make the cgroup sandbox stronger - Device-specialty domains, such as NESTED domains can be created by IOMMUFD. The new op clearly says the domain is being created by IOMMUFD, that the domain is intended for userspace use, and it provides a way to pass user flags or a driver specific uAPI structure to customize the created domain to exactly what the vIOMMU userspace driver requires. iommu drivers that cannot support VFIO/IOMMUFD should not support this op. This includes any driver that cannot provide a fully functional PAGING domain. This new op for now is only supposed to be used by IOMMUFD, hence no wrapper for it. IOMMUFD would call the callback directly. As for domain free, IOMMUFD would use iommu_domain_free(). Link: https://lore.kernel.org/r/20230928071528.26258-2-yi.l.liu@intel.com Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Co-developed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185 commit 7975b72 upstream. Intel-SIG: commit 7975b72 iommufd: Use the domain_alloc_user() op for domain allocation Backport to support Intel QAT live migration for in-tree driver Make IOMMUFD use iommu_domain_alloc_user() by default for iommu_domain creation. IOMMUFD needs to support iommu_domain allocation with parameters from userspace in nested support, and a driver is expected to implement everything under this op. If the iommu driver doesn't provide domain_alloc_user callback then IOMMUFD falls back to use iommu_domain_alloc() with an UNMANAGED type if possible. Link: https://lore.kernel.org/r/20230928071528.26258-3-yi.l.liu@intel.com Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Co-developed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185 commit 89d6387 upstream. Intel-SIG: commit 89d6387 iommufd: Flow user flags for domain allocation to domain_alloc_user() Backport to support Intel QAT live migration for in-tree driver Extends iommufd_hw_pagetable_alloc() to accept user flags, the uAPI will provide the flags. Link: https://lore.kernel.org/r/20230928071528.26258-4-yi.l.liu@intel.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185 commit 4ff5421 upstream. Intel-SIG: commit 4ff5421 iommufd: Support allocating nested parent domain Backport to support Intel QAT live migration for in-tree driver Extend IOMMU_HWPT_ALLOC to allocate domains to be used as parent (stage-2) in nested translation. Add IOMMU_HWPT_ALLOC_NEST_PARENT to the uAPI. Link: https://lore.kernel.org/r/20230928071528.26258-5-yi.l.liu@intel.com Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185 commit bb812e0 upstream. Intel-SIG: commit bb812e0 iommufd/selftest: Iterate idev_ids in mock_domain's alloc_hwpt test Backport to support Intel QAT live migration for in-tree driver The point in iterating variant->mock_domains is to test the idev_ids[0] and idev_ids[1]. So use it instead of keeping testing idev_ids[0] only. Link: https://lore.kernel.org/r/20230919011637.16483-1-nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185 commit 4086636 upstream. Intel-SIG: commit 4086636 iommufd/selftest: Add domain_alloc_user() support in iommu mock Backport to support Intel QAT live migration for in-tree driver Add mock_domain_alloc_user() and a new test case for IOMMU_HWPT_ALLOC_NEST_PARENT. Link: https://lore.kernel.org/r/20230928071528.26258-6-yi.l.liu@intel.com Co-developed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185 commit c97d1b2 upstream. Intel-SIG: commit c97d1b2 iommu/vt-d: Add domain_alloc_user op Backport to support Intel QAT live migration for in-tree driver Add the domain_alloc_user() op implementation. It supports allocating domains to be used as parent under nested translation. Unlike other drivers VT-D uses only a single page table format so it only needs to check if the HW can support nesting. Link: https://lore.kernel.org/r/20230928071528.26258-7-yi.l.liu@intel.com Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185 commit 266dcae upstream. Intel-SIG: commit 266dcae iommufd/selftest: Rework TEST_LENGTH to test min_size explicitly Backport to support Intel QAT live migration for in-tree driver TEST_LENGTH passing ".size = sizeof(struct _struct) - 1" expects -EINVAL from "if (ucmd.user_size < op->min_size)" check in iommufd_fops_ioctl(). This has been working when min_size is exactly the size of the structure. However, if the size of the structure becomes larger than min_size, i.e. the passing size above is larger than min_size, that min_size sanity no longer works. Since the first test in TEST_LENGTH() was to test that min_size sanity routine, rework it to support a min_size calculation, rather than using the full size of the structure. Link: https://lore.kernel.org/r/20231015074648.24185-1-nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185 commit 53f0b02 upstream. Intel-SIG: commit 53f0b02 vfio/iova_bitmap: Export more API symbols Backport to support Intel QAT live migration for in-tree driver In preparation to move iova_bitmap into iommufd, export the rest of API symbols that will be used in what could be used by modules, namely: iova_bitmap_alloc iova_bitmap_free iova_bitmap_for_each Link: https://lore.kernel.org/r/20231024135109.73787-2-joao.m.martins@oracle.com Suggested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185 commit 8c9c727 upstream. Intel-SIG: commit 8c9c727 vfio: Move iova_bitmap into iommufd Backport to support Intel QAT live migration for in-tree driver Both VFIO and IOMMUFD will need iova bitmap for storing dirties and walking the user bitmaps, so move to the common dependency into IOMMUFD. In doing so, create the symbol IOMMUFD_DRIVER which designates the builtin code that will be used by drivers when selected. Today this means MLX5_VFIO_PCI and PDS_VFIO_PCI. IOMMU drivers will do the same (in future patches) when supporting dirty tracking and select IOMMUFD_DRIVER accordingly. Given that the symbol maybe be disabled, add header definitions in iova_bitmap.h for when IOMMUFD_DRIVER=n Link: https://lore.kernel.org/r/20231024135109.73787-3-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185 commit 13578d4 upstream. Intel-SIG: commit 13578d4 iommufd/iova_bitmap: Move symbols to IOMMUFD namespace Backport to support Intel QAT live migration for in-tree driver Have the IOVA bitmap exported symbols adhere to the IOMMUFD symbol export convention i.e. using the IOMMUFD namespace. In doing so, import the namespace in the current users. This means VFIO and the vfio-pci drivers that use iova_bitmap_set(). Link: https://lore.kernel.org/r/20231024135109.73787-4-joao.m.martins@oracle.com Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185 commit 750e2e9 upstream. Intel-SIG: commit 750e2e9 iommu: Add iommu_domain ops for dirty tracking Backport to support Intel QAT live migration for in-tree driver Add to iommu domain operations a set of callbacks to perform dirty tracking, particulary to start and stop tracking and to read and clear the dirty data. Drivers are generally expected to dynamically change its translation structures to toggle the tracking and flush some form of control state structure that stands in the IOVA translation path. Though it's not mandatory, as drivers can also enable dirty tracking at boot, and just clear the dirty bits before setting dirty tracking. For each of the newly added IOMMU core APIs: iommu_cap::IOMMU_CAP_DIRTY_TRACKING: new device iommu_capable value when probing for capabilities of the device. .set_dirty_tracking(): an iommu driver is expected to change its translation structures and enable dirty tracking for the devices in the iommu_domain. For drivers making dirty tracking always-enabled, it should just return 0. .read_and_clear_dirty(): an iommu driver is expected to walk the pagetables for the iova range passed in and use iommu_dirty_bitmap_record() to record dirty info per IOVA. When detecting that a given IOVA is dirty it should also clear its dirty state from the PTE, *unless* the flag IOMMU_DIRTY_NO_CLEAR is passed in -- flushing is steered from the caller of the domain_op via iotlb_gather. The iommu core APIs use the same data structure in use for dirty tracking for VFIO device dirty (struct iova_bitmap) abstracted by iommu_dirty_bitmap_record() helper function. domain::dirty_ops: IOMMU domains will store the dirty ops depending on whether the iommu device supports dirty tracking or not. iommu drivers can then use this field to figure if the dirty tracking is supported+enforced on attach. The enforcement is enable via domain_alloc_user() which is done via IOMMUFD hwpt flag introduced later. Link: https://lore.kernel.org/r/20231024135109.73787-5-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185 commit b5f9e63 upstream. Intel-SIG: commit b5f9e63 iommufd: Correct IOMMU_HWPT_ALLOC_NEST_PARENT description Backport to support Intel QAT live migration for in-tree driver The IOMMU_HWPT_ALLOC_NEST_PARENT flag is used to allocate a HWPT. Though a HWPT holds a domain in the core structure, it is still quite confusing to describe it using "domain" in the uAPI kdoc. Correct it to "HWPT". Fixes: 4ff5421 ("iommufd: Support allocating nested parent domain") Link: https://lore.kernel.org/r/20231017181552.12667-1-nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185 commit 5f9bdbf upstream. Intel-SIG: commit 5f9bdbf iommufd: Add a flag to enforce dirty tracking on attach Backport to support Intel QAT live migration for in-tree driver Throughout IOMMU domain lifetime that wants to use dirty tracking, some guarantees are needed such that any device attached to the iommu_domain supports dirty tracking. The idea is to handle a case where IOMMU in the system are assymetric feature-wise and thus the capability may not be supported for all devices. The enforcement is done by adding a flag into HWPT_ALLOC namely: IOMMU_HWPT_ALLOC_DIRTY_TRACKING .. Passed in HWPT_ALLOC ioctl() flags. The enforcement is done by creating a iommu_domain via domain_alloc_user() and validating the requested flags with what the device IOMMU supports (and failing accordingly) advertised). Advertising the new IOMMU domain feature flag requires that the individual iommu driver capability is supported when a future device attachment happens. Link: https://lore.kernel.org/r/20231024135109.73787-6-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185 commit e2a4b29 upstream. Intel-SIG: commit e2a4b29 iommufd: Add IOMMU_HWPT_SET_DIRTY_TRACKING Backport to support Intel QAT live migration for in-tree driver Every IOMMU driver should be able to implement the needed iommu domain ops to control dirty tracking. Connect a hw_pagetable to the IOMMU core dirty tracking ops, specifically the ability to enable/disable dirty tracking on an IOMMU domain (hw_pagetable id). To that end add an io_pagetable kernel API to toggle dirty tracking: * iopt_set_dirty_tracking(iopt, [domain], state) The intended caller of this is via the hw_pagetable object that is created. Internally it will ensure the leftover dirty state is cleared /right before/ dirty tracking starts. This is also useful for iommu drivers which may decide that dirty tracking is always-enabled at boot without wanting to toggle dynamically via corresponding iommu domain op. Link: https://lore.kernel.org/r/20231024135109.73787-7-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185 commit b9a60d6 upstream. Intel-SIG: commit b9a60d6 iommufd: Add IOMMU_HWPT_GET_DIRTY_BITMAP Backport to support Intel QAT live migration for in-tree driver Connect a hw_pagetable to the IOMMU core dirty tracking read_and_clear_dirty iommu domain op. It exposes all of the functionality for the UAPI that read the dirtied IOVAs while clearing the Dirty bits from the PTEs. In doing so, add an IO pagetable API iopt_read_and_clear_dirty_data() that performs the reading of dirty IOPTEs for a given IOVA range and then copying back to userspace bitmap. Underneath it uses the IOMMU domain kernel API which will read the dirty bits, as well as atomically clearing the IOPTE dirty bit and flushing the IOTLB at the end. The IOVA bitmaps usage takes care of the iteration of the bitmaps user pages efficiently and without copies. Within the iterator function we iterate over io-pagetable contigous areas that have been mapped. Contrary to past incantation of a similar interface in VFIO the IOVA range to be scanned is tied in to the bitmap size, thus the application needs to pass a appropriately sized bitmap address taking into account the iova range being passed *and* page size ... as opposed to allowing bitmap-iova != iova. Link: https://lore.kernel.org/r/20231024135109.73787-8-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185 commit 7623683 upstream. Intel-SIG: commit 7623683 iommufd: Add capabilities to IOMMU_GET_HW_INFO Backport to support Intel QAT live migration for in-tree driver Extend IOMMUFD_CMD_GET_HW_INFO op to query generic iommu capabilities for a given device. Capabilities are IOMMU agnostic and use device_iommu_capable() API passing one of the IOMMU_CAP_*. Enumerate IOMMU_CAP_DIRTY_TRACKING for now in the out_capabilities field returned back to userspace. Link: https://lore.kernel.org/r/20231024135109.73787-9-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185 commit 6098481 upstream. Intel-SIG: commit 6098481 iommufd: Add a flag to skip clearing of IOPTE dirty Backport to support Intel QAT live migration for in-tree driver VFIO has an operation where it unmaps an IOVA while returning a bitmap with the dirty data. In reality the operation doesn't quite query the IO pagetables that the PTE was dirty or not. Instead it marks as dirty on anything that was mapped, and doing so in one syscall. In IOMMUFD the equivalent is done in two operations by querying with GET_DIRTY_IOVA followed by UNMAP_IOVA. However, this would incur two TLB flushes given that after clearing dirty bits IOMMU implementations require invalidating their IOTLB, plus another invalidation needed for the UNMAP. To allow dirty bits to be queried faster, add a flag (IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR) that requests to not clear the dirty bits from the PTE (but just reading them), under the expectation that the next operation is the unmap. An alternative is to unmap and just perpectually mark as dirty as that's the same behaviour as today. So here equivalent functionally can be provided with unmap alone, and if real dirty info is required it will amortize the cost while querying. There's still a race against DMA where in theory the unmap of the IOVA (when the guest invalidates the IOTLB via emulated iommu) would race against the VF performing DMA on the same IOVA. As discussed in [0], we are accepting to resolve this race as throwing away the DMA and it doesn't matter if it hit physical DRAM or not, the VM can't tell if we threw it away because the DMA was blocked or because we failed to copy the DRAM. [0] https://lore.kernel.org/linux-iommu/20220502185239.GR8364@nvidia.com/ Link: https://lore.kernel.org/r/20231024135109.73787-10-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185 commit 1342881 upstream. Intel-SIG: commit 1342881 iommu/amd: Add domain_alloc_user based domain allocation Backport to support Intel QAT live migration for in-tree driver Add the domain_alloc_user op implementation. To that end, refactor amd_iommu_domain_alloc() to receive a dev pointer and flags, while renaming it too, such that it becomes a common function shared with domain_alloc_user() implementation. The sole difference with domain_alloc_user() is that we initialize also other fields that iommu_domain_alloc() does. It lets it return the iommu domain correctly initialized in one function. This is in preparation to add dirty enforcement on AMD implementation of domain_alloc_user. Link: https://lore.kernel.org/r/20231024135109.73787-11-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185 commit 421a511 upstream. Intel-SIG: commit 421a511 iommu/amd: Access/Dirty bit support in IOPTEs Backport to support Intel QAT live migration for in-tree driver IOMMU advertises Access/Dirty bits if the extended feature register reports it. Relevant AMD IOMMU SDM ref[0] "1.3.8 Enhanced Support for Access and Dirty Bits" To enable it set the DTE flag in bits 7 and 8 to enable access, or access+dirty. With that, the IOMMU starts marking the D and A flags on every Memory Request or ATS translation request. It is on the VMM side to steer whether to enable dirty tracking or not, rather than wrongly doing in IOMMU. Relevant AMD IOMMU SDM ref [0], "Table 7. Device Table Entry (DTE) Field Definitions" particularly the entry "HAD". To actually toggle on and off it's relatively simple as it's setting 2 bits on DTE and flush the device DTE cache. To get what's dirtied use existing AMD io-pgtable support, by walking the pagetables over each IOVA, with fetch_pte(). The IOTLB flushing is left to the caller (much like unmap), and iommu_dirty_bitmap_record() is the one adding page-ranges to invalidate. This allows caller to batch the flush over a big span of IOVA space, without the iommu wondering about when to flush. Worthwhile sections from AMD IOMMU SDM: "2.2.3.1 Host Access Support" "2.2.3.2 Host Dirty Support" For details on how IOMMU hardware updates the dirty bit see, and expects from its consequent clearing by CPU: "2.2.7.4 Updating Accessed and Dirty Bits in the Guest Address Tables" "2.2.7.5 Clearing Accessed and Dirty Bits" Quoting the SDM: "The setting of accessed and dirty status bits in the page tables is visible to both the CPU and the peripheral when sharing guest page tables. The IOMMU interlocked operations to update A and D bits must be 64-bit operations and naturally aligned on a 64-bit boundary" .. and for the IOMMU update sequence to Dirty bit, essentially is states: 1. Decodes the read and write intent from the memory access. 2. If P=0 in the page descriptor, fail the access. 3. Compare the A & D bits in the descriptor with the read and write intent in the request. 4. If the A or D bits need to be updated in the descriptor: * Start atomic operation. * Read the descriptor as a 64-bit access. * If the descriptor no longer appears to require an update, release the atomic lock with no further action and continue to step 5. * Calculate the new A & D bits. * Write the descriptor as a 64-bit access. * End atomic operation. 5. Continue to the next stage of translation or to the memory access. Access/Dirty bits readout also need to consider the non-default page-sizes (aka replicated PTEs as mentined by manual), as AMD supports all powers of two (except 512G) page sizes. Select IOMMUFD_DRIVER only if IOMMUFD is enabled considering that IOMMU dirty tracking requires IOMMUFD. Link: https://lore.kernel.org/r/20231024135109.73787-12-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185 commit f35f22c upstream. Intel-SIG: commit f35f22c iommu/vt-d: Access/Dirty bit support for SS domains Backport to support Intel QAT live migration for in-tree driver IOMMU advertises Access/Dirty bits for second-stage page table if the extended capability DMAR register reports it (ECAP, mnemonic ECAP.SSADS). The first stage table is compatible with CPU page table thus A/D bits are implicitly supported. Relevant Intel IOMMU SDM ref for first stage table "3.6.2 Accessed, Extended Accessed, and Dirty Flags" and second stage table "3.7.2 Accessed and Dirty Flags". First stage page table is enabled by default so it's allowed to set dirty tracking and no control bits needed, it just returns 0. To use SSADS, set bit 9 (SSADE) in the scalable-mode PASID table entry and flush the IOTLB via pasid_flush_caches() following the manual. Relevant SDM refs: "3.7.2 Accessed and Dirty Flags" "6.5.3.3 Guidance to Software for Invalidations, Table 23. Guidance to Software for Invalidations" PTE dirty bit is located in bit 9 and it's cached in the IOTLB so flush IOTLB to make sure IOMMU attempts to set the dirty bit again. Note that iommu_dirty_bitmap_record() will add the IOVA to iotlb_gather and thus the caller of the iommu op will flush the IOTLB. Relevant manuals over the hardware translation is chapter 6 with some special mention to: "6.2.3.1 Scalable-Mode PASID-Table Entry Programming Considerations" "6.2.4 IOTLB" Select IOMMUFD_DRIVER only if IOMMUFD is enabled, given that IOMMU dirty tracking requires IOMMUFD. Link: https://lore.kernel.org/r/20231024135109.73787-13-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185 commit e04b23c upstream. Intel-SIG: commit e04b23c iommufd/selftest: Expand mock_domain with dev_flags Backport to support Intel QAT live migration for in-tree driver Expand mock_domain test to be able to manipulate the device capabilities. This allows testing with mockdev without dirty tracking support advertised and thus make sure enforce_dirty test does the expected. To avoid breaking IOMMUFD_TEST UABI replicate the mock_domain struct and thus add an input dev_flags at the end. Link: https://lore.kernel.org/r/20231024135109.73787-14-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185 commit 266ce58 upstream. Intel-SIG: commit 266ce58 iommufd/selftest: Test IOMMU_HWPT_ALLOC_DIRTY_TRACKING Backport to support Intel QAT live migration for in-tree driver In order to selftest the iommu domain dirty enforcing implement the mock_domain necessary support and add a new dev_flags to test that the hwpt_alloc/attach_device fails as expected. Expand the existing mock_domain fixture with a enforce_dirty test that exercises the hwpt_alloc and device attachment. Link: https://lore.kernel.org/r/20231024135109.73787-15-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185 commit 7adf267 upstream. Intel-SIG: commit 7adf267 iommufd/selftest: Test IOMMU_HWPT_SET_DIRTY_TRACKING Backport to support Intel QAT live migration for in-tree driver Change mock_domain to supporting dirty tracking and add tests to exercise the new SET_DIRTY_TRACKING API in the iommufd_dirty_tracking selftest fixture. Link: https://lore.kernel.org/r/20231024135109.73787-16-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185 commit a9af47e upstream. Intel-SIG: commit a9af47e iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_BITMAP Backport to support Intel QAT live migration for in-tree driver Add a new test ioctl for simulating the dirty IOVAs in the mock domain, and implement the mock iommu domain ops that get the dirty tracking supported. The selftest exercises the usual main workflow of: 1) Setting dirty tracking from the iommu domain 2) Read and clear dirty IOPTEs Different fixtures will test different IOVA range sizes, that exercise corner cases of the bitmaps. Link: https://lore.kernel.org/r/20231024135109.73787-17-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185 commit ae36fe7 upstream. Intel-SIG: commit ae36fe7 iommufd/selftest: Test out_capabilities in IOMMU_GET_HW_INFO Backport to support Intel QAT live migration for in-tree driver Enumerate the capabilities from the mock device and test whether it advertises as expected. Include it as part of the iommufd_dirty_tracking fixture. Link: https://lore.kernel.org/r/20231024135109.73787-18-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185 commit 0795b30 upstream. Intel-SIG: commit 0795b30 iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR flag Backport to support Intel QAT live migration for in-tree driver Change test_mock_dirty_bitmaps() to pass a flag where it specifies the flag under test. The test does the same thing as the GET_DIRTY_BITMAP regular test. Except that it tests whether the dirtied bits are fetched all the same a second time, as opposed to observing them cleared. Link: https://lore.kernel.org/r/20231024135109.73787-19-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185 commit a2cdecd upstream. Intel-SIG: commit a2cdecd iommu/vt-d: Enhance capability check for nested parent domain allocation Backport to support Intel QAT live migration for in-tree driver This adds the scalable mode check before allocating the nested parent domain as checking nested capability is not enough. User may turn off scalable mode which also means no nested support even if the hardware supports it. Fixes: c97d1b2 ("iommu/vt-d: Add domain_alloc_user op") Link: https://lore.kernel.org/r/20231024150011.44642-1-yi.l.liu@intel.com Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185 commit 2e22aac upstream. Intel-SIG: commit 2e22aac iommufd/selftest: Fix page-size check in iommufd_test_dirty() Backport to support Intel QAT live migration for in-tree driver iommufd_test_dirty()/IOMMU_TEST_OP_DIRTY sets the dirty bits in the mock domain implementation that the userspace side validates against what it obtains via the UAPI. However in introducing iommufd_test_dirty() it forgot to validate page_size being 0 leading to two possible divide-by-zero problems: one at the beginning when calculating @max and while calculating the IOVA in the XArray PFN tracking list. While at it, validate the length to require non-zero value as well, as we can't be allocating a 0-sized bitmap. Link: https://lore.kernel.org/r/20231030113446.7056-1-joao.m.martins@oracle.com Reported-by: syzbot+25dc7383c30ecdc83c38@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-iommu/00000000000005f6aa0608b9220f@google.com/ Fixes: a9af47e ("iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_BITMAP") Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185 commit 9859418 upstream. Intel-SIG: commit 9859418 iommufd/selftest: Fix _test_mock_dirty_bitmaps() Backport to support Intel QAT live migration for in-tree driver The ASSERT_EQ() macro sneakily expands to two statements, so the loop here needs braces to ensure it captures both and actually terminates the test upon failure. Where these tests are currently failing on my arm64 machine, this reduces the number of logged lines from a rather unreasonable ~197,000 down to 10. While we're at it, we can also clean up the tautologous "count" calculations whose assertions can never fail unless mathematics and/or the C language become fundamentally broken. Fixes: a9af47e ("iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_BITMAP") Link: https://lore.kernel.org/r/90e083045243ef407dd592bb1deec89cd1f4ddf2.1700153535.git.robin.murphy@arm.com Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Tested-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185 commit e378c7d upstream. Intel-SIG: commit e378c7d iommu/vt-d: Set variable intel_dirty_ops to static Backport to support Intel QAT live migration for in-tree driver Fix the following warning: drivers/iommu/intel/iommu.c:302:30: warning: symbol 'intel_dirty_ops' was not declared. Should it be static? This variable is only used in its defining file, so it should be static. Fixes: f35f22c ("iommu/vt-d: Access/Dirty bit support for SS domains") Signed-off-by: Kunwu Chan <chentao@kylinos.cn> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Link: https://lore.kernel.org/r/20231120101025.1103404-1-chentao@kylinos.cn Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185 commit 1894cb1 upstream. Intel-SIG: commit 1894cb1 crypto: qat - adf_get_etr_base() helper Backport to support Intel QAT live migration for in-tree driver Add and use the new helper function adf_get_etr_base() which retrieves the virtual address of the ring bar. This will be used extensively when adding support for Live Migration. Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Xin Zeng <xin.zeng@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185 commit 1f8d6a1 upstream. Intel-SIG: commit 1f8d6a1 crypto: qat - relocate and rename 4xxx PF2VM definitions Backport to support Intel QAT live migration for in-tree driver Move and rename ADF_4XXX_PF2VM_OFFSET and ADF_4XXX_VM2PF_OFFSET to ADF_GEN4_PF2VM_OFFSET and ADF_GEN4_VM2PF_OFFSET respectively. These definitions are moved from adf_gen4_pfvf.c to adf_gen4_hw_data.h as they are specific to GEN4 and not just to qat_4xxx. This change is made in anticipation of their use in live migration. This does not introduce any functional change. Signed-off-by: Xin Zeng <xin.zeng@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185 commit 867e801 upstream. Intel-SIG: commit 867e801 crypto: qat - move PFVF compat checker to a function Backport to support Intel QAT live migration for in-tree driver Move the code that implements VF version compatibility on the PF side to a separate function so that it can be reused when doing VM live migration. This does not introduce any functional change. Signed-off-by: Xin Zeng <xin.zeng@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185 commit 680302d upstream. Intel-SIG: commit 680302d crypto: qat - relocate CSR access code Backport to support Intel QAT live migration for in-tree driver As the common hw_data files are growing and the adf_hw_csr_ops is going to be extended with new operations, move all logic related to ring CSRs to the newly created adf_gen[2|4]_hw_csr_data.[c|h] files. This does not introduce any functional change. Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Xin Zeng <xin.zeng@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185 commit 84058ff upstream. Intel-SIG: commit 84058ff crypto: qat - rename get_sla_arr_of_type() Backport to support Intel QAT live migration for in-tree driver The function get_sla_arr_of_type() returns a pointer to an SLA type specific array. Rename it and expose it as it will be used externally to this module. This does not introduce any functional change. Signed-off-by: Siming Wan <siming.wan@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Damian Muszynski <damian.muszynski@intel.com> Signed-off-by: Xin Zeng <xin.zeng@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185 commit 3fa1057 upstream. Intel-SIG: commit 3fa1057 crypto: qat - expand CSR operations for QAT GEN4 devices Backport to support Intel QAT live migration for in-tree driver Extend the CSR operations for QAT GEN4 devices to allow saving and restoring the rings state. The new operations will be used as a building block for implementing the state save and restore of Virtual Functions necessary for VM live migration. This adds the following operations: - read ring status register - read ring underflow/overflow status register - read ring nearly empty status register - read ring nearly full status register - read ring full status register - read ring complete status register - read ring exception status register - read/write ring exception interrupt mask register - read ring configuration register - read ring base register - read/write ring interrupt enable register - read ring interrupt flag register - read/write ring interrupt source select register - read ring coalesced interrupt enable register - read ring coalesced interrupt control register - read ring flag and coalesced interrupt enable register - read ring service arbiter enable register - get ring coalesced interrupt control enable mask Signed-off-by: Siming Wan <siming.wan@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Xin Zeng <xin.zeng@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185 commit bbfdde7 upstream. Intel-SIG: commit bbfdde7 crypto: qat - add bank save and restore flows Backport to support Intel QAT live migration for in-tree driver Add logic to save, restore, quiesce and drain a ring bank for QAT GEN4 devices. This allows to save and restore the state of a Virtual Function (VF) and will be used to implement VM live migration. Signed-off-by: Siming Wan <siming.wan@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Xin Zeng <xin.zeng@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185 commit 0fce55e upstream. Intel-SIG: commit 0fce55e crypto: qat - add interface for live migration Backport to support Intel QAT live migration for in-tree driver Extend the driver with a new interface to be used for VF live migration. This allows to create and destroy a qat_mig_dev object that contains a set of methods to allow to save and restore the state of QAT VF. This interface will be used by the qat-vfio-pci module. Signed-off-by: Xin Zeng <xin.zeng@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185 commit f0bbfc3 upstream. Intel-SIG: commit f0bbfc3 crypto: qat - implement interface for live migration Backport to support Intel QAT live migration for in-tree driver Add logic to implement the interface for live migration defined in qat/qat_mig_dev.h. This is specific for QAT GEN4 Virtual Functions (VFs). This introduces a migration data manager which is used to handle the device state during migration. The manager ensures that the device state is stored in a format that can be restored in the destination node. The VF state is organized into a hierarchical structure that includes a preamble, a general state section, a MISC bar section and an ETR bar section. The latter contains the state of the 4 ring pairs contained on a VF. Here is a graphical representation of the state: preamble | general state section | leaf state | MISC bar state section| leaf state | ETR bar state section | bank0 state section | leaf state | bank1 state section | leaf state | bank2 state section | leaf state | bank3 state section | leaf state In addition to the implementation of the qat_migdev_ops interface and the state manager framework, add a mutex in pfvf to avoid pf2vf messages during migration. Signed-off-by: Xin Zeng <xin.zeng@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185 commit bb20881 upstream. Intel-SIG: commit bb20881 vfio/qat: Add vfio_pci driver for Intel QAT SR-IOV VF devices Backport to support Intel QAT live migration for in-tree driver Add vfio pci variant driver for Intel QAT SR-IOV VF devices. This driver registers to the vfio subsystem through the interfaces exposed by the subsystem. It follows the live migration protocol v2 defined in uapi/linux/vfio.h and interacts with Intel QAT PF driver through a set of interfaces defined in qat/qat_mig_dev.h to support live migration of Intel QAT VF devices. This version only covers migration for Intel QAT GEN4 VF devices. Co-developed-by: Yahui Cao <yahui.cao@intel.com> Signed-off-by: Yahui Cao <yahui.cao@intel.com> Signed-off-by: Xin Zeng <xin.zeng@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240426064051.2859652-1-xin.zeng@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185 commit 140e4c8 upstream. Intel-SIG: commit 140e4c8 crypto: qat - Avoid -Wflex-array-member-not-at-end warnings Backport to support Intel QAT live migration for in-tree driver -Wflex-array-member-not-at-end is coming in GCC-14, and we are getting ready to enable it globally. Use the `__struct_group()` helper to separate the flexible array from the rest of the members in flexible `struct qat_alg_buf_list`, through tagged `struct qat_alg_buf_list_hdr`, and avoid embedding the flexible-array member in the middle of `struct qat_alg_fixed_buf_list`. Also, use `container_of()` whenever we need to retrieve a pointer to the flexible structure. So, with these changes, fix the following warnings: drivers/crypto/intel/qat/qat_common/qat_bl.h:25:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] drivers/crypto/intel/qat/qat_common/qat_bl.h:25:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] drivers/crypto/intel/qat/qat_common/qat_bl.h:25:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] drivers/crypto/intel/qat/qat_common/qat_bl.h:25:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] drivers/crypto/intel/qat/qat_common/qat_bl.h:25:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] drivers/crypto/intel/qat/qat_common/qat_bl.h:25:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] drivers/crypto/intel/qat/qat_common/qat_bl.h:25:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] drivers/crypto/intel/qat/qat_common/qat_bl.h:25:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] Link: KSPP/linux#202 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Acked-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185 commit f5c2cf9 upstream. Intel-SIG: commit f5c2cf9 crypto: qat - Fix spelling mistake "Invalide" -> "Invalid" Backport to support Intel QAT live migration for in-tree driver There is a spelling mistake in a dev_err message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Acked-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185 commit 5d5bd24 upstream. Intel-SIG: commit 5d5bd24 crypto: qat - implement dh fallback for primes > 4K Backport to support Intel QAT live migration for in-tree driver The Intel QAT driver provides support for the Diffie-Hellman (DH) algorithm, limited to prime numbers up to 4K. This driver is used by default on platforms with integrated QAT hardware for all DH requests. This has led to failures with algorithms requiring larger prime sizes, such as ffdhe6144. alg: ffdhe6144(dh): test failed on vector 1, err=-22 alg: self-tests for ffdhe6144(qat-dh) (ffdhe6144(dh)) failed (rc=-22) Implement a fallback mechanism when an unsupported request is received. Signed-off-by: Damian Muszynski <damian.muszynski@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185 commit 4a4fc6c upstream. Intel-SIG: commit 4a4fc6c crypto: qat - improve error message in adf_get_arbiter_mapping() Backport to support Intel QAT live migration for in-tree driver Improve error message to be more readable. Fixes: 5da6a2d ("crypto: qat - generate dynamically arbiter mappings") Signed-off-by: Adam Guerin <adam.guerin@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185 commit d281a28 upstream. Intel-SIG: commit d281a28 crypto: qat - improve error logging to be consistent across features Backport to support Intel QAT live migration for in-tree driver Improve error logging in rate limiting feature. Staying consistent with the error logging found in the telemetry feature. Fixes: d9fb840 ("crypto: qat - add rate limiting feature to qat_4xxx") Signed-off-by: Adam Guerin <adam.guerin@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185 commit 483fd65 upstream. Intel-SIG: commit 483fd65 crypto: qat - validate slices count returned by FW Backport to support Intel QAT live migration for in-tree driver The function adf_send_admin_tl_start() enables the telemetry (TL) feature on a QAT device by sending the ICP_QAT_FW_TL_START message to the firmware. This triggers the FW to start writing TL data to a DMA buffer in memory and returns an array containing the number of accelerators of each type (slices) supported by this HW. The pointer to this array is stored in the adf_tl_hw_data data structure called slice_cnt. The array slice_cnt is then used in the function tl_print_dev_data() to report in debugfs only statistics about the supported accelerators. An incorrect value of the elements in slice_cnt might lead to an out of bounds memory read. At the moment, there isn't an implementation of FW that returns a wrong value, but for robustness validate the slice count array returned by FW. Fixes: 69e7649 ("crypto: qat - add support for device telemetry") Signed-off-by: Lucas Segarra Fernandez <lucas.segarra.fernandez@intel.com> Reviewed-by: Damian Muszynski <damian.muszynski@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Reviewer's GuideThis PR backports upstream IOMMUFD dirty-tracking support and Intel QAT live-migration to the in-tree driver by extending IOMMUFD and iommu core interfaces, enriching self-tests, refactoring QAT CSR ops, and adding new migration infrastructure in the QAT PF and VF drivers. Sequence Diagram: Intel QAT VF Live Migration - State SavesequenceDiagram
participant Hypervisor as HV
participant QAT_PF_Driver as PF
participant QAT_VF_Driver as VF
participant QAT_Hardware as HW
HV->>PF: Initiate VF Suspend for Live Migration
PF->>VF: Request State Save (e.g., via qat_migdev_ops.suspend)
VF->>HW: Quiesce Bank Operations
loop for each bank
VF->>HW: Read Bank CSRs (using adf_hw_csr_ops)
VF->>VF: Store bank_state data
end
VF->>PF: Send Saved State (bank_state data)
PF->>HV: VF State Saved
Sequence Diagram: Intel QAT VF Live Migration - State RestoresequenceDiagram
participant Hypervisor as HV
participant QAT_PF_Driver as PF
participant QAT_VF_Driver as VF
participant QAT_Hardware as HW
HV->>PF: Initiate VF Resume for Live Migration
PF->>VF: Send Saved State (e.g., via qat_migdev_ops.resume, qat_migdev_ops.load_state)
loop for each bank
VF->>VF: Load bank_state data
VF->>HW: Restore Bank CSRs (using adf_hw_csr_ops)
end
VF->>HW: Resume Bank Operations
VF->>PF: State Restore Complete
PF->>HV: VF State Restored
Sequence Diagram: IOMMUFD Dirty Tracking OperationssequenceDiagram
actor UserSpace as US
participant IOMMUFD
participant IOMMU_Core as Core
participant HW_IOMMU as HW
US->>IOMMUFD: IOMMU_HWPT_ALLOC (dev_id, pt_id, flags=IOMMU_HWPT_ALLOC_DIRTY_TRACKING)
IOMMUFD->>Core: domain_alloc_user(dev, flags)
Core->>HW: Allocate/Configure Domain for Dirty Tracking
HW-->>Core: Domain Context
Core-->>IOMMUFD: iommu_domain with dirty_ops
IOMMUFD-->>US: out_hwpt_id
US->>IOMMUFD: IOMMU_HWPT_SET_DIRTY_TRACKING (hwpt_id, flags=IOMMU_HWPT_DIRTY_TRACKING_ENABLE)
IOMMUFD->>Core: iommu_dirty_ops.set_dirty_tracking(domain, true)
Core->>HW: Enable A/D bits in IOMMU tables (e.g., PASID SLADE, DTE_FLAG_HAD)
Core-->>IOMMUFD: Success/Failure
IOMMUFD-->>US: Success/Failure
US->>IOMMUFD: IOMMU_HWPT_GET_DIRTY_BITMAP (hwpt_id, iova, length, page_size, data_ptr)
IOMMUFD->>Core: iommu_dirty_ops.read_and_clear_dirty(domain, iova, length, flags, &bitmap_data)
Core->>HW: Read A/D bits from IOMMU tables
Core->>Core: Populate user bitmap
Core->>HW: Clear A/D bits (if not NO_CLEAR flag)
HW-->>Core: Dirty Info
Core-->>IOMMUFD: Populated Bitmap
IOMMUFD-->>US: Bitmap Data
Class Diagram: IOMMU Core Dirty Tracking API ChangesclassDiagram
class iommu_domain {
+const struct iommu_domain_ops *ops
+const struct iommu_dirty_ops *dirty_ops // New field
+unsigned long pgsize_bitmap
+string "...other fields"
}
class iommu_ops {
+bool (*capable)(struct device *dev, enum iommu_cap cap)
+struct iommu_domain* (*domain_alloc_user)(struct device *dev, u32 flags) // New method
+struct iommu_domain* (*domain_alloc)(unsigned int type)
string "...other existing methods"
}
class iommu_dirty_ops { // New struct
+int (*set_dirty_tracking)(struct iommu_domain *domain, bool enabled)
+int (*read_and_clear_dirty)(struct iommu_domain *domain, unsigned long iova, size_t size, unsigned long flags, struct iommu_dirty_bitmap *dirty)
}
class iommu_dirty_bitmap { // New struct
+struct iova_bitmap *bitmap
+struct iommu_iotlb_gather *gather
}
class iova_bitmap { // New utility struct
+iova_bitmap_alloc(unsigned long iova, size_t length, unsigned long page_size, u64 __user *data)
+iova_bitmap_free()
+iova_bitmap_for_each(void* opaque, iova_bitmap_fn_t fn)
+iova_bitmap_set(unsigned long iova, size_t length)
}
iommu_domain --> iommu_dirty_ops : uses (optional)
iommu_dirty_ops ..> iommu_dirty_bitmap : uses
iommu_dirty_bitmap ..> iova_bitmap : uses
iommu_ops ..> iommu_domain : creates/manages
Class Diagram: QAT Driver Structures for Live MigrationclassDiagram
class adf_hw_device_data {
+char* dev_class_name
+struct adf_hw_csr_ops csr_ops
+struct qat_migdev_ops vfmig_ops // New field
+int (*bank_state_save)(struct adf_accel_dev*, u32, struct bank_state*) // New field
+int (*bank_state_restore)(struct adf_accel_dev*, u32, struct bank_state*) // New field
+string "...other fields"
}
class adf_hw_csr_ops {
+string "Existing CSR accessors (read/write_csr_ring_head etc.)"
+string "--- Expanded with many new members for detailed CSR access, e.g.: ---"
+u32 (*read_csr_stat)(void __iomem*, u32 bank)
+u32 (*read_csr_e_stat)(void __iomem*, u32 bank)
+dma_addr_t (*read_csr_ring_base)(void __iomem*, u32 bank, u32 ring)
+void (*write_csr_int_en)(void __iomem*, u32 bank, u32 value)
+u32 (*read_csr_int_col_ctl)(void __iomem*, u32 bank)
+u32 (*get_int_col_ctl_enable_mask)(void)
+string "(Consolidates previously static/macro CSR ops)"
}
class qat_migdev_ops { // New struct
+int (*init)(struct qat_mig_dev *mdev)
+void (*cleanup)(struct qat_mig_dev *mdev)
+int (*suspend)(struct qat_mig_dev *mdev)
+int (*resume)(struct qat_mig_dev *mdev)
+int (*save_state)(struct qat_mig_dev *mdev)
+int (*load_state)(struct qat_mig_dev *mdev)
+string "...other migration lifecycle ops"
}
class bank_state { // New struct for QAT bank state
+u32 ringstat0
+u32 ringuostat
+u32 ringestat
+u32 iaintflagen
+struct ring_config rings[ADF_ETR_MAX_RINGS_PER_BANK]
+string "...other saved CSR values"
}
class ring_config { // New struct for ring configuration within bank_state
+u64 base
+u32 config
+u32 head
+u32 tail
}
class adf_accel_vf_info {
+struct adf_accel_dev *accel_dev
+struct mutex pfvf_mig_lock // New field
+void *mig_priv // New field (e.g., for adf_gen4_vfmig)
+string "...other fields"
}
class adf_gen4_vfmig { // New struct for Gen4 VF migration state
+struct adf_mstate_mgr *mstate_mgr
+bool bank_stopped[]
}
adf_hw_device_data o-- adf_hw_csr_ops : uses
adf_hw_device_data o-- qat_migdev_ops : uses
adf_hw_device_data ..> bank_state : uses in bank_state_save/restore methods
bank_state o-- ring_config : contains
adf_accel_vf_info ..> adf_gen4_vfmig : mig_priv may point to
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Pull Request Overview
This PR backports Intel QAT live migration support and adds IOMMUFD Dirty Tracking improvements to the in‐tree QAT driver while aligning with upstream changes.
- Adds new state save/restore, bank drain, and coalesced timer functions for enhanced migration support.
- Introduces new CSR data files and updates macro names and function calls in gen4 and gen2 driver code.
- Integrates new migration operations into the driver and updates Makefile to include additional migration-related objects.
Reviewed Changes
Copilot reviewed 76 out of 76 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| drivers/crypto/intel/qat/qat_common/adf_gen4_tl.c | Assigns new max_sl_cnt field in TL data |
| drivers/crypto/intel/qat/qat_common/adf_gen4_pfvf.c | Updates offset macros to use new naming conventions |
| drivers/crypto/intel/qat/qat_common/adf_gen4_hw_data.h | Removes old transport access macros; adds new PF2VM/VM2PF macros and bank state functions' prototypes |
| drivers/crypto/intel/qat/qat_common/adf_gen4_hw_data.c | Refactors ring pair reset to use adf_get_etr_base(), and adds bank state handling functions |
| drivers/crypto/intel/qat/qat_common/adf_gen4_hw_csr_data.[hc] | New CSR definitions and inline functions for gen4 hardware data |
| drivers/crypto/intel/qat/qat_common/adf_accel_devices.h | Adds new migration operations structure and related members |
| Makefile and various qat_* files | Update build objects to include migration and state management sources |
Comments suppressed due to low confidence (3)
drivers/crypto/intel/qat/qat_common/adf_gen4_hw_data.h:122
- Ensure that the newly introduced PF2VM and corresponding VM2PF macros are fully aligned with the hardware specifications and that their usage is consistent across the driver. This change should be cross-checked against legacy definitions to prevent any potential conflicts.
#define ADF_GEN4_PF2VM_OFFSET(i) (0x40B010 + (i) * 0x20)
drivers/crypto/intel/qat/qat_common/adf_gen4_hw_data.c:228
- Verify that the call to adf_get_etr_base() returns the correct memory-mapped base address for the ETR bar, replacing the legacy GET_BARS() access; ensure this modification does not affect other parts of the driver that expect the previous behavior.
void __iomem *csr = adf_get_etr_base(accel_dev);
drivers/crypto/intel/qat/qat_common/adf_accel_devices.h:262
- [nitpick] Confirm that the new migration operations (qat_migdev_ops) and the accompanying synchronization via pfvf_mig_lock are correctly implemented and invoked during driver initialization to ensure thread-safe migration handling.
struct qat_migdev_ops {
|
Checkdepends: commit 2780025 commit 2780025 commit cf1e515 commit 8541323 commit 2760c51 commit ec61f82 commit 9560393 commit ffa3c79 commit 694a6f5 commit a5d8922 commit 9283b73 |
|
bugzilla: https://bugzilla.openanolis.cn/show_bug.cgi?id=9185
Intel® QuickAssist Technology (Intel® QAT) provides hardware acceleration for offloading security, authentication and compression services from the CPU, thus significantly increasing the performance and efficiency of standard platform solutions.
Intel QAT in-tree driver is supported in PR2954
This PR is to support Intel QAT live migration for in-tree driver(totally 51 commits).
Since IOMMUFD Dirty Tracking support can improve the performance of QAT live migration, it is added in this PR.
Upstream commit list for IOMMUFD Dirty Tracking support from kernel v6.7(totally 31 commits):
e378c7d iommu/vt-d: Set variable intel_dirty_ops to static
9859418 iommufd/selftest: Fix _test_mock_dirty_bitmaps()
2e22aac iommufd/selftest: Fix page-size check in iommufd_test_dirty()
a2cdecd iommu/vt-d: Enhance capability check for nested parent domain allocation
0795b30 iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR flag
ae36fe7 iommufd/selftest: Test out_capabilities in IOMMU_GET_HW_INFO
a9af47e iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_BITMAP
7adf267 iommufd/selftest: Test IOMMU_HWPT_SET_DIRTY_TRACKING
266ce58 iommufd/selftest: Test IOMMU_HWPT_ALLOC_DIRTY_TRACKING
e04b23c iommufd/selftest: Expand mock_domain with dev_flags
f35f22c iommu/vt-d: Access/Dirty bit support for SS domains
421a511 iommu/amd: Access/Dirty bit support in IOPTEs
1342881 iommu/amd: Add domain_alloc_user based domain allocation
6098481 iommufd: Add a flag to skip clearing of IOPTE dirty
7623683 iommufd: Add capabilities to IOMMU_GET_HW_INFO
b9a60d6 iommufd: Add IOMMU_HWPT_GET_DIRTY_BITMAP
e2a4b29 iommufd: Add IOMMU_HWPT_SET_DIRTY_TRACKING
5f9bdbf iommufd: Add a flag to enforce dirty tracking on attach
b5f9e63 iommufd: Correct IOMMU_HWPT_ALLOC_NEST_PARENT description
750e2e9 iommu: Add iommu_domain ops for dirty tracking
13578d4 iommufd/iova_bitmap: Move symbols to IOMMUFD namespace
8c9c727 vfio: Move iova_bitmap into iommufd
53f0b02 vfio/iova_bitmap: Export more API symbols
266dcae iommufd/selftest: Rework TEST_LENGTH to test min_size explicitly
c97d1b2 iommu/vt-d: Add domain_alloc_user op
4086636 iommufd/selftest: Add domain_alloc_user() support in iommu mock
bb812e0 iommufd/selftest: Iterate idev_ids in mock_domain's alloc_hwpt test
4ff5421 iommufd: Support allocating nested parent domain
89d6387 iommufd: Flow user flags for domain allocation to domain_alloc_user()
7975b72 iommufd: Use the domain_alloc_user() op for domain allocation
909f4ab iommu: Add new iommu op to create domains owned by userspace
One commit to enable IOMMUFD as kernel module for IOMMUFD Dirty Tracking:
b852ef2d6d2c49c525719df8d517d784c0359385 x86: configs: Add kernel config required for IOMMUFD Dirty Tracking
(deepin: Skiped b852ef2d6d2c49c525719df8d517d784c0359385 x86: configs: Add kernel config required for IOMMUFD Dirty Tracking.)
Upstream commit list for QAT live migration from kernel v6.10(totally 10 commits):
bb20881 vfio/qat: Add vfio_pci driver for Intel QAT SR-IOV VF devices
f0bbfc3 crypto: qat - implement interface for live migration
0fce55e crypto: qat - add interface for live migration
bbfdde7 crypto: qat - add bank save and restore flows
3fa1057 crypto: qat - expand CSR operations for QAT GEN4 devices
84058ff crypto: qat - rename get_sla_arr_of_type()
680302d crypto: qat - relocate CSR access code
867e801 crypto: qat - move PFVF compat checker to a function
1f8d6a1 crypto: qat - relocate and rename 4xxx PF2VM definitions
1894cb1 crypto: qat - adf_get_etr_base() helper
Upstream commit list for QAT bug fix from kernel v6.10(totally 8 commits):
d3b17c6 crypto: qat - Fix ADF_DEV_RESET_SYNC memory leak
a3dc1f2 crypto: qat - specify firmware files for 402xx
483fd65 crypto: qat - validate slices count returned by FW
d281a28 crypto: qat - improve error logging to be consistent across features
4a4fc6c crypto: qat - improve error message in adf_get_arbiter_mapping()
5d5bd24 crypto: qat - implement dh fallback for primes > 4K
f5c2cf9 crypto: qat - Fix spelling mistake "Invalide" -> "Invalid"
140e4c8 crypto: qat - Avoid -Wflex-array-member-not-at-end warnings
(deepin: Drop d3b17c6 ("crypto: qat - Fix ADF_DEV_RESET_SYNC memory leak") and a3dc1f2 ("crypto: qat - specify firmware files for 402xx") because of merged before.)
One commit to enable QAT live migration as kernel module:
6013cbb2826decd8fb5546764a5f401c7751e0d2 x86: configs: Add kernel config to support Intel QAT live migration
(deepin: Skiped 6013cbb2826decd8fb5546764a5f401c7751e0d2 x86: configs: Add kernel config to support Intel QAT live migration)
Test
Test is PASS on SPR, EMR, GNR & SRF platforms:
Kernel tools/testing/selftests/iommu testcases PASS.
QAT Live migration succeeds with cpa_sample_code in qatlib running on guest.
Configs
CONFIG_IOMMUFD=m
CONFIG_QAT_VFIO_PCI=m
Summary by Sourcery
Add support for IOMMU dirty tracking and live migration for Intel QAT in-tree driver by backporting upstream commits and extending IOMMUFD, QAT common, and VFIO code. Introduce new ioctl interfaces, kernel domain dirty tracking ops, selftests, and a state manager for QAT VF migration.
New Features:
Enhancements:
Build:
Tests: