Skip to content

[Deepin-Kernel-SIG] [linux 6.6-y] [Upstream] [Intel] Intel: Backport to support QuickAssist Technology(QAT) live migration for in-tree driver#837

Merged
opsiff merged 47 commits into
deepin-community:linux-6.6.yfrom
Avenger-285714:QAT_live_migration
Jun 3, 2025
Merged

[Deepin-Kernel-SIG] [linux 6.6-y] [Upstream] [Intel] Intel: Backport to support QuickAssist Technology(QAT) live migration for in-tree driver#837
opsiff merged 47 commits into
deepin-community:linux-6.6.yfrom
Avenger-285714:QAT_live_migration

Conversation

@Avenger-285714
Copy link
Copy Markdown
Member

@Avenger-285714 Avenger-285714 commented Jun 2, 2025

bugzilla: https://bugzilla.openanolis.cn/show_bug.cgi?id=9185

Intel® QuickAssist Technology (Intel® QAT) provides hardware acceleration for offloading security, authentication and compression services from the CPU, thus significantly increasing the performance and efficiency of standard platform solutions.

Intel QAT in-tree driver is supported in PR2954
This PR is to support Intel QAT live migration for in-tree driver(totally 51 commits).
Since IOMMUFD Dirty Tracking support can improve the performance of QAT live migration, it is added in this PR.

Upstream commit list for IOMMUFD Dirty Tracking support from kernel v6.7(totally 31 commits):
e378c7d iommu/vt-d: Set variable intel_dirty_ops to static
9859418 iommufd/selftest: Fix _test_mock_dirty_bitmaps()
2e22aac iommufd/selftest: Fix page-size check in iommufd_test_dirty()
a2cdecd iommu/vt-d: Enhance capability check for nested parent domain allocation
0795b30 iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR flag
ae36fe7 iommufd/selftest: Test out_capabilities in IOMMU_GET_HW_INFO
a9af47e iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_BITMAP
7adf267 iommufd/selftest: Test IOMMU_HWPT_SET_DIRTY_TRACKING
266ce58 iommufd/selftest: Test IOMMU_HWPT_ALLOC_DIRTY_TRACKING
e04b23c iommufd/selftest: Expand mock_domain with dev_flags
f35f22c iommu/vt-d: Access/Dirty bit support for SS domains
421a511 iommu/amd: Access/Dirty bit support in IOPTEs
1342881 iommu/amd: Add domain_alloc_user based domain allocation
6098481 iommufd: Add a flag to skip clearing of IOPTE dirty
7623683 iommufd: Add capabilities to IOMMU_GET_HW_INFO
b9a60d6 iommufd: Add IOMMU_HWPT_GET_DIRTY_BITMAP
e2a4b29 iommufd: Add IOMMU_HWPT_SET_DIRTY_TRACKING
5f9bdbf iommufd: Add a flag to enforce dirty tracking on attach
b5f9e63 iommufd: Correct IOMMU_HWPT_ALLOC_NEST_PARENT description
750e2e9 iommu: Add iommu_domain ops for dirty tracking
13578d4 iommufd/iova_bitmap: Move symbols to IOMMUFD namespace
8c9c727 vfio: Move iova_bitmap into iommufd
53f0b02 vfio/iova_bitmap: Export more API symbols
266dcae iommufd/selftest: Rework TEST_LENGTH to test min_size explicitly
c97d1b2 iommu/vt-d: Add domain_alloc_user op
4086636 iommufd/selftest: Add domain_alloc_user() support in iommu mock
bb812e0 iommufd/selftest: Iterate idev_ids in mock_domain's alloc_hwpt test
4ff5421 iommufd: Support allocating nested parent domain
89d6387 iommufd: Flow user flags for domain allocation to domain_alloc_user()
7975b72 iommufd: Use the domain_alloc_user() op for domain allocation
909f4ab iommu: Add new iommu op to create domains owned by userspace

One commit to enable IOMMUFD as kernel module for IOMMUFD Dirty Tracking:
b852ef2d6d2c49c525719df8d517d784c0359385 x86: configs: Add kernel config required for IOMMUFD Dirty Tracking

(deepin: Skiped b852ef2d6d2c49c525719df8d517d784c0359385 x86: configs: Add kernel config required for IOMMUFD Dirty Tracking.)

Upstream commit list for QAT live migration from kernel v6.10(totally 10 commits):
bb20881 vfio/qat: Add vfio_pci driver for Intel QAT SR-IOV VF devices
f0bbfc3 crypto: qat - implement interface for live migration
0fce55e crypto: qat - add interface for live migration
bbfdde7 crypto: qat - add bank save and restore flows
3fa1057 crypto: qat - expand CSR operations for QAT GEN4 devices
84058ff crypto: qat - rename get_sla_arr_of_type()
680302d crypto: qat - relocate CSR access code
867e801 crypto: qat - move PFVF compat checker to a function
1f8d6a1 crypto: qat - relocate and rename 4xxx PF2VM definitions
1894cb1 crypto: qat - adf_get_etr_base() helper

Upstream commit list for QAT bug fix from kernel v6.10(totally 8 commits):
d3b17c6 crypto: qat - Fix ADF_DEV_RESET_SYNC memory leak
a3dc1f2 crypto: qat - specify firmware files for 402xx
483fd65 crypto: qat - validate slices count returned by FW
d281a28 crypto: qat - improve error logging to be consistent across features
4a4fc6c crypto: qat - improve error message in adf_get_arbiter_mapping()
5d5bd24 crypto: qat - implement dh fallback for primes > 4K
f5c2cf9 crypto: qat - Fix spelling mistake "Invalide" -> "Invalid"
140e4c8 crypto: qat - Avoid -Wflex-array-member-not-at-end warnings

(deepin: Drop d3b17c6 ("crypto: qat - Fix ADF_DEV_RESET_SYNC memory leak") and a3dc1f2 ("crypto: qat - specify firmware files for 402xx") because of merged before.)

One commit to enable QAT live migration as kernel module:
6013cbb2826decd8fb5546764a5f401c7751e0d2 x86: configs: Add kernel config to support Intel QAT live migration

(deepin: Skiped 6013cbb2826decd8fb5546764a5f401c7751e0d2 x86: configs: Add kernel config to support Intel QAT live migration)

Test
Test is PASS on SPR, EMR, GNR & SRF platforms:
Kernel tools/testing/selftests/iommu testcases PASS.
QAT Live migration succeeds with cpa_sample_code in qatlib running on guest.

Configs
CONFIG_IOMMUFD=m
CONFIG_QAT_VFIO_PCI=m

Summary by Sourcery

Add support for IOMMU dirty tracking and live migration for Intel QAT in-tree driver by backporting upstream commits and extending IOMMUFD, QAT common, and VFIO code. Introduce new ioctl interfaces, kernel domain dirty tracking ops, selftests, and a state manager for QAT VF migration.

New Features:

  • Expose IOMMUFD dirty tracking via new ioctls to enable or disable hardware dirty tracking and retrieve dirty bitmaps
  • Introduce IOMMU domain dirty tracking operations in the kernel and in Intel/AMD IOMMU drivers
  • Add live migration support to the in-tree Intel QAT driver, including VF state save/restore and a new QAT VFIO PCI migration driver
  • Add user API flags for HW pagetable allocation to specify nesting and dirty tracking capabilities

Enhancements:

  • Refactor IOVA bitmap and IO pagetable code to integrate dirty tracking operations
  • Extract QAT Gen4 CSR operations into a dedicated source file and unify hardware base address access
  • Introduce a generic QAT migration state manager (adf_mstate_mgr) for hierarchical device state snapshots and restore sequences

Build:

  • Update Kconfig and Makefiles to include new dirty tracking and QAT migration modules (iova_bitmap, iommufd, qat_mig_dev, qat_vfio_pci)

Tests:

  • Extend IOMMUFD selftests to cover dirty tracking enable/disable and bitmap retrieval

yiliu1765 and others added 30 commits June 2, 2025 12:14
ANBZ: #9185

commit 909f4ab upstream.

Intel-SIG: commit 909f4ab iommu: Add new iommu op to create domains owned by userspace
Backport to support Intel QAT live migration for in-tree driver

Introduce a new iommu_domain op to create domains owned by userspace,
e.g. through IOMMUFD. These domains have a few different properties
compares to kernel owned domains:

 - They may be PAGING domains, but created with special parameters.
   For instance aperture size changes/number of levels, different
   IOPTE formats, or other things necessary to make a vIOMMU work

 - We have to track all the memory allocations with GFP_KERNEL_ACCOUNT
   to make the cgroup sandbox stronger

 - Device-specialty domains, such as NESTED domains can be created by
   IOMMUFD.

The new op clearly says the domain is being created by IOMMUFD, that the
domain is intended for userspace use, and it provides a way to pass user
flags or a driver specific uAPI structure to customize the created domain
to exactly what the vIOMMU userspace driver requires.

iommu drivers that cannot support VFIO/IOMMUFD should not support this
op. This includes any driver that cannot provide a fully functional PAGING
domain.

This new op for now is only supposed to be used by IOMMUFD, hence no
wrapper for it. IOMMUFD would call the callback directly. As for domain
free, IOMMUFD would use iommu_domain_free().

Link: https://lore.kernel.org/r/20230928071528.26258-2-yi.l.liu@intel.com
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185

commit 7975b72 upstream.

Intel-SIG: commit 7975b72 iommufd: Use the domain_alloc_user() op for domain allocation
Backport to support Intel QAT live migration for in-tree driver

Make IOMMUFD use iommu_domain_alloc_user() by default for iommu_domain
creation. IOMMUFD needs to support iommu_domain allocation with parameters
from userspace in nested support, and a driver is expected to implement
everything under this op.

If the iommu driver doesn't provide domain_alloc_user callback then
IOMMUFD falls back to use iommu_domain_alloc() with an UNMANAGED type if
possible.

Link: https://lore.kernel.org/r/20230928071528.26258-3-yi.l.liu@intel.com
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185

commit 89d6387 upstream.

Intel-SIG: commit 89d6387 iommufd: Flow user flags for domain allocation to domain_alloc_user()
Backport to support Intel QAT live migration for in-tree driver

Extends iommufd_hw_pagetable_alloc() to accept user flags, the uAPI will
provide the flags.

Link: https://lore.kernel.org/r/20230928071528.26258-4-yi.l.liu@intel.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185

commit 4ff5421 upstream.

Intel-SIG: commit 4ff5421 iommufd: Support allocating nested parent domain
Backport to support Intel QAT live migration for in-tree driver

Extend IOMMU_HWPT_ALLOC to allocate domains to be used as parent (stage-2)
in nested translation.

Add IOMMU_HWPT_ALLOC_NEST_PARENT to the uAPI.

Link: https://lore.kernel.org/r/20230928071528.26258-5-yi.l.liu@intel.com
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185

commit bb812e0 upstream.

Intel-SIG: commit bb812e0 iommufd/selftest: Iterate idev_ids in mock_domain's alloc_hwpt test
Backport to support Intel QAT live migration for in-tree driver

The point in iterating variant->mock_domains is to test the idev_ids[0]
and idev_ids[1]. So use it instead of keeping testing idev_ids[0] only.

Link: https://lore.kernel.org/r/20230919011637.16483-1-nicolinc@nvidia.com
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185

commit 4086636 upstream.

Intel-SIG: commit 4086636 iommufd/selftest: Add domain_alloc_user() support in iommu mock
Backport to support Intel QAT live migration for in-tree driver

Add mock_domain_alloc_user() and a new test case for
IOMMU_HWPT_ALLOC_NEST_PARENT.

Link: https://lore.kernel.org/r/20230928071528.26258-6-yi.l.liu@intel.com
Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185

commit c97d1b2 upstream.

Intel-SIG: commit c97d1b2 iommu/vt-d: Add domain_alloc_user op
Backport to support Intel QAT live migration for in-tree driver

Add the domain_alloc_user() op implementation. It supports allocating
domains to be used as parent under nested translation.

Unlike other drivers VT-D uses only a single page table format so it only
needs to check if the HW can support nesting.

Link: https://lore.kernel.org/r/20230928071528.26258-7-yi.l.liu@intel.com
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185

commit 266dcae upstream.

Intel-SIG: commit 266dcae iommufd/selftest: Rework TEST_LENGTH to test min_size explicitly
Backport to support Intel QAT live migration for in-tree driver

TEST_LENGTH passing ".size = sizeof(struct _struct) - 1" expects -EINVAL
from "if (ucmd.user_size < op->min_size)" check in iommufd_fops_ioctl().
This has been working when min_size is exactly the size of the structure.

However, if the size of the structure becomes larger than min_size, i.e.
the passing size above is larger than min_size, that min_size sanity no
longer works.

Since the first test in TEST_LENGTH() was to test that min_size sanity
routine, rework it to support a min_size calculation, rather than using
the full size of the structure.

Link: https://lore.kernel.org/r/20231015074648.24185-1-nicolinc@nvidia.com
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185

commit 53f0b02 upstream.

Intel-SIG: commit 53f0b02 vfio/iova_bitmap: Export more API symbols
Backport to support Intel QAT live migration for in-tree driver

In preparation to move iova_bitmap into iommufd, export the rest of API
symbols that will be used in what could be used by modules, namely:

	iova_bitmap_alloc
	iova_bitmap_free
	iova_bitmap_for_each

Link: https://lore.kernel.org/r/20231024135109.73787-2-joao.m.martins@oracle.com
Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185

commit 8c9c727 upstream.

Intel-SIG: commit 8c9c727 vfio: Move iova_bitmap into iommufd
Backport to support Intel QAT live migration for in-tree driver

Both VFIO and IOMMUFD will need iova bitmap for storing dirties and walking
the user bitmaps, so move to the common dependency into IOMMUFD.  In doing
so, create the symbol IOMMUFD_DRIVER which designates the builtin code that
will be used by drivers when selected. Today this means MLX5_VFIO_PCI and
PDS_VFIO_PCI. IOMMU drivers will do the same (in future patches) when
supporting dirty tracking and select IOMMUFD_DRIVER accordingly.

Given that the symbol maybe be disabled, add header definitions in
iova_bitmap.h for when IOMMUFD_DRIVER=n

Link: https://lore.kernel.org/r/20231024135109.73787-3-joao.m.martins@oracle.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185

commit 13578d4 upstream.

Intel-SIG: commit 13578d4 iommufd/iova_bitmap: Move symbols to IOMMUFD namespace
Backport to support Intel QAT live migration for in-tree driver

Have the IOVA bitmap exported symbols adhere to the IOMMUFD symbol
export convention i.e. using the IOMMUFD namespace. In doing so,
import the namespace in the current users. This means VFIO and the
vfio-pci drivers that use iova_bitmap_set().

Link: https://lore.kernel.org/r/20231024135109.73787-4-joao.m.martins@oracle.com
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185

commit 750e2e9 upstream.

Intel-SIG: commit 750e2e9 iommu: Add iommu_domain ops for dirty tracking
Backport to support Intel QAT live migration for in-tree driver

Add to iommu domain operations a set of callbacks to perform dirty
tracking, particulary to start and stop tracking and to read and clear the
dirty data.

Drivers are generally expected to dynamically change its translation
structures to toggle the tracking and flush some form of control state
structure that stands in the IOVA translation path. Though it's not
mandatory, as drivers can also enable dirty tracking at boot, and just
clear the dirty bits before setting dirty tracking. For each of the newly
added IOMMU core APIs:

iommu_cap::IOMMU_CAP_DIRTY_TRACKING: new device iommu_capable value when
probing for capabilities of the device.

.set_dirty_tracking(): an iommu driver is expected to change its
translation structures and enable dirty tracking for the devices in the
iommu_domain. For drivers making dirty tracking always-enabled, it should
just return 0.

.read_and_clear_dirty(): an iommu driver is expected to walk the pagetables
for the iova range passed in and use iommu_dirty_bitmap_record() to record
dirty info per IOVA. When detecting that a given IOVA is dirty it should
also clear its dirty state from the PTE, *unless* the flag
IOMMU_DIRTY_NO_CLEAR is passed in -- flushing is steered from the caller of
the domain_op via iotlb_gather. The iommu core APIs use the same data
structure in use for dirty tracking for VFIO device dirty (struct
iova_bitmap) abstracted by iommu_dirty_bitmap_record() helper function.

domain::dirty_ops: IOMMU domains will store the dirty ops depending on
whether the iommu device supports dirty tracking or not. iommu drivers can
then use this field to figure if the dirty tracking is supported+enforced
on attach. The enforcement is enable via domain_alloc_user() which is done
via IOMMUFD hwpt flag introduced later.

Link: https://lore.kernel.org/r/20231024135109.73787-5-joao.m.martins@oracle.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185

commit b5f9e63 upstream.

Intel-SIG: commit b5f9e63 iommufd: Correct IOMMU_HWPT_ALLOC_NEST_PARENT description
Backport to support Intel QAT live migration for in-tree driver

The IOMMU_HWPT_ALLOC_NEST_PARENT flag is used to allocate a HWPT. Though
a HWPT holds a domain in the core structure, it is still quite confusing
to describe it using "domain" in the uAPI kdoc. Correct it to "HWPT".

Fixes: 4ff5421 ("iommufd: Support allocating nested parent domain")
Link: https://lore.kernel.org/r/20231017181552.12667-1-nicolinc@nvidia.com
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185

commit 5f9bdbf upstream.

Intel-SIG: commit 5f9bdbf iommufd: Add a flag to enforce dirty tracking on attach
Backport to support Intel QAT live migration for in-tree driver

Throughout IOMMU domain lifetime that wants to use dirty tracking, some
guarantees are needed such that any device attached to the iommu_domain
supports dirty tracking.

The idea is to handle a case where IOMMU in the system are assymetric
feature-wise and thus the capability may not be supported for all devices.
The enforcement is done by adding a flag into HWPT_ALLOC namely:

	IOMMU_HWPT_ALLOC_DIRTY_TRACKING

.. Passed in HWPT_ALLOC ioctl() flags. The enforcement is done by creating
a iommu_domain via domain_alloc_user() and validating the requested flags
with what the device IOMMU supports (and failing accordingly) advertised).
Advertising the new IOMMU domain feature flag requires that the individual
iommu driver capability is supported when a future device attachment
happens.

Link: https://lore.kernel.org/r/20231024135109.73787-6-joao.m.martins@oracle.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185

commit e2a4b29 upstream.

Intel-SIG: commit e2a4b29 iommufd: Add IOMMU_HWPT_SET_DIRTY_TRACKING
Backport to support Intel QAT live migration for in-tree driver

Every IOMMU driver should be able to implement the needed iommu domain ops
to control dirty tracking.

Connect a hw_pagetable to the IOMMU core dirty tracking ops, specifically
the ability to enable/disable dirty tracking on an IOMMU domain
(hw_pagetable id). To that end add an io_pagetable kernel API to toggle
dirty tracking:

* iopt_set_dirty_tracking(iopt, [domain], state)

The intended caller of this is via the hw_pagetable object that is created.

Internally it will ensure the leftover dirty state is cleared /right
before/ dirty tracking starts. This is also useful for iommu drivers which
may decide that dirty tracking is always-enabled at boot without wanting to
toggle dynamically via corresponding iommu domain op.

Link: https://lore.kernel.org/r/20231024135109.73787-7-joao.m.martins@oracle.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185

commit b9a60d6 upstream.

Intel-SIG: commit b9a60d6 iommufd: Add IOMMU_HWPT_GET_DIRTY_BITMAP
Backport to support Intel QAT live migration for in-tree driver

Connect a hw_pagetable to the IOMMU core dirty tracking
read_and_clear_dirty iommu domain op. It exposes all of the functionality
for the UAPI that read the dirtied IOVAs while clearing the Dirty bits from
the PTEs.

In doing so, add an IO pagetable API iopt_read_and_clear_dirty_data() that
performs the reading of dirty IOPTEs for a given IOVA range and then
copying back to userspace bitmap.

Underneath it uses the IOMMU domain kernel API which will read the dirty
bits, as well as atomically clearing the IOPTE dirty bit and flushing the
IOTLB at the end. The IOVA bitmaps usage takes care of the iteration of the
bitmaps user pages efficiently and without copies. Within the iterator
function we iterate over io-pagetable contigous areas that have been
mapped.

Contrary to past incantation of a similar interface in VFIO the IOVA range
to be scanned is tied in to the bitmap size, thus the application needs to
pass a appropriately sized bitmap address taking into account the iova
range being passed *and* page size ... as opposed to allowing bitmap-iova
!= iova.

Link: https://lore.kernel.org/r/20231024135109.73787-8-joao.m.martins@oracle.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185

commit 7623683 upstream.

Intel-SIG: commit 7623683 iommufd: Add capabilities to IOMMU_GET_HW_INFO
Backport to support Intel QAT live migration for in-tree driver

Extend IOMMUFD_CMD_GET_HW_INFO op to query generic iommu capabilities for a
given device.

Capabilities are IOMMU agnostic and use device_iommu_capable() API passing
one of the IOMMU_CAP_*. Enumerate IOMMU_CAP_DIRTY_TRACKING for now in the
out_capabilities field returned back to userspace.

Link: https://lore.kernel.org/r/20231024135109.73787-9-joao.m.martins@oracle.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185

commit 6098481 upstream.

Intel-SIG: commit 6098481 iommufd: Add a flag to skip clearing of IOPTE dirty
Backport to support Intel QAT live migration for in-tree driver

VFIO has an operation where it unmaps an IOVA while returning a bitmap with
the dirty data. In reality the operation doesn't quite query the IO
pagetables that the PTE was dirty or not. Instead it marks as dirty on
anything that was mapped, and doing so in one syscall.

In IOMMUFD the equivalent is done in two operations by querying with
GET_DIRTY_IOVA followed by UNMAP_IOVA. However, this would incur two TLB
flushes given that after clearing dirty bits IOMMU implementations require
invalidating their IOTLB, plus another invalidation needed for the UNMAP.
To allow dirty bits to be queried faster, add a flag
(IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR) that requests to not clear the dirty
bits from the PTE (but just reading them), under the expectation that the
next operation is the unmap. An alternative is to unmap and just
perpectually mark as dirty as that's the same behaviour as today. So here
equivalent functionally can be provided with unmap alone, and if real dirty
info is required it will amortize the cost while querying.

There's still a race against DMA where in theory the unmap of the IOVA
(when the guest invalidates the IOTLB via emulated iommu) would race
against the VF performing DMA on the same IOVA. As discussed in [0], we are
accepting to resolve this race as throwing away the DMA and it doesn't
matter if it hit physical DRAM or not, the VM can't tell if we threw it
away because the DMA was blocked or because we failed to copy the DRAM.

[0] https://lore.kernel.org/linux-iommu/20220502185239.GR8364@nvidia.com/

Link: https://lore.kernel.org/r/20231024135109.73787-10-joao.m.martins@oracle.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185

commit 1342881 upstream.

Intel-SIG: commit 1342881 iommu/amd: Add domain_alloc_user based domain allocation
Backport to support Intel QAT live migration for in-tree driver

Add the domain_alloc_user op implementation. To that end, refactor
amd_iommu_domain_alloc() to receive a dev pointer and flags, while renaming
it too, such that it becomes a common function shared with
domain_alloc_user() implementation. The sole difference with
domain_alloc_user() is that we initialize also other fields that
iommu_domain_alloc() does. It lets it return the iommu domain correctly
initialized in one function.

This is in preparation to add dirty enforcement on AMD implementation of
domain_alloc_user.

Link: https://lore.kernel.org/r/20231024135109.73787-11-joao.m.martins@oracle.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185

commit 421a511 upstream.

Intel-SIG: commit 421a511 iommu/amd: Access/Dirty bit support in IOPTEs
Backport to support Intel QAT live migration for in-tree driver

IOMMU advertises Access/Dirty bits if the extended feature register reports
it. Relevant AMD IOMMU SDM ref[0] "1.3.8 Enhanced Support for Access and
Dirty Bits"

To enable it set the DTE flag in bits 7 and 8 to enable access, or
access+dirty. With that, the IOMMU starts marking the D and A flags on
every Memory Request or ATS translation request. It is on the VMM side to
steer whether to enable dirty tracking or not, rather than wrongly doing in
IOMMU. Relevant AMD IOMMU SDM ref [0], "Table 7. Device Table Entry (DTE)
Field Definitions" particularly the entry "HAD".

To actually toggle on and off it's relatively simple as it's setting 2 bits
on DTE and flush the device DTE cache.

To get what's dirtied use existing AMD io-pgtable support, by walking the
pagetables over each IOVA, with fetch_pte().  The IOTLB flushing is left to
the caller (much like unmap), and iommu_dirty_bitmap_record() is the one
adding page-ranges to invalidate. This allows caller to batch the flush
over a big span of IOVA space, without the iommu wondering about when to
flush.

Worthwhile sections from AMD IOMMU SDM:

"2.2.3.1 Host Access Support"
"2.2.3.2 Host Dirty Support"

For details on how IOMMU hardware updates the dirty bit see, and expects
from its consequent clearing by CPU:

"2.2.7.4 Updating Accessed and Dirty Bits in the Guest Address Tables"
"2.2.7.5 Clearing Accessed and Dirty Bits"

Quoting the SDM:

"The setting of accessed and dirty status bits in the page tables is
visible to both the CPU and the peripheral when sharing guest page tables.
The IOMMU interlocked operations to update A and D bits must be 64-bit
operations and naturally aligned on a 64-bit boundary"

.. and for the IOMMU update sequence to Dirty bit, essentially is states:

1. Decodes the read and write intent from the memory access.
2. If P=0 in the page descriptor, fail the access.
3. Compare the A & D bits in the descriptor with the read and write
intent in the request.
4. If the A or D bits need to be updated in the descriptor:
* Start atomic operation.
* Read the descriptor as a 64-bit access.
* If the descriptor no longer appears to require an update, release the
atomic lock with
no further action and continue to step 5.
* Calculate the new A & D bits.
* Write the descriptor as a 64-bit access.
* End atomic operation.
5. Continue to the next stage of translation or to the memory access.

Access/Dirty bits readout also need to consider the non-default page-sizes
(aka replicated PTEs as mentined by manual), as AMD supports all powers of
two (except 512G) page sizes.

Select IOMMUFD_DRIVER only if IOMMUFD is enabled considering that IOMMU
dirty tracking requires IOMMUFD.

Link: https://lore.kernel.org/r/20231024135109.73787-12-joao.m.martins@oracle.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185

commit f35f22c upstream.

Intel-SIG: commit f35f22c iommu/vt-d: Access/Dirty bit support for SS domains
Backport to support Intel QAT live migration for in-tree driver

IOMMU advertises Access/Dirty bits for second-stage page table if the
extended capability DMAR register reports it (ECAP, mnemonic ECAP.SSADS).
The first stage table is compatible with CPU page table thus A/D bits are
implicitly supported. Relevant Intel IOMMU SDM ref for first stage table
"3.6.2 Accessed, Extended Accessed, and Dirty Flags" and second stage table
"3.7.2 Accessed and Dirty Flags".

First stage page table is enabled by default so it's allowed to set dirty
tracking and no control bits needed, it just returns 0. To use SSADS, set
bit 9 (SSADE) in the scalable-mode PASID table entry and flush the IOTLB
via pasid_flush_caches() following the manual. Relevant SDM refs:

"3.7.2 Accessed and Dirty Flags"
"6.5.3.3 Guidance to Software for Invalidations,
 Table 23. Guidance to Software for Invalidations"

PTE dirty bit is located in bit 9 and it's cached in the IOTLB so flush
IOTLB to make sure IOMMU attempts to set the dirty bit again. Note that
iommu_dirty_bitmap_record() will add the IOVA to iotlb_gather and thus the
caller of the iommu op will flush the IOTLB. Relevant manuals over the
hardware translation is chapter 6 with some special mention to:

"6.2.3.1 Scalable-Mode PASID-Table Entry Programming Considerations"
"6.2.4 IOTLB"

Select IOMMUFD_DRIVER only if IOMMUFD is enabled, given that IOMMU dirty
tracking requires IOMMUFD.

Link: https://lore.kernel.org/r/20231024135109.73787-13-joao.m.martins@oracle.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185

commit e04b23c upstream.

Intel-SIG: commit e04b23c iommufd/selftest: Expand mock_domain with dev_flags
Backport to support Intel QAT live migration for in-tree driver

Expand mock_domain test to be able to manipulate the device capabilities.
This allows testing with mockdev without dirty tracking support advertised
and thus make sure enforce_dirty test does the expected.

To avoid breaking IOMMUFD_TEST UABI replicate the mock_domain struct and
thus add an input dev_flags at the end.

Link: https://lore.kernel.org/r/20231024135109.73787-14-joao.m.martins@oracle.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185

commit 266ce58 upstream.

Intel-SIG: commit 266ce58 iommufd/selftest: Test IOMMU_HWPT_ALLOC_DIRTY_TRACKING
Backport to support Intel QAT live migration for in-tree driver

In order to selftest the iommu domain dirty enforcing implement the
mock_domain necessary support and add a new dev_flags to test that the
hwpt_alloc/attach_device fails as expected.

Expand the existing mock_domain fixture with a enforce_dirty test that
exercises the hwpt_alloc and device attachment.

Link: https://lore.kernel.org/r/20231024135109.73787-15-joao.m.martins@oracle.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185

commit 7adf267 upstream.

Intel-SIG: commit 7adf267 iommufd/selftest: Test IOMMU_HWPT_SET_DIRTY_TRACKING
Backport to support Intel QAT live migration for in-tree driver

Change mock_domain to supporting dirty tracking and add tests to exercise
the new SET_DIRTY_TRACKING API in the iommufd_dirty_tracking selftest
fixture.

Link: https://lore.kernel.org/r/20231024135109.73787-16-joao.m.martins@oracle.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185

commit a9af47e upstream.

Intel-SIG: commit a9af47e iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_BITMAP
Backport to support Intel QAT live migration for in-tree driver

Add a new test ioctl for simulating the dirty IOVAs in the mock domain, and
implement the mock iommu domain ops that get the dirty tracking supported.

The selftest exercises the usual main workflow of:

1) Setting dirty tracking from the iommu domain
2) Read and clear dirty IOPTEs

Different fixtures will test different IOVA range sizes, that exercise
corner cases of the bitmaps.

Link: https://lore.kernel.org/r/20231024135109.73787-17-joao.m.martins@oracle.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185

commit ae36fe7 upstream.

Intel-SIG: commit ae36fe7 iommufd/selftest: Test out_capabilities in IOMMU_GET_HW_INFO
Backport to support Intel QAT live migration for in-tree driver

Enumerate the capabilities from the mock device and test whether it
advertises as expected. Include it as part of the iommufd_dirty_tracking
fixture.

Link: https://lore.kernel.org/r/20231024135109.73787-18-joao.m.martins@oracle.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185

commit 0795b30 upstream.

Intel-SIG: commit 0795b30 iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR flag
Backport to support Intel QAT live migration for in-tree driver

Change test_mock_dirty_bitmaps() to pass a flag where it specifies the flag
under test. The test does the same thing as the GET_DIRTY_BITMAP regular
test. Except that it tests whether the dirtied bits are fetched all the
same a second time, as opposed to observing them cleared.

Link: https://lore.kernel.org/r/20231024135109.73787-19-joao.m.martins@oracle.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185

commit a2cdecd upstream.

Intel-SIG: commit a2cdecd iommu/vt-d: Enhance capability check for nested parent domain allocation
Backport to support Intel QAT live migration for in-tree driver

This adds the scalable mode check before allocating the nested parent domain
as checking nested capability is not enough. User may turn off scalable mode
which also means no nested support even if the hardware supports it.

Fixes: c97d1b2 ("iommu/vt-d: Add domain_alloc_user op")
Link: https://lore.kernel.org/r/20231024150011.44642-1-yi.l.liu@intel.com
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185

commit 2e22aac upstream.

Intel-SIG: commit 2e22aac iommufd/selftest: Fix page-size check in iommufd_test_dirty()
Backport to support Intel QAT live migration for in-tree driver

iommufd_test_dirty()/IOMMU_TEST_OP_DIRTY sets the dirty bits in the mock
domain implementation that the userspace side validates against what it
obtains via the UAPI.

However in introducing iommufd_test_dirty() it forgot to validate page_size
being 0 leading to two possible divide-by-zero problems: one at the
beginning when calculating @max and while calculating the IOVA in the
XArray PFN tracking list.

While at it, validate the length to require non-zero value as well, as we
can't be allocating a 0-sized bitmap.

Link: https://lore.kernel.org/r/20231030113446.7056-1-joao.m.martins@oracle.com
Reported-by: syzbot+25dc7383c30ecdc83c38@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-iommu/00000000000005f6aa0608b9220f@google.com/
Fixes: a9af47e ("iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_BITMAP")
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185

commit 9859418 upstream.

Intel-SIG: commit 9859418 iommufd/selftest: Fix _test_mock_dirty_bitmaps()
Backport to support Intel QAT live migration for in-tree driver

The ASSERT_EQ() macro sneakily expands to two statements, so the loop here
needs braces to ensure it captures both and actually terminates the test
upon failure. Where these tests are currently failing on my arm64 machine,
this reduces the number of logged lines from a rather unreasonable
~197,000 down to 10. While we're at it, we can also clean up the
tautologous "count" calculations whose assertions can never fail unless
mathematics and/or the C language become fundamentally broken.

Fixes: a9af47e ("iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_BITMAP")
Link: https://lore.kernel.org/r/90e083045243ef407dd592bb1deec89cd1f4ddf2.1700153535.git.robin.murphy@arm.com
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Tested-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
KunWuChan and others added 17 commits June 2, 2025 12:15
ANBZ: #9185

commit e378c7d upstream.

Intel-SIG: commit e378c7d iommu/vt-d: Set variable intel_dirty_ops to static
Backport to support Intel QAT live migration for in-tree driver

Fix the following warning:
drivers/iommu/intel/iommu.c:302:30: warning: symbol
 'intel_dirty_ops' was not declared. Should it be static?

This variable is only used in its defining file, so it should be static.

Fixes: f35f22c ("iommu/vt-d: Access/Dirty bit support for SS domains")
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Link: https://lore.kernel.org/r/20231120101025.1103404-1-chentao@kylinos.cn
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185

commit 1894cb1 upstream.

Intel-SIG: commit 1894cb1 crypto: qat - adf_get_etr_base() helper
Backport to support Intel QAT live migration for in-tree driver

Add and use the new helper function adf_get_etr_base() which retrieves
the virtual address of the ring bar.

This will be used extensively when adding support for Live Migration.

Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Xin Zeng <xin.zeng@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185

commit 1f8d6a1 upstream.

Intel-SIG: commit 1f8d6a1 crypto: qat - relocate and rename 4xxx PF2VM definitions
Backport to support Intel QAT live migration for in-tree driver

Move and rename ADF_4XXX_PF2VM_OFFSET and ADF_4XXX_VM2PF_OFFSET to
ADF_GEN4_PF2VM_OFFSET and ADF_GEN4_VM2PF_OFFSET respectively.
These definitions are moved from adf_gen4_pfvf.c to adf_gen4_hw_data.h
as they are specific to GEN4 and not just to qat_4xxx.

This change is made in anticipation of their use in live migration.

This does not introduce any functional change.

Signed-off-by: Xin Zeng <xin.zeng@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185

commit 867e801 upstream.

Intel-SIG: commit 867e801 crypto: qat - move PFVF compat checker to a function
Backport to support Intel QAT live migration for in-tree driver

Move the code that implements VF version compatibility on the PF side to
a separate function so that it can be reused when doing VM live
migration.

This does not introduce any functional change.

Signed-off-by: Xin Zeng <xin.zeng@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185

commit 680302d upstream.

Intel-SIG: commit 680302d crypto: qat - relocate CSR access code
Backport to support Intel QAT live migration for in-tree driver

As the common hw_data files are growing and the adf_hw_csr_ops is going
to be extended with new operations, move all logic related to ring CSRs
to the newly created adf_gen[2|4]_hw_csr_data.[c|h] files.

This does not introduce any functional change.

Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Xin Zeng <xin.zeng@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185

commit 84058ff upstream.

Intel-SIG: commit 84058ff crypto: qat - rename get_sla_arr_of_type()
Backport to support Intel QAT live migration for in-tree driver

The function get_sla_arr_of_type() returns a pointer to an SLA type
specific array.
Rename it and expose it as it will be used externally to this module.

This does not introduce any functional change.

Signed-off-by: Siming Wan <siming.wan@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Damian Muszynski <damian.muszynski@intel.com>
Signed-off-by: Xin Zeng <xin.zeng@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185

commit 3fa1057 upstream.

Intel-SIG: commit 3fa1057 crypto: qat - expand CSR operations for QAT GEN4 devices
Backport to support Intel QAT live migration for in-tree driver

Extend the CSR operations for QAT GEN4 devices to allow saving and
restoring the rings state.

The new operations will be used as a building block for implementing the
state save and restore of Virtual Functions necessary for VM live
migration.

This adds the following operations:
 - read ring status register
 - read ring underflow/overflow status register
 - read ring nearly empty status register
 - read ring nearly full status register
 - read ring full status register
 - read ring complete status register
 - read ring exception status register
 - read/write ring exception interrupt mask register
 - read ring configuration register
 - read ring base register
 - read/write ring interrupt enable register
 - read ring interrupt flag register
 - read/write ring interrupt source select register
 - read ring coalesced interrupt enable register
 - read ring coalesced interrupt control register
 - read ring flag and coalesced interrupt enable register
 - read ring service arbiter enable register
 - get ring coalesced interrupt control enable mask

Signed-off-by: Siming Wan <siming.wan@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Xin Zeng <xin.zeng@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185

commit bbfdde7 upstream.

Intel-SIG: commit bbfdde7 crypto: qat - add bank save and restore flows
Backport to support Intel QAT live migration for in-tree driver

Add logic to save, restore, quiesce and drain a ring bank for QAT GEN4
devices.
This allows to save and restore the state of a Virtual Function (VF) and
will be used to implement VM live migration.

Signed-off-by: Siming Wan <siming.wan@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Xin Zeng <xin.zeng@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185

commit 0fce55e upstream.

Intel-SIG: commit 0fce55e crypto: qat - add interface for live migration
Backport to support Intel QAT live migration for in-tree driver

Extend the driver with a new interface to be used for VF live migration.
This allows to create and destroy a qat_mig_dev object that contains
a set of methods to allow to save and restore the state of QAT VF.
This interface will be used by the qat-vfio-pci module.

Signed-off-by: Xin Zeng <xin.zeng@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185

commit f0bbfc3 upstream.

Intel-SIG: commit f0bbfc3 crypto: qat - implement interface for live migration
Backport to support Intel QAT live migration for in-tree driver

Add logic to implement the interface for live migration defined in
qat/qat_mig_dev.h. This is specific for QAT GEN4 Virtual Functions
(VFs).

This introduces a migration data manager which is used to handle the
device state during migration. The manager ensures that the device state
is stored in a format that can be restored in the destination node.

The VF state is organized into a hierarchical structure that includes a
preamble, a general state section, a MISC bar section and an ETR bar
section. The latter contains the state of the 4 ring pairs contained on
a VF. Here is a graphical representation of the state:

    preamble | general state section | leaf state
             | MISC bar state section| leaf state
             | ETR bar state section | bank0 state section | leaf state
                                     | bank1 state section | leaf state
                                     | bank2 state section | leaf state
                                     | bank3 state section | leaf state

In addition to the implementation of the qat_migdev_ops interface and
the state manager framework, add a mutex in pfvf to avoid pf2vf messages
during migration.

Signed-off-by: Xin Zeng <xin.zeng@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185

commit bb20881 upstream.

Intel-SIG: commit bb20881 vfio/qat: Add vfio_pci driver for Intel QAT SR-IOV VF devices
Backport to support Intel QAT live migration for in-tree driver

Add vfio pci variant driver for Intel QAT SR-IOV VF devices. This driver
registers to the vfio subsystem through the interfaces exposed by the
subsystem. It follows the live migration protocol v2 defined in
uapi/linux/vfio.h and interacts with Intel QAT PF driver through a set
of interfaces defined in qat/qat_mig_dev.h to support live migration of
Intel QAT VF devices.

This version only covers migration for Intel QAT GEN4 VF devices.

Co-developed-by: Yahui Cao <yahui.cao@intel.com>
Signed-off-by: Yahui Cao <yahui.cao@intel.com>
Signed-off-by: Xin Zeng <xin.zeng@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20240426064051.2859652-1-xin.zeng@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185

commit 140e4c8 upstream.

Intel-SIG: commit 140e4c8 crypto: qat - Avoid -Wflex-array-member-not-at-end warnings
Backport to support Intel QAT live migration for in-tree driver

-Wflex-array-member-not-at-end is coming in GCC-14, and we are getting
ready to enable it globally.

Use the `__struct_group()` helper to separate the flexible array
from the rest of the members in flexible `struct qat_alg_buf_list`,
through tagged `struct qat_alg_buf_list_hdr`, and avoid embedding the
flexible-array member in the middle of `struct qat_alg_fixed_buf_list`.

Also, use `container_of()` whenever we need to retrieve a pointer to
the flexible structure.

So, with these changes, fix the following warnings:
drivers/crypto/intel/qat/qat_common/qat_bl.h:25:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/crypto/intel/qat/qat_common/qat_bl.h:25:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/crypto/intel/qat/qat_common/qat_bl.h:25:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/crypto/intel/qat/qat_common/qat_bl.h:25:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/crypto/intel/qat/qat_common/qat_bl.h:25:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/crypto/intel/qat/qat_common/qat_bl.h:25:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/crypto/intel/qat/qat_common/qat_bl.h:25:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/crypto/intel/qat/qat_common/qat_bl.h:25:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]

Link: KSPP/linux#202
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Acked-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185

commit f5c2cf9 upstream.

Intel-SIG: commit f5c2cf9 crypto: qat - Fix spelling mistake "Invalide" -> "Invalid"
Backport to support Intel QAT live migration for in-tree driver

There is a spelling mistake in a dev_err message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Acked-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185

commit 5d5bd24 upstream.

Intel-SIG: commit 5d5bd24 crypto: qat - implement dh fallback for primes > 4K
Backport to support Intel QAT live migration for in-tree driver

The Intel QAT driver provides support for the Diffie-Hellman (DH)
algorithm, limited to prime numbers up to 4K. This driver is used
by default on platforms with integrated QAT hardware for all DH requests.
This has led to failures with algorithms requiring larger prime sizes,
such as ffdhe6144.

  alg: ffdhe6144(dh): test failed on vector 1, err=-22
  alg: self-tests for ffdhe6144(qat-dh) (ffdhe6144(dh)) failed (rc=-22)

Implement a fallback mechanism when an unsupported request is received.

Signed-off-by: Damian Muszynski <damian.muszynski@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185

commit 4a4fc6c upstream.

Intel-SIG: commit 4a4fc6c crypto: qat - improve error message in adf_get_arbiter_mapping()
Backport to support Intel QAT live migration for in-tree driver

Improve error message to be more readable.

Fixes: 5da6a2d ("crypto: qat - generate dynamically arbiter mappings")
Signed-off-by: Adam Guerin <adam.guerin@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185

commit d281a28 upstream.

Intel-SIG: commit d281a28 crypto: qat - improve error logging to be consistent across features
Backport to support Intel QAT live migration for in-tree driver

Improve error logging in rate limiting feature. Staying consistent with
the error logging found in the telemetry feature.

Fixes: d9fb840 ("crypto: qat - add rate limiting feature to qat_4xxx")
Signed-off-by: Adam Guerin <adam.guerin@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
ANBZ: #9185

commit 483fd65 upstream.

Intel-SIG: commit 483fd65 crypto: qat - validate slices count returned by FW
Backport to support Intel QAT live migration for in-tree driver

The function adf_send_admin_tl_start() enables the telemetry (TL)
feature on a QAT device by sending the ICP_QAT_FW_TL_START message to
the firmware. This triggers the FW to start writing TL data to a DMA
buffer in memory and returns an array containing the number of
accelerators of each type (slices) supported by this HW.
The pointer to this array is stored in the adf_tl_hw_data data
structure called slice_cnt.

The array slice_cnt is then used in the function tl_print_dev_data()
to report in debugfs only statistics about the supported accelerators.
An incorrect value of the elements in slice_cnt might lead to an out
of bounds memory read.
At the moment, there isn't an implementation of FW that returns a wrong
value, but for robustness validate the slice count array returned by FW.

Fixes: 69e7649 ("crypto: qat - add support for device telemetry")
Signed-off-by: Lucas Segarra Fernandez <lucas.segarra.fernandez@intel.com>
Reviewed-by: Damian Muszynski <damian.muszynski@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
@Avenger-285714 Avenger-285714 requested a review from Copilot June 2, 2025 04:19
@deepin-ci-robot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from avenger-285714. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented Jun 2, 2025

Reviewer's Guide

This PR backports upstream IOMMUFD dirty-tracking support and Intel QAT live-migration to the in-tree driver by extending IOMMUFD and iommu core interfaces, enriching self-tests, refactoring QAT CSR ops, and adding new migration infrastructure in the QAT PF and VF drivers.

Sequence Diagram: Intel QAT VF Live Migration - State Save

sequenceDiagram
    participant Hypervisor as HV
    participant QAT_PF_Driver as PF
    participant QAT_VF_Driver as VF
    participant QAT_Hardware as HW

    HV->>PF: Initiate VF Suspend for Live Migration
    PF->>VF: Request State Save (e.g., via qat_migdev_ops.suspend)
    VF->>HW: Quiesce Bank Operations
    loop for each bank
        VF->>HW: Read Bank CSRs (using adf_hw_csr_ops)
        VF->>VF: Store bank_state data
    end
    VF->>PF: Send Saved State (bank_state data)
    PF->>HV: VF State Saved
Loading

Sequence Diagram: Intel QAT VF Live Migration - State Restore

sequenceDiagram
    participant Hypervisor as HV
    participant QAT_PF_Driver as PF
    participant QAT_VF_Driver as VF
    participant QAT_Hardware as HW

    HV->>PF: Initiate VF Resume for Live Migration
    PF->>VF: Send Saved State (e.g., via qat_migdev_ops.resume, qat_migdev_ops.load_state)
    loop for each bank
        VF->>VF: Load bank_state data
        VF->>HW: Restore Bank CSRs (using adf_hw_csr_ops)
    end
    VF->>HW: Resume Bank Operations
    VF->>PF: State Restore Complete
    PF->>HV: VF State Restored
Loading

Sequence Diagram: IOMMUFD Dirty Tracking Operations

sequenceDiagram
    actor UserSpace as US
    participant IOMMUFD
    participant IOMMU_Core as Core
    participant HW_IOMMU as HW

    US->>IOMMUFD: IOMMU_HWPT_ALLOC (dev_id, pt_id, flags=IOMMU_HWPT_ALLOC_DIRTY_TRACKING)
    IOMMUFD->>Core: domain_alloc_user(dev, flags)
    Core->>HW: Allocate/Configure Domain for Dirty Tracking
    HW-->>Core: Domain Context
    Core-->>IOMMUFD: iommu_domain with dirty_ops
    IOMMUFD-->>US: out_hwpt_id

    US->>IOMMUFD: IOMMU_HWPT_SET_DIRTY_TRACKING (hwpt_id, flags=IOMMU_HWPT_DIRTY_TRACKING_ENABLE)
    IOMMUFD->>Core: iommu_dirty_ops.set_dirty_tracking(domain, true)
    Core->>HW: Enable A/D bits in IOMMU tables (e.g., PASID SLADE, DTE_FLAG_HAD)
    Core-->>IOMMUFD: Success/Failure
    IOMMUFD-->>US: Success/Failure

    US->>IOMMUFD: IOMMU_HWPT_GET_DIRTY_BITMAP (hwpt_id, iova, length, page_size, data_ptr)
    IOMMUFD->>Core: iommu_dirty_ops.read_and_clear_dirty(domain, iova, length, flags, &bitmap_data)
    Core->>HW: Read A/D bits from IOMMU tables
    Core->>Core: Populate user bitmap
    Core->>HW: Clear A/D bits (if not NO_CLEAR flag)
    HW-->>Core: Dirty Info
    Core-->>IOMMUFD: Populated Bitmap
    IOMMUFD-->>US: Bitmap Data
Loading

Class Diagram: IOMMU Core Dirty Tracking API Changes

classDiagram
    class iommu_domain {
      +const struct iommu_domain_ops *ops
      +const struct iommu_dirty_ops *dirty_ops // New field
      +unsigned long pgsize_bitmap
      +string "...other fields"
    }
    class iommu_ops {
      +bool (*capable)(struct device *dev, enum iommu_cap cap)
      +struct iommu_domain* (*domain_alloc_user)(struct device *dev, u32 flags) // New method
      +struct iommu_domain* (*domain_alloc)(unsigned int type)
      string "...other existing methods"
    }
    class iommu_dirty_ops { // New struct
      +int (*set_dirty_tracking)(struct iommu_domain *domain, bool enabled)
      +int (*read_and_clear_dirty)(struct iommu_domain *domain, unsigned long iova, size_t size, unsigned long flags, struct iommu_dirty_bitmap *dirty)
    }
    class iommu_dirty_bitmap { // New struct
      +struct iova_bitmap *bitmap
      +struct iommu_iotlb_gather *gather
    }
    class iova_bitmap { // New utility struct
      +iova_bitmap_alloc(unsigned long iova, size_t length, unsigned long page_size, u64 __user *data)
      +iova_bitmap_free()
      +iova_bitmap_for_each(void* opaque, iova_bitmap_fn_t fn)
      +iova_bitmap_set(unsigned long iova, size_t length)
    }
    iommu_domain --> iommu_dirty_ops : uses (optional)
    iommu_dirty_ops ..> iommu_dirty_bitmap : uses
    iommu_dirty_bitmap ..> iova_bitmap : uses
    iommu_ops ..> iommu_domain : creates/manages
Loading

Class Diagram: QAT Driver Structures for Live Migration

classDiagram
    class adf_hw_device_data {
        +char* dev_class_name
        +struct adf_hw_csr_ops csr_ops
        +struct qat_migdev_ops vfmig_ops // New field
        +int (*bank_state_save)(struct adf_accel_dev*, u32, struct bank_state*) // New field
        +int (*bank_state_restore)(struct adf_accel_dev*, u32, struct bank_state*) // New field
        +string "...other fields"
    }
    class adf_hw_csr_ops {
      +string "Existing CSR accessors (read/write_csr_ring_head etc.)"
      +string "--- Expanded with many new members for detailed CSR access, e.g.: ---"
      +u32 (*read_csr_stat)(void __iomem*, u32 bank)
      +u32 (*read_csr_e_stat)(void __iomem*, u32 bank)
      +dma_addr_t (*read_csr_ring_base)(void __iomem*, u32 bank, u32 ring)
      +void (*write_csr_int_en)(void __iomem*, u32 bank, u32 value)
      +u32 (*read_csr_int_col_ctl)(void __iomem*, u32 bank)
      +u32 (*get_int_col_ctl_enable_mask)(void)
      +string "(Consolidates previously static/macro CSR ops)"
    }
    class qat_migdev_ops { // New struct
        +int (*init)(struct qat_mig_dev *mdev)
        +void (*cleanup)(struct qat_mig_dev *mdev)
        +int (*suspend)(struct qat_mig_dev *mdev)
        +int (*resume)(struct qat_mig_dev *mdev)
        +int (*save_state)(struct qat_mig_dev *mdev)
        +int (*load_state)(struct qat_mig_dev *mdev)
        +string "...other migration lifecycle ops"
    }
    class bank_state { // New struct for QAT bank state
        +u32 ringstat0
        +u32 ringuostat
        +u32 ringestat
        +u32 iaintflagen
        +struct ring_config rings[ADF_ETR_MAX_RINGS_PER_BANK]
        +string "...other saved CSR values"
    }
    class ring_config { // New struct for ring configuration within bank_state
        +u64 base
        +u32 config
        +u32 head
        +u32 tail
    }
    class adf_accel_vf_info {
        +struct adf_accel_dev *accel_dev
        +struct mutex pfvf_mig_lock // New field
        +void *mig_priv // New field (e.g., for adf_gen4_vfmig)
        +string "...other fields"
    }
    class adf_gen4_vfmig { // New struct for Gen4 VF migration state
        +struct adf_mstate_mgr *mstate_mgr
        +bool bank_stopped[]
    }

    adf_hw_device_data o-- adf_hw_csr_ops : uses
    adf_hw_device_data o-- qat_migdev_ops : uses
    adf_hw_device_data ..> bank_state : uses in bank_state_save/restore methods
    bank_state o-- ring_config : contains
    adf_accel_vf_info ..> adf_gen4_vfmig : mig_priv may point to
Loading

File-Level Changes

Change Details Files
Add IOMMUFD dirty-tracking support
  • Define new IOCTLs: HWPT_SET_DIRTY_TRACKING and HWPT_GET_DIRTY_BITMAP
  • Extend domain_alloc_user to accept flags for nesting and dirty-tracking
  • Implement set_dirty_tracking and read_and_clear_dirty in iommu, VFIO and io-pagetable layers
  • Introduce struct iommu_dirty_ops and integrate into intel/amd domain_alloc_user, io-pagetable, iova_bitmap and uapi headers
  • Update IOMMUFD to handle new IOCTLs and propagate flags
drivers/iommu/iommufd
include/uapi/linux/iommufd.h
drivers/iommu
include/linux/iommu.h
Extend IOMMUFD self-tests for dirty tracking
  • Add iommufd_dirty_tracking fixture variants with multiple buffer sizes
  • Implement tests for enforce_dirty, set_dirty_tracking, get_dirty_bitmap, and no-clear scenarios
  • Enhance mock domain and test utils to support dirty flags and dirty operations
  • Update test length macros to cover new ioctls
tools/testing/selftests/iommu
drivers/iommu/iommufd/selftest.c
tools/testing/selftests/iommu/iommufd_utils.h
Backport QAT live-migration flow
  • Introduce qat_mig_dev abstraction and ops for PF/VF migration
  • Implement adf_mstate_mgr to manage hierarchical state segments
  • Add adf_gen4_vf_mig for PF-side save/load of VF state (CSR, ETR, misc)
  • Create VFIO PCI driver qat_vfio_pci.c to drive live-migration state machine
  • Wire migration ops into QAT hw_data and PFVF utils
drivers/crypto/intel/qat
drivers/vfio/pci/qat
include/linux/qat/qat_mig_dev.h
Refactor QAT CSR ops into separate modules
  • Move hardware CSR helper functions from adf_genX_hw_data.c into adf_genX_hw_csr_data.{h,c}
  • Replace manual macros with structured adf_hw_csr_ops initializers
  • Remove duplicated inline CSR builds across Gen2/Gen4
  • Use adf_get_etr_base helper for ETR bar accesses
drivers/crypto/intel/qat/qat_common/adf_gen4_hw_data.c
drivers/crypto/intel/qat/qat_common/adf_gen2_hw_data.c
drivers/crypto/intel/qat/qat_common/adf_genX_hw_csr_data.*
Update configs and build scripts
  • Add Kconfig options for CONFIG_IOMMUFD and CONFIG_QAT_VFIO_PCI live-migration
  • Enable CONFIG_IOMMUFD, CONFIG_QAT_VFIO_PCI modules
  • Adjust Makefiles to include new source files (iova_bitmap, migration, CSR data)
  • Skip deepin-specific config commits
drivers/iommu/Kconfig
drivers/crypto/intel/qat/**/*.Makefile
drivers/vfio/pci/Makefile

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR backports Intel QAT live migration support and adds IOMMUFD Dirty Tracking improvements to the in‐tree QAT driver while aligning with upstream changes.

  • Adds new state save/restore, bank drain, and coalesced timer functions for enhanced migration support.
  • Introduces new CSR data files and updates macro names and function calls in gen4 and gen2 driver code.
  • Integrates new migration operations into the driver and updates Makefile to include additional migration-related objects.

Reviewed Changes

Copilot reviewed 76 out of 76 changed files in this pull request and generated no comments.

Show a summary per file
File Description
drivers/crypto/intel/qat/qat_common/adf_gen4_tl.c Assigns new max_sl_cnt field in TL data
drivers/crypto/intel/qat/qat_common/adf_gen4_pfvf.c Updates offset macros to use new naming conventions
drivers/crypto/intel/qat/qat_common/adf_gen4_hw_data.h Removes old transport access macros; adds new PF2VM/VM2PF macros and bank state functions' prototypes
drivers/crypto/intel/qat/qat_common/adf_gen4_hw_data.c Refactors ring pair reset to use adf_get_etr_base(), and adds bank state handling functions
drivers/crypto/intel/qat/qat_common/adf_gen4_hw_csr_data.[hc] New CSR definitions and inline functions for gen4 hardware data
drivers/crypto/intel/qat/qat_common/adf_accel_devices.h Adds new migration operations structure and related members
Makefile and various qat_* files Update build objects to include migration and state management sources
Comments suppressed due to low confidence (3)

drivers/crypto/intel/qat/qat_common/adf_gen4_hw_data.h:122

  • Ensure that the newly introduced PF2VM and corresponding VM2PF macros are fully aligned with the hardware specifications and that their usage is consistent across the driver. This change should be cross-checked against legacy definitions to prevent any potential conflicts.
#define ADF_GEN4_PF2VM_OFFSET(i)	(0x40B010 + (i) * 0x20)

drivers/crypto/intel/qat/qat_common/adf_gen4_hw_data.c:228

  • Verify that the call to adf_get_etr_base() returns the correct memory-mapped base address for the ETR bar, replacing the legacy GET_BARS() access; ensure this modification does not affect other parts of the driver that expect the previous behavior.
void __iomem *csr = adf_get_etr_base(accel_dev);

drivers/crypto/intel/qat/qat_common/adf_accel_devices.h:262

  • [nitpick] Confirm that the new migration operations (qat_migdev_ops) and the accompanying synchronization via pfvf_mig_lock are correctly implemented and invoked during driver initialization to ensure thread-safe migration handling.
struct qat_migdev_ops {

@Avenger-285714 Avenger-285714 requested a review from opsiff June 2, 2025 04:21
@opsiff
Copy link
Copy Markdown
Member

opsiff commented Jun 3, 2025

Checkdepends:
commit 7a41dcb
Author: Jason Gunthorpe jgg@nvidia.com
Date: Thu Aug 29 21:06:12 2024 -0300

iommu/amd: Set the pgsize_bitmap correctly

When using io_pgtable the correct pgsize_bitmap is stored in the cfg, both
v1_alloc_pgtable() and v2_alloc_pgtable() set it correctly.

This fixes a bug where the v2 pgtable had the wrong pgsize as
protection_domain_init_v2() would set it and then do_iommu_domain_alloc()
immediately resets it.

Remove the confusing ops.pgsize_bitmap since that is not used if the
driver sets domain.pgsize_bitmap.

Fixes: 134288158a41 ("iommu/amd: Add domain_alloc_user based domain allocation")
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/3-v2-831cdc4d00f3+1a315-amd_iopgtbl_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>

commit 2780025
Author: Joao Martins joao.m.martins@oracle.com
Date: Fri Feb 2 13:34:10 2024 +0000

iommufd/iova_bitmap: Handle recording beyond the mapped pages

IOVA bitmap is a zero-copy scheme of recording dirty bits that iterate the
different bitmap user pages at chunks of a maximum of
PAGE_SIZE/sizeof(struct page*) pages.

When the iterations are split up into 64G, the end of the range may be
broken up in a way that's aligned with a non base page PTE size. This
leads to only part of the huge page being recorded in the bitmap. Note
that in pratice this is only a problem for IOMMU dirty tracking i.e. when
the backing PTEs are in IOMMU hugepages and the bitmap is in base page
granularity. So far this not something that affects VF dirty trackers
(which reports and records at the same granularity).

To fix that, if there is a remainder of bits left to set in which the
current IOVA bitmap doesn't cover, make a copy of the bitmap structure and
iterate-and-set the rest of the bits remaining. Finally, when advancing
the iterator, skip all the bits that were set ahead.

Link: https://lore.kernel.org/r/20240202133415.23819-5-joao.m.martins@oracle.com
Reported-by: Avihai Horon <avihaih@nvidia.com>
Fixes: f35f22cc760e ("iommu/vt-d: Access/Dirty bit support for SS domains")
Fixes: 421a511a293f ("iommu/amd: Access/Dirty bit support in IOPTEs")
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Tested-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

commit 2780025
Author: Joao Martins joao.m.martins@oracle.com
Date: Fri Feb 2 13:34:10 2024 +0000

iommufd/iova_bitmap: Handle recording beyond the mapped pages

IOVA bitmap is a zero-copy scheme of recording dirty bits that iterate the
different bitmap user pages at chunks of a maximum of
PAGE_SIZE/sizeof(struct page*) pages.

When the iterations are split up into 64G, the end of the range may be
broken up in a way that's aligned with a non base page PTE size. This
leads to only part of the huge page being recorded in the bitmap. Note
that in pratice this is only a problem for IOMMU dirty tracking i.e. when
the backing PTEs are in IOMMU hugepages and the bitmap is in base page
granularity. So far this not something that affects VF dirty trackers
(which reports and records at the same granularity).

To fix that, if there is a remainder of bits left to set in which the
current IOVA bitmap doesn't cover, make a copy of the bitmap structure and
iterate-and-set the rest of the bits remaining. Finally, when advancing
the iterator, skip all the bits that were set ahead.

Link: https://lore.kernel.org/r/20240202133415.23819-5-joao.m.martins@oracle.com
Reported-by: Avihai Horon <avihaih@nvidia.com>
Fixes: f35f22cc760e ("iommu/vt-d: Access/Dirty bit support for SS domains")
Fixes: 421a511a293f ("iommu/amd: Access/Dirty bit support in IOPTEs")
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Tested-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

commit cf1e515
Author: Jinjie Ruan ruanjinjie@huawei.com
Date: Mon Aug 19 20:00:07 2024 +0800

iommufd/selftest: Make dirty_ops static

The sparse tool complains as follows:

drivers/iommu/iommufd/selftest.c:277:30: warning:
        symbol 'dirty_ops' was not declared. Should it be static?

This symbol is not used outside of selftest.c, so marks it static.

Fixes: 266ce58989ba ("iommufd/selftest: Test IOMMU_HWPT_ALLOC_DIRTY_TRACKING")
Link: https://patch.msgid.link/r/20240819120007.3884868-1-ruanjinjie@huawei.com
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

commit 8541323
Author: Jason Gunthorpe jgg@nvidia.com
Date: Thu Apr 4 21:05:14 2024 -0300

iommufd: Add missing IOMMUFD_DRIVER kconfig for the selftest

Some kconfigs don't automatically include this symbol which results in sub
functions for some of the dirty tracking related things that are
non-functional. Thus the test suite will fail. select IOMMUFD_DRIVER in
the IOMMUFD_TEST kconfig to fix it.

Fixes: a9af47e382a4 ("iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_BITMAP")
Link: https://lore.kernel.org/r/20240327182050.GA1363414@ziepe.ca
Tested-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

commit 2760c51
Author: Muhammad Usama Anjum usama.anjum@collabora.com
Date: Mon Mar 25 14:00:48 2024 +0500

iommufd: Add config needed for iommufd_fail_nth

Add FAULT_INJECTION_DEBUG_FS and FAILSLAB configurations to the kconfig
fragment for the iommfd selftests. These kconfigs are needed by the
iommufd_fail_nth test.

Fixes: a9af47e382a4 ("iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_BITMAP")
Link: https://lore.kernel.org/r/20240325090048.1423908-1-usama.anjum@collabora.com
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

commit ec61f82
Author: Joao Martins joao.m.martins@oracle.com
Date: Thu Jun 27 12:00:55 2024 +0100

iommufd/selftest: Fix dirty bitmap tests with u8 bitmaps

With 64k base pages, the first 128k iova length test requires less than a
byte for a bitmap, exposing a bug in the tests that assume that bitmaps are
at least a byte.

Rather than dealing with bytes, have _test_mock_dirty_bitmaps() pass the
number of bits. The caller functions are adjusted to also use bits as well,
and converting to bytes when clearing, allocating and freeing the bitmap.

Link: https://lore.kernel.org/r/20240627110105.62325-2-joao.m.martins@oracle.com
Reported-by: Matt Ochs <mochs@nvidia.com>
Fixes: a9af47e382a4 ("iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_BITMAP")
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Matt Ochs <mochs@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

commit 9560393
Author: Joao Martins joao.m.martins@oracle.com
Date: Thu Jun 27 12:00:56 2024 +0100

iommufd/selftest: Fix iommufd_test_dirty() to handle <u8 bitmaps

The calculation returns 0 if it sets less than the number of bits per
byte. For calculating memory allocation from bits, lets round it up to
one byte.

Link: https://lore.kernel.org/r/20240627110105.62325-3-joao.m.martins@oracle.com
Reported-by: Matt Ochs <mochs@nvidia.com>
Fixes: a9af47e382a4 ("iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_BITMAP")
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Matt Ochs <mochs@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

commit ffa3c79
Author: Joao Martins joao.m.martins@oracle.com
Date: Thu Jun 27 12:00:58 2024 +0100

iommufd/selftest: Fix tests to use MOCK_PAGE_SIZE based buffer sizes

commit a9af47e382a4 ("iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_BITMAP")
added tests covering edge cases in the boundaries of iova bitmap. Although
it used buffer sizes thinking in PAGE_SIZE (4K) as opposed to the
MOCK_PAGE_SIZE (2K) that is used in iommufd mock selftests. This meant that
isn't correctly exercising everything specifically the u32 and 4K bitmap
test cases. Fix selftests buffer sizes to be based on mock page size.

Link: https://lore.kernel.org/r/20240627110105.62325-5-joao.m.martins@oracle.com
Reported-by: Kevin Tian <kevin.tian@intel.com>
Closes: https://lore.kernel.org/linux-iommu/96efb6cf-a41c-420f-9673-2f0b682cac8c@oracle.com/
Fixes: a9af47e382a4 ("iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_BITMAP")
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Matt Ochs <mochs@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

commit 694a6f5
Author: Svyatoslav Pankratov svyatoslav.pankratov@intel.com
Date: Thu Aug 15 16:47:23 2024 +0100

crypto: qat - fix "Full Going True" macro definition

The macro `ADF_RP_INT_SRC_SEL_F_RISE_MASK` is currently set to the value
`0100b` which means "Empty Going False". This might cause an incorrect
restore of the bank state during live migration.

Fix the definition of the macro to properly represent the "Full Going
True" state which is encoded as `0011b`.

Fixes: bbfdde7d195f ("crypto: qat - add bank save and restore flows")
Signed-off-by: Svyatoslav Pankratov <svyatoslav.pankratov@intel.com>
Reviewed-by: Xin Zeng <xin.zeng@intel.com>
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

commit a5d8922
Author: Xin Zeng xin.zeng@intel.com
Date: Mon Jun 10 22:37:56 2024 +0800

crypto: qat - fix linking errors when PCI_IOV is disabled

When CONFIG_PCI_IOV=n, the build of the QAT vfio pci variant driver
fails reporting the following linking errors:

    ERROR: modpost: "qat_vfmig_open" [drivers/vfio/pci/qat/qat_vfio_pci.ko] undefined!
    ERROR: modpost: "qat_vfmig_resume" [drivers/vfio/pci/qat/qat_vfio_pci.ko] undefined!
    ERROR: modpost: "qat_vfmig_save_state" [drivers/vfio/pci/qat/qat_vfio_pci.ko] undefined!
    ERROR: modpost: "qat_vfmig_suspend" [drivers/vfio/pci/qat/qat_vfio_pci.ko] undefined!
    ERROR: modpost: "qat_vfmig_load_state" [drivers/vfio/pci/qat/qat_vfio_pci.ko] undefined!
    ERROR: modpost: "qat_vfmig_reset" [drivers/vfio/pci/qat/qat_vfio_pci.ko] undefined!
    ERROR: modpost: "qat_vfmig_save_setup" [drivers/vfio/pci/qat/qat_vfio_pci.ko] undefined!
    ERROR: modpost: "qat_vfmig_destroy" [drivers/vfio/pci/qat/qat_vfio_pci.ko] undefined!
    ERROR: modpost: "qat_vfmig_close" [drivers/vfio/pci/qat/qat_vfio_pci.ko] undefined!
    ERROR: modpost: "qat_vfmig_cleanup" [drivers/vfio/pci/qat/qat_vfio_pci.ko] undefined!
    WARNING: modpost: suppressed 1 unresolved symbol warnings because there were too many)

Make live migration helpers provided by QAT PF driver always available
even if CONFIG_PCI_IOV is not selected. This does not cause any side
effect.

Reported-by: Arnd Bergmann <arnd@arndb.de>
Closes: https://lore.kernel.org/lkml/20240607153406.60355e6c.alex.williamson@redhat.com/T/
Fixes: bb208810b1ab ("vfio/qat: Add vfio_pci driver for Intel QAT SR-IOV VF devices")
Signed-off-by: Xin Zeng <xin.zeng@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

commit 9283b73
Author: Giovanni Cabiddu giovanni.cabiddu@intel.com
Date: Mon Oct 21 13:37:53 2024 +0100

vfio/qat: fix overflow check in qat_vf_resume_write()

The unsigned variable `size_t len` is cast to the signed type `loff_t`
when passed to the function check_add_overflow(). This function considers
the type of the destination, which is of type loff_t (signed),
potentially leading to an overflow. This issue is similar to the one
described in the link below.

Remove the cast.

Note that even if check_add_overflow() is bypassed, by setting `len` to
a value that is greater than LONG_MAX (which is considered as a negative
value after the cast), the function copy_from_user(), invoked a few lines
later, will not perform any copy and return `len` as (len > INT_MAX)
causing qat_vf_resume_write() to fail with -EFAULT.

Fixes: bb208810b1ab ("vfio/qat: Add vfio_pci driver for Intel QAT SR-IOV VF devices")
CC: stable@vger.kernel.org # 6.10+
Link: https://lore.kernel.org/all/138bd2e2-ede8-4bcc-aa7b-f3d9de167a37@moroto.mountain
Reported-by: Zijie Zhao <zzjas98@gmail.com>
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Xin Zeng <xin.zeng@intel.com>
Link: https://lore.kernel.org/r/20241021123843.42979-1-giovanni.cabiddu@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

@opsiff opsiff merged commit 6fe43b2 into deepin-community:linux-6.6.y Jun 3, 2025
6 of 7 checks passed
@Avenger-285714 Avenger-285714 deleted the QAT_live_migration branch June 4, 2025 06:56
@Avenger-285714
Copy link
Copy Markdown
Member Author

iommu/amd: Set the pgsize_bitmap correctly

#859

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.