You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
NVIDIA: VR: SAUCE: cxl/region: Support multi-level interleaving with smaller granularities for lower levels
The CXL specification supports multi-level interleaving "as long as
all the levels use different, but consecutive, HPA bits to select the
target and no Interleave Set has more than 8 devices" (from 3.2).
Currently the kernel expects that a decoder's "interleave granularity
is a multiple of @parent_port granularity". That is, the granularity
of a lower level is bigger than those of the parent and uses the outer
HPA bits as selector. It works e.g. for the following 8-way config:
* cross-link (cross-hostbridge config in CFMWS):
* 4-way
* 256 granularity
* Selector: HPA[8:9]
* sub-link (CXL Host bridge config of the HDM):
* 2-way
* 1024 granularity
* Selector: HPA[10]
Now, if the outer HPA bits are used for the cross-hostbridge, an 8-way
config could look like this:
* cross-link (cross-hostbridge config in CFMWS):
* 4-way
* 512 granularity
* Selector: HPA[9:10]
* sub-link (CXL Host bridge config of the HDM):
* 2-way
* 256 granularity
* Selector: HPA[8]
The enumeration of decoders for this configuration fails then with
following error:
cxl region0: pci0000:00:port1 cxl_port_setup_targets expected iw: 2 ig: 1024 [mem 0x10000000000-0x1ffffffffff flags 0x200]
cxl region0: pci0000:00:port1 cxl_port_setup_targets got iw: 2 ig: 256 state: enabled 0x10000000000:0x1ffffffffff
cxl_port endpoint12: failed to attach decoder12.0 to region0: -6
Note that this happens only if firmware is setting up the decoders
(CXL_REGION_F_AUTO). For userspace region assembly the granularities
are chosen to increase from root down to the lower levels. That is,
outer HPA bits are always used for lower interleaving levels.
Rework the implementation to also support multi-level interleaving
with smaller granularities for lower levels. Determine the interleave
set of autodetected decoders. Check that it is a subset of the root
interleave.
The HPA selector bits are extracted for all decoders of the set and
checked that there is no overlap and bits are consecutive. All
decoders can be programmed now to use any bit range within the
region's target selector.
Signed-off-by: Robert Richter <rrichter@amd.com>
(backported from https://lore.kernel.org/all/20251028094754.72816-1-rrichter@amd.com/)
[jan: Resolved minor conflicts]
Signed-off-by: Jiandi An <jan@nvidia.com>
0 commit comments