feat: Backport VRAM management patches for dmem cgroup (6.6.y)#1890
feat: Backport VRAM management patches for dmem cgroup (6.6.y)#1890deepin-wm wants to merge 5 commits into
Conversation
…otection Backport the dmem cgroup controller from kernel 6.14 and the page_counter_calculate_protection function from kernel 6.11. The dmem cgroup controller allows tracking and limiting device memory (such as GPU VRAM) consumption via cgroups. It uses the same min/low/max semantics as the memory cgroup. page_counter_calculate_protection is needed by dmem to calculate effective memory protection values. This function was factored out of memcontrol.c in kernel 6.11.
Callers can use this feedback to be more aggressive in making space for
allocations of a cgroup if they know it is protected.
These are counterparts to memcg's mem_cgroup_below_{min,low}.
Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
This helps to find a common subtree of two resources, which is important when determining whether it's helpful to evict one resource in favor of another. To facilitate this, add a common helper to find the ancestor of two cgroups using each cgroup's ancestor array. Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
Add dmem cgroup integration to TTM for kernel 6.6, adapting the VRAM management improvements from pixelcluster's dmemcg-aggressive-protect branch for the 6.6 TTM code structure. Key changes: - Add ttm_bo_alloc_state for tracking allocation state (charge_pool, limit_pool, in_evict, may_try_low) - Add ttm_bo_alloc_at_place() for dmem-aware allocation attempts - Add ttm_resource_try_charge() for pre-charging cgroups before resource allocation - Split cgroup charge from resource allocation in ttm_resource_alloc - Add dmem cgroup pool state (css) to ttm_resource - Add dmem cgroup region (cg) to ttm_resource_manager - Add ttm_mem_evict_first_dmem() that skips protected BOs during eviction - Add ttm_bo_evict_valuable_dmem() for cgroup-aware eviction decisions using common ancestor for correct protection calculation - Be more aggressive when allocating below dmem cgroup protection limits - Retry eviction with low-protected BOs when may_try_low is set This is a functional equivalent of pixelcluster's patches 3-6, adapted for 6.6's TTM eviction mechanism (ttm_mem_evict_first instead of ttm_bo_evict_alloc with ttm_lru_walk). Original patches by Natalie Vock <natalie.vock@gmx.de> Adapted for deepin-community/kernel linux-6.6.y branch.
Reviewer's GuideBackports the dmem cgroup controller and its page-counter protection logic from newer kernels into 6.6 and wires it into TTM VRAM allocation/eviction paths so that VRAM usage honors dmem.low/min protections, preferentially evicting unprotected BOs and keeping protected BOs in VRAM. Sequence diagram for dmem-aware VRAM allocation and eviction in TTMsequenceDiagram
participant Proc as Process
participant BO as ttm_bo_mem_space
participant Alloc as ttm_bo_alloc_at_place
participant RMCharge as ttm_resource_try_charge
participant DmemCharge as dmem_cgroup_try_charge
participant ResAlloc as ttm_resource_alloc
participant BelowMin as dmem_cgroup_below_min
participant BelowLow as dmem_cgroup_below_low
participant Force as ttm_bo_mem_force_space
participant EvictDmem as ttm_mem_evict_first_dmem
participant EvictVal as dmem_cgroup_state_evict_valuable
Proc->>BO: ttm_bo_mem_space(bo, placement, mem, ctx)
loop placement entries
BO->>Alloc: ttm_bo_alloc_at_place(bo, place, ctx, force_space=false)
Alloc->>RMCharge: ttm_resource_try_charge(bo, place, &charge_pool, &limit_pool)
RMCharge->>DmemCharge: dmem_cgroup_try_charge(region, size, &pool, &limit_pool)
DmemCharge-->>RMCharge: 0 or -EAGAIN
RMCharge-->>Alloc: 0 or -EAGAIN
alt charge succeeded
Alloc->>BelowMin: dmem_cgroup_below_min(NULL, charge_pool)
Alloc->>BelowLow: dmem_cgroup_below_low(NULL, charge_pool)
Alloc->>ResAlloc: ttm_resource_alloc(bo, place, res, charge_pool)
ResAlloc-->>Alloc: 0 or -ENOSPC
alt allocated
Alloc-->>BO: 0
BO-->>Proc: success
else no space but may evict
Alloc-->>BO: -EBUSY
end
else hit dmem limit (-EAGAIN)
Alloc-->>BO: mapped to -EBUSY or -ENOSPC
end
end
alt BO gets -EBUSY
BO->>Force: ttm_bo_mem_force_space(bo, place, mem, ctx, alloc_state)
loop evict until space
alt manager has cg
Force->>EvictDmem: ttm_mem_evict_first_dmem(bdev, man, place, ctx, ticket, alloc_state)
EvictDmem->>EvictVal: dmem_cgroup_state_evict_valuable(limit_pool, test_pool, try_low, &hit_low)
EvictVal-->>EvictDmem: true/false (evict or skip protected BO)
else
Force->>EvictDmem: ttm_mem_evict_first(bdev, man, place, ctx, ticket)
end
end
Force->>ResAlloc: ttm_resource_alloc(bo, place, mem, charge_pool)
ResAlloc-->>Force: 0
Force-->>BO: 0
BO-->>Proc: success
end
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @deepin-wm. Thanks for your PR. I'm waiting for a deepin-community member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- The signature change to ttm_resource_alloc (new charge_pool parameter) will break any out-of-tree users; consider adding a static inline wrapper with the old three-argument signature that forwards to the new helper with a NULL pool to preserve the existing API surface.
- In ttm_resource_free(), the uncharge uses bo->base.size rather than the resource size; if a manager allocates with alignment / padding or partial usage, this may mismatch the charged amount, so it would be safer to uncharge based on the resource’s actual size field instead of bo->base.size.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The signature change to ttm_resource_alloc (new charge_pool parameter) will break any out-of-tree users; consider adding a static inline wrapper with the old three-argument signature that forwards to the new helper with a NULL pool to preserve the existing API surface.
- In ttm_resource_free(), the uncharge uses bo->base.size rather than the resource size; if a manager allocates with alignment / padding or partial usage, this may mismatch the charged amount, so it would be safer to uncharge based on the resource’s actual size field instead of bo->base.size.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Pull request overview
This PR backports the device-memory (dmem) cgroup controller plus prerequisite page-counter protection accounting, and integrates dmem charging / protection-aware eviction into TTM so VRAM allocation/eviction respects dmem.min/dmem.low on linux-6.6.y.
Changes:
- Add
page_counter_calculate_protection()to compute effective min/low protection top-down in a hierarchy. - Introduce the dmem cgroup controller (Kconfig, cgroup subsystem registration, controller implementation, and docs).
- Extend TTM resource/allocation/eviction paths to charge VRAM allocations to dmem cgroups and avoid evicting protected buffers.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| mm/page_counter.c | Adds effective protection calculation and exported helper for counters. |
| kernel/cgroup/Makefile | Builds the new dmem controller when enabled. |
| kernel/cgroup/dmem.c | Implements dmem controller: regions, pools, charging, protection queries, and eviction heuristics. |
| init/Kconfig | Adds CONFIG_CGROUP_DMEM option. |
| include/linux/page_counter.h | Declares page_counter_calculate_protection(). |
| include/linux/cgroup.h | Adds cgroup_common_ancestor() helper. |
| include/linux/cgroup_subsys.h | Registers the new dmem cgroup subsystem. |
| include/linux/cgroup_dmem.h | New public-internal header for dmem charging/eviction APIs. |
| include/drm/ttm/ttm_resource.h | Adds dmem region/pool tracking fields to TTM resource/manager APIs and structs. |
| drivers/gpu/drm/ttm/ttm_resource.c | Adds ttm_resource_try_charge() and plumbs charge ownership into resources/free path. |
| drivers/gpu/drm/ttm/ttm_bo.c | Implements dmem-aware allocation and eviction paths in TTM BO placement/eviction. |
| Documentation/admin-guide/cgroup-v2.rst | Documents the new dmem controller interface and semantics. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /** | ||
| * @css: cgroup state this resource is charged to | ||
| */ | ||
| struct dmem_cgroup_pool_state *css; | ||
|
|
There was a problem hiding this comment.
Fixed in b1f5900. Moved css field to replace DEEPIN_KABI_RESERVE(1) after lru instead of before it, preserving the offset of existing struct members.
| struct cgroup *ancestor_cgroup; | ||
| struct cgroup_subsys_state *ancestor_css; | ||
|
|
||
| if (!a || !b) | ||
| return NULL; | ||
|
|
||
| ancestor_cgroup = cgroup_common_ancestor(a->cs->css.cgroup, b->cs->css.cgroup); | ||
| if (!ancestor_cgroup) | ||
| return NULL; | ||
|
|
||
| ancestor_css = cgroup_e_css(ancestor_cgroup, &dmem_cgrp_subsys); | ||
| css_get(ancestor_css); | ||
|
|
||
| return get_cg_pool_unlocked(css_to_dmemcs(ancestor_css), a->region); | ||
| } |
There was a problem hiding this comment.
Fixed in b1f5900. Added IS_ERR_OR_NULL check on get_cg_pool_unlocked return value, with css_put(ancestor_css) on error path. Returns NULL on failure.
| region_name = strsep(&options, " \t"); | ||
| if (!region_name[0]) | ||
| continue; | ||
|
|
||
| rcu_read_lock(); | ||
| region = dmemcg_get_region_by_name(region_name); | ||
| rcu_read_unlock(); | ||
|
|
||
| if (!region) | ||
| return -EINVAL; | ||
|
|
||
| err = dmemcg_parse_limit(options, region, &new_limit); | ||
| if (err < 0) |
There was a problem hiding this comment.
Fixed in b1f5900. Added NULL check for options after strsep: if (!options) return -EINVAL;
| val = fn(pool); | ||
| if (val < PAGE_COUNTER_MAX) | ||
| seq_printf(sf, " %lld\n", val); | ||
| else | ||
| seq_puts(sf, " max\n"); |
There was a problem hiding this comment.
Fixed in b1f5900. Changed %lld to %llu for u64 format specifiers.
| if (pool == ERR_PTR(-ENOMEM)) { | ||
| pool = NULL; | ||
| if (WARN_ON(allocpool)) | ||
| continue; | ||
|
|
||
| allocpool = kzalloc(sizeof(*allocpool), GFP_KERNEL); | ||
| if (allocpool) { | ||
| pool = NULL; | ||
| continue; | ||
| } | ||
| } |
There was a problem hiding this comment.
Fixed in b1f5900. Added return ERR_PTR(-ENOMEM); when fallback kzalloc fails, preventing the infinite loop.
| * @size: Size (in bytes) to charge. | ||
| * @ret_pool: On succesfull allocation, the pool that is charged. | ||
| * @ret_limit_pool: On a failed allocation, the limiting pool. |
There was a problem hiding this comment.
Fixed in b1f5900. Corrected spelling in kernel-doc parameter description.
| /** | ||
| * @ref: References keeping the region alive. | ||
| * Keeps the region reference alive after a succesful RCU lookup. | ||
| */ |
| @@ -0,0 +1,91 @@ | |||
| /* SPDX-License-Identifier: MIT */ | |||
There was a problem hiding this comment.
Fixed in b1f5900. Changed SPDX-License-Identifier from MIT to GPL-2.0 in include/linux/cgroup_dmem.h for consistency with other internal cgroup headers.
| do { | ||
| ret = ttm_resource_alloc(bo, place, mem); | ||
| if (likely(!ret)) | ||
| ret = ttm_resource_alloc(bo, place, mem, | ||
| alloc_state ? alloc_state->charge_pool : NULL); | ||
| if (likely(!ret)) { |
There was a problem hiding this comment.
Fixed in b1f5900. Added charge_pool initialization in ttm_bo_mem_force_space: when man->cg is set but charge_pool is NULL, we call ttm_resource_try_charge() before proceeding. This prevents bypassing dmem cgroup limits.
| if (!alloc_state) | ||
| return true; | ||
|
|
||
| /* Skip BOs from the same cgroup when not trying low-protected ones */ | ||
| if (!alloc_state->may_try_low && | ||
| bo->resource->css == alloc_state->charge_pool) | ||
| return false; |
There was a problem hiding this comment.
Fixed in b1f5900. Added early return with return true; when bo->resource->css is NULL in ttm_bo_evict_valuable_dmem(), treating uncharged resources as unprotected and evictable.
Fix issues identified by Copilot code review: dmem.c: - Add NULL test pointer check in dmem_cgroup_below_min/low to prevent NULL dereference when called with uncharged pools - Fix dmem_cgroup_get_common_ancestor ERR_PTR handling: check for IS_ERR_OR_NULL return from get_cg_pool_unlocked and release css reference on failure - Fix kernel-doc: @b parameter was incorrectly described as 'First pool' - Fix get_cg_pool_unlocked infinite loop: return ERR_PTR(-ENOMEM) when fallback allocation fails instead of looping forever - Add NULL options check in dmemcg_limit_write to prevent crash when line contains only a region name - Fix %lld -> %llu format specifier for u64 values - Fix kernel-doc mismatch: dmem_cgroup_evict_valuable -> dmem_cgroup_state_evict_valuable cgroup_dmem.h: - Change license from MIT to GPL-2.0 for consistency with other internal cgroup headers ttm_resource.h: - Move css field to DEEPIN_KABI_RESERVE slot after lru to preserve struct layout and avoid breaking KABI for out-of-tree users ttm_bo.c: - Guard dmem_cgroup_below_min/low calls with charge_pool check - Only skip same-cgroup BOs during eviction when charge_pool is non-NULL - Handle NULL css in eviction: treat uncharged resources as evictable - Add charge_pool initialization in ttm_bo_mem_force_space for busy_placement path to prevent bypassing dmem cgroup limits
Summary
Backport VRAM management patches from pixelcluster's dmemcg-aggressive-protect branch to improve VRAM allocation for low-end GPUs, targeting the linux-6.6.y branch.
These patches fix AMDGPU's VRAM management so that applications protected by dmem cgroup limits (dmem.low/dmem.min) are more aggressive about evicting unprotected buffers, preventing protected application buffers from being forced into GTT (system RAM) even when they are within their protection limits.
Challenges
Kernel 6.6 does not have the dmem cgroup infrastructure that was introduced in 6.14. This PR backports all prerequisites in addition to the VRAM management patches:
Changes
Commit 1: cgroup/dmem: Add dmem cgroup controller and page_counter_calculate_protection
Commit 2: cgroup/dmem: Add queries for protection values (pixelcluster patch 1)
Commit 3: cgroup,cgroup/dmem: Add (dmem_)cgroup_common_ancestor helper (pixelcluster patch 2)
Commit 4: drm/ttm: Add dmem cgroup support for VRAM management
Adapted version of pixelcluster's patches 3-6 for 6.6's TTM code:
Notes
Source
Patches from: https://pixelcluster.github.io/VRAM-Mgmt-fixed/
Original commits by Natalie Vock natalie.vock@gmx.de
Summary by Sourcery
Backport device-memory (dmem) cgroup support and integrate it with TTM-based VRAM management to honor dmem protection when allocating and evicting GPU buffers.
New Features:
Enhancements:
Documentation: