Skip to content

arm64: tlb: Optimize the performance of tlb flush#1714

Closed
xiaqian1486 wants to merge 4 commits into
deepin-community:linux-6.6.yfrom
xiaqian1486:tlb-6.6.y
Closed

arm64: tlb: Optimize the performance of tlb flush#1714
xiaqian1486 wants to merge 4 commits into
deepin-community:linux-6.6.yfrom
xiaqian1486:tlb-6.6.y

Conversation

@xiaqian1486
Copy link
Copy Markdown

@xiaqian1486 xiaqian1486 commented May 14, 2026

This patches optimize the performance of tlb flush.

  1. Rename MAX_TLBI_OPS
  2. Allow range operation for MAX_TLBI_RANGE_PAGES
  3. Add __flush_tlb_range_limit_excess()
  4. Optimize flush tlb kernel range

Summary by Sourcery

Optimize arm64 TLB flush handling for kernel and user ranges to better leverage range-based operations and avoid excessive full-context flushes.

Enhancements:

  • Rename the TLB invalidation operation limit constant to clarify its association with DVM operations.
  • Introduce a helper to centralize logic for determining when a TLB range flush exceeds per-op limits and should fall back to full-context flushes.
  • Update user and kernel TLB range flush paths to use the shared limit helper and the generic range operation helper instead of manual per-page invalidation loops.

Perhaps unsurprisingly, I-cache invalidations suffer from performance
issues similar to TLB invalidations on certain systems. TLB and I-cache
maintenance all result in DVM on the mesh, which is where the real
bottleneck lies.

Rename the heuristic to point the finger at DVM, such that it may be
reused for limiting I-cache invalidations.

Reviewed-by: Gavin Shan <gshan@redhat.com>
Tested-by: Gavin Shan <gshan@redhat.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230920080133.944717-2-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
(cherry picked from commit ec1c3b9)
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Wang Yinfeng <wangyinfeng@phytium.com.cn>
Signed-off-by: Xia Qian <xiaqian1486@phytium.com.cn>
MAX_TLBI_RANGE_PAGES pages is covered by SCALE#3 and NUM#31 and it's
supported now. Allow TLBI RANGE operation when the number of pages is
equal to MAX_TLBI_RANGE_PAGES in __flush_tlb_range_nosync().

Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Shaoqin Huang <shahuang@redhat.com>
Link: https://lore.kernel.org/r/20240405035852.1532010-4-gshan@redhat.com
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit 73301e4)
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Wang Yinfeng <wangyinfeng@phytium.com.cn>
Signed-off-by: Xia Qian <xiaqian1486@phytium.com.cn>
The __flush_tlb_range_limit_excess() helper will be used when
flush tlb kernel range soon.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20240923131351.713304-2-wangkefeng.wang@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit 7ffc13e)
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Wang Yinfeng <wangyinfeng@phytium.com.cn>
Signed-off-by: Xia Qian <xiaqian1486@phytium.com.cn>
Currently the kernel TLBs is flushed page by page if the target
VA range is less than MAX_DVM_OPS * PAGE_SIZE, otherwise we'll
brutally issue a TLBI ALL.

But we could optimize it when CPU supports TLB range operations,
convert to use __flush_tlb_range_op() like other tlb range flush
to improve performance.

Co-developed-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20240923131351.713304-3-wangkefeng.wang@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit a923705)
[KF: no lpa2 support]
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Wang Yinfeng <wangyinfeng@phytium.com.cn>
Signed-off-by: Xia Qian <xiaqian1486@phytium.com.cn>
@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented May 14, 2026

Reviewer's Guide

Optimizes ARM64 TLB flush paths by renaming the generic per-op limit macro, centralizing the decision logic for when to fall back to full MM/all flushes, and reusing the range-based TLB invalidation helper for kernel address ranges to improve performance and consistency.

File-Level Changes

Change Details Files
Rename the TLBI operation count limit macro to better reflect its DVM-specific purpose and prepare for separate range limits.
  • Rename MAX_TLBI_OPS to MAX_DVM_OPS while keeping its value tied to PTRS_PER_PTE
  • Update comments to reference MAX_DVM_OPS semantics where the macro is used for non-range TLB operations
arch/arm64/include/asm/tlbflush.h
Factor out common logic that decides when a TLB range flush is too large and should fall back to a full MM/all flush.
  • Introduce __flush_tlb_range_limit_excess() that encapsulates the size/feature checks for range-based and per-page TLB operations
  • Define the behavior so that without TLB range support, up to (MAX_DVM_OPS - 1) pages are handled via discrete operations, and with range support up to MAX_TLBI_RANGE_PAGES pages can be handled
  • Use this helper in __flush_tlb_range() instead of open-coded conditional logic
arch/arm64/include/asm/tlbflush.h
Optimize flush_tlb_kernel_range() by aligning the range, sharing the range-limit logic, and using the generic range operation helper instead of a manual VA loop.
  • Align start/end to PAGE_SIZE boundaries and compute pages consistently with __flush_tlb_range()
  • Use __flush_tlb_range_limit_excess() to decide when to fall back to flush_tlb_all() instead of using a custom MAX_TLBI_OPS-based threshold
  • Replace the explicit vaale1is loop with a call to __flush_tlb_range_op(vaale1is, ...) after issuing dsb(ishst), followed by the standard dsb(ish)/isb barriers
arch/arm64/include/asm/tlbflush.h

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@deepin-ci-robot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign opsiff for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@deepin-ci-robot
Copy link
Copy Markdown

Hi @xiaqian1486. Thanks for your PR.

I'm waiting for a deepin-community member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • In __flush_tlb_range_limit_excess(), pages is passed in even though it’s derivable from start, end, and stride; consider computing it internally to avoid potential inconsistencies between the arguments over time.
  • The new __flush_tlb_range_limit_excess() semantics for kernel ranges now use rounded start/end for the limit check, which may change when we fall back to flush_tlb_all() for unaligned ranges; it would be good to confirm this behavior is intentional or clarify it with a short comment.
  • The comment in __flush_tlb_range_limit_excess() talks about (MAX_DVM_OPS - 1) pages and MAX_TLBI_RANGE_PAGES pages being handled, while the code uses >= for the DVM bound and > for the range-bound; consider tightening the wording to explicitly match these boundary conditions to avoid confusion.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `__flush_tlb_range_limit_excess()`, `pages` is passed in even though it’s derivable from `start`, `end`, and `stride`; consider computing it internally to avoid potential inconsistencies between the arguments over time.
- The new `__flush_tlb_range_limit_excess()` semantics for kernel ranges now use rounded `start`/`end` for the limit check, which may change when we fall back to `flush_tlb_all()` for unaligned ranges; it would be good to confirm this behavior is intentional or clarify it with a short comment.
- The comment in `__flush_tlb_range_limit_excess()` talks about `(MAX_DVM_OPS - 1)` pages and `MAX_TLBI_RANGE_PAGES` pages being handled, while the code uses `>=` for the DVM bound and `>` for the range-bound; consider tightening the wording to explicitly match these boundary conditions to avoid confusion.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the arm64 TLB flushing helpers to better leverage TLBI range operations and to centralize the “range too large, fall back to full flush” decision logic, with the goal of reducing per-page invalidation overhead and avoiding excessive full-context flushes.

Changes:

  • Rename the per-op threshold constant from MAX_TLBI_OPS to MAX_DVM_OPS.
  • Introduce __flush_tlb_range_limit_excess() to share limit/fallback logic between user and kernel range flush paths.
  • Rework flush_tlb_kernel_range() to use __flush_tlb_range_op() (and the shared limit helper) instead of a manual per-page TLBI loop.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +409 to +422
unsigned long end, unsigned long pages, unsigned long stride)
{
/*
* When the system does not support TLB range based flush
* operation, (MAX_DVM_OPS - 1) pages can be handled. But
* with TLB range based operation, MAX_TLBI_RANGE_PAGES
* pages can be handled.
*/
if ((!system_supports_tlb_range() &&
(end - start) >= (MAX_DVM_OPS * stride)) ||
pages > MAX_TLBI_RANGE_PAGES)
return true;

return false;
Comment on lines +413 to +415
* operation, (MAX_DVM_OPS - 1) pages can be handled. But
* with TLB range based operation, MAX_TLBI_RANGE_PAGES
* pages can be handled.
@xiaqian1486 xiaqian1486 deleted the tlb-6.6.y branch May 14, 2026 09:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants