arm64: tlb: Optimize the performance of tlb flush#1714
Conversation
Perhaps unsurprisingly, I-cache invalidations suffer from performance issues similar to TLB invalidations on certain systems. TLB and I-cache maintenance all result in DVM on the mesh, which is where the real bottleneck lies. Rename the heuristic to point the finger at DVM, such that it may be reused for limiting I-cache invalidations. Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Gavin Shan <gshan@redhat.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20230920080133.944717-2-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev> (cherry picked from commit ec1c3b9) Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Wang Yinfeng <wangyinfeng@phytium.com.cn> Signed-off-by: Xia Qian <xiaqian1486@phytium.com.cn>
MAX_TLBI_RANGE_PAGES pages is covered by SCALE#3 and NUM#31 and it's supported now. Allow TLBI RANGE operation when the number of pages is equal to MAX_TLBI_RANGE_PAGES in __flush_tlb_range_nosync(). Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Link: https://lore.kernel.org/r/20240405035852.1532010-4-gshan@redhat.com Signed-off-by: Will Deacon <will@kernel.org> (cherry picked from commit 73301e4) Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Wang Yinfeng <wangyinfeng@phytium.com.cn> Signed-off-by: Xia Qian <xiaqian1486@phytium.com.cn>
The __flush_tlb_range_limit_excess() helper will be used when flush tlb kernel range soon. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20240923131351.713304-2-wangkefeng.wang@huawei.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> (cherry picked from commit 7ffc13e) Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Wang Yinfeng <wangyinfeng@phytium.com.cn> Signed-off-by: Xia Qian <xiaqian1486@phytium.com.cn>
Currently the kernel TLBs is flushed page by page if the target VA range is less than MAX_DVM_OPS * PAGE_SIZE, otherwise we'll brutally issue a TLBI ALL. But we could optimize it when CPU supports TLB range operations, convert to use __flush_tlb_range_op() like other tlb range flush to improve performance. Co-developed-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20240923131351.713304-3-wangkefeng.wang@huawei.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> (cherry picked from commit a923705) [KF: no lpa2 support] Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Wang Yinfeng <wangyinfeng@phytium.com.cn> Signed-off-by: Xia Qian <xiaqian1486@phytium.com.cn>
Reviewer's GuideOptimizes ARM64 TLB flush paths by renaming the generic per-op limit macro, centralizing the decision logic for when to fall back to full MM/all flushes, and reusing the range-based TLB invalidation helper for kernel address ranges to improve performance and consistency. File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @xiaqian1486. Thanks for your PR. I'm waiting for a deepin-community member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- In
__flush_tlb_range_limit_excess(),pagesis passed in even though it’s derivable fromstart,end, andstride; consider computing it internally to avoid potential inconsistencies between the arguments over time. - The new
__flush_tlb_range_limit_excess()semantics for kernel ranges now use roundedstart/endfor the limit check, which may change when we fall back toflush_tlb_all()for unaligned ranges; it would be good to confirm this behavior is intentional or clarify it with a short comment. - The comment in
__flush_tlb_range_limit_excess()talks about(MAX_DVM_OPS - 1)pages andMAX_TLBI_RANGE_PAGESpages being handled, while the code uses>=for the DVM bound and>for the range-bound; consider tightening the wording to explicitly match these boundary conditions to avoid confusion.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `__flush_tlb_range_limit_excess()`, `pages` is passed in even though it’s derivable from `start`, `end`, and `stride`; consider computing it internally to avoid potential inconsistencies between the arguments over time.
- The new `__flush_tlb_range_limit_excess()` semantics for kernel ranges now use rounded `start`/`end` for the limit check, which may change when we fall back to `flush_tlb_all()` for unaligned ranges; it would be good to confirm this behavior is intentional or clarify it with a short comment.
- The comment in `__flush_tlb_range_limit_excess()` talks about `(MAX_DVM_OPS - 1)` pages and `MAX_TLBI_RANGE_PAGES` pages being handled, while the code uses `>=` for the DVM bound and `>` for the range-bound; consider tightening the wording to explicitly match these boundary conditions to avoid confusion.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Pull request overview
This PR updates the arm64 TLB flushing helpers to better leverage TLBI range operations and to centralize the “range too large, fall back to full flush” decision logic, with the goal of reducing per-page invalidation overhead and avoiding excessive full-context flushes.
Changes:
- Rename the per-op threshold constant from
MAX_TLBI_OPStoMAX_DVM_OPS. - Introduce
__flush_tlb_range_limit_excess()to share limit/fallback logic between user and kernel range flush paths. - Rework
flush_tlb_kernel_range()to use__flush_tlb_range_op()(and the shared limit helper) instead of a manual per-page TLBI loop.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| unsigned long end, unsigned long pages, unsigned long stride) | ||
| { | ||
| /* | ||
| * When the system does not support TLB range based flush | ||
| * operation, (MAX_DVM_OPS - 1) pages can be handled. But | ||
| * with TLB range based operation, MAX_TLBI_RANGE_PAGES | ||
| * pages can be handled. | ||
| */ | ||
| if ((!system_supports_tlb_range() && | ||
| (end - start) >= (MAX_DVM_OPS * stride)) || | ||
| pages > MAX_TLBI_RANGE_PAGES) | ||
| return true; | ||
|
|
||
| return false; |
| * operation, (MAX_DVM_OPS - 1) pages can be handled. But | ||
| * with TLB range based operation, MAX_TLBI_RANGE_PAGES | ||
| * pages can be handled. |
This patches optimize the performance of tlb flush.
Summary by Sourcery
Optimize arm64 TLB flush handling for kernel and user ranges to better leverage range-based operations and avoid excessive full-context flushes.
Enhancements: