[Deepin-Kernel-SIG] [linux 6.18.y] [Backport] sched/fair: Revert 6d71a9c ("sched/fair: Fix EEVDF entity placement b… …ug causing scheduling lag")#1909
Conversation
This reverts commit 4ed4cc0. Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
…ug causing scheduling lag") [ Upstream commit 101f3498b4bdfef97152a444847948de1543f692 ] Zicheng Qu reported that, because avg_vruntime() always includes cfs_rq->curr, when ->on_rq, place_entity() doesn't work right. Specifically, the lag scaling in place_entity() relies on avg_vruntime() being the state *before* placement of the new entity. However in this case avg_vruntime() will actually already include the entity, which breaks things. Also, Zicheng Qu argues that avg_vruntime should be invariant under reweight. IOW commit 6d71a9c ("sched/fair: Fix EEVDF entity placement bug causing scheduling lag") was wrong! The issue reported in 6d71a9c could possibly be explained by rounding artifacts -- notably the extreme weight '2' is outside of the range of avg_vruntime/sum_w_vruntime, since that uses scale_load_down(). By scaling vruntime by the real weight, but accounting it in vruntime with a factor 1024 more, the average moves significantly. However, that is now cured. Tested by reverting 66951e4 ("sched/fair: Fix update_cfs_group() vs DELAY_DEQUEUE") and tracing vruntime and vlag figures again. Reported-by: Zicheng Qu <quzicheng@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Tested-by: Shubhang Kaushik <shubhang@os.amperecomputing.com> Link: https://patch.msgid.link/20260219080625.066102672%40infradead.org (cherry picked from commit 101f3498b4bdfef97152a444847948de1543f692) [jstultz: Resolved minor collision in the revert against 6.18-stable] Signed-off-by: John Stultz <jstultz@google.com>
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
Reviewer's GuideBackports an upstream revert of a previous EEVDF placement fix and reintroduces updated BORE handling by refactoring lag/deadline computation around avg_vruntime, adding a rescale helper for reweighting, and adjusting how vruntime, vlag, deadlines, and vprot are managed during reweight, placement, and delayed requeue. Flow diagram for updated reweight_entity lag/deadline handlingflowchart TD
A[reweight_entity] --> B{se.on_rq?}
B -- no --> H[dequeue_load_avg]
B -- yes --> C[update_curr]
C --> D[avruntime = avg_vruntime]
D --> E[se.vlag = entity_lag cfs_rq se avruntime]
E --> F[se.deadline -= avruntime]
F --> G{curr && protect_slice se?}
G -- yes --> G1[se.vprot -= avruntime; rel_vprot = true]
G -- no --> H
G1 --> H[dequeue_load_avg]
H --> I[rescale_entity se weight rel_vprot]
I --> J[update_load_set se.load weight]
J --> K[recompute se.avg.load_avg]
K --> L[enqueue_load_avg]
L --> M{se.on_rq?}
M -- no --> Z[end]
M -- yes --> N{rel_vprot?}
N -- yes --> O[se.vprot += avruntime]
N -- no --> P
O --> P[se.deadline += avruntime]
P --> Q[se.rel_deadline = 0]
Q --> R[se.vruntime = avruntime - se.vlag]
R --> S[update_load_add cfs_rq.load se.load.weight]
S --> T{curr?}
T -- no --> U[__enqueue_entity]
T -- yes --> Z[end]
U --> Z[end]
Flow diagram for DELAY_ZERO requeue_delayed_entity with BORE handlingflowchart TD
A[requeue_delayed_entity] --> B{DELAY_ZERO feature?}
B -- no --> Z[end]
B -- yes --> C[update_entity_lag]
C --> D{se.vlag > 0?}
D -- no --> Z
D -- yes --> E[cfs_rq.nr_queued--]
E --> F{se != cfs_rq.curr?}
F -- yes --> G[__dequeue_entity]
F -- no --> H
G --> H[se.vlag = 0]
H --> I{CONFIG_SCHED_BORE && sched_bore_key && DELAY_ZERO?}
I -- yes --> J[flags or= ENQUEUE_WAKEUP]
I -- no --> K[flags = 0]
J --> L[place_entity cfs_rq se flags]
K --> L[place_entity cfs_rq se flags]
L --> M{se != cfs_rq.curr?}
M -- yes --> N[__enqueue_entity]
M -- no --> O
N --> O[cfs_rq.nr_queued++]
O --> Z[end]
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Pull request overview
Backports an upstream revert in the CFS fair scheduler (EEVDF) to correct entity placement/lag behavior by making lag/deadline adjustments reference the cfs_rq average vruntime state, and by tightening when relative-deadline placement is applied.
Changes:
- Add
entity_lag()helper to compute lag from a supplied average vruntime and reuse it for lag updates. - Rework
reweight_entity()to adjustvruntime/vlag/deadline/vprotrelative toavg_vruntime(cfs_rq)instead ofse->vruntime. - Gate relative-deadline placement in
place_entity()behindPLACE_REL_DEADLINEand simplifyDELAY_ZEROdelayed requeueing logic.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (se->on_rq) { | ||
| /* commit outstanding execution time */ | ||
| update_curr(cfs_rq); | ||
| update_entity_lag(cfs_rq, se); | ||
| #ifdef CONFIG_SCHED_BORE | ||
| vlag_unscaled = se->vlag; | ||
| #endif /* !CONFIG_SCHED_BORE */ | ||
| se->deadline -= se->vruntime; | ||
| avruntime = avg_vruntime(cfs_rq); | ||
| se->vlag = entity_lag(cfs_rq, se, avruntime); | ||
| se->deadline -= avruntime; | ||
| se->rel_deadline = 1; | ||
| if (curr && protect_slice(se)) { | ||
| vprot = se->vprot - se->vruntime; | ||
| se->vprot -= avruntime; | ||
| rel_vprot = true; |
[ Upstream commit 101f3498b4bdfef97152a444847948de1543f692 ]
Zicheng Qu reported that, because avg_vruntime() always includes
cfs_rq->curr, when ->on_rq, place_entity() doesn't work right.
Specifically, the lag scaling in place_entity() relies on
avg_vruntime() being the state before placement of the new entity.
However in this case avg_vruntime() will actually already include the
entity, which breaks things.
Also, Zicheng Qu argues that avg_vruntime should be invariant under
reweight. IOW commit 6d71a9c ("sched/fair: Fix EEVDF entity
placement bug causing scheduling lag") was wrong!
The issue reported in 6d71a9c could possibly be explained by
rounding artifacts -- notably the extreme weight '2' is outside of the
range of avg_vruntime/sum_w_vruntime, since that uses
scale_load_down(). By scaling vruntime by the real weight, but
accounting it in vruntime with a factor 1024 more, the average moves
significantly. However, that is now cured.
Tested by reverting 66951e4 ("sched/fair: Fix update_cfs_group()
vs DELAY_DEQUEUE") and tracing vruntime and vlag figures again.
Reported-by: Zicheng Qu quzicheng@huawei.com
Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org
Reviewed-by: Vincent Guittot vincent.guittot@linaro.org
Tested-by: K Prateek Nayak kprateek.nayak@amd.com
Tested-by: Shubhang Kaushik shubhang@os.amperecomputing.com
Link: https://patch.msgid.link/20260219080625.066102672%40infradead.org
(cherry picked from commit 101f3498b4bdfef97152a444847948de1543f692)
[jstultz: Resolved minor collision in the revert against 6.18-stable]
Signed-off-by: John Stultz jstultz@google.com
and revert and apply new bore for this version!
Summary by Sourcery
Revert a previous EEVDF entity placement fix in the fair scheduler and adjust vruntime, lag, deadline and protection handling around entity reweighting for the 6.18.y kernel backport.
Enhancements: