docs(kep): draft fail-fast restart budget and init-phase DNS for LWS#813
docs(kep): draft fail-fast restart budget and init-phase DNS for LWS#813panpan0000 wants to merge 4 commits into
Conversation
✅ Deploy Preview for kubernetes-sigs-lws canceled.
|
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: panpan0000 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
c21c1db to
de083b3
Compare
|
the 2nd commit( updating |
|
@panpan0000 can you create an issue which describes what this KEP is trying to address? |
|
Also to fix the failing tests, we need to wait for #814 to be completed |
|
820d7ff to
de083b3
Compare
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
de083b3 to
0590df4
Compare
|
rebase to trigger CI again, after #814 merged, as @Edwinhr716 metioned. |
|
itseems CI passed , please take a look when you have time, thank you! @Edwinhr716 |
|
ping |
|
I saw another problem: @Edwinhr716 when vLLM pod crash and restarted by probe, nothing changed, except the |
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
ff477d0 to
0d142bd
Compare
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
594394e to
758d435
Compare
Summary
Key Points
Notes
relevant to kubeflow/trainer#3417
Co-Author: GPT-5.4