Skip to content

(RHEL-137251) Add new RestartMode= option#419

Open
dtardon wants to merge 3 commits into
redhat-plumbers:mainfrom
dtardon:RHEL-137251-RestartMode=
Open

(RHEL-137251) Add new RestartMode= option#419
dtardon wants to merge 3 commits into
redhat-plumbers:mainfrom
dtardon:RHEL-137251-RestartMode=

Conversation

@dtardon
Copy link
Copy Markdown
Member

@dtardon dtardon commented Apr 16, 2026

Resolves: RHEL-137251

@github-actions github-actions Bot changed the title Add new RestartMode= option (RHEL-137251) Add new RestartMode= option Apr 16, 2026
@github-actions github-actions Bot added tracker/invalid-product tracker/unapproved Formerly needs-acks pr/needs-ci Formerly needs-ci pr/needs-review Formerly needs-review labels Apr 16, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 16, 2026

Commit validation

Tracker - RHEL-137251

The following commits meet all requirements

commit upstream
f62f09e - pid1: introduce new SERVICE{DEAD|FAILED}BEFORE_AUTO_RESTART service … systemd/systemd@a1d3157
70dca05 - service: add new RestartMode option systemd/systemd@e568fea
2dae6d1 - core/service: drop unneeded unit_add_to_gc_queue() systemd/systemd@818315a

Tracker validation

Success

🟢 Tracker RHEL-137251 has set desired product: rhel-9.9
🟢 Tracker RHEL-137251 has set desired component: systemd
🟢 Tracker RHEL-137251 has been approved
🟢 Tracker RHEL-137251 has set severity
🟠 Tracker RHEL-137251 is not linked to any backfill issue


Pull Request validation

Failed

🔴 Review - Missing review from a member (1 required)

Success

🟡 CI - Waived


Triggered by Workflow Run

poettering and others added 3 commits April 17, 2026 12:24
… substates

When a service deactivates and is then automatically restarted via
Restart= we currently quickly transition through
SERVICE_DEAD/SERVICE_FAILED. Which is weird given it's not the
normal ("permanent") dead/failed state, but a transitory one we
immediately leave from again. We do this so that software that looks for
failures/successes can take notice, even if we restart as a consequence
of the deactivation.

Let's clean this up a bit: let's introduce two new states:
SERVICE_DEAD_BEFORE_AUTO_RESTART and SERVICE_FAILED_BEFORE_AUTO_RESTART
that are used for the transitory states. Both the SERVICE_DEAD and
SERVICE_DEAD_BEFORE_AUTO_RESTART will map to the high-level
UNIT_INACTIVE state though. (and similar for the respective failed
states). This means the high-level state machine won't change by this,
only the low-level one.

This clearly seperates the substates, which makes the state engine
cleaner, and allows clients to follow precisely whether we are in a
transitory dead/failed state, or a permanent one, by looking at the
service substate. Moreover it allows us to remove the 'n_keep_fd_store'
which so far we used to ensure the fdstore was not released during this
transitory dead/failed state but only during the permanent one. Since we
can now distinguish these states properly we can just use that.

This has been bugging me for a while. Let's clean this up.

Note that the unit restart logic is already nicely covered in the
testsiute, hence this adds no new tests for that.

And yes, this could be considered a compat break, but sofar we took the
liberty to make changes to the low-level state machine (i.e. SERVICE_xyz
states, sometimes called "substates") without considering this a bad
breakage – the high-level state machine (i.e.  UNIT_xyz states) should
be considered API that cannot be changed.

(cherry picked from commit a1d3157)

Related: RHEL-137251
When this option is set to direct, the service restarts without entering a failed
state. Dependent units are not notified of transitory failure.

This is useful for the following use case:

We have a target with Requires=my-service, After=my-service.
my-service.service is a oneshot service and has Restart=on-failure in
its definition.

my-service.service can get stuck for various reasons and time out, in
which case it is restarted. Currently, when it fails the first time, the
target fails, even though my-service is restarted.

The behavior we're looking for is that until my-service is not restarted
anymore, the target stays pending waiting for my-service.service to
start successfully or fail without being restarted anymore.

(cherry picked from commit e568fea)

Resolves: RHEL-137251
Follow-up for a1d3157
and 6ac62d6

With the aforementioned commits, unit_release_resources()
is dispatched in a dedicated queue, and Service.n_keep_fd_store
has been dropped, hence the comment is outdated. Moreover,
the unit is added to GC queue in unit_notify() already.
No other unit types do this in corresponding _enter_dead()
functions, nor does Service need it anymore.

(cherry picked from commit 818315a)

Related: RHEL-137251
@dtardon dtardon force-pushed the RHEL-137251-RestartMode= branch from 662c5c8 to 2dae6d1 Compare April 17, 2026 10:24
@dtardon
Copy link
Copy Markdown
Member Author

dtardon commented Apr 17, 2026

rpm-build:centos-stream-9-s390x is busted.

@github-actions github-actions Bot removed the pr/needs-ci Formerly needs-ci label Apr 17, 2026
@github-actions github-actions Bot added tracker/missing Formerly needs-bz and removed tracker/missing Formerly needs-bz labels May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-waived pr/needs-review Formerly needs-review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants