Skip to content

feat(deployment): add sd_notify integration with watchdog support#25072

Open
newtonne wants to merge 4 commits intovectordotdev:masterfrom
newtonne:systemd-notify
Open

feat(deployment): add sd_notify integration with watchdog support#25072
newtonne wants to merge 4 commits intovectordotdev:masterfrom
newtonne:systemd-notify

Conversation

@newtonne
Copy link
Copy Markdown

Summary

Adds systemd sd_notify integration to Vector, enabling enhanced service lifecycle management when running under systemd with Type=notify.

Vector now sends:

  • READY=1 when fully started and ready to process events
  • STOPPING=1 at the beginning of graceful shutdown
  • WATCHDOG=1 keepalive pings at half the configured WatchdogSec interval

The bundled vector.service and hardened-vector.service unit files are updated to use Type=notify, with an optional commented out WatchdogSec directive.

See: https://www.freedesktop.org/software/systemd/man/latest/sd_notify.html

Vector configuration

No config changes required.

How did you test this PR?

  • Tested manually on Linux with a systemd unit using Type=notify:
    • Verified READY=1 is sent after startup is completed
    • Verified STOPPING=1 is sent on systemctl stop
    • Verified WATCHDOG=1 pings are sent at the correct interval with WatchdogSec enabled
  • Verified all notify calls are no-ops when NOTIFY_SOCKET is not set, such as:
    • Unit tests
    • systemd unit using the default Type=simple
    • Standalone binary

Before

2026-03-30T18:31:00.008911+00:00 systemd[1]: Starting vector.service - Vector...
...
2026-03-30T18:31:00.048406+00:00 systemd[1]: vector.service: Changed start-pre -> running
2026-03-30T18:31:00.048599+00:00 systemd[1]: vector.service: Job 11673 vector.service/start finished, result=done
2026-03-30T18:31:00.048886+00:00 systemd[1]: Started vector.service - Vector.
...
2026-03-30T18:31:00.069751+00:00 vector[13987]: 2026-03-30T18:31:00.068889Z  INFO vector::app: Log level is enabled. level="info"
2026-03-30T18:31:00.069751+00:00 vector[13987]: 2026-03-30T18:31:00.069647Z  INFO vector::app: Loading configs. paths=["/etc/vector/vector.yaml"]
2026-03-30T18:31:00.074214+00:00 vector[13987]: 2026-03-30T18:31:00.074159Z  INFO vector::topology::running: Running healthchecks.
2026-03-30T18:31:00.074812+00:00 vector[13987]: 2026-03-30T18:31:00.074522Z  INFO vector::topology::builder: Healthcheck passed.
2026-03-30T18:31:00.074812+00:00 vector[13987]: 2026-03-30T18:31:00.074674Z  INFO vector: Vector has started. debug="false" version="0.54.0" arch="x86_64" revision="2b8b875 2026-03-10 15:47:37.284215410"
2026-03-30T18:31:00.074812+00:00 vector[13987]: 2026-03-30T18:31:00.074692Z  INFO vector::app: API is disabled, enable by setting `api.enabled` to `true` and use commands like `vector top`.

Note that systemd shows vector as started as soon as the process has forked.

After

2026-03-30T18:27:30.884285+00:00 systemd[1]: Starting vector.service - Vector...
...
2026-03-30T18:27:30.943735+00:00 vector[13467]: 2026-03-30T18:27:30.943609Z  INFO vector::app: Log level is enabled. level="info"
2026-03-30T18:27:30.945571+00:00 vector[13467]: 2026-03-30T18:27:30.945477Z  INFO vector::app: Loading configs. paths=["/etc/vector/vector.yaml"]
2026-03-30T18:27:30.964246+00:00 vector[13467]: 2026-03-30T18:27:30.964157Z  INFO vector::topology::running: Running healthchecks.
2026-03-30T18:27:30.965250+00:00 vector[13467]: 2026-03-30T18:27:30.964605Z  INFO vector::topology::builder: Healthcheck passed.
2026-03-30T18:27:30.965250+00:00 vector[13467]: 2026-03-30T18:27:30.965080Z  INFO vector: Vector has started. debug="true" version="0.55.0" arch="x86_64" revision=""
2026-03-30T18:27:30.965368+00:00 systemd[1]: vector.service: Got notification message from PID 13467 (READY=1)
2026-03-30T18:27:30.965406+00:00 systemd[1]: vector.service: Changed start -> running
2026-03-30T18:27:30.965421+00:00 systemd[1]: vector.service: Job 11079 vector.service/start finished, result=done
2026-03-30T18:27:30.965453+00:00 systemd[1]: Started vector.service - Vector.

But now systemd shows vector as started only after vector has finished starting up.

Change Type

  • Bug fix
  • New feature
  • Dependencies
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Changelog fragment added: changelog.d/systemd_notify.feature.md
  • No. A maintainer will apply the no-changelog label to this PR.

References

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • Some CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template.
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • make fmt
      • make check-clippy (if there are failures it's possible some of them can be fixed with make clippy-fix)
      • make test
  • After a review is requested, please avoid force pushes to help us review incrementally.
    • Feel free to push as many commits as you want. They will be squashed into one before merging.
    • For example, you can run git merge origin master and git push.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run make build-licenses to regenerate the license inventory and commit the changes (if any). More details on the dd-rust-license-tool.

@newtonne newtonne requested a review from a team as a code owner March 30, 2026 18:25
@github-actions github-actions bot added the domain: releasing Anything related to releasing Vector label Mar 30, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 30, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@newtonne
Copy link
Copy Markdown
Author

I have read the CLA Document and I hereby sign the CLA

@newtonne newtonne changed the title feat(systemd): add sd_notify integration with watchdog support feat(deployment): add sd_notify integration with watchdog support Mar 31, 2026
Copy link
Copy Markdown
Contributor

@jpds jpds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth adding RELOADING=1 support into reload_config_from_result.

Add systemd notify integration. Vector now sends `READY=1` when fully started, `STOPPING=1`
when beginning a graceful shutdown, and `WATCHDOG=1` pings at half the configured `WatchdogSec`
interval. The bundled `vector.service` and `hardened-vector.service` unit files are updated
to use `Type=notify`, with an optional `WatchdogSec` directive.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add an authors: here like the other files

Comment thread src/systemd.rs
let Some(duration) = sd_notify::watchdog_enabled() else {
return;
};
let mut ticker = interval(duration / 2);
Copy link
Copy Markdown
Contributor

@jpds jpds Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also might be worth adding here:

    debug!(
        message = "Systemd watchdog keepalive started.",
        interval_secs = ticker.as_secs_f64(),
    );

@newtonne
Copy link
Copy Markdown
Author

Hey @jpds. Thanks for taking a look. I did consider this but then I discovered the notify-reload type, which is seemingly the way to go. From sd_notify (3) regarding RELOADING=1:

This message is particularly relevant for Type=notify-reload services, to inform the service manager that the request to reload the service has been received and is now being processed.

However, it doesn’t seem to be widely adopted and has its detractors so rather than opening up a can of worms, I decided to keep the scope tight with the view that this can always be added later. What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: releasing Anything related to releasing Vector

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants