ipvs: tolerate distinct VSGs resolving to the same kernel service tuple#2691
Open
pierrecdn wants to merge 2 commits into
Open
ipvs: tolerate distinct VSGs resolving to the same kernel service tuple#2691pierrecdn wants to merge 2 commits into
pierrecdn wants to merge 2 commits into
Conversation
When several Virtual Service Groups resolve to the same (proto, addr, port) IPVS tuple (distinct VSG names with overlapping entries, common in dynamically generated configs), the kernel only stores one service for the tuple. Each VSG has its own per-VSG entry alive counter, so a clear_diff_vsg() that removes one VSG's entry can issue LVS_CMD_DEL on the shared service while the surviving VSG was just marked vsge->reloaded=true and will be skipped on the next LVS_CMD_ADD pass. From that point on every healthcheck transition logs: IPVS cmd IP_VS_SO_SET_ADDDEST(1159) error: No such file or directory(2) because the kernel cannot find the parent service to attach the dst to. This change makes ipvs_talk() self-heal: when ADDDEST returns ENOENT, re-create the service from the same srule and retry the dest add once. If the service recreation fails for another reason, the original errno is restored and let the existing error path log it. Signed-off-by: Pierre Cheynier <p.cheynier@criteo.com>
ipvs_group_cmd() used to skip LVS_CMD_ADD for any VSG entry with vsg_entry->reloaded set on a reload, and to additionally skip when !is_vsge_alive(). Both checks reflect keepalived's in-memory state, potentially different than the actual kernel state. When several VSGs (different vsgnames, possibly emitted by a provisioning template that does not deduplicate across frontends) resolve to the same IPVS tuple (proto, addr, tuple), the kernel only keeps one service. Any clear_diff_vsg() pass that removes one VSG's entry can issue LVS_CMD_DEL on the shared service while the surviving VSG is just marked "reloaded=true" and so is skipped on the next reloads. The kernel state then diverges from keepalived's view and stays wedged forever: re-skips the ADD because reloaded=true, every checker transition logs IP_VS_SO_SET_ADDDEST ENOENT and IP_VS_SO_SET_DELDEST ENOENT against the missing parent service. This commit always issue LVS_CMD_ADD: ipvs_talk() already turns EEXIST on ADD into success, so the operation is idempotent and self-healing. To prevent log noise from the new EEXIST flood (one per reloaded VSG entry per reload), suppress logging when ipvs_talk() turns EEXIST or ENOENT into success (the "benign" case). Signed-off-by: Pierre Cheynier <p.cheynier@criteo.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The docs (
doc/man/man5/keepalived.conf.5.in:2280) forbid two VSes using the same VSG with the same protocol/family.The same kernel constraint - one IPVS service per
(proto, vip, port)or(proto, family, fwmark)- applies when distinct VSGs emit entries that collide on that tuple.While that should not happen, in practice (provisioning streams, churn between renders on automation system) it sometimes does, and keepalived currently lets it silently desync from the kernel and never recovers.
Typically, every reload then logs
IP_VS_SO_SET_ADDDEST(1159) ENOENT(2)/IP_VS_SO_SET_DELDEST ENOENTagainst a service the kernel no longer has but the surviving VSG still expects.These patches makes keepalived hopefully deal with that case:
ipvs: recover from missing parent service on ADDDEST/EDITDEST: when the kernel returnsENOENTon a dest add, recreate the parent service from the samesruleand retry once. No behavior change is expected on healthy reloads.ipvs: do not trust per-vsge "reloaded" mark on LVS_CMD_ADD: drop thereloadedskip andis_vsge_aliveshortcut on the ADD path;EEXISTis already idempotent inipvs_talk(), so unconditionally re-issuing it prevents the wedge from forming.Reproducer
Two VSGs (
A,B) both containing e.g.2001:db8::1 80withprotocol tcp, alive RSes.Reload after dropping the entry from
Bonly. Without the series: the kernel service for(tcp, 2001:db8::1, 80)disappears, syslog starts streamingADDDEST/DELDEST ENOENT, and the wedge survives next reloads.With patch 1 the next
ADDDESTrepairs it. With patch 2 it never forms.