You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While investigated API scaling issues we noticed that our hard code
probe configuration is not optimal for scaling nova-api. Instead of
immediately killing pods when they are not responding in 30 seconds we
should be removing pods from the load balancer first when they are
getting overloaded and let them work through their backlog and only kill
a pod if it is hanging for an excessive amount of time.
Another observation was that we allow configuring APITimeout parameter
on our routes but changing that value is not reflected in our probe
configs. So even if the customer decides that it is OK if nova-api is
responding slower by increasing the APITimeout, our probes does not
become more forgiving.
This patch changes the probe configuration of nova-api and nova-metadata
to:
* be quick to remove the pod from the load balancer if it is overloaded
via the readiness probe config
* be very forgiving about slow responses and only killing the pod if it
is hanging for a long time via the liveness probe.
* both readiness and liveness probe timeout is now scaling with the
APITimeout configuration.
Jira: OSPRH-25717
Jira: OSPRH-27192
Signed-off-by: Balazs Gibizer <gibi@redhat.com>
0 commit comments