-
Notifications
You must be signed in to change notification settings - Fork 640
Fix NodeNames aggregation in AWS jobs #3843
Copy link
Copy link
Open
Labels
kind/bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.Denotes an issue or PR has remained open with no activity and has become stale.
Metadata
Metadata
Assignees
Labels
kind/bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.Denotes an issue or PR has remained open with no activity and has become stale.
Type
Fields
Give feedbackNo fields configured for issues without a type.
Context
In the GCE jobs, the nodes are given well-defined names (like “us-east-1-a”) and so the per-node metrics have well-defined labels, so they can easily be compared across days. Whereas in the AWS job, the nodes seem to have randomly-assigned names, and so the metrics labels end up being different every day so we can't compare them as easily.
Example:
name: control-plane-us-east1-b-jlvv,control-plane-us-east1-b-smv6GCE tests truncate the last 4 characters when aggregating.What you expected to happen:
The node metrics don't have consistent trends over time.
As a result, overtime GCE tests have a trend
https://perf-dash.k8s.io/#/?jobname=gce-5000Nodes&metriccategoryname=E2E&metricname=LoadResources&PodName=kube-proxy-nodes-us-east1-b%2Fkube-proxy&Resource=memory

AWS jobs don't have a trend because they are using different node names.
https://perf-dash.k8s.io/#/?jobname=aws-5000Nodes&metriccategoryname=E2E&metricname=LoadResources&PodName=kube-proxy-i-094de2c600c60d0bb%2Fkube-proxy&Resource=memory
Context in sig-scalability: https://kubernetes.slack.com/archives/C09QZTRH7/p1771436311523919