Commit 4242c15
committed
fix: eliminate per-scrape Kubernetes API call causing 15s+ metrics latency
Collect() was calling containerLister.Update() on every Prometheus scrape,
which issues a synchronous cluster-wide Pods().List() against the API server.
The watchAndFeedback goroutine already refreshes container state every 5s,
so Collect() only needs to read cached data.
Also remove nvml.Init() from Collect() — NVML is already initialized in
watchAndFeedback and should only be called once at startup.
Fix a data race in ListContainers(): Update() holds the mutex while writing
the containers map, but the previous ListContainers() returned the raw map
with no lock. It now returns a snapshot under the mutex.
Co-Authored-By: charford <casey@caseyharford.com>
Signed-off-by: charford <casey@caseyharford.com>1 parent 51ddae6 commit 4242c15
2 files changed
Lines changed: 7 additions & 8 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
139 | 139 | | |
140 | 140 | | |
141 | 141 | | |
142 | | - | |
143 | | - | |
144 | | - | |
145 | 142 | | |
146 | | - | |
147 | | - | |
148 | | - | |
149 | | - | |
150 | 143 | | |
151 | 144 | | |
152 | 145 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
112 | 112 | | |
113 | 113 | | |
114 | 114 | | |
115 | | - | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
116 | 122 | | |
117 | 123 | | |
118 | 124 | | |
| |||
0 commit comments