Skip to content

Commit edd7144

Browse files
authored
Fix discovery bug that failed when there are many jobs that change frequently (#1938)
Use new Kubewatch advanced filters by default to improve performance Fix alert export api docs
1 parent b066ce1 commit edd7144

4 files changed

Lines changed: 81 additions & 5 deletions

File tree

docs/configuration/exporting/alert-export-api.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,10 @@ Query Parameters
3838
- string
3939
- The name of the alert to filter by (e.g., ``CrashLoopBackoff``).
4040
- No
41+
* - ``namespace``
42+
- string
43+
- The namespace of the alert to filter by (e.g., ``monitoring``).
44+
- No
4145

4246
Example Request
4347
^^^^^^^^^^^^^^^^^^^^^^^^^

docs/playbook-reference/triggers/kubernetes.rst

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -261,6 +261,21 @@ Single Resource Triggers
261261

262262
For triggers that fire only on Pod errors, see :ref:`Crashing Pod Triggers`.
263263

264+
.. note::
265+
266+
By default, Robusta processes only **Pod** change events that are related to failures or modifications in the **Pod** spec.
267+
Other types of **Pod** changes are considered less relevant and are filtered out to reduce noise.
268+
269+
To process **all Pod** change events, add the following configuration to your `generated_values.yaml` file.
270+
271+
.. code-block:: yaml
272+
273+
kubewatch:
274+
additional_env_vars:
275+
- name: ADVANCED_FILTERS
276+
value: "false"
277+
278+
264279
.. jinja::
265280
:inline-ctx: { "resource_name" : "ReplicaSet", "related_actions" : ["related_pods"] }
266281
:header_update_levels:
@@ -296,6 +311,21 @@ For triggers that fire only on Pod errors, see :ref:`Crashing Pod Triggers`.
296311
:header_update_levels:
297312
:file: playbook-reference/triggers/_k8s-generic-triggers.jinja
298313

314+
.. note::
315+
316+
By default, Robusta processes only **Event Create** events with the type ``Warning``.
317+
**Events** with the type ``Normal`` are considered less relevant and are filtered out to reduce noise,
318+
except for ``Normal`` events that indicate Pod evictions.
319+
320+
To process all Kubernetes **Event** change events, add the following configuration to your ``generated_values.yaml`` file.
321+
322+
.. code-block:: yaml
323+
324+
kubewatch:
325+
additional_env_vars:
326+
- name: ADVANCED_FILTERS
327+
value: "false"
328+
299329
.. jinja::
300330
:inline-ctx: { "resource_name" : "HorizontalPodAutoscaler", "related_actions" : [] }
301331
:header_update_levels:
@@ -321,6 +351,20 @@ For triggers that fire only on Pod errors, see :ref:`Crashing Pod Triggers`.
321351
:header_update_levels:
322352
:file: playbook-reference/triggers/_k8s-generic-triggers.jinja
323353

354+
.. note::
355+
356+
By default, Robusta processes only **Job** change events that are related to failures or modifications in the **Job** spec.
357+
Other types of **Job** changes are considered less relevant and are filtered out to reduce noise.
358+
359+
To process **all Job** change events, add the following configuration to your ``generated_values.yaml`` file.
360+
361+
.. code-block:: yaml
362+
363+
kubewatch:
364+
additional_env_vars:
365+
- name: ADVANCED_FILTERS
366+
value: "false"
367+
324368
.. jinja::
325369
:inline-ctx: { "resource_name" : "Namespace", "related_actions" : [] }
326370
:header_update_levels:

helm/robusta/values.yaml

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -455,6 +455,19 @@ builtinPlaybooks:
455455
enablePlatformPlaybooks: false
456456

457457
platformPlaybooks:
458+
- name: "PodEvictionReport"
459+
triggers:
460+
- on_event_create:
461+
scope:
462+
include:
463+
- attributes:
464+
- "reason=Evicted"
465+
actions:
466+
- create_event_finding:
467+
aggregation_key: "PodEviction"
468+
- event_resource_events: {}
469+
sinks:
470+
- "robusta_ui_sink"
458471
- name: "K8sWarningEventsReport"
459472
triggers:
460473
- on_kubernetes_warning_event_create:
@@ -590,7 +603,7 @@ image:
590603
# parameters for the robusta forwarder deployment
591604
kubewatch:
592605
image: ~ # image can be used to override image.registry/imageName
593-
imageName: kubewatch:v2.11.0
606+
imageName: kubewatch:v2.12.0
594607
imagePullPolicy: IfNotPresent
595608
revisionHistoryLimit: 10
596609
pprof: True
@@ -600,7 +613,9 @@ kubewatch:
600613
memory: 512Mi
601614
limits:
602615
cpu: ~
603-
additional_env_vars: []
616+
additional_env_vars:
617+
- name: ADVANCED_FILTERS
618+
value: "true"
604619
priorityClassName: ""
605620
tolerations: []
606621
annotations: {}

src/robusta/core/discovery/discovery.py

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -649,9 +649,22 @@ def discovery_process() -> DiscoveryResults:
649649
try:
650650
continue_ref: Optional[str] = None
651651
for _ in range(DISCOVERY_MAX_BATCHES):
652-
current_jobs: V1JobList = client.BatchV1Api().list_job_for_all_namespaces(
653-
limit=DISCOVERY_BATCH_SIZE, _continue=continue_ref
654-
)
652+
try:
653+
current_jobs: V1JobList = client.BatchV1Api().list_job_for_all_namespaces(
654+
limit=DISCOVERY_BATCH_SIZE, _continue=continue_ref
655+
)
656+
except ApiException as e:
657+
if e.status == 410 and e.body:
658+
# Continue token expired, extract new token from error and continue
659+
import json
660+
error_body = json.loads(e.body)
661+
new_continue_token = error_body.get("metadata", {}).get("continue")
662+
if new_continue_token:
663+
logging.info("Continue token expired for jobs listing. Continuing")
664+
continue_ref = new_continue_token
665+
continue
666+
raise
667+
655668
for job in current_jobs.items:
656669
job_pods = []
657670
job_labels = {}

0 commit comments

Comments
 (0)