Skip to content

Commit 48c4766

Browse files
authored
Merge branch 'master' into demo-runbook
2 parents cc09e01 + 142ed7d commit 48c4766

61 files changed

Lines changed: 1008 additions & 415 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/release.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -105,10 +105,6 @@ jobs:
105105
name: helm-chart
106106
path: helm/robusta/
107107

108-
- name: Upload helm chart
109-
run: |
110-
cd helm && ./upload_chart.sh
111-
112108
- name: Release Docker to Dockerhub
113109
run: |-
114110
docker buildx build \
@@ -118,3 +114,7 @@ jobs:
118114
--tag robustadev/robusta-runner:${{env.RELEASE_VER}} \
119115
--push \
120116
.
117+
118+
- name: Upload helm chart
119+
run: |
120+
cd helm && ./upload_chart.sh

docs/conf.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,8 @@
113113
"tutorials/playbook-failed-liveness.html": "/master/playbook-reference/kubernetes-examples/playbook-failed-liveness.html",
114114
"tutorials/playbook-track-secrets.html": "/master/playbook-reference/kubernetes-examples//playbook-track-secrets.html",
115115
"tutorials/alert-remediation.html": "/master/playbook-reference/prometheus-examples/alert-remediation.html",
116-
"tutorials/alert-custom-enrichment.html": "/master/playbook-reference/prometheus-examples/alert-custom-enrichment.html"
116+
"tutorials/alert-custom-enrichment.html": "/master/playbook-reference/prometheus-examples/alert-custom-enrichment.html",
117+
"catalog/sinks/slack.html": "/master/configuration/sinks/slack.html"
117118

118119

119120
}

docs/configuration/ai-analysis.rst

Lines changed: 43 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -223,7 +223,7 @@ To use HolmesGPT with the Robusta UI, one further step may be necessary, dependi
223223
* If you store the Robusta UI token in a Kubernetes secret, follow the instructions below.
224224

225225
Note: the same Robusta UI token is used for the Robusta UI sink and for HolmesGPT.
226-
226+
227227
Reading the Robusta UI Token from a secret in HolmesGPT
228228
************************************************************
229229

@@ -249,7 +249,7 @@ Reading the Robusta UI Token from a secret in HolmesGPT
249249
.. code-block:: yaml
250250
251251
holmes:
252-
additional_env_vars:
252+
additionalEnvVars:
253253
....
254254
- name: ROBUSTA_UI_TOKEN
255255
valueFrom:
@@ -428,3 +428,44 @@ Finally, after updating your ``generated_values.yaml``, apply the changes to you
428428
helm upgrade robusta robusta/robusta --values=generated_values.yaml --set clusterName=<YOUR_CLUSTER_NAME>
429429
430430
This will update the deployment to use the custom Docker image, which includes the new binaries. The ``toolsets`` defined in the configuration will now be available for Holmes to use, including any new binaries like ``jq``.
431+
432+
433+
Adding Permissions for Additional Resources
434+
----------------------------------------------
435+
436+
There are scenarios where HolmesGPT may require access to additional Kubernetes resources or CRDs to perform specific analyses or interact with external tools.
437+
438+
You will need to extend its ClusterRole rules whenever HolmesGPT needs to access resources that are not included in its default configuration.
439+
440+
Common Scenarios for Adding Permissions:
441+
442+
* External Integrations and CRDs: When HolmesGPT needs to access custom resources (CRDs) in your cluster, like ArgoCD Application resources or Istio VirtualService resources.
443+
* Additional Kubernetes resources: By default, Holmes can only access a limited number of Kubernetes resources. For example, Holmes has no access to Kubernetes secrets by default. You can give Holmes access to more built-in cluster resources if it is useful for your use case.
444+
445+
As an example, let's consider a case where we ask HolmesGPT to analyze the state of Argo CD applications and projects to troubleshoot issues related to application deployments managed by Argo CD, but it doesn't have access to the relevant CRDs.
446+
447+
**Steps to Add Permissions for Argo CD:**
448+
449+
1. **Update generated_values.yaml with Required Permissions:**
450+
451+
Add the following configuration under the ``customClusterRoleRules`` section:
452+
453+
.. code-block:: yaml
454+
455+
enableHolmesGPT: true
456+
holmes:
457+
customClusterRoleRules:
458+
- apiGroups: ["argoproj.io"]
459+
resources: ["applications", "appprojects"]
460+
verbs: ["get", "list", "watch"]
461+
462+
2. **Apply the Configuration:**
463+
464+
Deploy the updated configuration using Helm:
465+
466+
.. code-block:: bash
467+
468+
helm upgrade robusta robusta/robusta --values=generated_values.yaml --set clusterName=<YOUR_CLUSTER_NAME>
469+
470+
This will grant HolmesGPT the necessary permissions to analyze Argo CD applications and projects.
471+
Now you can ask HolmesGPT questions like "What is the current status of all Argo CD applications in the cluster?" and it will be able to answer.

docs/configuration/alertmanager-integration/embedded-prometheus.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,3 +50,9 @@ To allow the Grafana dashboard to persist after the Grafana instance restarts, y
5050
enabled: true
5151
5252
Apply the change by performing a :ref:`Helm Upgrade <Simple Upgrade>`.
53+
54+
Troubleshooting
55+
---------------------
56+
57+
Encountering issues with your Prometheus? Follow this guide to resolve some :ref:`common errors <Common Errors>`.
58+

docs/configuration/cluster-misconfigurations.rst

Lines changed: 0 additions & 126 deletions
This file was deleted.

docs/configuration/exporting/exporting-data.rst

Lines changed: 122 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
Alert History Import and Export API
22
===================================
33

4-
GET https://api.robusta.dev/api/alerts
4+
GET https://api.robusta.dev/api/query/alerts
55
--------------------------------------
66

77
Use this endpoint to export alert history data. You can filter the results based on specific criteria using query parameters such as ``alert_name``, ``account_id``, and time range.
@@ -149,6 +149,127 @@ Response Fields
149149
- The node where the resource is located.
150150

151151

152+
GET `https://api.robusta.dev/api/query/report`
153+
--------------------------------------
154+
155+
Use this endpoint to retrieve aggregated alert data, including the count of each type of alert during a specified time range. Filters can be applied using query parameters such as `account_id` and the time range.
156+
157+
158+
Query Parameters
159+
^^^^^^^^^^^^^^^
160+
161+
.. list-table::
162+
:widths: 20 10 70 10
163+
:header-rows: 1
164+
165+
* - Parameter
166+
- Type
167+
- Description
168+
- Required
169+
* - ``account_id``
170+
- string
171+
- The unique account identifier (found in your ``generated_values.yaml`` file).
172+
- Yes
173+
* - ``start_ts``
174+
- string
175+
- Start timestamp for the query (in ISO 8601 format, e.g., ``2024-10-27T04:02:05.032Z``).
176+
- Yes
177+
* - ``end_ts``
178+
- string
179+
- End timestamp for the query (in ISO 8601 format, e.g., ``2024-11-27T05:02:05.032Z``).
180+
- Yes
181+
182+
183+
Example Request
184+
^^^^^^^^^^^^^^^
185+
186+
The following `curl` command demonstrates how to query aggregated alert data for a specified time range:
187+
188+
.. code-block:: bash
189+
190+
curl --location 'https://api.robusta.dev/api/query/report?account_id=XXXXXX-XXXX_XXXX_XXXXX7&start_ts=2024-10-27T04:02:05.032Z&end_ts=2024-11-27T05:02:05.032Z' \
191+
--header 'Authorization: Bearer TOKEN_HERE'
192+
193+
194+
In the command, make sure to replace the following placeholders:
195+
196+
- **`account_id`**: Your account ID, which can be found in your `generated_values.yaml` file.
197+
- **`TOKEN_HERE`**: Your API token for authentication. Generate this token in the platform by navigating to **Settings** -> **API Keys** -> **New API Key**, and creating a key with the "Read Alerts" permission.
198+
199+
200+
201+
Request Headers
202+
^^^^^^^^^^^^^^^
203+
204+
.. list-table::
205+
:widths: 30 70
206+
:header-rows: 1
207+
208+
* - Header
209+
- Description
210+
* - ``Authorization``
211+
- Bearer token for authentication (e.g., ``Bearer TOKEN_HERE``). The token must have "Read Alerts" permission.
212+
213+
Response Format
214+
^^^^^^^^^^^^^^^
215+
216+
The API will return a JSON array of aggregated alerts, with each object containing:
217+
218+
- **`aggregation_key`**: The unique identifier of the alert type (e.g., `KubeJobFailed`).
219+
- **`alert_count`**: The total count of occurrences of this alert type within the specified time range.
220+
221+
Example Response
222+
^^^^^^^^^^^^^^^
223+
.. code-block:: json
224+
[
225+
{"aggregation_key": "KubeJobFailed", "alert_count": 17413},
226+
{"aggregation_key": "KubePodNotReady", "alert_count": 11893},
227+
{"aggregation_key": "KubeDeploymentReplicasMismatch", "alert_count": 2410},
228+
{"aggregation_key": "KubeDeploymentRolloutStuck", "alert_count": 923},
229+
{"aggregation_key": "KubePodCrashLooping", "alert_count": 921},
230+
{"aggregation_key": "KubeContainerWaiting", "alert_count": 752},
231+
{"aggregation_key": "PrometheusRuleFailures", "alert_count": 188},
232+
{"aggregation_key": "KubeMemoryOvercommit", "alert_count": 187},
233+
{"aggregation_key": "PrometheusOperatorRejectedResources", "alert_count": 102},
234+
{"aggregation_key": "KubeletTooManyPods", "alert_count": 94},
235+
{"aggregation_key": "NodeMemoryHighUtilization", "alert_count": 23},
236+
{"aggregation_key": "TargetDown", "alert_count": 19},
237+
{"aggregation_key": "test123", "alert_count": 7},
238+
{"aggregation_key": "KubeAggregatedAPIDown", "alert_count": 4},
239+
{"aggregation_key": "KubeAggregatedAPIErrors", "alert_count": 4},
240+
{"aggregation_key": "KubeMemoryOvercommitTEST2", "alert_count": 1},
241+
{"aggregation_key": "TestAlert", "alert_count": 1},
242+
{"aggregation_key": "TestAlert2", "alert_count": 1},
243+
{"aggregation_key": "dsafd", "alert_count": 1},
244+
{"aggregation_key": "KubeMemoryOvercommitTEST", "alert_count": 1},
245+
{"aggregation_key": "vfd", "alert_count": 1}
246+
]
247+
248+
249+
250+
Response Fields
251+
^^^^^^^^^^^^^^^
252+
.. list-table::
253+
:widths: 25 10 70
254+
:header-rows: 1
255+
256+
* - Field
257+
- Type
258+
- Description
259+
* - ``aggregation_key``
260+
- string
261+
- The unique key representing the type of alert (e.g., ``KubeJobFailed``).
262+
* - ``alert_count``
263+
- integer
264+
- The number of times this alert occurred within the specified time range.
265+
266+
Notes
267+
^^^^^^^^^^^^^^^
268+
269+
- Ensure that the `start_ts` and `end_ts` parameters are in ISO 8601 format and are correctly set to cover the desired time range.
270+
- Use the correct `Authorization` token with sufficient permissions to access the alert data.
271+
272+
152273
POST https://api.robusta.dev/api/alerts
153274
--------------------------------------
154275
Use this endpoint to send alert data to Robusta. You can send up to 1000 alerts in a single request.

docs/configuration/sinks/Opsgenie.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,12 @@ Robusta can report issues and events in your Kubernetes cluster to the OpsGenie
55

66
To configure OpsGenie, We need an OpsGenie API key. It can be configured using the OpsGenie team integration.
77

8+
Customizing Opsgenie Extra Details
9+
------------------------------------------------
10+
11+
We can add Prometheus alert labels into Opsgenie alert extra details by setting `extra_details_labels` to `true` in the `sinksConfig` section.
12+
13+
814
Configuring the OpsGenie sink
915
------------------------------------------------
1016

@@ -21,6 +27,7 @@ Configuring the OpsGenie sink
2127
- "sre"
2228
tags:
2329
- "prod a"
30+
extra_details_labels: false # optional, default is false
2431
2532
Save the file and run
2633

0 commit comments

Comments
 (0)