Skip to content

Commit 4a16584

Browse files
authored
Merge branch 'master' into custom_recource_monitoring
2 parents 9541991 + 04282fa commit 4a16584

12 files changed

Lines changed: 338 additions & 8 deletions

docs/configuration/alertmanager-integration/grafana-alert-manager.rst

Lines changed: 84 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,101 @@
11
Grafana AlertManager
22
****************************************
33

4-
Grafana can send alerts to Robusta for automatic enrichment and visualization.
4+
Grafana can send alerts to the Robusta timeline for visualization and AI investigation.
55

66
.. image:: /images/grafana-docs-robusta-ui.png
77
:width: 600
88
:align: center
99

1010

11-
This guide only covers integrating alerts from Grafana Alerting with Robusta, not configuring Robusta to query metrics from the relevant Grafana data source.
11+
This guide only covers sending alerts from Grafana Alerting to the Robusta timeline.
12+
If you'd like Robusta to also query metrics from Grafana, refer to general :ref:`metrics-integration docs for Prometheus <Integrating with Prometheus>`.
1213

13-
After completing this tutorial, we recommend configuring a metrics integration according to the :ref:`standard instructions for your metrics backend <Integrating with Prometheus>`
1414

15-
Prerequisite
15+
Send Alerts to Robusta's Timeline
16+
===========================================
17+
18+
This integration lets you send Grafana alerts to Robusta's Timeline. To configure it:
19+
20+
1. Get your Robusta ``account_id`` from your ``generated_values.yaml`` file. It appears under the ``globalConfig`` section.
21+
22+
2. Create an ``api key``
23+
24+
In the Robusta UI, navigate to the ``settings`` page, and select the ``API Keys`` tab.
25+
26+
.. image:: /images/robusta-api-keys.png
27+
:width: 600
28+
:align: center
29+
30+
31+
Click ``New API Key``. Select a name for your key, and check the ``Alerts Write`` capability.
32+
Generate and save your new ``API Key``
33+
34+
.. image:: /images/new-api-key.png
35+
:width: 600
36+
:align: center
37+
38+
39+
3. In the Grafana UI, navigate to the ``Alerting`` tab, click on ``Manage Contact Points``, and then ``Create contact point``.
40+
41+
Select ``Webhook`` from the Integration options.
42+
Add the following URL. Add your ``account_id`` to it:
43+
44+
.. code-block::
45+
46+
https://api.robusta.dev/integrations/alerts/grafana?account_id=YOUR_ACCOUNT_ID
47+
48+
.. image:: /images/robusta-contact-point-1.png
49+
:width: 600
50+
:align: center
51+
52+
On the ``Optional Webhook settings`` add your ``API Key`` in the ``Bearer Token`` field:
53+
54+
.. image:: /images/robusta-contact-point-2.png
55+
:width: 600
56+
:align: center
57+
58+
Lastly, on the ``Notification settings``, check the ``Send resolved`` checkbox:
59+
60+
.. image:: /images/grafana-send-resolved.png
61+
:width: 600
62+
:align: center
63+
64+
Click the ``Test`` button. If successful, you will receive a notification in the Robusta UI under the ``external`` cluster.
65+
66+
Save your new ``Contact Point``
67+
68+
4. Create a new ``Notification Policy``. Navigate to ``Alerting`` tab, and click ``Manage notification policies``
69+
Create a new policy.
70+
71+
Add a policy without matchers, that handles all alerts. Disable grouping, by specifying ``Group By = ...``
72+
73+
.. image:: /images/robusta-new-notification-policy.png
74+
:width: 600
75+
:align: center
76+
77+
78+
Save your new ``Notification Policy``
79+
80+
81+
That's it!
82+
83+
You can now see your Grafana alerts in the Robusta Timeline, and use AI to analyze it.
84+
85+
86+
Kubernetes Alerts
1687
=================================
17-
* A label in the following format ``cluster_name: YourClusterName`` added to each alert, with the cluster name as it appears in Robusta ``generated_values.yaml``.
88+
In case your alerts are from a Kubernetes cluster monitored by Robusta, and your alerts has a ``cluster`` label, make sure it matches the ``cluster_name`` that appears in Robusta ``generated_values.yaml``.
89+
90+
** This is optional - you can send any alert to the Robusta timeline! **
91+
92+
93+
Send Alerts to Robusta for enrichments
94+
===================================================================
1895

19-
Send Alerts to Robusta
20-
============================
96+
You can use Robusta to enrich alerts with extra context, and to route it to other systems as well.
2197

22-
This integration lets you send Grafana alerts to Robusta.
98+
If you'd like to do that, this integration is for you.
2399

24100
To configure it:
25101

docs/configuration/holmesgpt/builtin_toolsets.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ Builtin Toolsets
1010
toolsets/aws
1111
toolsets/confluence
1212
toolsets/coralogix_logs
13+
toolsets/datadog_logs
1314
toolsets/datetime
1415
toolsets/docker
1516
toolsets/grafanaloki
@@ -62,6 +63,11 @@ by the user by providing credentials or API keys to external systems.
6263
:link: toolsets/coralogix_logs
6364
:link-type: doc
6465

66+
.. grid-item-card:: :octicon:`cpu;1em;` Datadog logs
67+
:class-card: sd-bg-light sd-bg-text-light
68+
:link: toolsets/datadog_logs
69+
:link-type: doc
70+
6571
.. grid-item-card:: :octicon:`cpu;1em;` Datetime
6672
:class-card: sd-bg-light sd-bg-text-light
6773
:link: toolsets/datetime

docs/configuration/holmesgpt/toolsets/_toolsets_that_provide_logging.inc.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,5 +2,6 @@ HolmesGPT provides several out-of-the-box alternatives for log access. You can s
22

33
* :ref:`kubernetes/logs <toolset_kubernetes_logs>`: Access logs directly through Kubernetes. **This is the default toolset.**
44
* :ref:`coralogix/logs <toolset_coralogix_logs>`: Access logs through Coralogix.
5+
* :ref:`datadog/logs <toolset_datadog_logs>`: Access logs through Datadog.
56
* :ref:`grafana/loki <toolset_grafana_loki>`: Access Loki logs by proxying through a Grafana instance.
67
* :ref:`opensearch/logs <toolset_opensearch_logs>`: Access logs through OpenSearch.
Lines changed: 187 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,187 @@
1+
.. _toolset_datadog_logs:
2+
3+
Datadog logs
4+
============
5+
6+
By enabling this toolset, HolmesGPT will fetch pod logs from `Datadog <https://www.datadoghq.com/>`_.
7+
8+
You **should** enable this toolset to replace the default :ref:`kubernetes/logs <toolset_kubernetes_logs>`
9+
toolset if all your kubernetes pod logs are consolidated inside Datadog. It will make it easier for HolmesGPT
10+
to fetch incident logs, including the ability to precisely consult past logs.
11+
12+
13+
.. include:: ./_toolsets_that_provide_logging.inc.rst
14+
15+
Configuration
16+
^^^^^^^^^^^^^
17+
18+
.. md-tab-set::
19+
20+
.. md-tab-item:: Robusta Helm Chart
21+
22+
.. code-block:: yaml
23+
24+
holmes:
25+
toolsets:
26+
datadog/logs:
27+
enabled: true
28+
config:
29+
dd_api_key: <your-datadog-api-key> # Required. Your Datadog API key
30+
dd_app_key: <your-datadog-app-key> # Required. Your Datadog Application key
31+
site_api_url: https://api.datadoghq.com # Required. Your Datadog site URL (e.g. https://api.us3.datadoghq.com for US3)
32+
indexes: ["*"] # Optional. List of Datadog indexes to search. Default: ["*"]
33+
storage_tiers: ["indexes"] # Optional. Ordered list of storage tiers to query (fallback mechanism). Options: "indexes", "online-archives", "flex". Default: ["indexes"]
34+
labels: # Optional. Map Datadog labels to Kubernetes resources
35+
pod: "pod_name"
36+
namespace: "kube_namespace"
37+
page_size: 300 # Optional. Number of logs per API page. Default: 300
38+
default_limit: 1000 # Optional. Default maximum logs to fetch when limit not specified by the LLM. Default: 1000
39+
request_timeout: 60 # Optional. API request timeout in seconds. Default: 60
40+
41+
kubernetes/logs:
42+
enabled: false # HolmesGPT's default logging mechanism MUST be disabled
43+
44+
45+
.. include:: ./_toolset_configuration.inc.rst
46+
47+
.. md-tab-item:: Holmes CLI
48+
49+
Add the following to **~/.holmes/config.yaml**, creating the file if it doesn't exist:
50+
51+
.. code-block:: yaml
52+
53+
toolsets:
54+
datadog/logs:
55+
enabled: true
56+
config:
57+
dd_api_key: <your-datadog-api-key> # Required. Your Datadog API key
58+
dd_app_key: <your-datadog-app-key> # Required. Your Datadog Application key
59+
site_api_url: https://api.datadoghq.com # Required. Your Datadog site URL (e.g. https://api.us3.datadoghq.com for US3)
60+
indexes: ["*"] # Optional. List of Datadog indexes to search. Default: ["*"]
61+
storage_tiers: ["indexes"] # Optional. Ordered list of storage tiers to query (fallback mechanism). Options: "indexes", "online-archives", "flex". Default: ["indexes"]
62+
labels: # Optional. Map Datadog labels to Kubernetes resources
63+
pod: "pod_name"
64+
namespace: "kube_namespace"
65+
page_size: 300 # Optional. Number of logs per API page. Default: 300
66+
default_limit: 1000 # Optional. Default maximum logs to fetch when limit not specified by the LLM. Default: 1000
67+
request_timeout: 60 # Optional. API request timeout in seconds. Default: 60
68+
69+
kubernetes/logs:
70+
enabled: false # HolmesGPT's default logging mechanism MUST be disabled
71+
72+
Getting API and Application Keys
73+
********************************
74+
75+
To use this toolset, you need both a Datadog API key and Application key:
76+
77+
1. **API Key**: Go to Organization Settings > API Keys in your Datadog console
78+
79+
* The API key must have the ``logs_read_data`` permission scope
80+
* When creating a new key, ensure this permission is enabled
81+
82+
2. **Application Key**: Go to Organization Settings > Application Keys in your Datadog console
83+
84+
For more information, see the `Datadog API documentation <https://docs.datadoghq.com/api/latest/authentication/>`_.
85+
86+
Configuring Site URL
87+
********************
88+
89+
The ``site_api_url`` must match your Datadog site. Common values include:
90+
91+
* ``https://api.datadoghq.com`` - US1
92+
* ``https://api.us3.datadoghq.com`` - US3
93+
* ``https://api.us5.datadoghq.com`` - US5
94+
* ``https://api.datadoghq.eu`` - EU
95+
* ``https://api.ap1.datadoghq.com`` - AP1
96+
97+
For a complete list of site URLs, see the `Datadog site documentation <https://docs.datadoghq.com/getting_started/site/>`_.
98+
99+
Configuring Storage Tiers
100+
*************************
101+
102+
Datadog offers different storage tiers for logs with varying retention and costs:
103+
104+
.. list-table::
105+
:header-rows: 1
106+
:widths: 20 40 40
107+
108+
* - Storage Tier
109+
- Description
110+
- Use Case
111+
* - indexes
112+
- Hot storage for recent logs (default)
113+
- Real-time analysis and alerting
114+
* - online-archives
115+
- Warm storage for older logs
116+
- Historical log analysis
117+
* - flex
118+
- Cost-effective storage
119+
- Long-term retention
120+
121+
The toolset uses storage tiers as a fallback mechanism. Subsequent tiers are queried only if the previous tier yielded no result.
122+
For example if the toolset is configured with storage_tiers ``["indexes", "online-archives"]``, then:
123+
124+
* Holmes first runs a query using storage_tier ``indexes``
125+
* If there are no results at all, Holmes will then query ``online-archives``
126+
127+
Handling Rate Limits
128+
********************
129+
130+
If you encounter rate limiting issues with Datadog (visible as warning messages in Holmes logs), you can adjust the following parameters:
131+
132+
* **page_size**: Reduce this value to fetch fewer logs per API request. This helps avoid hitting rate limits on individual requests.
133+
* **default_limit**: Lower this value to reduce the total number of logs fetched when no explicit limit is specified.
134+
135+
Example configuration for rate-limited environments:
136+
137+
.. code-block:: yaml
138+
139+
toolsets:
140+
datadog/logs:
141+
enabled: true
142+
config:
143+
page_size: 100 # Reduced from default 300
144+
default_limit: 500 # Reduced from default 1000
145+
146+
When rate limiting occurs, Holmes will automatically retry with exponential backoff. You'll see warnings like:
147+
``DataDog logs toolset is rate limited/throttled. Waiting X.Xs until reset time``
148+
149+
Configuring Labels
150+
******************
151+
152+
You can customize the labels used by the toolset to identify Kubernetes resources. This is **optional** and only needed if your
153+
Datadog logs use different field names than the defaults.
154+
155+
.. code-block:: yaml
156+
157+
toolsets:
158+
datadog/logs:
159+
enabled: true
160+
config:
161+
labels:
162+
pod: "pod_name" # The field name for Kubernetes pod name in your Datadog logs
163+
namespace: "kube_namespace" # The field name for Kubernetes namespace in your Datadog logs
164+
165+
To find the correct field names in your Datadog logs:
166+
167+
1. Go to Logs > Search in your Datadog console
168+
2. View a sample log entry
169+
3. Identify the field names used for pod name and namespace
170+
4. Update the labels configuration accordingly
171+
172+
.. include:: ./_disable_default_logging_toolset.inc.rst
173+
174+
175+
Capabilities
176+
^^^^^^^^^^^^
177+
178+
.. include:: ./_toolset_capabilities.inc.rst
179+
180+
.. list-table::
181+
:header-rows: 1
182+
:widths: 30 70
183+
184+
* - Tool Name
185+
- Description
186+
* - fetch_pod_logs
187+
- Retrieve logs from Datadog with support for filtering, time ranges, and multiple storage tiers
12.4 KB
Loading

docs/images/new-api-key.png

38.6 KB
Loading

docs/images/robusta-api-keys.png

51.8 KB
Loading
36.4 KB
Loading
67.3 KB
Loading
45.7 KB
Loading

0 commit comments

Comments
 (0)