Skip to content

Commit 603f252

Browse files
nhermentarikalon1
andauthored
ROB-1723 datadog logs toolset (#1868)
* doc: add docs for datadog/logs toolset * doc: add docs for datadog/logs toolset * doc: fix incorrect indentation --------- Co-authored-by: arik <alon.arik@gmail.com>
1 parent 4104fa1 commit 603f252

3 files changed

Lines changed: 194 additions & 0 deletions

File tree

docs/configuration/holmesgpt/builtin_toolsets.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ Builtin Toolsets
1010
toolsets/aws
1111
toolsets/confluence
1212
toolsets/coralogix_logs
13+
toolsets/datadog_logs
1314
toolsets/datetime
1415
toolsets/docker
1516
toolsets/grafanaloki
@@ -62,6 +63,11 @@ by the user by providing credentials or API keys to external systems.
6263
:link: toolsets/coralogix_logs
6364
:link-type: doc
6465

66+
.. grid-item-card:: :octicon:`cpu;1em;` Datadog logs
67+
:class-card: sd-bg-light sd-bg-text-light
68+
:link: toolsets/datadog_logs
69+
:link-type: doc
70+
6571
.. grid-item-card:: :octicon:`cpu;1em;` Datetime
6672
:class-card: sd-bg-light sd-bg-text-light
6773
:link: toolsets/datetime

docs/configuration/holmesgpt/toolsets/_toolsets_that_provide_logging.inc.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,5 +2,6 @@ HolmesGPT provides several out-of-the-box alternatives for log access. You can s
22

33
* :ref:`kubernetes/logs <toolset_kubernetes_logs>`: Access logs directly through Kubernetes. **This is the default toolset.**
44
* :ref:`coralogix/logs <toolset_coralogix_logs>`: Access logs through Coralogix.
5+
* :ref:`datadog/logs <toolset_datadog_logs>`: Access logs through Datadog.
56
* :ref:`grafana/loki <toolset_grafana_loki>`: Access Loki logs by proxying through a Grafana instance.
67
* :ref:`opensearch/logs <toolset_opensearch_logs>`: Access logs through OpenSearch.
Lines changed: 187 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,187 @@
1+
.. _toolset_datadog_logs:
2+
3+
Datadog logs
4+
============
5+
6+
By enabling this toolset, HolmesGPT will fetch pod logs from `Datadog <https://www.datadoghq.com/>`_.
7+
8+
You **should** enable this toolset to replace the default :ref:`kubernetes/logs <toolset_kubernetes_logs>`
9+
toolset if all your kubernetes pod logs are consolidated inside Datadog. It will make it easier for HolmesGPT
10+
to fetch incident logs, including the ability to precisely consult past logs.
11+
12+
13+
.. include:: ./_toolsets_that_provide_logging.inc.rst
14+
15+
Configuration
16+
^^^^^^^^^^^^^
17+
18+
.. md-tab-set::
19+
20+
.. md-tab-item:: Robusta Helm Chart
21+
22+
.. code-block:: yaml
23+
24+
holmes:
25+
toolsets:
26+
datadog/logs:
27+
enabled: true
28+
config:
29+
dd_api_key: <your-datadog-api-key> # Required. Your Datadog API key
30+
dd_app_key: <your-datadog-app-key> # Required. Your Datadog Application key
31+
site_api_url: https://api.datadoghq.com # Required. Your Datadog site URL (e.g. https://api.us3.datadoghq.com for US3)
32+
indexes: ["*"] # Optional. List of Datadog indexes to search. Default: ["*"]
33+
storage_tiers: ["indexes"] # Optional. Ordered list of storage tiers to query (fallback mechanism). Options: "indexes", "online-archives", "flex". Default: ["indexes"]
34+
labels: # Optional. Map Datadog labels to Kubernetes resources
35+
pod: "pod_name"
36+
namespace: "kube_namespace"
37+
page_size: 300 # Optional. Number of logs per API page. Default: 300
38+
default_limit: 1000 # Optional. Default maximum logs to fetch when limit not specified by the LLM. Default: 1000
39+
request_timeout: 60 # Optional. API request timeout in seconds. Default: 60
40+
41+
kubernetes/logs:
42+
enabled: false # HolmesGPT's default logging mechanism MUST be disabled
43+
44+
45+
.. include:: ./_toolset_configuration.inc.rst
46+
47+
.. md-tab-item:: Holmes CLI
48+
49+
Add the following to **~/.holmes/config.yaml**, creating the file if it doesn't exist:
50+
51+
.. code-block:: yaml
52+
53+
toolsets:
54+
datadog/logs:
55+
enabled: true
56+
config:
57+
dd_api_key: <your-datadog-api-key> # Required. Your Datadog API key
58+
dd_app_key: <your-datadog-app-key> # Required. Your Datadog Application key
59+
site_api_url: https://api.datadoghq.com # Required. Your Datadog site URL (e.g. https://api.us3.datadoghq.com for US3)
60+
indexes: ["*"] # Optional. List of Datadog indexes to search. Default: ["*"]
61+
storage_tiers: ["indexes"] # Optional. Ordered list of storage tiers to query (fallback mechanism). Options: "indexes", "online-archives", "flex". Default: ["indexes"]
62+
labels: # Optional. Map Datadog labels to Kubernetes resources
63+
pod: "pod_name"
64+
namespace: "kube_namespace"
65+
page_size: 300 # Optional. Number of logs per API page. Default: 300
66+
default_limit: 1000 # Optional. Default maximum logs to fetch when limit not specified by the LLM. Default: 1000
67+
request_timeout: 60 # Optional. API request timeout in seconds. Default: 60
68+
69+
kubernetes/logs:
70+
enabled: false # HolmesGPT's default logging mechanism MUST be disabled
71+
72+
Getting API and Application Keys
73+
********************************
74+
75+
To use this toolset, you need both a Datadog API key and Application key:
76+
77+
1. **API Key**: Go to Organization Settings > API Keys in your Datadog console
78+
79+
* The API key must have the ``logs_read_data`` permission scope
80+
* When creating a new key, ensure this permission is enabled
81+
82+
2. **Application Key**: Go to Organization Settings > Application Keys in your Datadog console
83+
84+
For more information, see the `Datadog API documentation <https://docs.datadoghq.com/api/latest/authentication/>`_.
85+
86+
Configuring Site URL
87+
********************
88+
89+
The ``site_api_url`` must match your Datadog site. Common values include:
90+
91+
* ``https://api.datadoghq.com`` - US1
92+
* ``https://api.us3.datadoghq.com`` - US3
93+
* ``https://api.us5.datadoghq.com`` - US5
94+
* ``https://api.datadoghq.eu`` - EU
95+
* ``https://api.ap1.datadoghq.com`` - AP1
96+
97+
For a complete list of site URLs, see the `Datadog site documentation <https://docs.datadoghq.com/getting_started/site/>`_.
98+
99+
Configuring Storage Tiers
100+
*************************
101+
102+
Datadog offers different storage tiers for logs with varying retention and costs:
103+
104+
.. list-table::
105+
:header-rows: 1
106+
:widths: 20 40 40
107+
108+
* - Storage Tier
109+
- Description
110+
- Use Case
111+
* - indexes
112+
- Hot storage for recent logs (default)
113+
- Real-time analysis and alerting
114+
* - online-archives
115+
- Warm storage for older logs
116+
- Historical log analysis
117+
* - flex
118+
- Cost-effective storage
119+
- Long-term retention
120+
121+
The toolset uses storage tiers as a fallback mechanism. Subsequent tiers are queried only if the previous tier yielded no result.
122+
For example if the toolset is configured with storage_tiers ``["indexes", "online-archives"]``, then:
123+
124+
* Holmes first runs a query using storage_tier ``indexes``
125+
* If there are no results at all, Holmes will then query ``online-archives``
126+
127+
Handling Rate Limits
128+
********************
129+
130+
If you encounter rate limiting issues with Datadog (visible as warning messages in Holmes logs), you can adjust the following parameters:
131+
132+
* **page_size**: Reduce this value to fetch fewer logs per API request. This helps avoid hitting rate limits on individual requests.
133+
* **default_limit**: Lower this value to reduce the total number of logs fetched when no explicit limit is specified.
134+
135+
Example configuration for rate-limited environments:
136+
137+
.. code-block:: yaml
138+
139+
toolsets:
140+
datadog/logs:
141+
enabled: true
142+
config:
143+
page_size: 100 # Reduced from default 300
144+
default_limit: 500 # Reduced from default 1000
145+
146+
When rate limiting occurs, Holmes will automatically retry with exponential backoff. You'll see warnings like:
147+
``DataDog logs toolset is rate limited/throttled. Waiting X.Xs until reset time``
148+
149+
Configuring Labels
150+
******************
151+
152+
You can customize the labels used by the toolset to identify Kubernetes resources. This is **optional** and only needed if your
153+
Datadog logs use different field names than the defaults.
154+
155+
.. code-block:: yaml
156+
157+
toolsets:
158+
datadog/logs:
159+
enabled: true
160+
config:
161+
labels:
162+
pod: "pod_name" # The field name for Kubernetes pod name in your Datadog logs
163+
namespace: "kube_namespace" # The field name for Kubernetes namespace in your Datadog logs
164+
165+
To find the correct field names in your Datadog logs:
166+
167+
1. Go to Logs > Search in your Datadog console
168+
2. View a sample log entry
169+
3. Identify the field names used for pod name and namespace
170+
4. Update the labels configuration accordingly
171+
172+
.. include:: ./_disable_default_logging_toolset.inc.rst
173+
174+
175+
Capabilities
176+
^^^^^^^^^^^^
177+
178+
.. include:: ./_toolset_capabilities.inc.rst
179+
180+
.. list-table::
181+
:header-rows: 1
182+
:widths: 30 70
183+
184+
* - Tool Name
185+
- Description
186+
* - fetch_pod_logs
187+
- Retrieve logs from Datadog with support for filtering, time ranges, and multiple storage tiers

0 commit comments

Comments
 (0)