|
| 1 | +.. _toolset_datadog_logs: |
| 2 | + |
| 3 | +Datadog logs |
| 4 | +============ |
| 5 | + |
| 6 | +By enabling this toolset, HolmesGPT will fetch pod logs from `Datadog <https://www.datadoghq.com/>`_. |
| 7 | + |
| 8 | +You **should** enable this toolset to replace the default :ref:`kubernetes/logs <toolset_kubernetes_logs>` |
| 9 | +toolset if all your kubernetes pod logs are consolidated inside Datadog. It will make it easier for HolmesGPT |
| 10 | +to fetch incident logs, including the ability to precisely consult past logs. |
| 11 | + |
| 12 | + |
| 13 | +.. include:: ./_toolsets_that_provide_logging.inc.rst |
| 14 | + |
| 15 | +Configuration |
| 16 | +^^^^^^^^^^^^^ |
| 17 | + |
| 18 | +.. md-tab-set:: |
| 19 | + |
| 20 | + .. md-tab-item:: Robusta Helm Chart |
| 21 | + |
| 22 | + .. code-block:: yaml |
| 23 | +
|
| 24 | + holmes: |
| 25 | + toolsets: |
| 26 | + datadog/logs: |
| 27 | + enabled: true |
| 28 | + config: |
| 29 | + dd_api_key: <your-datadog-api-key> # Required. Your Datadog API key |
| 30 | + dd_app_key: <your-datadog-app-key> # Required. Your Datadog Application key |
| 31 | + site_api_url: https://api.datadoghq.com # Required. Your Datadog site URL (e.g. https://api.us3.datadoghq.com for US3) |
| 32 | + indexes: ["*"] # Optional. List of Datadog indexes to search. Default: ["*"] |
| 33 | + storage_tiers: ["indexes"] # Optional. Ordered list of storage tiers to query (fallback mechanism). Options: "indexes", "online-archives", "flex". Default: ["indexes"] |
| 34 | + labels: # Optional. Map Datadog labels to Kubernetes resources |
| 35 | + pod: "pod_name" |
| 36 | + namespace: "kube_namespace" |
| 37 | + page_size: 300 # Optional. Number of logs per API page. Default: 300 |
| 38 | + default_limit: 1000 # Optional. Default maximum logs to fetch when limit not specified by the LLM. Default: 1000 |
| 39 | + request_timeout: 60 # Optional. API request timeout in seconds. Default: 60 |
| 40 | +
|
| 41 | + kubernetes/logs: |
| 42 | + enabled: false # HolmesGPT's default logging mechanism MUST be disabled |
| 43 | +
|
| 44 | +
|
| 45 | + .. include:: ./_toolset_configuration.inc.rst |
| 46 | + |
| 47 | + .. md-tab-item:: Holmes CLI |
| 48 | + |
| 49 | + Add the following to **~/.holmes/config.yaml**, creating the file if it doesn't exist: |
| 50 | + |
| 51 | + .. code-block:: yaml |
| 52 | +
|
| 53 | + toolsets: |
| 54 | + datadog/logs: |
| 55 | + enabled: true |
| 56 | + config: |
| 57 | + dd_api_key: <your-datadog-api-key> # Required. Your Datadog API key |
| 58 | + dd_app_key: <your-datadog-app-key> # Required. Your Datadog Application key |
| 59 | + site_api_url: https://api.datadoghq.com # Required. Your Datadog site URL (e.g. https://api.us3.datadoghq.com for US3) |
| 60 | + indexes: ["*"] # Optional. List of Datadog indexes to search. Default: ["*"] |
| 61 | + storage_tiers: ["indexes"] # Optional. Ordered list of storage tiers to query (fallback mechanism). Options: "indexes", "online-archives", "flex". Default: ["indexes"] |
| 62 | + labels: # Optional. Map Datadog labels to Kubernetes resources |
| 63 | + pod: "pod_name" |
| 64 | + namespace: "kube_namespace" |
| 65 | + page_size: 300 # Optional. Number of logs per API page. Default: 300 |
| 66 | + default_limit: 1000 # Optional. Default maximum logs to fetch when limit not specified by the LLM. Default: 1000 |
| 67 | + request_timeout: 60 # Optional. API request timeout in seconds. Default: 60 |
| 68 | +
|
| 69 | + kubernetes/logs: |
| 70 | + enabled: false # HolmesGPT's default logging mechanism MUST be disabled |
| 71 | +
|
| 72 | +Getting API and Application Keys |
| 73 | +******************************** |
| 74 | + |
| 75 | +To use this toolset, you need both a Datadog API key and Application key: |
| 76 | + |
| 77 | +1. **API Key**: Go to Organization Settings > API Keys in your Datadog console |
| 78 | + |
| 79 | + * The API key must have the ``logs_read_data`` permission scope |
| 80 | + * When creating a new key, ensure this permission is enabled |
| 81 | + |
| 82 | +2. **Application Key**: Go to Organization Settings > Application Keys in your Datadog console |
| 83 | + |
| 84 | +For more information, see the `Datadog API documentation <https://docs.datadoghq.com/api/latest/authentication/>`_. |
| 85 | + |
| 86 | +Configuring Site URL |
| 87 | +******************** |
| 88 | + |
| 89 | +The ``site_api_url`` must match your Datadog site. Common values include: |
| 90 | + |
| 91 | +* ``https://api.datadoghq.com`` - US1 |
| 92 | +* ``https://api.us3.datadoghq.com`` - US3 |
| 93 | +* ``https://api.us5.datadoghq.com`` - US5 |
| 94 | +* ``https://api.datadoghq.eu`` - EU |
| 95 | +* ``https://api.ap1.datadoghq.com`` - AP1 |
| 96 | + |
| 97 | +For a complete list of site URLs, see the `Datadog site documentation <https://docs.datadoghq.com/getting_started/site/>`_. |
| 98 | + |
| 99 | +Configuring Storage Tiers |
| 100 | +************************* |
| 101 | + |
| 102 | +Datadog offers different storage tiers for logs with varying retention and costs: |
| 103 | + |
| 104 | +.. list-table:: |
| 105 | + :header-rows: 1 |
| 106 | + :widths: 20 40 40 |
| 107 | + |
| 108 | + * - Storage Tier |
| 109 | + - Description |
| 110 | + - Use Case |
| 111 | + * - indexes |
| 112 | + - Hot storage for recent logs (default) |
| 113 | + - Real-time analysis and alerting |
| 114 | + * - online-archives |
| 115 | + - Warm storage for older logs |
| 116 | + - Historical log analysis |
| 117 | + * - flex |
| 118 | + - Cost-effective storage |
| 119 | + - Long-term retention |
| 120 | + |
| 121 | +The toolset uses storage tiers as a fallback mechanism. Subsequent tiers are queried only if the previous tier yielded no result. |
| 122 | +For example if the toolset is configured with storage_tiers ``["indexes", "online-archives"]``, then: |
| 123 | + |
| 124 | +* Holmes first runs a query using storage_tier ``indexes`` |
| 125 | +* If there are no results at all, Holmes will then query ``online-archives`` |
| 126 | + |
| 127 | +Handling Rate Limits |
| 128 | +******************** |
| 129 | + |
| 130 | +If you encounter rate limiting issues with Datadog (visible as warning messages in Holmes logs), you can adjust the following parameters: |
| 131 | + |
| 132 | +* **page_size**: Reduce this value to fetch fewer logs per API request. This helps avoid hitting rate limits on individual requests. |
| 133 | +* **default_limit**: Lower this value to reduce the total number of logs fetched when no explicit limit is specified. |
| 134 | + |
| 135 | +Example configuration for rate-limited environments: |
| 136 | + |
| 137 | +.. code-block:: yaml |
| 138 | +
|
| 139 | + toolsets: |
| 140 | + datadog/logs: |
| 141 | + enabled: true |
| 142 | + config: |
| 143 | + page_size: 100 # Reduced from default 300 |
| 144 | + default_limit: 500 # Reduced from default 1000 |
| 145 | +
|
| 146 | +When rate limiting occurs, Holmes will automatically retry with exponential backoff. You'll see warnings like: |
| 147 | +``DataDog logs toolset is rate limited/throttled. Waiting X.Xs until reset time`` |
| 148 | + |
| 149 | +Configuring Labels |
| 150 | +****************** |
| 151 | + |
| 152 | +You can customize the labels used by the toolset to identify Kubernetes resources. This is **optional** and only needed if your |
| 153 | +Datadog logs use different field names than the defaults. |
| 154 | + |
| 155 | +.. code-block:: yaml |
| 156 | +
|
| 157 | + toolsets: |
| 158 | + datadog/logs: |
| 159 | + enabled: true |
| 160 | + config: |
| 161 | + labels: |
| 162 | + pod: "pod_name" # The field name for Kubernetes pod name in your Datadog logs |
| 163 | + namespace: "kube_namespace" # The field name for Kubernetes namespace in your Datadog logs |
| 164 | +
|
| 165 | +To find the correct field names in your Datadog logs: |
| 166 | + |
| 167 | +1. Go to Logs > Search in your Datadog console |
| 168 | +2. View a sample log entry |
| 169 | +3. Identify the field names used for pod name and namespace |
| 170 | +4. Update the labels configuration accordingly |
| 171 | + |
| 172 | +.. include:: ./_disable_default_logging_toolset.inc.rst |
| 173 | + |
| 174 | + |
| 175 | +Capabilities |
| 176 | +^^^^^^^^^^^^ |
| 177 | + |
| 178 | +.. include:: ./_toolset_capabilities.inc.rst |
| 179 | + |
| 180 | +.. list-table:: |
| 181 | + :header-rows: 1 |
| 182 | + :widths: 30 70 |
| 183 | + |
| 184 | + * - Tool Name |
| 185 | + - Description |
| 186 | + * - fetch_pod_logs |
| 187 | + - Retrieve logs from Datadog with support for filtering, time ranges, and multiple storage tiers |
0 commit comments