You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: airflow/README.md
+61-34Lines changed: 61 additions & 34 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -28,8 +28,8 @@ All steps below are needed for the Airflow integration to work properly. Before
28
28
29
29
There are two parts of the Airflow integration:
30
30
31
-
- The Datadog Agent portion, which makes requests to a provided endpoint for Airflow to report whether it can connect and is healthy. The Agent integration also queries Airflow to produce some of its own metrics.
32
-
- The Airflow StatsD portion, where Airflow can be configured to send metrics to the Datadog Agent, which can remap the Airflow notation to a Datadog notation.
31
+
- The Datadog Agent portion, which makes requests to a provided endpoint for Airflow to report whether it can connect and is healthy. The Agent integration also queries Airflow to produce some of its own metrics.*Support for Airflow V1 and V2*.
32
+
- The Airflow StatsD portion, where Airflow can be configured to send metrics to the Datadog Agent, which can remap the Airflow notation to a Datadog notation.*Support for Airflow V1, V2, and V3*.
33
33
34
34
The Airflow integration's [metrics](#metrics) come from both the Agent and StatsD portions.
35
35
@@ -40,7 +40,9 @@ The Airflow integration's [metrics](#metrics) come from both the Agent and Stats
40
40
41
41
##### Configure Datadog Agent Airflow integration
42
42
43
-
Configure the Airflow check included in the [Datadog Agent][4] package to collect health metrics and service checks. This can be done by editing the `url` within the `airflow.d/conf.yaml` file, in the `conf.d/` folder at the root of your Agent's configuration directory, to start collecting your Airflow service checks. See the [sample airflow.d/conf.yaml][5] for all available configuration options.
43
+
**Note:** The Datadog Agent's `airflow` integration does not support Airflow V3.
44
+
45
+
Configure the Agent's `airflow` check included in the [Datadog Agent][4] package to collect health metrics and service checks. This can be done by editing the `url` within the `airflow.d/conf.yaml` file, in the `conf.d/` folder at the root of your Agent's configuration directory, to start collecting your Airflow service checks. See the [sample airflow.d/conf.yaml][5] for all available configuration options.
44
46
45
47
Ensure that `url` matches your Airflow [webserver `base_url`][19], the URL used to connect to your Airflow instance.
46
48
@@ -61,19 +63,18 @@ Connect Airflow to DogStatsD (included in the Datadog Agent) by using the Airflo
61
63
62
64
2. Update the Airflow configuration file `airflow.cfg` by adding the following configs:
63
65
64
-
<divclass="alert alert-warning"> Do not set `statsd_datadog_enabled` to true. Enabling `statsd_datadog_enabled` can create conflicts. To prevent issues, ensure that the variable is set to `False`.</div>
65
-
66
66
```conf
67
-
[scheduler]
67
+
[metrics]
68
68
statsd_on = True
69
69
# Hostname or IP of server running the Datadog Agent
70
70
statsd_host = localhost
71
71
# DogStatsD port configured in the Datadog Agent
72
72
statsd_port = 8125
73
73
statsd_prefix = airflow
74
74
```
75
+
Do not set `statsd_datadog_enabled` without first [installing the Datadog DogStatsD package](#datadog-dogstatsd-package-and-origin-detection).
75
76
76
-
3. Update the [Datadog Agent main configuration file][9]`datadog.yaml` by adding the following configs:
77
+
3. Update the [Datadog Agent main configuration file][9]`datadog.yaml` by adding the following configuration to remap the Airflow notation to Datadog notation:
77
78
78
79
```yaml
79
80
# dogstatsd_mapper_cache_size: 1000 # default to 1000
@@ -295,7 +296,7 @@ _Available for Agent versions >6.0_
295
296
pattern: \[\d{4}\-\d{2}\-\d{2}
296
297
```
297
298
298
-
3. [Restart the Agent][11].
299
+
3. [Restart the Agent][10].
299
300
300
301
<!-- xxz tab xxx -->
301
302
<!-- xxx tab "Containerized" xxx -->
@@ -304,6 +305,8 @@ _Available for Agent versions >6.0_
304
305
305
306
##### Configure Datadog Agent Airflow integration
306
307
308
+
**Note:** The Datadog Agent's `airflow` integration does not support Airflow V3.
309
+
307
310
For containerized environments, see the [Autodiscovery Integration Templates][8] for guidance on applying the parameters below.
308
311
309
312
| Parameter | Value |
@@ -314,31 +317,24 @@ For containerized environments, see the [Autodiscovery Integration Templates][8]
314
317
315
318
Ensure that `url` matches your Airflow [webserver `base_url`][19], the URL used to connect to your Airflow instance. Replace `localhost` with the template variable `%%host%%`.
316
319
317
-
If you are using Airflow's Helm chart, this [exposes the webserver as a ClusterIP service][22] that you should use in the `url` parameter.
320
+
If you are using the [official Airflow Helm chart][24], this should be applied on the `webserver` pod and its `webserver` container. For example, with the [`webserver.podAnnotations`][22], your Autodiscovery Annotations may look like the following:
318
321
319
-
For example, your Autodiscovery annotations may look like the following:
Replace `<CONTAINER_IDENTIFIER>` with the container's name within the pod (the value returned by `.name`).
337
+
Adjust the `ad.datadoghq.com/<CONTAINER_NAME>.checks` annotation accordingly if your container name differs.
342
338
343
339
##### Connect Airflow to DogStatsD
344
340
@@ -349,27 +345,28 @@ Connect Airflow to DogStatsD (included in the Datadog Agent) by using the Airflo
349
345
350
346
**Note**: Presence or absence of StatsD metrics reported by Airflow might vary depending on the Airflow Executor used. For example: `airflow.ti_failures/successes`, `airflow.operator_failures/successes`, `airflow.dag.task.duration` are [not reported for `KubernetesExecutor`][20].
351
347
352
-
**Note**: The environment variables used for Airflow may differ between versions. For example in Airflow `2.0.0` this utilizes the environment variable `AIRFLOW__METRICS__STATSD_HOST`, whereas Airflow `1.10.15` utilizes `AIRFLOW__SCHEDULER__STATSD_HOST`.
353
-
354
-
The Airflow StatsD configuration can be enabled with the following environment variables in a Kubernetes Deployment:
348
+
The Airflow StatsD configuration can be enabled with the following environment variables with the Airflow Helm Chart:
355
349
356
350
```yaml
357
351
env:
358
-
- name: AIRFLOW__SCHEDULER__STATSD_ON
352
+
- name: AIRFLOW__METRICS__STATSD_ON
359
353
value: "True"
360
-
- name: AIRFLOW__SCHEDULER__STATSD_PORT
354
+
- name: AIRFLOW__METRICS__STATSD_PORT
361
355
value: "8125"
362
-
- name: AIRFLOW__SCHEDULER__STATSD_PREFIX
356
+
- name: AIRFLOW__METRICS__STATSD_PREFIX
363
357
value: "airflow"
364
-
- name: AIRFLOW__SCHEDULER__STATSD_HOST
358
+
extraEnv: |
359
+
- name: AIRFLOW__METRICS__STATSD_HOST
365
360
valueFrom:
366
361
fieldRef:
367
362
fieldPath: status.hostIP
368
363
```
369
364
370
-
The environment variable for the host endpoint `AIRFLOW__SCHEDULER__STATSD_HOST` is supplied with the node's host IP address to route the StatsD data to the Datadog Agent pod on the same node as the Airflow pod. This setup also requires the Agent to have a `hostPort` open for this port `8125` and accepting non-local StatsD traffic. For more information, see [DogStatsD on Kubernetes Setup][12].
365
+
**Note**: The [Airflow Helm Chart][24] requires the `valueFrom` based environment variables to be set with `extraEnv`. Do not set `AIRFLOW__METRICS__STATSD_DATADOG_ENABLED` without first [installing the Datadog package](#datadog-dogstatsd-package-and-origin-detection).
366
+
367
+
The environment variable for the metrics endpoint `AIRFLOW__METRICS__STATSD_HOST` is supplied with the node's host IP address to route the StatsD data to the Datadog Agent pod on the same node as the Airflow pod. This setup also requires the Agent to have a `hostPort` open for this port `8125` and accepting non-local StatsD traffic. For more information, see [DogStatsD on Kubernetes Setup][12]. This should direct the StatsD traffic from the Airflow container to a Datadog Agent ready to accept the incoming data.
371
368
372
-
This should direct the StatsD traffic from the Airflow container to a Datadog Agent ready to accept the incoming data. The last portion is to update the Datadog Agent with the corresponding `dogstatsd_mapper_profiles` . This can be done by copying the `dogstatsd_mapper_profiles` provided in the [Host installation][13] into your `datadog.yaml` file. Or by deploying your Datadog Agent with the equivalent JSON configuration in the environment variable `DD_DOGSTATSD_MAPPER_PROFILES`. With respect to Kubernetes the equivalent environment variable notation is:
369
+
You must also update the Datadog Agent with the corresponding `dogstatsd_mapper_profiles`. To do this, copy the `dogstatsd_mapper_profiles` provided in the [Host installation][13] into your `datadog.yaml` file. Alternatively, you can also deploy your Datadog Agent with the equivalent JSON configuration in the environment variable `DD_DOGSTATSD_MAPPER_PROFILES`. For Kubernetes, the equivalent environment variable notation is:
373
370
374
371
```yaml
375
372
env:
@@ -429,6 +426,33 @@ See [service_checks.json][18] for a list of service checks provided by this inte
429
426
430
427
You may need to configure parameters for the Datadog Agent to make authenticated requests to Airflow's API. Use one of the available [configuration options][23].
431
428
429
+
### Datadog DogStatsD package and origin detection
430
+
431
+
Airflow can use its own StatsD library, as well the Datadog Python DogStatsD logger. Using the Datadog Python DogStatsD can provide extra tagging options, including [Origin Detection][27] in Kubernetes.
432
+
433
+
However, this does **not** come installed by default in Airflow. You need to install the [Datadog provider package][25]. For host installations, you can install it directly with `pip install apache-airflow-providers-datadog`.
434
+
435
+
For containerized environments, [Airflow recommends][26] to build a custom image with this package installed. For example, the following `Dockerfile` can be used relative to your desired version tag (ex: `2.8.4`or `3.0.2`):
436
+
437
+
```
438
+
FROM apache/airflow:<VERSION>
439
+
RUN pip install apache-airflow-providers-datadog
440
+
```
441
+
442
+
After that is running, provide the environment variable to your Airflow containers to enable this:
443
+
444
+
```yaml
445
+
- name: AIRFLOW__METRICS__STATSD_DATADOG_ENABLED
446
+
value: "true"
447
+
```
448
+
449
+
Because this option switches Airflow from using the Airflow StatsD library to the Datadog DogStatsD library, this option supports Datadog tagging options, including Origin Detection out-of-the-box on the Airflow side. You need to enable [Origin Detection on the Datadog Agent][27] side to match.
450
+
451
+
If you try to enable the DogStatsD plugin without this package installed, no metrics are sent, and an error like the following occurs:
452
+
453
+
> {stats.py:42} ERROR - Could not configure StatsClient: No module named 'datadog', using NoStatsLogger instead.
0 commit comments