Skip to content

Commit 7613a6e

Browse files
Update Airflow docs for adding V3, removing V1, and add Datadog plugin (DataDog#21215)
* Update integration for Airflow V3, deprecate V1, add details for Datadog plugin * grammar fixes --------- Co-authored-by: cecilia saixue watt <cecilia.watt@datadoghq.com>
1 parent 2aeb4bc commit 7613a6e

1 file changed

Lines changed: 61 additions & 34 deletions

File tree

airflow/README.md

Lines changed: 61 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -28,8 +28,8 @@ All steps below are needed for the Airflow integration to work properly. Before
2828

2929
There are two parts of the Airflow integration:
3030

31-
- The Datadog Agent portion, which makes requests to a provided endpoint for Airflow to report whether it can connect and is healthy. The Agent integration also queries Airflow to produce some of its own metrics.
32-
- The Airflow StatsD portion, where Airflow can be configured to send metrics to the Datadog Agent, which can remap the Airflow notation to a Datadog notation.
31+
- The Datadog Agent portion, which makes requests to a provided endpoint for Airflow to report whether it can connect and is healthy. The Agent integration also queries Airflow to produce some of its own metrics. *Support for Airflow V1 and V2*.
32+
- The Airflow StatsD portion, where Airflow can be configured to send metrics to the Datadog Agent, which can remap the Airflow notation to a Datadog notation. *Support for Airflow V1, V2, and V3*.
3333

3434
The Airflow integration's [metrics](#metrics) come from both the Agent and StatsD portions.
3535

@@ -40,7 +40,9 @@ The Airflow integration's [metrics](#metrics) come from both the Agent and Stats
4040

4141
##### Configure Datadog Agent Airflow integration
4242

43-
Configure the Airflow check included in the [Datadog Agent][4] package to collect health metrics and service checks. This can be done by editing the `url` within the `airflow.d/conf.yaml` file, in the `conf.d/` folder at the root of your Agent's configuration directory, to start collecting your Airflow service checks. See the [sample airflow.d/conf.yaml][5] for all available configuration options.
43+
**Note:** The Datadog Agent's `airflow` integration does not support Airflow V3.
44+
45+
Configure the Agent's `airflow` check included in the [Datadog Agent][4] package to collect health metrics and service checks. This can be done by editing the `url` within the `airflow.d/conf.yaml` file, in the `conf.d/` folder at the root of your Agent's configuration directory, to start collecting your Airflow service checks. See the [sample airflow.d/conf.yaml][5] for all available configuration options.
4446

4547
Ensure that `url` matches your Airflow [webserver `base_url`][19], the URL used to connect to your Airflow instance.
4648

@@ -61,19 +63,18 @@ Connect Airflow to DogStatsD (included in the Datadog Agent) by using the Airflo
6163

6264
2. Update the Airflow configuration file `airflow.cfg` by adding the following configs:
6365

64-
<div class="alert alert-warning"> Do not set `statsd_datadog_enabled` to true. Enabling `statsd_datadog_enabled` can create conflicts. To prevent issues, ensure that the variable is set to `False`.</div>
65-
6666
```conf
67-
[scheduler]
67+
[metrics]
6868
statsd_on = True
6969
# Hostname or IP of server running the Datadog Agent
7070
statsd_host = localhost
7171
# DogStatsD port configured in the Datadog Agent
7272
statsd_port = 8125
7373
statsd_prefix = airflow
7474
```
75+
Do not set `statsd_datadog_enabled` without first [installing the Datadog DogStatsD package](#datadog-dogstatsd-package-and-origin-detection).
7576

76-
3. Update the [Datadog Agent main configuration file][9] `datadog.yaml` by adding the following configs:
77+
3. Update the [Datadog Agent main configuration file][9] `datadog.yaml` by adding the following configuration to remap the Airflow notation to Datadog notation:
7778

7879
```yaml
7980
# dogstatsd_mapper_cache_size: 1000 # default to 1000
@@ -295,7 +296,7 @@ _Available for Agent versions >6.0_
295296
pattern: \[\d{4}\-\d{2}\-\d{2}
296297
```
297298

298-
3. [Restart the Agent][11].
299+
3. [Restart the Agent][10].
299300

300301
<!-- xxz tab xxx -->
301302
<!-- xxx tab "Containerized" xxx -->
@@ -304,6 +305,8 @@ _Available for Agent versions >6.0_
304305

305306
##### Configure Datadog Agent Airflow integration
306307

308+
**Note:** The Datadog Agent's `airflow` integration does not support Airflow V3.
309+
307310
For containerized environments, see the [Autodiscovery Integration Templates][8] for guidance on applying the parameters below.
308311

309312
| Parameter | Value |
@@ -314,31 +317,24 @@ For containerized environments, see the [Autodiscovery Integration Templates][8]
314317

315318
Ensure that `url` matches your Airflow [webserver `base_url`][19], the URL used to connect to your Airflow instance. Replace `localhost` with the template variable `%%host%%`.
316319

317-
If you are using Airflow's Helm chart, this [exposes the webserver as a ClusterIP service][22] that you should use in the `url` parameter.
320+
If you are using the [official Airflow Helm chart][24], this should be applied on the `webserver` pod and its `webserver` container. For example, with the [`webserver.podAnnotations`][22], your Autodiscovery Annotations may look like the following:
318321

319-
For example, your Autodiscovery annotations may look like the following:
320-
321-
```
322-
apiVersion: v1
323-
kind: Pod
324-
# (...)
325-
metadata:
326-
name: '<POD_NAME>'
327-
annotations:
328-
ad.datadoghq.com/<CONTAINER_IDENTIFIER>.checks: |
322+
```yaml
323+
webserver:
324+
podAnnotations:
325+
ad.datadoghq.com/webserver.checks: |
329326
{
330327
"airflow": {
331328
"instances": [
332329
{
333-
"url": "http://airflow-ui.%%kube_namespace%%.svc.cluster.local:8080"
330+
"url": "http://%%host%%:8080"
334331
}
335332
]
336333
}
337334
}
338-
# (...)
339335
```
340336

341-
Replace `<CONTAINER_IDENTIFIER>` with the container's name within the pod (the value returned by `.name`).
337+
Adjust the `ad.datadoghq.com/<CONTAINER_NAME>.checks` annotation accordingly if your container name differs.
342338

343339
##### Connect Airflow to DogStatsD
344340

@@ -349,27 +345,28 @@ Connect Airflow to DogStatsD (included in the Datadog Agent) by using the Airflo
349345

350346
**Note**: Presence or absence of StatsD metrics reported by Airflow might vary depending on the Airflow Executor used. For example: `airflow.ti_failures/successes`, `airflow.operator_failures/successes`, `airflow.dag.task.duration` are [not reported for `KubernetesExecutor`][20].
351347

352-
**Note**: The environment variables used for Airflow may differ between versions. For example in Airflow `2.0.0` this utilizes the environment variable `AIRFLOW__METRICS__STATSD_HOST`, whereas Airflow `1.10.15` utilizes `AIRFLOW__SCHEDULER__STATSD_HOST`.
353-
354-
The Airflow StatsD configuration can be enabled with the following environment variables in a Kubernetes Deployment:
348+
The Airflow StatsD configuration can be enabled with the following environment variables with the Airflow Helm Chart:
355349

356350
```yaml
357351
env:
358-
- name: AIRFLOW__SCHEDULER__STATSD_ON
352+
- name: AIRFLOW__METRICS__STATSD_ON
359353
value: "True"
360-
- name: AIRFLOW__SCHEDULER__STATSD_PORT
354+
- name: AIRFLOW__METRICS__STATSD_PORT
361355
value: "8125"
362-
- name: AIRFLOW__SCHEDULER__STATSD_PREFIX
356+
- name: AIRFLOW__METRICS__STATSD_PREFIX
363357
value: "airflow"
364-
- name: AIRFLOW__SCHEDULER__STATSD_HOST
358+
extraEnv: |
359+
- name: AIRFLOW__METRICS__STATSD_HOST
365360
valueFrom:
366361
fieldRef:
367362
fieldPath: status.hostIP
368363
```
369364

370-
The environment variable for the host endpoint `AIRFLOW__SCHEDULER__STATSD_HOST` is supplied with the node's host IP address to route the StatsD data to the Datadog Agent pod on the same node as the Airflow pod. This setup also requires the Agent to have a `hostPort` open for this port `8125` and accepting non-local StatsD traffic. For more information, see [DogStatsD on Kubernetes Setup][12].
365+
**Note**: The [Airflow Helm Chart][24] requires the `valueFrom` based environment variables to be set with `extraEnv`. Do not set `AIRFLOW__METRICS__STATSD_DATADOG_ENABLED` without first [installing the Datadog package](#datadog-dogstatsd-package-and-origin-detection).
366+
367+
The environment variable for the metrics endpoint `AIRFLOW__METRICS__STATSD_HOST` is supplied with the node's host IP address to route the StatsD data to the Datadog Agent pod on the same node as the Airflow pod. This setup also requires the Agent to have a `hostPort` open for this port `8125` and accepting non-local StatsD traffic. For more information, see [DogStatsD on Kubernetes Setup][12]. This should direct the StatsD traffic from the Airflow container to a Datadog Agent ready to accept the incoming data.
371368

372-
This should direct the StatsD traffic from the Airflow container to a Datadog Agent ready to accept the incoming data. The last portion is to update the Datadog Agent with the corresponding `dogstatsd_mapper_profiles` . This can be done by copying the `dogstatsd_mapper_profiles` provided in the [Host installation][13] into your `datadog.yaml` file. Or by deploying your Datadog Agent with the equivalent JSON configuration in the environment variable `DD_DOGSTATSD_MAPPER_PROFILES`. With respect to Kubernetes the equivalent environment variable notation is:
369+
You must also update the Datadog Agent with the corresponding `dogstatsd_mapper_profiles`. To do this, copy the `dogstatsd_mapper_profiles` provided in the [Host installation][13] into your `datadog.yaml` file. Alternatively, you can also deploy your Datadog Agent with the equivalent JSON configuration in the environment variable `DD_DOGSTATSD_MAPPER_PROFILES`. For Kubernetes, the equivalent environment variable notation is:
373370

374371
```yaml
375372
env:
@@ -429,6 +426,33 @@ See [service_checks.json][18] for a list of service checks provided by this inte
429426

430427
You may need to configure parameters for the Datadog Agent to make authenticated requests to Airflow's API. Use one of the available [configuration options][23].
431428

429+
### Datadog DogStatsD package and origin detection
430+
431+
Airflow can use its own StatsD library, as well the Datadog Python DogStatsD logger. Using the Datadog Python DogStatsD can provide extra tagging options, including [Origin Detection][27] in Kubernetes.
432+
433+
However, this does **not** come installed by default in Airflow. You need to install the [Datadog provider package][25]. For host installations, you can install it directly with `pip install apache-airflow-providers-datadog`.
434+
435+
For containerized environments, [Airflow recommends][26] to build a custom image with this package installed. For example, the following `Dockerfile` can be used relative to your desired version tag (ex: `2.8.4` or `3.0.2`):
436+
437+
```
438+
FROM apache/airflow:<VERSION>
439+
RUN pip install apache-airflow-providers-datadog
440+
```
441+
442+
After that is running, provide the environment variable to your Airflow containers to enable this:
443+
444+
```yaml
445+
- name: AIRFLOW__METRICS__STATSD_DATADOG_ENABLED
446+
value: "true"
447+
```
448+
449+
Because this option switches Airflow from using the Airflow StatsD library to the Datadog DogStatsD library, this option supports Datadog tagging options, including Origin Detection out-of-the-box on the Airflow side. You need to enable [Origin Detection on the Datadog Agent][27] side to match.
450+
451+
If you try to enable the DogStatsD plugin without this package installed, no metrics are sent, and an error like the following occurs:
452+
453+
> {stats.py:42} ERROR - Could not configure StatsClient: No module named 'datadog', using NoStatsLogger instead.
454+
455+
432456
Need help? Contact [Datadog support][11].
433457

434458
[1]: https://airflow.apache.org/docs/stable/metrics.html
@@ -449,9 +473,12 @@ Need help? Contact [Datadog support][11].
449473
[16]: https://airflow.apache.org/docs/apache-airflow-providers-datadog/stable/_modules/airflow/providers/datadog/hooks/datadog.html
450474
[17]: https://github.com/DataDog/integrations-core/blob/master/airflow/metadata.csv
451475
[18]: https://github.com/DataDog/integrations-core/blob/master/airflow/assets/service_checks.json
452-
[19]: https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#base-url
476+
[19]: https://airflow.apache.org/docs/apache-airflow/2.11.0/configurations-ref.html#base-url
453477
[20]: https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html
454478
[21]: http://docs.datadoghq.com/resources/json/airflow_ust.json
455-
[22]: https://github.com/apache/airflow/blob/main/chart/values.yaml#L1522-L1529
479+
[22]: https://github.com/apache/airflow/blob/helm-chart/1.16.0/chart/values.yaml#L1583
456480
[23]: https://github.com/DataDog/integrations-core/blob/master/airflow/datadog_checks/airflow/data/conf.yaml.example#L84-L118
457-
481+
[24]: https://airflow.apache.org/docs/helm-chart/stable/index.html
482+
[25]: https://airflow.apache.org/docs/apache-airflow-providers-datadog/stable/index.html
483+
[26]: https://airflow.apache.org/docs/docker-stack/entrypoint.html#installing-additional-requirements
484+
[27]: https://docs.datadoghq.com/developers/dogstatsd/?tab=cgroups#origin-detection

0 commit comments

Comments
 (0)