Skip to content

Commit cf3b0fa

Browse files
authored
Chore: Deprecate Airflow integration (#4180)
1 parent c13c40f commit cf3b0fa

106 files changed

Lines changed: 34 additions & 7788 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.circleci/continue_config.yml

Lines changed: 0 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -143,39 +143,6 @@ jobs:
143143
name: Run tests
144144
command: npm --prefix web/client run test
145145

146-
airflow_docker_tests:
147-
machine:
148-
image: ubuntu-2204:2022.10.2
149-
docker_layer_caching: true
150-
resource_class: large
151-
environment:
152-
PYTEST_XDIST_AUTO_NUM_WORKERS: 8
153-
SQLMESH__DISABLE_ANONYMIZED_ANALYTICS: "1"
154-
steps:
155-
- checkout
156-
- run:
157-
name: Install envsubst
158-
command: sudo apt-get update && sudo apt-get install gettext-base
159-
- run:
160-
name: Setup python env
161-
command: |
162-
pip3 install --upgrade pip
163-
pip3 install ruamel.yaml==0.16.0
164-
python3 --version
165-
- run:
166-
name: Run Airflow slow tests
167-
command: make airflow-docker-test-with-env
168-
no_output_timeout: 15m
169-
- run:
170-
name: Collect Airflow logs
171-
command: |
172-
tar -czf ./airflow_logs.tgz -C ./examples/airflow/logs .
173-
mkdir -p /tmp/airflow_logs
174-
cp ./airflow_logs.tgz /tmp/airflow_logs/
175-
when: on_fail
176-
- store_artifacts:
177-
path: /tmp/airflow_logs
178-
179146
trigger_private_tests:
180147
docker:
181148
- image: cimg/python:3.12.0
@@ -281,13 +248,6 @@ workflows:
281248
- "3.10"
282249
- "3.11"
283250
- "3.12"
284-
- airflow_docker_tests:
285-
requires:
286-
- style_and_cicd_tests
287-
filters:
288-
branches:
289-
only:
290-
- main
291251
- engine_tests_docker:
292252
name: engine_<< matrix.engine >>
293253
matrix:

.gitignore

Lines changed: 0 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -138,15 +138,6 @@ dmypy.json
138138
*~
139139
*#
140140

141-
# Airflow example
142-
examples/airflow/Dockerfile
143-
examples/airflow/docker-compose.yaml
144-
examples/airflow/airflow.sh
145-
examples/airflow/.env
146-
examples/airflow/logs
147-
examples/airflow/plugins
148-
examples/airflow/warehouse
149-
150141
*.duckdb
151142
*.duckdb.wal
152143

Makefile

Lines changed: 3 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ ui-style:
1919
SKIP=ruff,ruff-format,mypy pre-commit run --all-files
2020

2121
doc-test:
22-
PYTEST_PLUGINS=tests.common_fixtures python -m pytest --doctest-modules sqlmesh/core sqlmesh/utils
22+
python -m pytest --doctest-modules sqlmesh/core sqlmesh/utils
2323

2424
package:
2525
pip3 install build && python3 -m build
@@ -33,24 +33,6 @@ package-tests:
3333
publish-tests: package-tests
3434
pip3 install twine && python3 -m twine upload -r tobiko-private tests/dist/*
3535

36-
airflow-init:
37-
export AIRFLOW_ENGINE_OPERATOR=spark && make -C ./examples/airflow init
38-
39-
airflow-run:
40-
make -C ./examples/airflow run
41-
42-
airflow-stop:
43-
make -C ./examples/airflow stop
44-
45-
airflow-clean:
46-
make -C ./examples/airflow clean
47-
48-
airflow-psql:
49-
make -C ./examples/airflow psql
50-
51-
airflow-spark-sql:
52-
make -C ./examples/airflow spark-sql
53-
5436
docs-serve:
5537
mkdocs serve
5638

@@ -91,27 +73,10 @@ cicd-test:
9173
pytest -n auto -m "fast or slow" --junitxml=test-results/junit-cicd.xml && pytest -m "isolated"
9274

9375
core-fast-test:
94-
pytest -n auto -m "fast and not web and not github and not dbt and not airflow and not jupyter"
76+
pytest -n auto -m "fast and not web and not github and not dbt and not jupyter"
9577

9678
core-slow-test:
97-
pytest -n auto -m "(fast or slow) and not web and not github and not dbt and not airflow and not jupyter"
98-
99-
airflow-fast-test:
100-
pytest -n auto -m "fast and airflow"
101-
102-
airflow-test:
103-
pytest -n auto -m "(fast or slow) and airflow"
104-
105-
airflow-local-test:
106-
export AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow:airflow@localhost/airflow && \
107-
pytest -n 1 -m "docker and airflow"
108-
109-
airflow-docker-test:
110-
make -C ./examples/airflow docker-test
111-
112-
airflow-local-test-with-env: install-dev airflow-clean airflow-init airflow-run airflow-local-test airflow-stop
113-
114-
airflow-docker-test-with-env: install-dev airflow-clean airflow-init airflow-run airflow-docker-test airflow-stop
79+
pytest -n auto -m "(fast or slow) and not web and not github and not dbt and not jupyter"
11580

11681
engine-slow-test:
11782
pytest -n auto -m "(fast or slow) and engine"

docs/comparisons.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,6 @@ SQLMesh aims to be dbt format-compatible. Importing existing dbt projects with m
3737
| `Virtual Data Environments` | ❌ | [](../concepts/environments)
3838
| `Open-source CI/CD bot` | ❌ | [](../integrations/github)
3939
| `Data consistency enforcement` | ❌ | ✅
40-
| `Native Airflow integration` | ❌ | [](../integrations/airflow)
4140
| Interfaces
4241
| `CLI` | ✅ | [](../reference/cli)
4342
| `Paid UI` | ✅ | ❌

docs/concepts/models/sql_models.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -281,7 +281,7 @@ JOIN countries
281281

282282
SQLMesh will detect that the model depends on both `employees` and `countries`. When executing this model, it will ensure that `employees` and `countries` are executed first.
283283

284-
External dependencies not defined in SQLMesh are also supported. SQLMesh can either depend on them implicitly through the order in which they are executed, or through signals if you are using [Airflow](../../integrations/airflow.md).
284+
External dependencies not defined in SQLMesh are also supported. SQLMesh can either depend on them implicitly through the order in which they are executed, or through [signals](../../guides/signals.md).
285285

286286
Although automatic dependency detection works most of the time, there may be specific cases for which you want to define dependencies manually. You can do so in the `MODEL` DDL with the [dependencies property](./overview.md#properties).
287287

docs/concepts/overview.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,4 @@ SQLMesh automatically runs audits when you apply a `plan` to an environment, or
6868
## Infrastructure and orchestration
6969
Every company's data infrastructure is different. SQLMesh is flexible with regard to which engines and orchestration frameworks you use &mdash; its only requirement is access to the target SQL/analytics engine.
7070

71-
SQLMesh keeps track of model versions and processed data intervals using your existing infrastructure. If SQLMesh is configured without an external orchestrator (such as Airflow), it automatically creates a `sqlmesh` schema in your data warehouse for its internal metadata.
72-
73-
If SQLMesh is configured with Airflow, then it will store all its metadata in the Airflow database. Read more about how [SQLMesh integrates with Airflow](../integrations/airflow.md).
71+
SQLMesh keeps track of model versions and processed data intervals using your existing infrastructure. SQLMesh it automatically creates a `sqlmesh` schema in your data warehouse for its internal metadata.

docs/development.md

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -25,10 +25,6 @@ Run more comprehensive tests that run on each commit:
2525
```bash
2626
make slow-test
2727
```
28-
Run Airflow tests that will run when PR is merged to main:
29-
```bash
30-
make airflow-docker-test-with-env
31-
```
3228
Install docs dependencies:
3329
```bash
3430
make install-doc

docs/faq/faq.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -167,14 +167,17 @@
167167
## Scheduling
168168

169169
??? question "How do I run SQLMesh models on a schedule?"
170-
You can run SQLMesh models using the [built-in scheduler](../guides/scheduling.md#built-in-scheduler) or with the native [Airflow integration](../integrations/airflow.md).
170+
You can run SQLMesh models using the [built-in scheduler](../guides/scheduling.md#built-in-scheduler) or using [Tobiko Cloud](../cloud/features/scheduler/scheduler.md)
171171

172172
Both approaches use each model's `cron` parameter to determine when the model should run - see the [question about `cron` above](#cron-question) for more information.
173173

174174
The built-in scheduler works by executing the command `sqlmesh run`. A sensible approach to running on your project on a schedule is to use Linux’s `cron` tool to execute `sqlmesh run` on a cadence at least as frequent as your briefest SQLMesh model `cron` parameter. For example, if your most frequent model’s `cron` is hour, the `cron` tool should execute `sqlmesh run` at least every hour.
175175

176176
??? question "How do I use SQLMesh with Airflow?"
177-
SQLMesh has first-class support for Airflow - learn more [here](../integrations/airflow.md).
177+
Tobiko Cloud offers first-class support for Airflow - learn more [here](../cloud/features/scheduler/airflow.md)
178+
179+
??? question "How do I use SQLMesh with Dagster?"
180+
Tobiko Cloud offers first-class support for Dagster - learn more [here](../cloud/features/scheduler/dagster.md)
178181

179182
## Warnings and Errors
180183

docs/guides/configuration.md

Lines changed: 2 additions & 85 deletions
Original file line numberDiff line numberDiff line change
@@ -501,7 +501,7 @@ These pages describe the connection configuration options for each execution eng
501501

502502
Configuration for the state backend connection if different from the data warehouse connection.
503503

504-
The data warehouse connection is used to store SQLMesh state if the `state_connection` key is not specified, unless the configuration uses an Airflow or Google Cloud Composer scheduler. If using one of those schedulers, the scheduler's database is used (not the data warehouse) unless an [Airflow Connection has been configured](../integrations/airflow.md#state-connection).
504+
The data warehouse connection is used to store SQLMesh state if the `state_connection` key is not specified.
505505

506506
Unlike data transformations, storing state information requires database transactions. Data warehouses aren’t optimized for executing transactions, and storing state information in them can slow down your project or produce corrupted data due to simultaneous writes to the same table. Therefore, production SQLMesh deployments should use a dedicated state connection.
507507

@@ -675,7 +675,7 @@ Configuration for a connection used to run unit tests. An in-memory DuckDB datab
675675

676676
### Scheduler
677677

678-
Identifies which scheduler backend to use. The scheduler backend is used both for storing metadata and for executing [plans](../concepts/plans.md). By default, the scheduler type is set to `builtin`, which uses the existing SQL engine to store metadata. Use the `airflow` type integrate with Airflow.
678+
Identifies which scheduler backend to use. The scheduler backend is used both for storing metadata and for executing [plans](../concepts/plans.md). By default, the scheduler type is set to `builtin`, which uses the existing SQL engine to store metadata.
679679

680680
These options are in the [scheduler](../reference/configuration.md#scheduler) section of the configuration reference page.
681681

@@ -716,89 +716,6 @@ Example configuration:
716716

717717
No additional configuration options are supported by this scheduler type.
718718

719-
#### Airflow
720-
721-
Example configuration:
722-
723-
=== "YAML"
724-
725-
```yaml linenums="1"
726-
gateways:
727-
my_gateway:
728-
scheduler:
729-
type: airflow
730-
airflow_url: <airflow_url>
731-
username: <username>
732-
password: <password>
733-
```
734-
735-
=== "Python"
736-
737-
An Airflow scheduler is specified with an `AirflowSchedulerConfig` object.
738-
739-
```python linenums="1"
740-
from sqlmesh.core.config import (
741-
Config,
742-
ModelDefaultsConfig,
743-
GatewayConfig,
744-
AirflowSchedulerConfig,
745-
)
746-
747-
config = Config(
748-
model_defaults=ModelDefaultsConfig(dialect=<dialect>),
749-
gateways={
750-
"my_gateway": GatewayConfig(
751-
scheduler=AirflowSchedulerConfig(
752-
airflow_url=<airflow_url>,
753-
username=<username>,
754-
password=<password>,
755-
),
756-
),
757-
}
758-
)
759-
```
760-
761-
See [Airflow Integration Guide](../integrations/airflow.md) for information about how to integrate Airflow with SQLMesh. See the [configuration reference page](../reference/configuration.md#airflow) for a list of all parameters.
762-
763-
#### Cloud Composer
764-
765-
The Google Cloud Composer scheduler type shares the same configuration options as the `airflow` type, except for `username` and `password`. Cloud Composer relies on `gcloud` authentication, so the `username` and `password` options are not required.
766-
767-
Example configuration:
768-
769-
=== "YAML"
770-
771-
```yaml linenums="1"
772-
gateways:
773-
my_gateway:
774-
scheduler:
775-
type: cloud_composer
776-
airflow_url: <airflow_url>
777-
```
778-
779-
=== "Python"
780-
781-
An Google Cloud Composer scheduler is specified with an `CloudComposerSchedulerConfig` object.
782-
783-
```python linenums="1"
784-
from sqlmesh.core.config import (
785-
Config,
786-
ModelDefaultsConfig,
787-
GatewayConfig,
788-
CloudComposerSchedulerConfig,
789-
)
790-
791-
config = Config(
792-
model_defaults=ModelDefaultsConfig(dialect=<dialect>),
793-
gateways={
794-
"my_gateway": GatewayConfig(
795-
scheduler=CloudComposerSchedulerConfig(
796-
airflow_url=<airflow_url>,
797-
),
798-
),
799-
}
800-
)
801-
```
802719

803720
### Gateway/connection defaults
804721

docs/guides/connections.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,6 @@
22

33
## Overview
44

5-
**Note:** The following guide only applies when using the built-in scheduler. Connections are configured differently when using an external scheduler such as Airflow. See the [Scheduling guide](scheduling.md) for more details.
6-
75
In order to deploy models and to apply changes to them, you must configure a connection to your Data Warehouse and, optionally, connection to the database where the SQLMesh state is stored. This can be done in either the `config.yaml` file in your project folder, or the one in `~/.sqlmesh`.
86

97
Each connection is configured as part of a gateway which has a unique name associated with it. The gateway name can be used to select a specific combination of connection settings when using the CLI. For example:
@@ -23,7 +21,7 @@ sqlmesh --gateway local_db plan
2321

2422
## State connection
2523

26-
By default, the data warehouse connection is also used to store the SQLMesh state, unless the configuration uses an Airflow or Google Cloud Composer scheduler. If using one of those schedulers, the state connection defaults to the scheduler's database.
24+
By default, the data warehouse connection is also used to store the SQLMesh state.
2725

2826
The state connection can be changed by providing different connection settings in the `state_connection` key of the gateway configuration:
2927

0 commit comments

Comments
 (0)