You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix(docs): fix Sphinx build warnings across documentation (#80)
- Add autosectionlabel_prefix_document = True to eliminate ~80 duplicate label warnings
- Fix broken :doc: and :ref: cross-references (Groups 4 & 5)
- Fix code-block language directives causing lexer failures (Group 6)
- Fix all "Title underline too short" warnings across 27 files
- Remove stale toctree entries and deduplicate files in multiple toctrees (Groups 1 & 3)
- Fix CRITICAL unexpected section title in feature_framework_configuration.rst
- Add patterns_streaming_flow_groups to toctree (Group 2)
- Remove deploy_enterprise.rst (empty orphan)
- Rename deployment anchors in framework and pipeline bundle docs to avoid
cross-page label collisions
- mark standalone reference/guide pages as :orphan: to suppress toc.not_included warnings when they are linked but not listed in a toctree.
- Enabled sphinx spell checking as make target - make spelling. Fixed spelling issues and added word list.
- fix docs dependency typo by replacing gitsphinx-autoapi==3.8.0 with
sphinx-autoapi==3.8.0 in requirements-docs.txt.
Copy file name to clipboardExpand all lines: docs/source/build_pipeline_bundle_steps.rst
+10-8Lines changed: 10 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -134,13 +134,14 @@ The databricks.yml needs to be adjusted to include the following configurations:
134
134
Based on the Use Case and the standards defined in your Org, select the appropriate bundle structure. See the :doc:`build_pipeline_bundle_structure` section for guidance.
135
135
136
136
4. Select your Data Flow Specification Language / Format
Based on the implementation and standards in your Org, you can select the appropriate specification language / format. See the :doc:`feature_spec_format` section details.
140
140
141
141
Be aware that:
142
+
142
143
- The default format is `JSON`.
143
-
- The format may have alrteady been enforced globally at Framework level per you orgs standards.
144
+
- The format may have already been enforced globally at Framework level per your org standards.
144
145
- If enabled at Framework level, you can set the format at the Pipeline Bundle level.
145
146
- You cannot mix and match formats in the same bundle, it's important to ensure consistency for engineers working on the same bundle.
146
147
@@ -149,7 +150,7 @@ Be aware that:
149
150
150
151
If you haven't already done so, familiarize yourself with the :doc:`feature_substitutions` feature of the Framework.
151
152
152
-
If you need to use substitutions and the substitutions you require have not been configure globally at the Framework level, you need to now setup your substitutions file. See the :doc:`substitutions` section for guidance.
153
+
If you need to use substitutions and the substitutions you require have not been configure globally at the Framework level, you need to now setup your substitutions file. See the :doc:`feature_substitutions` section for guidance.
153
154
154
155
.. note::
155
156
This step is optional and only required if substitutions are required to deploy the same pipeline bundle to multiple environments with different resources names. This step can also be actioned later in the build process after the Data Flow Specs have been created.
@@ -182,8 +183,8 @@ Iterate over the following steps to create each individual Data Flow:
182
183
183
184
If your pipeline requires Spark configuration, event hook registration, or any one-time setup that must run outside of Data Flow logic, add ``.py`` scripts to:
184
185
185
-
- ``src/init/pre/`` — run **before** SDP dataflow declarations
186
-
- ``src/init/post/`` — run **after** SDP dataflow declarations
186
+
- ``src/init/pre/`` — run **before** SDP data flow declarations
187
+
- ``src/init/post/`` — run **after** SDP data flow declarations
187
188
188
189
Scripts are executed in sorted filename order. Files whose names begin with ``_`` are skipped. Use a numeric prefix (e.g. ``01_setup.py``) to control execution order.
189
190
@@ -199,6 +200,7 @@ Iterate over the following steps to create each individual Data Flow:
199
200
If necessary, create a new folder in the ``src/dataflows`` directory based on your selected bundle strategy.
200
201
201
202
b. Create Data Flow Spec file(s):
203
+
202
204
* Refer to the :doc:`dataflow_spec_reference` section to build your Data Flow Spec
203
205
* Refer to the :doc:`patterns` section for high level patterns and sample code.
204
206
* Refer to the :doc:`deploy_samples` section on how to deploy the samples so you can reference the sample code.
@@ -208,7 +210,7 @@ Iterate over the following steps to create each individual Data Flow:
208
210
Create your schema JSON / DDL files in the ``schema`` sub-directory of your Data Flow Spec's home folder:
209
211
210
212
* You should in general always specify a schema for your source and target tables, unless you want schema evolution to happen automatically in Bronze.
211
-
* Schemas are optional for staging tables.
213
+
* Schema definitions are optional for staging tables.
212
214
* Each schema must be defined in it's own individual file.
213
215
* Each schema must be referenced by the appropriate object(s) in your Data Flow Spec JSON file(s).
214
216
@@ -280,7 +282,7 @@ To create a single Pipeline definition, follow these steps:
280
282
281
283
3. **Add any required Data Flow filters:**
282
284
283
-
By default, if you don't specify any Data Flow filters, the pipleine will execute all Data Flows in you Pipeline Bundle.
285
+
By default, if you don't specify any Data Flow filters, the pipeline will execute all Data Flows in your Pipeline Bundle.
284
286
285
287
If you are creating more than one Pipeline definition in your bundle, you may want your Pipeline(s) to only execute specific Data Flows.
286
288
@@ -309,7 +311,7 @@ To create a single Pipeline definition, follow these steps:
309
311
You can add the appropriate Data Flow filter options described above to the Pipeline definition, as show below:
Copy file name to clipboardExpand all lines: docs/source/build_pipeline_bundle_structure.rst
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,7 +15,7 @@ Ultimately you will need to determine the best way to scope your Pipeline Bundle
15
15
16
16
.. important::
17
17
18
-
Per the :ref:`concepts` section of this documentation:
18
+
Per the :doc:`concepts` section of this documentation:
19
19
20
20
* A data flow, and its Data Flow Spec, defines the source(s) and logic required to generate a **single target table**.
21
21
* A Pipeline Bundle can contain multiple Data Flow Specs, and a Pipeline deployed by the bundle may execute the logic for one or more Data Flow Specs.
@@ -36,7 +36,7 @@ Some of the most common groupings strategies are shown below:
36
36
* - Logical Grouping
37
37
- Description
38
38
* - Monolithic
39
-
- A single Pipeline Bundle containing all ata flows and Pipeline definitions. Only suitable for smaller and simpler deployments.
39
+
- A single Pipeline Bundle containing all data flows and Pipeline definitions. Only suitable for smaller and simpler deployments.
40
40
* - Bronze
41
41
- * Source System - A Pipeline per Source System or application
42
42
* - Silver / Enterprise Models
@@ -116,7 +116,7 @@ The ``src/`` directories serve distinct purposes:
116
116
117
117
It is the structure of the ``src/dataflows`` directory that is flexible and can be organised in the way that best suits your standards and ways of working. The Framework will:
118
118
119
-
* Read all the Data Flow Spec files under the ``src/dataflows`` directory, regardless of the folder structure. Filtering of the Dataflows is done when defining your Pipeline and is discussed in the :doc:`build_pipeline_bundle_steps` section.
119
+
* Read all the Data Flow Spec files under the ``src/dataflows`` directory, regardless of the folder structure. Filtering of the data flows is done when defining your Pipeline and is discussed in the :doc:`build_pipeline_bundle_steps` section.
120
120
* Expect that the schemas, transforms and expectations related to a Data Flow Spec are located in their respective ``schemas``, ``dml`` and ``expectations`` sub-directories within the Data Flow Spec's home directory.
121
121
122
122
The most common ways to organize your ``src/dataflows`` directory are:
Copy file name to clipboardExpand all lines: docs/source/concepts.rst
+13-13Lines changed: 13 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,9 +11,9 @@ The below diagram illustrates some of the key concepts of the Framework, which a
11
11
12
12
.. _concepts_dabs:
13
13
14
-
Databricks Asset Bundles (DABs)
15
-
===============================
16
-
Databricks Asset Bundles (DABs) are a way to package and deploy Databricks assets such as source code, Spark Declarative Pipelines notebooks and libraries.
14
+
Declarative Automation Bundles (DABs)
15
+
=====================================
16
+
Declarative Automation Bundles (DABs) are a way to package and deploy Databricks assets such as source code, Spark Declarative Pipelines notebooks and libraries.
17
17
This concept is core to how the Lakeflow Framework has been designed and implemented.
18
18
19
19
Detailed documentation on DABs can be found at: https://docs.databricks.com/en/dev-tools/bundles/index.html
@@ -256,12 +256,12 @@ In the Lakeflow Framework the following types of data flows can be defined in a
256
256
257
257
2. **Flow**
258
258
259
-
Flows data flows allow you to create simple or complex data flows, using the different components of a flow as building blocks. They implement the :doc:`feature_multi_source_streaming` feature of DLT.
259
+
Flows data flows allow you to create simple or complex data flows, using the different components of a flow as building blocks. They implement the :doc:`feature_multi_source_streaming` feature of SDP.
260
260
Flows are useful for Silver and Gold scenarios, and where multiple sources and transformations are required.
261
261
262
262
3. **Materialized Views**
263
263
264
-
Materialized Views are the precomputed results of a query stored in a Table. They are useful for Gold scenarios, and where complex transformations are required.
264
+
Materialized Views are the pre-computed results of a query stored in a Table. They are useful for Gold scenarios, and where complex transformations are required.
* - :ref:`Flow Groups <dataflow-spec-flows-flow-groups-configuration>`
366
-
- Contains the flow groups for the dataflow.
366
+
- Contains the flow groups for the data flow.
367
367
368
368
* A flow group can contain one or more flows.
369
-
* flows implements the :doc:`feature_multi_source_streaming` feature of DLT.'
369
+
* flows implements the :doc:`feature_multi_source_streaming` feature of SDP.
370
370
371
371
372
372
Flow Groups Explained
@@ -400,7 +400,7 @@ Some key points to note:
400
400
401
401
* Staging tables are optional.
402
402
* Staging tables can be referenced as a source or target in any of the flows defined in the flow group.
403
-
* In some cases for very large and complex data flows, you may want to decompose your dataflow into a smaller more manageable data flows. In this instance staging tables may in fact become target tables in smaller more manageable data flows. In these cases they can only be used as a source in downstream Pipelines. This however really depend on the design practices you choose to follow.
403
+
* In some cases for very large and complex data flows, you may want to decompose your data flow into smaller, more manageable data flows. In this instance staging tables may in fact become target tables in smaller, more manageable data flows. In these cases they can only be used as a source in downstream Pipelines. This, however, really depends on the design practices you choose to follow.
404
404
405
405
When defining a staging table, you can specify the following:
406
406
@@ -413,7 +413,7 @@ When defining a staging table, you can specify the following:
413
413
Flows Explained
414
414
~~~~~~~~~~~~~~~
415
415
416
-
Flows are the building blocks of a Data Flow and they implement the :doc:`feature_multi_source_streaming` feature of DLT.
416
+
Flows are the building blocks of a Data Flow and they implement the :doc:`feature_multi_source_streaming` feature of SDP.
417
417
418
418
Flows can be defined in one of two ways:
419
419
@@ -436,14 +436,14 @@ Flows can be defined in one of two ways:
436
436
437
437
* **append_view** - Uses a source view to append data to a staging or target table.
438
438
* **append_sql** - Uses a raw SQL statement to append data to a staging or target table.
439
-
* **merge** - Uses the :ref:`CDC API's <feature_cdc>` to merge data from a source view to a staging or target table.
439
+
* **merge** - Uses the :doc:`CDC API's <feature_cdc>` to merge data from a source view to a staging or target table.
- Views are used to define the source and any additional transformations for a flow. The different types of views are documented in the following sections:
445
445
446
-
* :doc:`feature_source_target_types`
446
+
* :doc:`feature_source_types`
447
447
* :ref:`dataflow-spec-flows-view-configuration`
448
448
449
449
.. important::
@@ -457,7 +457,7 @@ Flows can be defined in one of two ways:
457
457
Materialized Views
458
458
-------------------
459
459
460
-
Materialized Views are the precomputed results of a query stored in a Table. They are typically used for Gold scenarios, and where complex transformations are required.
460
+
Materialized Views are the pre-computed results of a query stored in a Table. They are typically used for Gold scenarios, and where complex transformations are required.
461
461
462
462
Data Flow Spec Components:
463
463
@@ -491,7 +491,7 @@ Data Flow Spec Components:
491
491
- Specifies any additional configuration for the target table, its configuration and properties.
Once you have cloned the Lakeflow Framework repository, you'll need to follow the steps below to set up the framework.
10
10
@@ -36,10 +36,10 @@ Once you have cloned the Lakeflow Framework repository, you'll need to follow th
36
36
./scripts/generate_lockfiles.sh
37
37
38
38
See :doc:`contributor_dev_docs` for more details on the lockfiles.
39
-
2. **Set up VS Code extentions**
39
+
2. **Set up VS Code extensions**
40
40
41
41
Once you open the Lakeflow Framework workspace in VS Code for the first time, VS Code will prompt you to install the recommended extensions.
42
-
If you missed this prompt, you can review and install the recommended extensions with the Extensions: Show Recommended Extensions command or by clicking on the extentions tab on left side of the window and selecting "Workspace Recommendations".
42
+
If you missed this prompt, you can review and install the recommended extensions with the Extensions: Show Recommended Extensions command or by clicking on the extensions tab on left side of the window and selecting "Workspace Recommendations".
Copy file name to clipboardExpand all lines: docs/source/contributor_dev_git.rst
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ GIT
4
4
Branching Strategy
5
5
------------------
6
6
7
-
The project follows the Gitflow branching model for version control. This model provides a robust framework for managing larger projects with scheduled releases.
7
+
The project follows the GitFlow branching model for version control. This model provides a robust framework for managing larger projects with scheduled releases.
8
8
9
9
Our repository maintains the following primary branches:
10
10
@@ -15,7 +15,7 @@ Our repository maintains the following primary branches:
0 commit comments