Skip to content

Commit 07c9a0b

Browse files
authored
fix(docs): fix Sphinx build warnings across documentation (#80)
- Add autosectionlabel_prefix_document = True to eliminate ~80 duplicate label warnings - Fix broken :doc: and :ref: cross-references (Groups 4 & 5) - Fix code-block language directives causing lexer failures (Group 6) - Fix all "Title underline too short" warnings across 27 files - Remove stale toctree entries and deduplicate files in multiple toctrees (Groups 1 & 3) - Fix CRITICAL unexpected section title in feature_framework_configuration.rst - Add patterns_streaming_flow_groups to toctree (Group 2) - Remove deploy_enterprise.rst (empty orphan) - Rename deployment anchors in framework and pipeline bundle docs to avoid cross-page label collisions - mark standalone reference/guide pages as :orphan: to suppress toc.not_included warnings when they are linked but not listed in a toctree. - Enabled sphinx spell checking as make target - make spelling. Fixed spelling issues and added word list. - fix docs dependency typo by replacing gitsphinx-autoapi==3.8.0 with sphinx-autoapi==3.8.0 in requirements-docs.txt.
1 parent 8952d11 commit 07c9a0b

67 files changed

Lines changed: 410 additions & 477 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
v0.16.0
1+
v0.16.1

docs/Makefile

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ IMAGEDIR = source/images
1313
help:
1414
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
1515

16-
.PHONY: help Makefile markdown
16+
.PHONY: help Makefile markdown spelling
1717

1818
# Add custom markdown build command
1919
md:
@@ -26,6 +26,10 @@ md:
2626
@cp -r "$(SOURCEDIR)/_static"/* "$(BUILDDIR)/markdown/_static"
2727
@echo "Build finished. The markdown files are in $(BUILDDIR)/markdown."
2828

29+
# Add explicit spelling target
30+
spelling:
31+
@$(SPHINXBUILD) -M spelling "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
32+
2933
# Catch-all target: route all unknown targets to Sphinx using the new
3034
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
3135
%: Makefile

docs/conf.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,9 +23,12 @@
2323
'sphinx_design',
2424
'myst_parser',
2525
'sphinx_tabs.tabs',
26-
'custom_markdown_builder'
26+
'custom_markdown_builder',
27+
"sphinxcontrib.spelling"
2728
]
2829

30+
autosectionlabel_prefix_document = True
31+
2932
templates_path = ['source/_templates']
3033
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
3134

docs/source/build_pipeline_bundle_steps.rst

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -134,13 +134,14 @@ The databricks.yml needs to be adjusted to include the following configurations:
134134
Based on the Use Case and the standards defined in your Org, select the appropriate bundle structure. See the :doc:`build_pipeline_bundle_structure` section for guidance.
135135

136136
4. Select your Data Flow Specification Language / Format
137-
-------------------------------------------------------
137+
--------------------------------------------------------
138138

139139
Based on the implementation and standards in your Org, you can select the appropriate specification language / format. See the :doc:`feature_spec_format` section details.
140140

141141
Be aware that:
142+
142143
- The default format is `JSON`.
143-
- The format may have alrteady been enforced globally at Framework level per you orgs standards.
144+
- The format may have already been enforced globally at Framework level per your org standards.
144145
- If enabled at Framework level, you can set the format at the Pipeline Bundle level.
145146
- You cannot mix and match formats in the same bundle, it's important to ensure consistency for engineers working on the same bundle.
146147

@@ -149,7 +150,7 @@ Be aware that:
149150

150151
If you haven't already done so, familiarize yourself with the :doc:`feature_substitutions` feature of the Framework.
151152

152-
If you need to use substitutions and the substitutions you require have not been configure globally at the Framework level, you need to now setup your substitutions file. See the :doc:`substitutions` section for guidance.
153+
If you need to use substitutions and the substitutions you require have not been configure globally at the Framework level, you need to now setup your substitutions file. See the :doc:`feature_substitutions` section for guidance.
153154

154155
.. note::
155156
This step is optional and only required if substitutions are required to deploy the same pipeline bundle to multiple environments with different resources names. This step can also be actioned later in the build process after the Data Flow Specs have been created.
@@ -182,8 +183,8 @@ Iterate over the following steps to create each individual Data Flow:
182183

183184
If your pipeline requires Spark configuration, event hook registration, or any one-time setup that must run outside of Data Flow logic, add ``.py`` scripts to:
184185

185-
- ``src/init/pre/`` — run **before** SDP dataflow declarations
186-
- ``src/init/post/`` — run **after** SDP dataflow declarations
186+
- ``src/init/pre/`` — run **before** SDP data flow declarations
187+
- ``src/init/post/`` — run **after** SDP data flow declarations
187188

188189
Scripts are executed in sorted filename order. Files whose names begin with ``_`` are skipped. Use a numeric prefix (e.g. ``01_setup.py``) to control execution order.
189190

@@ -199,6 +200,7 @@ Iterate over the following steps to create each individual Data Flow:
199200
If necessary, create a new folder in the ``src/dataflows`` directory based on your selected bundle strategy.
200201

201202
b. Create Data Flow Spec file(s):
203+
202204
* Refer to the :doc:`dataflow_spec_reference` section to build your Data Flow Spec
203205
* Refer to the :doc:`patterns` section for high level patterns and sample code.
204206
* Refer to the :doc:`deploy_samples` section on how to deploy the samples so you can reference the sample code.
@@ -208,7 +210,7 @@ Iterate over the following steps to create each individual Data Flow:
208210
Create your schema JSON / DDL files in the ``schema`` sub-directory of your Data Flow Spec's home folder:
209211

210212
* You should in general always specify a schema for your source and target tables, unless you want schema evolution to happen automatically in Bronze.
211-
* Schemas are optional for staging tables.
213+
* Schema definitions are optional for staging tables.
212214
* Each schema must be defined in it's own individual file.
213215
* Each schema must be referenced by the appropriate object(s) in your Data Flow Spec JSON file(s).
214216

@@ -280,7 +282,7 @@ To create a single Pipeline definition, follow these steps:
280282
281283
3. **Add any required Data Flow filters:**
282284

283-
By default, if you don't specify any Data Flow filters, the pipleine will execute all Data Flows in you Pipeline Bundle.
285+
By default, if you don't specify any Data Flow filters, the pipeline will execute all Data Flows in your Pipeline Bundle.
284286

285287
If you are creating more than one Pipeline definition in your bundle, you may want your Pipeline(s) to only execute specific Data Flows.
286288

@@ -309,7 +311,7 @@ To create a single Pipeline definition, follow these steps:
309311
You can add the appropriate Data Flow filter options described above to the Pipeline definition, as show below:
310312

311313
.. code-block:: yaml
312-
:emphasize-lines: 20-23
314+
:emphasize-lines: 19-22
313315
314316
resources:
315317
pipelines:

docs/source/build_pipeline_bundle_structure.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ Ultimately you will need to determine the best way to scope your Pipeline Bundle
1515

1616
.. important::
1717

18-
Per the :ref:`concepts` section of this documentation:
18+
Per the :doc:`concepts` section of this documentation:
1919

2020
* A data flow, and its Data Flow Spec, defines the source(s) and logic required to generate a **single target table**.
2121
* A Pipeline Bundle can contain multiple Data Flow Specs, and a Pipeline deployed by the bundle may execute the logic for one or more Data Flow Specs.
@@ -36,7 +36,7 @@ Some of the most common groupings strategies are shown below:
3636
* - Logical Grouping
3737
- Description
3838
* - Monolithic
39-
- A single Pipeline Bundle containing all ata flows and Pipeline definitions. Only suitable for smaller and simpler deployments.
39+
- A single Pipeline Bundle containing all data flows and Pipeline definitions. Only suitable for smaller and simpler deployments.
4040
* - Bronze
4141
- * Source System - A Pipeline per Source System or application
4242
* - Silver / Enterprise Models
@@ -116,7 +116,7 @@ The ``src/`` directories serve distinct purposes:
116116

117117
It is the structure of the ``src/dataflows`` directory that is flexible and can be organised in the way that best suits your standards and ways of working. The Framework will:
118118

119-
* Read all the Data Flow Spec files under the ``src/dataflows`` directory, regardless of the folder structure. Filtering of the Dataflows is done when defining your Pipeline and is discussed in the :doc:`build_pipeline_bundle_steps` section.
119+
* Read all the Data Flow Spec files under the ``src/dataflows`` directory, regardless of the folder structure. Filtering of the data flows is done when defining your Pipeline and is discussed in the :doc:`build_pipeline_bundle_steps` section.
120120
* Expect that the schemas, transforms and expectations related to a Data Flow Spec are located in their respective ``schemas``, ``dml`` and ``expectations`` sub-directories within the Data Flow Spec's home directory.
121121

122122
The most common ways to organize your ``src/dataflows`` directory are:

docs/source/concepts.rst

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,9 @@ The below diagram illustrates some of the key concepts of the Framework, which a
1111

1212
.. _concepts_dabs:
1313

14-
Databricks Asset Bundles (DABs)
15-
===============================
16-
Databricks Asset Bundles (DABs) are a way to package and deploy Databricks assets such as source code, Spark Declarative Pipelines notebooks and libraries.
14+
Declarative Automation Bundles (DABs)
15+
=====================================
16+
Declarative Automation Bundles (DABs) are a way to package and deploy Databricks assets such as source code, Spark Declarative Pipelines notebooks and libraries.
1717
This concept is core to how the Lakeflow Framework has been designed and implemented.
1818

1919
Detailed documentation on DABs can be found at: https://docs.databricks.com/en/dev-tools/bundles/index.html
@@ -256,12 +256,12 @@ In the Lakeflow Framework the following types of data flows can be defined in a
256256

257257
2. **Flow**
258258

259-
Flows data flows allow you to create simple or complex data flows, using the different components of a flow as building blocks. They implement the :doc:`feature_multi_source_streaming` feature of DLT.
259+
Flows data flows allow you to create simple or complex data flows, using the different components of a flow as building blocks. They implement the :doc:`feature_multi_source_streaming` feature of SDP.
260260
Flows are useful for Silver and Gold scenarios, and where multiple sources and transformations are required.
261261

262262
3. **Materialized Views**
263263

264-
Materialized Views are the precomputed results of a query stored in a Table. They are useful for Gold scenarios, and where complex transformations are required.
264+
Materialized Views are the pre-computed results of a query stored in a Table. They are useful for Gold scenarios, and where complex transformations are required.
265265

266266
.. important::
267267

@@ -363,10 +363,10 @@ Data Flow Spec Components:
363363
* - :ref:`Table Migration Details (optional) <dataflow-spec-flows-table-migration-configuration>`
364364
- The details of the table being migrated from.
365365
* - :ref:`Flow Groups <dataflow-spec-flows-flow-groups-configuration>`
366-
- Contains the flow groups for the dataflow.
366+
- Contains the flow groups for the data flow.
367367

368368
* A flow group can contain one or more flows.
369-
* flows implements the :doc:`feature_multi_source_streaming` feature of DLT.'
369+
* flows implements the :doc:`feature_multi_source_streaming` feature of SDP.
370370

371371

372372
Flow Groups Explained
@@ -400,7 +400,7 @@ Some key points to note:
400400

401401
* Staging tables are optional.
402402
* Staging tables can be referenced as a source or target in any of the flows defined in the flow group.
403-
* In some cases for very large and complex data flows, you may want to decompose your dataflow into a smaller more manageable data flows. In this instance staging tables may in fact become target tables in smaller more manageable data flows. In these cases they can only be used as a source in downstream Pipelines. This however really depend on the design practices you choose to follow.
403+
* In some cases for very large and complex data flows, you may want to decompose your data flow into smaller, more manageable data flows. In this instance staging tables may in fact become target tables in smaller, more manageable data flows. In these cases they can only be used as a source in downstream Pipelines. This, however, really depends on the design practices you choose to follow.
404404

405405
When defining a staging table, you can specify the following:
406406

@@ -413,7 +413,7 @@ When defining a staging table, you can specify the following:
413413
Flows Explained
414414
~~~~~~~~~~~~~~~
415415

416-
Flows are the building blocks of a Data Flow and they implement the :doc:`feature_multi_source_streaming` feature of DLT.
416+
Flows are the building blocks of a Data Flow and they implement the :doc:`feature_multi_source_streaming` feature of SDP.
417417

418418
Flows can be defined in one of two ways:
419419

@@ -436,14 +436,14 @@ Flows can be defined in one of two ways:
436436

437437
* **append_view** - Uses a source view to append data to a staging or target table.
438438
* **append_sql** - Uses a raw SQL statement to append data to a staging or target table.
439-
* **merge** - Uses the :ref:`CDC API's <feature_cdc>` to merge data from a source view to a staging or target table.
439+
* **merge** - Uses the :doc:`CDC API's <feature_cdc>` to merge data from a source view to a staging or target table.
440440

441441
* - :ref:`Flow Details <dataflow-spec-flows-flow-configuration>`
442442
- Defines the source and target of the flow and any additional properties required for the flow type.
443443
* - :ref:`Views <dataflow-spec-flows-flow-configuration>` (optional)
444444
- Views are used to define the source and any additional transformations for a flow. The different types of views are documented in the following sections:
445445

446-
* :doc:`feature_source_target_types`
446+
* :doc:`feature_source_types`
447447
* :ref:`dataflow-spec-flows-view-configuration`
448448

449449
.. important::
@@ -457,7 +457,7 @@ Flows can be defined in one of two ways:
457457
Materialized Views
458458
-------------------
459459

460-
Materialized Views are the precomputed results of a query stored in a Table. They are typically used for Gold scenarios, and where complex transformations are required.
460+
Materialized Views are the pre-computed results of a query stored in a Table. They are typically used for Gold scenarios, and where complex transformations are required.
461461

462462
Data Flow Spec Components:
463463

@@ -491,7 +491,7 @@ Data Flow Spec Components:
491491
- Specifies any additional configuration for the target table, its configuration and properties.
492492
* - :ref:`Data Quality Expectations (optional) <dataflow-spec-materialized-view-data-quality-configuration>`
493493
- Enable expectations and specify the location of the expectations file(s).
494-
* - :ref:`Quarantine Details (optional) <dataflow-spec-materialized-view-quarantine-configuration>`
494+
* - :doc:`Quarantine Details (optional) <dataflow_spec_ref_data_quality>`
495495
- Set the quarantine mode and if the mode is ``table`` the details of the quarantine table.
496496

497497
Patterns

docs/source/contributor_dev_docs.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ to install the documentation dependencies separately —
8181
``requirements-docs.lock``.
8282

8383
Updating dependencies / regenerating the lockfile
84-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
84+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
8585

8686
If you add, remove, or bump a documentation dependency, edit
8787
``requirements-docs.txt`` (the unhashed source-of-truth file) and then

docs/source/contributor_dev_env.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ Development Environment Setup
44
The sections below assumes the Lakeflow Framework repository has been cloned from git and you are in the root directory. If not please do so first.
55

66
Setting up for development as a contributor to the Lakeflow Framework
7-
================================================================
7+
=====================================================================
88

99
Once you have cloned the Lakeflow Framework repository, you'll need to follow the steps below to set up the framework.
1010

@@ -36,10 +36,10 @@ Once you have cloned the Lakeflow Framework repository, you'll need to follow th
3636
./scripts/generate_lockfiles.sh
3737
3838
See :doc:`contributor_dev_docs` for more details on the lockfiles.
39-
2. **Set up VS Code extentions**
39+
2. **Set up VS Code extensions**
4040

4141
Once you open the Lakeflow Framework workspace in VS Code for the first time, VS Code will prompt you to install the recommended extensions.
42-
If you missed this prompt, you can review and install the recommended extensions with the Extensions: Show Recommended Extensions command or by clicking on the extentions tab on left side of the window and selecting "Workspace Recommendations".
42+
If you missed this prompt, you can review and install the recommended extensions with the Extensions: Show Recommended Extensions command or by clicking on the extensions tab on left side of the window and selecting "Workspace Recommendations".
4343

4444
.. note::
4545

docs/source/contributor_dev_git.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ GIT
44
Branching Strategy
55
------------------
66

7-
The project follows the Gitflow branching model for version control. This model provides a robust framework for managing larger projects with scheduled releases.
7+
The project follows the GitFlow branching model for version control. This model provides a robust framework for managing larger projects with scheduled releases.
88

99
Our repository maintains the following primary branches:
1010

@@ -15,7 +15,7 @@ Our repository maintains the following primary branches:
1515
* ``fix`` - Short-lived branches for bug fixes
1616

1717
Feature Branch Guidelines
18-
^^^^^^^^^^^^^^^^^^^^^^^^
18+
^^^^^^^^^^^^^^^^^^^^^^^^^
1919

2020
* Create feature branches from ``develop``
2121
* Branch naming: ``feature/descriptive-name`` (e.g. ``feature/add-cdc-support``)
@@ -24,7 +24,7 @@ Feature Branch Guidelines
2424
* Submit pull request to merge back into ``develop`` when complete
2525

2626
Fix Branch Guidelines
27-
^^^^^^^^^^^^^^^^^^^
27+
^^^^^^^^^^^^^^^^^^^^^
2828

2929
* Create fix branches from ``develop`` for non-critical bugs
3030
* Branch naming: ``fix/issue-description`` (e.g. ``fix/logging-format``)
@@ -33,7 +33,7 @@ Fix Branch Guidelines
3333
* Submit pull request to merge back into ``develop`` when complete
3434

3535
Release Strategy
36-
---------------
36+
----------------
3737

3838
The project follows a structured release process aligned with the GitFlow branching model:
3939

docs/source/contributor_dev_steps.rst

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ Development Process
2626
1. Local Development
2727

2828
- Follow coding standards and style guides
29-
- Ensure the yapf extention is installed and enabled in VS Code (refer to step 2 of :doc:`contributor_dev_env`)
29+
- Ensure the yapf extension is installed and enabled in VS Code (refer to step 2 of :doc:`contributor_dev_env`)
3030
- Use yapf to format your python code (right click and select 'Format Document With' then select yapf)
3131
- Stick to solid principles and object oriented design patterns
3232
- Deploy updated framework to Databricks to ensure it is working as expected
@@ -35,10 +35,7 @@ Development Process
3535

3636
2. Unit Testing
3737

38-
- Write unit tests per :doc:`contributor_unit_test`
39-
- Test both success and failure scenarios
40-
- Ensure test coverage meets requirements
41-
- Run existing test suite to check for regressions
38+
- Currently being redeveloped and will be added back in soon.
4239

4340
3. Integration Testing / Samples
4441

0 commit comments

Comments
 (0)