Skip to content

Commit aee118d

Browse files
authored
feat(docs): refresh docs IA, discoverability metadata, and Lakeflow p… (#100)
- Consolidate docs entry flow by replacing Introduction with what_is_lakeflow_framework and simplify index.rst. - Strengthen docs guidance across concepts, patterns, and getting_started for operating models, modelling paradigms, and Gold-layer pattern selection. - Add search-indexing metadata/artifacts for hosted docs (html_baseurl, canonical/OG/Twitter template usage, sitemap, robots). - Align root README.md with current Lakeflow SDP positioning, quick-start flow, and repo badges.
1 parent 4ba3ab2 commit aee118d

14 files changed

Lines changed: 228 additions & 57 deletions

README.md

Lines changed: 61 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,10 @@
11
# Databricks Lakeflow Framework
22

3+
[![Docs](https://img.shields.io/badge/docs-live-brightgreen)](https://databricks-solutions.github.io/lakeflow_framework/)
4+
[![Main Build](https://github.com/databricks-solutions/lakeflow_framework/actions/workflows/main-build.yml/badge.svg?branch=main&event=push)](https://github.com/databricks-solutions/lakeflow_framework/actions/workflows/main-build.yml?query=branch%3Amain)
5+
[![Release](https://img.shields.io/github/v/release/databricks-solutions/lakeflow_framework)](https://github.com/databricks-solutions/lakeflow_framework/releases)
6+
[![License](https://img.shields.io/badge/license-Databricks%20License-blue)](https://github.com/databricks-solutions/lakeflow_framework/blob/main/LICENSE.md)
7+
38
<!-- Top bar will be removed from PyPi packaged versions -->
49
<!-- Dont remove: exclude package -->
510
[Documentation](https://databricks-solutions.github.io/lakeflow_framework/) |
@@ -8,17 +13,69 @@
813

914
## Project Description
1015

11-
The Lakeflow Framework is a meta-data driven framework designed to:
12-
- accelerate and simplify the deployment of Spark Declarative Pipelines, and support their deployment through your SDLC.
13-
- support a wide variety of patterns across the medallion architecture for both batch and streaming workloads.
16+
The Lakeflow Framework is a metadata-driven framework for building Databricks Lakeflow Spark Declarative Pipelines. It uses a configuration-driven, pattern-based approach to support both batch and streaming workloads across the medallion architecture.
17+
18+
The framework supports centralized and domain-oriented operating models, and accommodates multiple modelling paradigms (including dimensional, Data Vault, and enterprise canonical models). It is designed for simplicity, performance, maintainability, and extensibility as the Databricks product evolves.
19+
20+
## Why use Lakeflow Framework
21+
22+
- Configuration-driven pattern based pipeline delivery with reusable implementation patterns
23+
- Support for batch and streaming pipelines across Bronze/Silver/Gold, aligned to your chosen modelling pattern
24+
- Flexible for centralized and domain-oriented operating models
25+
26+
## Quick start
27+
28+
```bash
29+
git clone https://github.com/databricks-solutions/lakeflow_framework.git
30+
cd lakeflow_framework
31+
pip install -r requirements-dev.txt
32+
```
33+
34+
Then:
35+
36+
1. Open the hosted docs: https://databricks-solutions.github.io/lakeflow_framework/
37+
2. Deploy the framework using the `Deploy Framework` guide
38+
3. Deploy samples from `samples/` using the documentation walkthroughs
39+
4. Build your first pipeline bundle using the `Build a Pipeline Bundle` guide
40+
41+
## Prerequisites
42+
43+
- Access to a Databricks workspace
44+
- Databricks CLI installed and configured
45+
- Python environment with project dependencies installed
46+
- Familiarity with Databricks Lakeflow Spark Declarative Pipelines concepts
1447

15-
The Framework is designed for simplicity, performance and alignment to the Databricks Product Roadmap. The Framework is designed in such away to allow ease of maintenance and extensibility as the SDP product evolves.
48+
## Repository structure
49+
50+
- `docs/` - Sphinx documentation and versioned docs build tooling
51+
- `samples/` - example framework and pipeline bundles
52+
- `src/` - framework source code and runtime components
53+
54+
## Version compatibility
55+
56+
This project tracks Databricks Lakeflow Spark Declarative Pipelines capabilities and evolves with platform changes. Validate runtime, feature, and API compatibility against your target Databricks workspace and the latest project documentation before production rollout.
57+
58+
## Project status and support
59+
60+
The framework is actively maintained. Databricks support does not cover this repository; issue support is best effort through GitHub issues.
61+
62+
## Releases and changelog
63+
64+
- Releases: https://github.com/databricks-solutions/lakeflow_framework/releases
65+
- Tags: https://github.com/databricks-solutions/lakeflow_framework/tags
1666

1767
## Documentation
1868

1969
Please refer to the [documentation](https://databricks-solutions.github.io/lakeflow_framework/) for further details and an explanation of the samples.
2070
The documentation needs to be deployed as HTML or Markdown within your org before it can be used.
2171

72+
### Local docs development (optional)
73+
74+
```bash
75+
pip install -r requirements-docs.txt
76+
make -C docs html
77+
```
78+
2279
## How to get help
2380

2481
Databricks support doesn't cover this content. For questions or bugs, please open a GitHub issue and the team will help on a best effort basis.

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
v0.17.0
1+
v0.17.1

docs/README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -160,6 +160,11 @@ On push to `main` or release tags:
160160
5. Deploy with GitHub Pages actions.
161161

162162
The deployed root redirects to `current/index.html`.
163+
Build output also includes:
164+
165+
- `sitemap.xml` for search-engine discovery.
166+
- `robots.txt` referencing the sitemap.
167+
- Canonical URL metadata on pages, based on `html_baseurl` in `docs/conf.py`.
163168

164169
## Notes and Troubleshooting
165170

docs/_templates/layout.html

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
{% extends "!layout.html" %}
2+
3+
{% block extrahead %}
4+
{{ super() }}
5+
{%- set _base = (html_baseurl or '').rstrip('/') %}
6+
{%- set _version = docs_current_version|default('current') %}
7+
{%- set _page = pagename ~ '.html' %}
8+
{%- set _canonical = _base ~ '/' ~ _version ~ '/' ~ _page %}
9+
<link rel="canonical" href="{{ _canonical }}" />
10+
<meta property="og:type" content="website" />
11+
<meta property="og:site_name" content="Lakeflow Framework" />
12+
<meta property="og:title" content="{{ title|striptags|e }}" />
13+
<meta property="og:url" content="{{ _canonical }}" />
14+
<meta property="og:description" content="Metadata-driven framework for Databricks Lakeflow Spark Declarative Pipelines." />
15+
<meta name="twitter:card" content="summary" />
16+
<meta name="twitter:title" content="{{ title|striptags|e }}" />
17+
<meta name="twitter:description" content="Metadata-driven framework for Databricks Lakeflow Spark Declarative Pipelines." />
18+
{% endblock %}

docs/conf.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,7 @@
5555
}
5656

5757
html_theme = "sphinx_rtd_theme"
58+
html_baseurl = "https://databricks-solutions.github.io/lakeflow_framework/"
5859

5960
# Inject the version switcher into the sidebar.
6061
html_sidebars = {
@@ -175,5 +176,4 @@ def _current_version_meta(versions: list[dict[str, str]], current: str) -> dict[
175176
def setup(app):
176177
app.set_translator("markdown", CustomMarkdownTranslator)
177178
app.add_css_file('custom.css')
178-
app.add_css_file('custom.css')
179179
#app.add_builder(MarkdownBuilder)

docs/decisions/0007-scripted-versioned-docs-and-ui-scope.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,6 @@ We need stable, versioned Sphinx documentation published to GitHub Pages with:
1818
An earlier `sphinx-multiversion` approach proved brittle in this repository environment.
1919
This branch moved to a script-driven build pipeline and then iterated on selector placement and styling.
2020

21-
Existing prior ADR numbering already covers package/runtime architecture decisions. This ADR is limited to documentation build/deploy and docs UI scope.
22-
2321
## Decision
2422

2523
### 1) Use script-driven versioned docs builds (no `sphinx-multiversion`)

docs/scripts/build_versioned_docs.py

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,8 @@
1818
import sys
1919
from pathlib import Path
2020

21+
DOCS_BASEURL = "https://databricks-solutions.github.io/lakeflow_framework/"
22+
2123

2224
def _run(command: list[str], *, cwd: Path | None = None, env: dict[str, str] | None = None) -> None:
2325
subprocess.run(command, cwd=cwd, env=env, check=True)
@@ -188,6 +190,35 @@ def main() -> None:
188190
"""
189191
(output_root / "index.html").write_text(index_html, encoding="utf-8")
190192

193+
# Basic crawl artifacts for search engines.
194+
sitemap_entries: list[tuple[str, str]] = [("", _release_date_for_ref(repo_root, "main"))]
195+
for item in links:
196+
sitemap_entries.append((f"{item['name']}/index.html", item.get("release_date", "")))
197+
198+
sitemap_xml = [
199+
'<?xml version="1.0" encoding="UTF-8"?>',
200+
'<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">',
201+
]
202+
for rel_path, lastmod in sitemap_entries:
203+
url = DOCS_BASEURL.rstrip("/") + "/" + rel_path
204+
sitemap_xml.append(" <url>")
205+
sitemap_xml.append(f" <loc>{url}</loc>")
206+
if lastmod:
207+
sitemap_xml.append(f" <lastmod>{lastmod}</lastmod>")
208+
sitemap_xml.append(" </url>")
209+
sitemap_xml.append("</urlset>")
210+
(output_root / "sitemap.xml").write_text("\n".join(sitemap_xml) + "\n", encoding="utf-8")
211+
212+
robots_txt = "\n".join(
213+
[
214+
"User-agent: *",
215+
"Allow: /",
216+
f"Sitemap: {DOCS_BASEURL.rstrip('/')}/sitemap.xml",
217+
"",
218+
]
219+
)
220+
(output_root / "robots.txt").write_text(robots_txt, encoding="utf-8")
221+
191222
_safe_remove(worktrees_root)
192223
_run(["git", "worktree", "prune"], cwd=repo_root)
193224

docs/source/concepts.rst

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,25 @@
11
Framework Concepts
22
##################
33

4-
The purpose of the Framework is to provide a standard metadata driven approach to creating Databricks Spark Declarative Pipelines.
4+
The purpose of the Framework is to provide a standard metadata driven approach to creating Databricks Lakeflow Spark Declarative (SDP) Pipelines.
5+
6+
Operating Models and Modeling Paradigms
7+
=======================================
8+
9+
The Lakeflow Framework is designed to support different organizational and data architecture approaches without changing the core framework model.
10+
11+
**Operating models**
12+
13+
* Centralized data engineering and platform teams
14+
* Federated domain-aligned teams, including data mesh and data product approaches
15+
* Hybrid operating models that combine centralized governance with domain ownership
16+
17+
**Modelling paradigms**
18+
19+
* Medallion architecture (Bronze, Silver, Gold)
20+
* Common modelling patterns, including dimensional (facts/dimensions), Data Vault, enterprise canonical/3NF, and other custom enterprise models
21+
22+
These approaches can be implemented using the same Data Flow Spec-driven patterns, deployment process, and framework features documented in this guide.
523

624
The below diagram illustrates some of the key concepts of the Framework, which are explained in more detail in the following sections.
725

docs/source/getting_started.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@ Getting Started
33

44
The following section is a quick start guide on how to get started with the Lakeflow Framework as a data engineer.
55

6+
If you are evaluating fit first, start with :doc:`what_is_lakeflow_framework`.
7+
68
Prerequisites
79
-------------
810

@@ -21,6 +23,9 @@ Follow the below steps to get yourself setup to learn and use the Lakeflow Frame
2123

2224
Understanding the Framework
2325
---------------------------
26+
27+
The framework supports both centralized and domain-oriented delivery models; use :doc:`concepts` for operating model guidance and :doc:`patterns` to select the right implementation pattern for your modelling approach.
28+
2429
1. :doc:`concepts`
2530
2. Step through the ``feature-samples`` bundle — run the ``feature_samples_run_job`` and inspect the resulting tables in the ``{namespace}_feature`` schema. This is the simplest entry point as all features share a single schema.
2631
3. :doc:`features`

docs/source/index.rst

Lines changed: 3 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -6,22 +6,15 @@
66
Lakeflow Framework documentation
77
=================================
88

9-
The Lakeflow Framework is a metadata-driven data engineering framework built for Databricks. It accelerates and simplifies the deployment of Spark Declarative Pipelines (SDP) while supporting your entire software development life cycle.
9+
The Lakeflow Framework is a metadata-driven data engineering framework built for Databricks Lakeflow Spark Declarative Pipelines (SDP). It accelerates and simplifies deployment through a configuration-driven, pattern-based approach that supports both centralized and domain-oriented operating models across the medallion architecture.
1010

11-
**Key Capabilities:**
12-
13-
* Build robust data pipelines using a configuration-driven, Lego-block approach
14-
* Support batch and streaming workloads across the medallion architecture (Bronze, Silver, Gold)
15-
* Deploy seamlessly with Declarative Automation Bundles (DABS)—no wheel files or control tables required
16-
* Extend and maintain easily as your data platform evolves
17-
18-
This documentation covers everything from getting started to advanced orchestration patterns. Explore the sections below to begin building reliable, maintainable data pipelines.
11+
Start with :doc:`what_is_lakeflow_framework` for the full overview, then use the sections below to get hands-on.
1912

2013
.. toctree::
2114
:maxdepth: 4
2215
:caption: Contents:
2316

24-
Introduction <introduction>
17+
what_is_lakeflow_framework
2518
getting_started
2619
Concepts <concepts>
2720
Features <features>

0 commit comments

Comments
 (0)