feat(docs): refresh docs IA, discoverability metadata, and Lakeflow p… (#100)

rederik76 · web-flow · commit aee118dd6e51 · 2026-06-21T16:49:08.000+10:00
- Consolidate docs entry flow by replacing Introduction with what_is_lakeflow_framework and simplify index.rst.
- Strengthen docs guidance across concepts, patterns, and getting_started for operating models, modelling paradigms, and Gold-layer pattern selection.
- Add search-indexing metadata/artifacts for hosted docs (html_baseurl, canonical/OG/Twitter template usage, sitemap, robots).
- Align root README.md with current Lakeflow SDP positioning, quick-start flow, and repo badges.
diff --git a/README.md b/README.md
@@ -1,5 +1,10 @@
 # Databricks Lakeflow Framework
 
+[![Docs](https://img.shields.io/badge/docs-live-brightgreen)](https://databricks-solutions.github.io/lakeflow_framework/)
+[![Main Build](https://github.com/databricks-solutions/lakeflow_framework/actions/workflows/main-build.yml/badge.svg?branch=main&event=push)](https://github.com/databricks-solutions/lakeflow_framework/actions/workflows/main-build.yml?query=branch%3Amain)
+[![Release](https://img.shields.io/github/v/release/databricks-solutions/lakeflow_framework)](https://github.com/databricks-solutions/lakeflow_framework/releases)
+[![License](https://img.shields.io/badge/license-Databricks%20License-blue)](https://github.com/databricks-solutions/lakeflow_framework/blob/main/LICENSE.md)
+
 <!-- Top bar will be removed from PyPi packaged versions -->
 <!-- Dont remove: exclude package -->
 [Documentation](https://databricks-solutions.github.io/lakeflow_framework/) |
@@ -8,17 +13,69 @@
 
 ## Project Description
 
-The Lakeflow Framework is a meta-data driven framework designed to:
-- accelerate and simplify the deployment of Spark Declarative Pipelines, and support their deployment through your SDLC.
-- support a wide variety of patterns across the medallion architecture for both batch and streaming workloads.
+The Lakeflow Framework is a metadata-driven framework for building Databricks Lakeflow Spark Declarative Pipelines. It uses a configuration-driven, pattern-based approach to support both batch and streaming workloads across the medallion architecture.
+
+The framework supports centralized and domain-oriented operating models, and accommodates multiple modelling paradigms (including dimensional, Data Vault, and enterprise canonical models). It is designed for simplicity, performance, maintainability, and extensibility as the Databricks product evolves.
+
+## Why use Lakeflow Framework
+
+- Configuration-driven pattern based pipeline delivery with reusable implementation patterns
+- Support for batch and streaming pipelines across Bronze/Silver/Gold, aligned to your chosen modelling pattern
+- Flexible for centralized and domain-oriented operating models
+
+## Quick start
+
+```bash
+git clone https://github.com/databricks-solutions/lakeflow_framework.git
+cd lakeflow_framework
+pip install -r requirements-dev.txt
+```
+
+Then:
+
+1. Open the hosted docs: https://databricks-solutions.github.io/lakeflow_framework/
+2. Deploy the framework using the `Deploy Framework` guide
+3. Deploy samples from `samples/` using the documentation walkthroughs
+4. Build your first pipeline bundle using the `Build a Pipeline Bundle` guide
+
+## Prerequisites
+
+- Access to a Databricks workspace
+- Databricks CLI installed and configured
+- Python environment with project dependencies installed
+- Familiarity with Databricks Lakeflow Spark Declarative Pipelines concepts
 
-The Framework is designed for simplicity, performance and alignment to the Databricks Product Roadmap. The Framework is designed in such away to allow ease of maintenance and extensibility as the SDP product evolves.
+## Repository structure
+
+- `docs/` - Sphinx documentation and versioned docs build tooling
+- `samples/` - example framework and pipeline bundles
+- `src/` - framework source code and runtime components
+
+## Version compatibility
+
+This project tracks Databricks Lakeflow Spark Declarative Pipelines capabilities and evolves with platform changes. Validate runtime, feature, and API compatibility against your target Databricks workspace and the latest project documentation before production rollout.
+
+## Project status and support
+
+The framework is actively maintained. Databricks support does not cover this repository; issue support is best effort through GitHub issues.
+
+## Releases and changelog
+
+- Releases: https://github.com/databricks-solutions/lakeflow_framework/releases
+- Tags: https://github.com/databricks-solutions/lakeflow_framework/tags
 
 ## Documentation
 
 Please refer to the [documentation](https://databricks-solutions.github.io/lakeflow_framework/) for further details and an explanation of the samples.
 The documentation needs to be deployed as HTML or Markdown within your org before it can be used.
 
+### Local docs development (optional)
+
+```bash
+pip install -r requirements-docs.txt
+make -C docs html
+```
+
 ## How to get help
 
 Databricks support doesn't cover this content. For questions or bugs, please open a GitHub issue and the team will help on a best effort basis.
diff --git a/VERSION b/VERSION
@@ -1 +1 @@
-v0.17.0
+v0.17.1
diff --git a/docs/README.md b/docs/README.md
@@ -160,6 +160,11 @@ On push to `main` or release tags:
 5. Deploy with GitHub Pages actions.
 
 The deployed root redirects to `current/index.html`.
+Build output also includes:
+
+- `sitemap.xml` for search-engine discovery.
+- `robots.txt` referencing the sitemap.
+- Canonical URL metadata on pages, based on `html_baseurl` in `docs/conf.py`.
 
 ## Notes and Troubleshooting
 
diff --git a/docs/_templates/layout.html b/docs/_templates/layout.html
@@ -0,0 +1,18 @@
+{% extends "!layout.html" %}
+
+{% block extrahead %}
+  {{ super() }}
+  {%- set _base = (html_baseurl or '').rstrip('/') %}
+  {%- set _version = docs_current_version|default('current') %}
+  {%- set _page = pagename ~ '.html' %}
+  {%- set _canonical = _base ~ '/' ~ _version ~ '/' ~ _page %}
+  <link rel="canonical" href="{{ _canonical }}" />
+  <meta property="og:type" content="website" />
+  <meta property="og:site_name" content="Lakeflow Framework" />
+  <meta property="og:title" content="{{ title|striptags|e }}" />
+  <meta property="og:url" content="{{ _canonical }}" />
+  <meta property="og:description" content="Metadata-driven framework for Databricks Lakeflow Spark Declarative Pipelines." />
+  <meta name="twitter:card" content="summary" />
+  <meta name="twitter:title" content="{{ title|striptags|e }}" />
+  <meta name="twitter:description" content="Metadata-driven framework for Databricks Lakeflow Spark Declarative Pipelines." />
+{% endblock %}
diff --git a/docs/conf.py b/docs/conf.py
@@ -55,6 +55,7 @@
 }
 
 html_theme = "sphinx_rtd_theme"
+html_baseurl = "https://databricks-solutions.github.io/lakeflow_framework/"
 
 # Inject the version switcher into the sidebar.
 html_sidebars = {
@@ -175,5 +176,4 @@ def _current_version_meta(versions: list[dict[str, str]], current: str) -> dict[
 def setup(app):
     app.set_translator("markdown", CustomMarkdownTranslator)
     app.add_css_file('custom.css')
-    app.add_css_file('custom.css')
     #app.add_builder(MarkdownBuilder)
diff --git a/docs/decisions/0007-scripted-versioned-docs-and-ui-scope.md b/docs/decisions/0007-scripted-versioned-docs-and-ui-scope.md
@@ -18,8 +18,6 @@ We need stable, versioned Sphinx documentation published to GitHub Pages with:
 An earlier `sphinx-multiversion` approach proved brittle in this repository environment.  
 This branch moved to a script-driven build pipeline and then iterated on selector placement and styling.
 
-Existing prior ADR numbering already covers package/runtime architecture decisions. This ADR is limited to documentation build/deploy and docs UI scope.
-
 ## Decision
 
 ### 1) Use script-driven versioned docs builds (no `sphinx-multiversion`)
diff --git a/docs/scripts/build_versioned_docs.py b/docs/scripts/build_versioned_docs.py
@@ -18,6 +18,8 @@
 import sys
 from pathlib import Path
 
+DOCS_BASEURL = "https://databricks-solutions.github.io/lakeflow_framework/"
+
 
 def _run(command: list[str], *, cwd: Path | None = None, env: dict[str, str] | None = None) -> None:
     subprocess.run(command, cwd=cwd, env=env, check=True)
@@ -188,6 +190,35 @@ def main() -> None:
 """
     (output_root / "index.html").write_text(index_html, encoding="utf-8")
 
+    # Basic crawl artifacts for search engines.
+    sitemap_entries: list[tuple[str, str]] = [("", _release_date_for_ref(repo_root, "main"))]
+    for item in links:
+        sitemap_entries.append((f"{item['name']}/index.html", item.get("release_date", "")))
+
+    sitemap_xml = [
+        '<?xml version="1.0" encoding="UTF-8"?>',
+        '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">',
+    ]
+    for rel_path, lastmod in sitemap_entries:
+        url = DOCS_BASEURL.rstrip("/") + "/" + rel_path
+        sitemap_xml.append("  <url>")
+        sitemap_xml.append(f"    <loc>{url}</loc>")
+        if lastmod:
+            sitemap_xml.append(f"    <lastmod>{lastmod}</lastmod>")
+        sitemap_xml.append("  </url>")
+    sitemap_xml.append("</urlset>")
+    (output_root / "sitemap.xml").write_text("\n".join(sitemap_xml) + "\n", encoding="utf-8")
+
+    robots_txt = "\n".join(
+        [
+            "User-agent: *",
+            "Allow: /",
+            f"Sitemap: {DOCS_BASEURL.rstrip('/')}/sitemap.xml",
+            "",
+        ]
+    )
+    (output_root / "robots.txt").write_text(robots_txt, encoding="utf-8")
+
     _safe_remove(worktrees_root)
     _run(["git", "worktree", "prune"], cwd=repo_root)
 
diff --git a/docs/source/concepts.rst b/docs/source/concepts.rst
@@ -1,7 +1,25 @@
 Framework Concepts
 ##################
 
-The purpose of the Framework is to provide a standard metadata driven approach to creating Databricks Spark Declarative Pipelines.
+The purpose of the Framework is to provide a standard metadata driven approach to creating Databricks Lakeflow Spark Declarative (SDP) Pipelines.
+
+Operating Models and Modeling Paradigms
+=======================================
+
+The Lakeflow Framework is designed to support different organizational and data architecture approaches without changing the core framework model.
+
+**Operating models**
+
+* Centralized data engineering and platform teams
+* Federated domain-aligned teams, including data mesh and data product approaches
+* Hybrid operating models that combine centralized governance with domain ownership
+
+**Modelling paradigms**
+
+* Medallion architecture (Bronze, Silver, Gold)
+* Common modelling patterns, including dimensional (facts/dimensions), Data Vault, enterprise canonical/3NF, and other custom enterprise models
+
+These approaches can be implemented using the same Data Flow Spec-driven patterns, deployment process, and framework features documented in this guide.
 
 The below diagram illustrates some of the key concepts of the Framework, which are explained in more detail in the following sections.
 
diff --git a/docs/source/getting_started.rst b/docs/source/getting_started.rst
@@ -3,6 +3,8 @@ Getting Started
 
 The following section is a quick start guide on how to get started with the Lakeflow Framework as a data engineer.
 
+If you are evaluating fit first, start with :doc:`what_is_lakeflow_framework`.
+
 Prerequisites
 -------------
 
@@ -21,6 +23,9 @@ Follow the below steps to get yourself setup to learn and use the Lakeflow Frame
 
 Understanding the Framework
 ---------------------------
+
+The framework supports both centralized and domain-oriented delivery models; use :doc:`concepts` for operating model guidance and :doc:`patterns` to select the right implementation pattern for your modelling approach.
+
 1. :doc:`concepts`
 2. Step through the ``feature-samples`` bundle — run the ``feature_samples_run_job`` and inspect the resulting tables in the ``{namespace}_feature`` schema. This is the simplest entry point as all features share a single schema.
 3. :doc:`features`
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -6,22 +6,15 @@
 Lakeflow Framework documentation
 =================================
 
-The Lakeflow Framework is a metadata-driven data engineering framework built for Databricks. It accelerates and simplifies the deployment of Spark Declarative Pipelines (SDP) while supporting your entire software development life cycle.
+The Lakeflow Framework is a metadata-driven data engineering framework built for Databricks Lakeflow Spark Declarative Pipelines (SDP). It accelerates and simplifies deployment through a configuration-driven, pattern-based approach that supports both centralized and domain-oriented operating models across the medallion architecture.
 
-**Key Capabilities:**
-
-* Build robust data pipelines using a configuration-driven, Lego-block approach
-* Support batch and streaming workloads across the medallion architecture (Bronze, Silver, Gold)
-* Deploy seamlessly with Declarative Automation Bundles (DABS)—no wheel files or control tables required
-* Extend and maintain easily as your data platform evolves
-
-This documentation covers everything from getting started to advanced orchestration patterns. Explore the sections below to begin building reliable, maintainable data pipelines.
+Start with :doc:`what_is_lakeflow_framework` for the full overview, then use the sections below to get hands-on.
 
 .. toctree::
    :maxdepth: 4
    :caption: Contents:
    
-   Introduction <introduction>
+   what_is_lakeflow_framework
    getting_started
    Concepts <concepts> 
    Features <features>
diff --git a/docs/source/introduction.rst b/docs/source/introduction.rst
diff --git a/docs/source/patterns.rst b/docs/source/patterns.rst
@@ -6,7 +6,25 @@ Data Flow and Pipeline Patterns
 Patterns Overview 
 =================
 
-Below we summarize the core patterns that can be used to design and build out your data flows and pipelines.
+Below we summarize a set of core reference patterns used to design and build out data flows and pipelines. These are not the only patterns supported by the framework.
+
+With the exception of the Basic 1:1 pattern, the examples on this page are primarily aimed at more complex streaming requirements (for example, multi-source joins, CDC-driven updates, and mixed stream/static topologies).
+
+For the underlying Lakeflow Spark Declarative Pipelines concepts (datasets, flows, and pipeline semantics), refer to the Databricks documentation: `What is Lakeflow Spark Declarative Pipelines - Key concepts <https://docs.databricks.com/aws/en/ldp/concepts/#key-concepts>`_.
+
+For Gold-layer workloads, Materialized Views should generally be the first choice for dimensional modelling, batch processing, and aggregation-centric serving tables. Prefer streaming-first Gold patterns when lower-latency or less-aggregated use cases are required.
+
+Choosing patterns by operating model
+------------------------------------
+
+These patterns can be applied across centralized platform teams, domain-aligned ownership models (including data mesh/data products), and hybrid approaches.
+
+When selecting a pattern, start with:
+
+* Your ownership model (centralized, domain-aligned, or hybrid)
+* The target modelling approach (medallion, dimensional, Data Vault, enterprise canonical/3NF, or other enterprise models)
+* Source characteristics (streaming, static, CDC, and key alignment)
+* Latency and change-propagation requirements for downstream consumers
 
 .. important::
 
diff --git a/docs/source/spelling_wordlist.txt b/docs/source/spelling_wordlist.txt
@@ -113,3 +113,7 @@ walkthrough
 yapf
 yaml
 yml
+centric
+modelling
+roadmap
+topologies
diff --git a/docs/source/what_is_lakeflow_framework.rst b/docs/source/what_is_lakeflow_framework.rst
@@ -0,0 +1,61 @@
+What is the Lakeflow Framework?
+===============================
+
+The Lakeflow Framework is a metadata-driven framework for building Databricks Lakeflow Spark Declarative Pipelines. It uses a configuration-driven, pattern-based approach to support both batch and streaming workloads across the medallion architecture. Pipelines are deployed with Declarative Automation Bundles, keeping delivery consistent across environments. The framework is designed for simplicity, extensibility, and long-term alignment with the Databricks product roadmap.
+
+Why teams use it
+----------------
+
+* Faster delivery through reusable pipeline patterns
+* Consistent configuration model across environments
+* Native alignment with Declarative Automation Bundles (DABs)
+* Simple, extensible, and long-term alignment with the Databricks product roadmap
+* Lower maintenance overhead as platform features evolve
+
+Core outcomes
+-------------
+
+* Build and deploy reliable Databricks Lakeflow Spark Declarative Pipelines
+* Support Bronze/Silver/Gold medallion workloads
+* Support centralized and domain-oriented operating models, including data mesh and data product approaches
+* Accommodate multiple modelling paradigms (modeling paradigms), including dimensional, Data Vault, and enterprise canonical models
+* Keep implementation extensible without heavy custom scaffolding
+
+Core concepts
+-------------
+
+* **Pattern-based pipeline design**: reusable building blocks standardize implementation and reduce duplication.
+* **Two-layer architecture**:
+
+  * SDP wrapper components expose Spark Declarative Pipelines APIs directly, keeping behavior explicit and close to the platform.
+  * The Data Flow Spec abstraction layer composes those components into consistent, configuration-driven pipeline definitions.
+
+* **Deployment and operations principles**:
+
+  * DABs-native deployment model
+  * No artifacts or wheel files required
+  * Minimal third-party dependencies
+  * No control tables
+  * Extensible framework structure
+  * Flexible bundle-based delivery across environments
+
+Medallion pipeline patterns
+---------------------------
+
+The Lakeflow Framework supports common Databricks medallion architecture patterns for both batch and streaming workloads:
+
+* Bronze ingestion pipelines for raw landing
+* Silver refinement pipelines for modelling, quality, and conformance
+* Gold serving pipelines for consumption-ready datasets
+* Mixed static/streaming and CDC-oriented topologies
+
+The framework composes Spark Declarative Pipelines into repeatable, configuration-driven patterns while keeping implementation behavior explicit and maintainable.
+
+Next steps
+----------
+
+* Start with :doc:`getting_started`
+* Review architecture in :doc:`concepts`
+* Explore practical implementations in :doc:`patterns`
+* Review spec-level options in :doc:`dataflow_spec_reference`
+* Build and deploy your first bundle in :doc:`build_pipeline_bundle`