Skip to content
2 changes: 1 addition & 1 deletion docs/source/feature_builder_parallelization.rst
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ Configuration

Global Configuration
~~~~~~~~~~~~~~~~~~~~~
Configure these parameters globally for all pipelines in your ``src/config/global.json|yaml`` file:
Configure these parameters globally for all pipelines in your ``src/config/default/global.json|yaml`` file:

.. tabs::

Expand Down
69 changes: 69 additions & 0 deletions docs/source/feature_framework_configuration.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
Framework configuration
=======================

.. list-table::
:header-rows: 0

* - **Applies To:**
- :bdg-info:`Framework Bundle`
* - **Configuration Scope:**
- :bdg-info:`Global`
* - **Databricks Docs:**
- NA

Framework-level settings (global JSON/YAML, substitutions, secrets, spec mappings, operational metadata) live under **one** active directory. The framework chooses between a **default** tree and an optional **override** tree; everything else reads paths relative to that choice.

Configuration
-------------

| **Scope: Global (framework bundle)**
| **Default:** ``./config/default/`` (for example ``src/config/default/`` when the framework root is ``src``).
| **Override:** ``./config/override/`` (for example ``src/config/override/``). Optional; see **Override** below.

Under the active directory you normally have:

* exactly one global file: ``global.json``, ``global.yaml``, or ``global.yml``
* a ``dataflow_spec_mapping/`` directory (see :doc:`feature_versioning_dataflow_spec`)
* optional per-target substitution and secrets files (see :doc:`feature_substitutions`, :doc:`feature_secrets`)
* optional ``operational_metadata_<layer>.json`` (see :doc:`feature_operational_metadata`)

Mandatory
---------

* **Global file:** exactly one of ``global.json``, ``global.yaml``, ``global.yml``. More than one is an error.
* **Mappings:** the ``dataflow_spec_mapping/`` directory must exist.

Optional
--------

Inside the global file, all top-level keys are optional. Common ones:

.. list-table::
:header-rows: 1
:widths: 30 70

* - Key
- See
* - ``pipeline_bundle_spec_format``
- :doc:`feature_spec_format`
* - ``mandatory_table_properties``
- :doc:`feature_mandatory_table_properties`
* - ``spark_config``
- :doc:`feature_spark_configuration`
* - ``table_migration_state_volume_path``
- :doc:`feature_table_migration`
* - ``dataflow_spec_version``
- :doc:`feature_versioning_dataflow_spec`
* - ``override_max_workers`` / ``pipeline_builder_disable_threading``
- :doc:`feature_builder_parallelization`

Override
--------

* If ``./config/override/`` has **no** non-hidden files (only names starting with ``.``, such as ``.gitkeep``), the framework uses ``./config/default/``.
* If it has **any** non-hidden file or folder, the framework uses ``./config/override/`` instead—but then that directory must already contain **both** a valid global file and a ``dataflow_spec_mapping/`` directory. Otherwise startup fails with a message to copy the full layout from ``./config/default/``.
* If **neither** directory has non-hidden content, startup fails: add configuration under ``./config/default/``.

.. tip::

Leave ``config/override`` empty until you can mirror the whole ``config/default`` tree.
2 changes: 1 addition & 1 deletion docs/source/feature_mandatory_table_properties.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Configuration
-------------

| **Scope: Global**
| Mandatory table properties are defined in the global configuration file located at ``src/config/global.json|yaml`` under the ``mandatory_table_properties`` section.
| Mandatory table properties are defined in the global configuration file located at ``src/config/default/global.json|yaml`` under the ``mandatory_table_properties`` section.

Configuration Schema
------------------
Expand Down
2 changes: 1 addition & 1 deletion docs/source/feature_operational_metadata.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ Configuration
-------------

| **Scope: Global**
| In the Framework bundle, operational metadata columns are defined in JSON configuration files at Lakehouse layer level (e.g. bronze, silver, gold). The configuration files are locate at and must be named as follows: ``src/config/operational_metadata_<layer>.json``
| In the Framework bundle, operational metadata columns are defined in JSON configuration files at Lakehouse layer level (e.g. bronze, silver, gold). The configuration files are locate at and must be named as follows: ``src/config/default/operational_metadata_<layer>.json``

.. admonition:: Layer Config
:class: note
Expand Down
2 changes: 1 addition & 1 deletion docs/source/feature_spark_configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Configuration
-------------

| **Scope: Global**
| In the Framework bundle, Spark configurations are defined in the global configuration file located at: ``src/config/global.json|yaml`` under the ``spark_config`` section.
| In the Framework bundle, Spark configurations are defined in the global configuration file located at: ``src/config/default/global.json|yaml`` under the ``spark_config`` section.

| **Scope: Bundle**
| In a Pipeline bundle, Spark configurations are defined in the global configuration file located at: ``src/pipeline_configs/global.json|yaml`` under the ``spark_config`` section.
Expand Down
8 changes: 4 additions & 4 deletions docs/source/feature_spec_format.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ Framework-Level Configuration
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

| **Scope: Framework**
| The global specification format is defined in the Framework's global configuration file: ``src/config/global.json|yaml``
| The global specification format is defined in the Framework's global configuration file: ``src/config/default/global.json|yaml``

.. tabs::

Expand Down Expand Up @@ -235,7 +235,7 @@ Configuration Examples
Example 1: Framework Enforces JSON Format
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**Framework Configuration** (``src/config/global.json|yaml``):
**Framework Configuration** (``src/config/default/global.json|yaml``):

.. tabs::

Expand Down Expand Up @@ -263,7 +263,7 @@ Example 1: Framework Enforces JSON Format
Example 2: Framework Allows Format Flexibility
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**Framework Configuration** (``src/config/global.json|yaml``):
**Framework Configuration** (``src/config/default/global.json|yaml``):

.. tabs::

Expand Down Expand Up @@ -312,7 +312,7 @@ Example 2: Framework Allows Format Flexibility
Example 3: Framework Defaults to YAML
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**Framework Configuration** (``src/config/global.json|yaml``):
**Framework Configuration** (``src/config/default/global.json|yaml``):

.. tabs::

Expand Down
4 changes: 2 additions & 2 deletions docs/source/feature_substitutions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,8 @@ Configuration
-------------

| **Scope: Global**
| In the Framework bundle, substitutions are defined in the following configuration file: ``src/config/<deployment environment/target>_substitutions.json|yaml``
| e.g. ``src/config/dev_substitutions.json|yaml``
| In the Framework bundle, substitutions are defined in the following configuration file: ``src/config/default/<deployment environment/target>_substitutions.json|yaml``
| e.g. ``src/config/default/dev_substitutions.json|yaml``

| **Scope: Pipeline**
| In a Pipeline bundle, substitutions are defined in the following configuration file: ``src/pipeline_configs/<deployment environment/target>_substitutions.json|yaml``
Expand Down
2 changes: 1 addition & 1 deletion docs/source/feature_table_migration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ Set as an attribute when creating your Data Flow Spec, refer to the :doc:`datafl

**Required Global Configuration**

When table migration is enabled, you must specify the volume path for checkpoint state storage in your ``global.json|yaml`` configuration file at either the framework level (``src/config/global.json|yaml``) or pipeline bundle level (``src/pipeline_configs/global.json|yaml``):
When table migration is enabled, you must specify the volume path for checkpoint state storage in your ``global.json|yaml`` configuration file at either the framework level (``src/config/default/global.json|yaml``) or pipeline bundle level (``src/pipeline_configs/global.json|yaml``):

.. tabs::

Expand Down
4 changes: 2 additions & 2 deletions docs/source/feature_versioning_dataflow_spec.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ The versioning system applies transformation mappings that can rename fields, mo
Mapping File Structure
----------------------
DataFlow specification mappings are stored in version-specific directories under:
``src/config/dataflow_spec_mapping/[version]/dataflow_spec_mapping.json``
``src/config/default/dataflow_spec_mapping/[version]/dataflow_spec_mapping.json``

Each mapping file contains transformation rules organized by:

Expand Down Expand Up @@ -242,7 +242,7 @@ Best Practices
Version Management
------------------
1. Mapping versions should follow semantic versioning (MAJOR.MINOR.PATCH)
2. Each mapping version should be stored in its own directory under ``src/config/dataflow_spec_mapping/``
2. Each mapping version should be stored in its own directory under ``src/config/default/dataflow_spec_mapping/``
3. Maintain documentation of what each version transforms and why
4. Keep mapping files immutable once deployed to ensure consistency
5. Create new mapping versions rather than modifying existing ones
Expand Down
1 change: 1 addition & 0 deletions docs/source/features.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ Framework Features
feature_data_quality_expectations
feature_data_quality_quarantine
feature_direct_publishing_mode
feature_framework_configuration
feature_liquid_clustering
feature_logging
feature_logical_environment
Expand Down
2 changes: 1 addition & 1 deletion scripts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ python scripts/validate_dataflows.py -v

**Version Mapping (enabled by default):**
- Automatically detects `dataFlowVersion` property in spec files
- Applies version-specific transformations from `src/config/dataflow_spec_mapping/{version}/`
- Applies version-specific transformations from `src/config/default/dataflow_spec_mapping/{version}/`
- Transforms old property names to current schema (e.g., `cdcApplyChanges` → `cdcSettings`)
- Useful for validating legacy spec files against the current schema
- Shows which files had mappings applied with a version indicator `[v0.1.0]`
Expand Down
File renamed without changes.
8 changes: 8 additions & 0 deletions src/config/override/.gitkeep
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Use this ./config/override folder for your own customised framework-level config, using the same layout as the default ./config/default folder.
Doing so isolates your fork from upstream changes to ./config/default in the open-source project, so merges do not overwrite or conflict with your settings.
When this folder contains any non-hidden files, the framework reads from here instead of ./config/default for:
* global config
* substitutions
* secrets
* dataflow spec mappings
* operational metadata.
20 changes: 13 additions & 7 deletions src/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,26 +39,32 @@ class FrameworkPaths:
"""
FrameworkPaths is a class that contains constants for various paths and file masks used in the Lakeflow Framework.

CONFIG_PATH and CONFIG_OVERRIDE_PATH are static path segments (./config/default and ./config/override).
At runtime, which root to use for framework config files should be chosen using
utility.resolve_framework_config_path(framework_path).

Attributes:
CONFIG_PATH (str): Path to the config directory.
CONFIG_PATH (str): Path to the default config directory (./config/default).
CONFIG_OVERRIDE_PATH (str): Overrides the config directory (./config/override).
EXTENSIONS_PATH (str): The path for extensions.
GLOBAL_CONFIG (tuple): Paths to the global configuration files.
GLOBAL_CONFIG (tuple): Basenames of global configuration files (under the resolved config root).
GLOBAL_SUBSTITUTIONS (tuple): Paths to the global substitutions files.
GLOBAL_SECRETS (tuple): Paths to the global secrets files.
DATAFLOW_SPEC_MAPPING_PATH (str): Path to the dataflow spec mapping directory.
DATAFLOW_SPEC_MAPPING (str): Directory segment for dataflow spec mapping (under the resolved root).
MAIN_SPEC_SCHEMA_PATH (str): Path to the main specification schema file.
FLOW_GROUP_SPEC_SCHEMA_PATH (str): Path to the flow group specification schema file.
EXPECTATIONS_SPEC_SCHEMA_PATH (str): Path to the expectations specification schema file.
SECRETS_SCHEMA_PATH (str): Path to the secrets specification schema file.
TEMPLATE_DEFINITION_SPEC_SCHEMA_PATH (str): Path to the template definition specification schema file.
TEMPLATE_SPEC_SCHEMA_PATH (str): Path to the template specification schema file.
"""
CONFIG_PATH: str = "./config"
CONFIG_PATH: str = "./config/default"
CONFIG_OVERRIDE_PATH: str = "./config/override"
EXTENSIONS_PATH: str = "./extensions"
GLOBAL_CONFIG: tuple = ("./config/global.json", "./config/global.yaml", "./config/global.yml")
GLOBAL_CONFIG: tuple = ("global.json", "global.yaml", "global.yml")
GLOBAL_SUBSTITUTIONS: tuple = ("_substitutions.json", "_substitutions.yaml", "_substitutions.yml")
GLOBAL_SECRETS: tuple = ("_secrets.json", "_secrets.yaml", "_secrets.yml")
DATAFLOW_SPEC_MAPPING_PATH: str = "./config/dataflow_spec_mapping"
DATAFLOW_SPEC_MAPPING: str = "dataflow_spec_mapping"
REQUIREMENTS_FILE: str = "requirements.txt"

# Spec schema definitions paths
Expand All @@ -69,7 +75,7 @@ class FrameworkPaths:
SECRETS_SCHEMA_PATH: str = "./schemas/secrets.json"
TEMPLATE_DEFINITION_SPEC_SCHEMA_PATH: str = "./schemas/spec_template_definition.json"
TEMPLATE_SPEC_SCHEMA_PATH: str = "./schemas/spec_template.json"


class SupportedSpecFormat(str, Enum):
"""Supported specification file formats."""
Expand Down
4 changes: 3 additions & 1 deletion src/dataflow_spec_builder/spec_mapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ def __init__(self, framework_path: str, max_workers: int = 1):
max_workers: Maximum parallel workers for processing
"""
self.framework_path = framework_path
self._framework_config_path = utility.resolve_framework_config_path(framework_path)
self.max_workers = max_workers
self._mapping_cache: Dict[str, Dict] = {}

Expand Down Expand Up @@ -121,7 +122,8 @@ def get_mapping(self, version: str) -> Dict:

mapping_path = os.path.join(
self.framework_path,
FrameworkPaths.DATAFLOW_SPEC_MAPPING_PATH,
self._framework_config_path,
FrameworkPaths.DATAFLOW_SPEC_MAPPING,
version,
"dataflow_spec_mapping.json"
)
Expand Down
16 changes: 11 additions & 5 deletions src/dlt_pipeline_builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
from pyspark.sql import SparkSession
from typing import Dict, Any

from constants import(
from constants import (
FrameworkPaths, FrameworkSettings, PipelineBundlePaths, DLTPipelineSettingKeys, SupportedSpecFormat
)
from dataflow import DataFlow
Expand Down Expand Up @@ -114,6 +114,7 @@ def _init_configurations(self) -> None:

self.bundle_path = config_values[DLTPipelineSettingKeys.BUNDLE_SOURCE_PATH]
self.framework_path = config_values[DLTPipelineSettingKeys.FRAMEWORK_SOURCE_PATH]
self._framework_config_path = utility.resolve_framework_config_path(self.framework_path)
self.workspace_host = config_values[DLTPipelineSettingKeys.WORKSPACE_HOST]

# Load optional parameters
Expand Down Expand Up @@ -186,7 +187,10 @@ def _init_pipeline_components(self) -> None:

def _load_framework_global_config_file(self) -> Dict[str, Any]:
"""Load a global config file"""
global_config_paths = [os.path.join(self.framework_path, path) for path in FrameworkPaths.GLOBAL_CONFIG]
global_config_paths = [
os.path.join(self.framework_path, self._framework_config_path, path)
for path in FrameworkPaths.GLOBAL_CONFIG
]

# Check if more than one global config exists
existing_configs = [path for path in global_config_paths if os.path.exists(path)]
Expand Down Expand Up @@ -284,7 +288,7 @@ def _init_substitution_manager(self) -> None:

# Build framework substitutions paths
framework_subs_paths = [
os.path.join(self.framework_path, FrameworkPaths.CONFIG_PATH, workspace_env + path)
os.path.join(self.framework_path, self._framework_config_path, workspace_env + path)
for path in FrameworkPaths.GLOBAL_SUBSTITUTIONS
]
self.logger.info("Framework substitutions paths: %s", framework_subs_paths)
Expand Down Expand Up @@ -315,7 +319,7 @@ def _init_secrets_manager(self) -> None:

# Build framework secrets paths
framework_secrets_config_paths = [
os.path.join(self.framework_path, FrameworkPaths.CONFIG_PATH, workspace_env + path)
os.path.join(self.framework_path, self._framework_config_path, workspace_env + path)
for path in FrameworkPaths.GLOBAL_SECRETS
]

Expand Down Expand Up @@ -370,7 +374,9 @@ def _setup_operational_metadata(self) -> None:
return

self.logger.info("Operational Metadata: layer set to %s", layer)
metadata_path = os.path.join(self.framework_path, f"config/operational_metadata_{layer}.json")
metadata_path = os.path.join(
self.framework_path, self._framework_config_path, f"operational_metadata_{layer}.json"
)
self.logger.info("Operational Metadata Path: %s", metadata_path)
metadata_json = utility.get_json_from_file(metadata_path, False)
self.operational_metadata_schema = (
Expand Down
52 changes: 51 additions & 1 deletion src/utility.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,8 @@
from constants import (
SupportedSpecFormat,
PipelineBundleSuffixesJson,
PipelineBundleSuffixesYaml
PipelineBundleSuffixesYaml,
FrameworkPaths,
)


Expand Down Expand Up @@ -526,3 +527,52 @@ def set_logger(logger_name: str, log_level: str = "INFO") -> logging.Logger:
logger.addHandler(console_output_handler)

return logger


def _has_visible_children(directory: str) -> bool:
"""
Return True if `directory` exists and contains at least one child name not prefixed with `.`
"""
if not os.path.isdir(directory):
return False
try:
names = os.listdir(directory)
except OSError:
return False
return any(not n.startswith(".") for n in names)


def resolve_framework_config_path(framework_path: str) -> str:
"""
Return FrameworkPaths.CONFIG_OVERRIDE_PATH when the override directory has at least one
non-hidden child and mirrors the required layout; otherwise FrameworkPaths.CONFIG_PATH.

Raises:
FileNotFoundError: If neither default nor override config roots contain valid files,
or if the override root is active but incomplete.
"""
config_dir = os.path.join(framework_path, FrameworkPaths.CONFIG_PATH)
override_dir = os.path.join(framework_path, FrameworkPaths.CONFIG_OVERRIDE_PATH)
if not _has_visible_children(override_dir):
if not _has_visible_children(config_dir):
raise FileNotFoundError(
f"No valid files found under {FrameworkPaths.CONFIG_PATH} or "
f"{FrameworkPaths.CONFIG_OVERRIDE_PATH} in the framework bundle "
f"({framework_path!s}). Please add framework configuration under "
f"{FrameworkPaths.CONFIG_PATH} (for example a global config file, "
f"the {FrameworkPaths.DATAFLOW_SPEC_MAPPING} directory, and related files)."
)
return FrameworkPaths.CONFIG_PATH

mapping_dir = os.path.join(override_dir, FrameworkPaths.DATAFLOW_SPEC_MAPPING)
global_paths = [
os.path.join(override_dir, name) for name in FrameworkPaths.GLOBAL_CONFIG
]
if not os.path.isdir(mapping_dir) or not any(os.path.isfile(p) for p in global_paths):
raise FileNotFoundError(
f"Using {FrameworkPaths.CONFIG_OVERRIDE_PATH} requires both a global config file "
f"({' or '.join(FrameworkPaths.GLOBAL_CONFIG)}) and the "
f"{FrameworkPaths.DATAFLOW_SPEC_MAPPING} directory under that path. "
f"Copy the full {FrameworkPaths.CONFIG_PATH} tree into {FrameworkPaths.CONFIG_OVERRIDE_PATH}."
)
return FrameworkPaths.CONFIG_OVERRIDE_PATH