Kindling supports a hierarchical configuration system with multiple layers of YAML configuration files. This allows you to organize settings from most general (base settings) to most specific (environment overrides).
See docs/config_reference.md for an exhaustive list of Kindling config keys and provider tag settings.
Configuration files are loaded in the following order (lowest to highest priority):
settings.yaml- Base framework settings (lowest priority)platform_{platform}.yaml- Platform-specific settings (fabric, synapse, databricks)workspace_{workspace_id}.yaml- Workspace-specific settingsenv_{environment}.yaml- Environment-specific settings (dev, prod, etc.)- SparkConf -
spark.kindling.*pool/session settings - Bootstrap Config - In-memory overrides from BOOTSTRAP_CONFIG dict (highest priority)
Each layer can override values from previous layers, allowing precise control over configuration at different organizational levels.
Universal framework settings that apply across all platforms, workspaces, and environments.
kindling:
version: "0.6.0"
bootstrap:
load_lake: true
load_local: true
required_packages: []
extensions: []
ignored_folders: [".git", "__pycache__", ".vscode"]
delta:
tablerefmode: "forName"
optimize_write: true
telemetry:
logging:
level: "INFO"
print: true
tracing:
enabled: false
print: falseUpload to: {artifacts_storage_path}/config/settings.yaml
Settings specific to a platform (Fabric, Synapse, or Databricks). These override base settings.
Platforms:
platform_fabric.yaml- Microsoft Fabric settingsplatform_synapse.yaml- Azure Synapse Analytics settingsplatform_databricks.yaml- Databricks settings
Example: platform_fabric.yaml
kindling:
platform:
name: fabric
TELEMETRY:
logging:
level: DEBUG # More verbose for Fabric diagnostic emitters
SPARK_CONFIGS:
spark.sql.adaptive.enabled: "true"
extensions:
- kindling-otel-azure>=0.3.0Upload to: {artifacts_storage_path}/config/platform_fabric.yaml
Kindling's default convention is name-based Delta tables (forName).
- Databricks: Use Unity Catalog / metastore-managed table names. Databricks manages the physical table location.
- Synapse: Create your target database with an explicit
LOCATIONsosaveAsTable("schema.table")creates tables under that storage location by default. - Fabric: Lakehouse tables are created under the Lakehouse
Tables/area;saveAsTable("table")uses the Lakehouse context.
Exceptions (cross-Lakehouse writes, streaming sink limitations, or platform-specific constraints) should be handled via per-entity overrides:
provider.access_mode:forName|forPath|autoprovider.table_name: fully qualified table name (when usingforName)provider.path: explicit storage path (when usingforPath)
See examples/config/ for complete examples of all platform configs.
Settings specific to a particular workspace. Useful for:
- Team-specific configurations
- Geographic region settings
- Cost center allocations
- Security policies per workspace
Workspace ID Format:
- Fabric: GUID format (e.g.,
workspace_12345678-1234-1234-1234-123456789abc.yaml) - Synapse: Workspace name (e.g.,
workspace_mysynapsews.yaml) - Databricks: Sanitized workspace URL (e.g.,
workspace_adb-123456789_azuredatabricks_net.yaml)
Example: workspace_12345678-1234-1234-1234-123456789abc.yaml
kindling:
workspace:
name: "Production Workspace"
team: "Data Engineering"
TELEMETRY:
logging:
level: INFO
tracing:
tags:
workspace: "production"
team: "data-eng"
DATA:
bronze_layer: "abfss://bronze@mystorageaccount.dfs.core.windows.net/"
silver_layer: "abfss://silver@mystorageaccount.dfs.core.windows.net/"Upload to: {artifacts_storage_path}/config/workspace_{workspace_id}.yaml
Settings specific to an environment (dev, test, staging, prod). Highest priority among YAML files.
Example: env_prod.yaml
kindling:
TELEMETRY:
logging:
level: WARN # Less verbose in production
SPARK_CONFIGS:
spark.executor.memory: "16g"
spark.executor.cores: "4"
DELTA:
optimize_write: true
auto_compact: trueUpload to: {artifacts_storage_path}/config/env_prod.yaml
Minimal setup with just base settings and environment overrides:
config/
├── settings.yaml # Base settings
├── env_dev.yaml # Development overrides
└── env_prod.yaml # Production overrides
Add platform-specific settings for multi-platform deployments:
config/
├── settings.yaml # Base settings
├── platform_fabric.yaml # Fabric-specific settings
├── platform_synapse.yaml # Synapse-specific settings
├── platform_databricks.yaml # Databricks-specific settings
├── env_dev.yaml # Development overrides
└── env_prod.yaml # Production overrides
Complete hierarchy with workspace-specific settings:
config/
├── settings.yaml # Base settings
├── platform_fabric.yaml # Fabric settings
├── platform_synapse.yaml # Synapse settings
├── workspace_abc123.yaml # Team A workspace
├── workspace_def456.yaml # Team B workspace
├── env_dev.yaml # Development env
├── env_staging.yaml # Staging env
└── env_prod.yaml # Production env
Workspaces aligned with environments (common pattern):
config/
├── settings.yaml # Base settings
├── platform_fabric.yaml # Fabric settings
├── workspace_dev-workspace-id.yaml # Dev workspace
├── workspace_staging-workspace-id.yaml # Staging workspace
├── workspace_prod-workspace-id.yaml # Prod workspace
├── env_dev.yaml # Dev env overrides
├── env_staging.yaml # Staging env overrides
└── env_prod.yaml # Prod env overrides
Kindling automatically detects platform and workspace information:
- Checks for explicit
platformorplatform_environmentin bootstrap config - Detects from storage utilities (
mssparkutils,dbutils) - Detects from Spark session configuration
Fabric:
# From notebookutils
notebookutils.runtime.context.get("currentWorkspaceId")
# From mssparkutils
mssparkutils.env.getWorkspaceId()Synapse:
# From Spark config
spark.conf.get("spark.synapse.workspace.name")
# From mssparkutils
mssparkutils.env.getWorkspaceName()Databricks:
# From dbutils context
dbutils.entry_point.getDbutils().notebook().getContext().workspaceId().get()
# From Spark config (fallback)
spark.conf.get("spark.databricks.workspaceUrl")Bootstrap config (in-memory dict) always has highest priority and overrides all YAML and SparkConf settings:
BOOTSTRAP_CONFIG = {
"artifacts_storage_path": "abfss://artifacts@mystorageaccount.dfs.core.windows.net/",
"environment": "prod",
"log_level": "DEBUG", # Overrides all YAML log_level settings
"use_lake_packages": True,
}
initialize_framework(BOOTSTRAP_CONFIG)Spark pool/session config can also provide defaults via spark.kindling.*:
spark.kindling.bootstrap.environment=prod
spark.kindling.bootstrap.use_lake_packages=true
spark.kindling.extensions=["kindling-otel-azure>=0.3.0"]
Mapping rules:
spark.kindling.bootstrap.<key>-> bootstrap key<key>spark.kindling.<key>->kindling.<key>
You have multiple teams using the same Fabric capacity with different workspaces:
# settings.yaml - Base for all teams
kindling:
telemetry:
logging:
level: INFO
# platform_fabric.yaml - Fabric-specific
kindling:
platform:
name: fabric
extensions:
- kindling-otel-azure>=0.3.0
# workspace_team-a-workspace-id.yaml - Team A settings
kindling:
workspace:
team: "team-a"
DATA:
bronze: "abfss://bronze-team-a@..."
# workspace_team-b-workspace-id.yaml - Team B settings
kindling:
workspace:
team: "team-b"
DATA:
bronze: "abfss://bronze-team-b@..."
# env_prod.yaml - Production overrides
kindling:
telemetry:
logging:
level: WARNResult: Each team gets their own data paths, but shares platform settings and production logging levels.
Your app runs on both Fabric and Databricks with different configurations:
# settings.yaml - Universal settings
kindling:
delta:
tablerefmode: "forName"
# platform_fabric.yaml - Fabric tuning
kindling:
TELEMETRY:
logging:
level: DEBUG # Need verbose logs for diagnostic emitters
# platform_databricks.yaml - Databricks tuning
kindling:
TELEMETRY:
logging:
level: INFO # Less verbose, stdout is captured
SPARK_CONFIGS:
spark.databricks.delta.optimizeWrite.enabled: "true"Result: Same app, optimized configs per platform.
Separate workspaces for dev and prod environments:
# settings.yaml - Base settings
kindling:
delta:
optimize_write: true
# workspace_dev-workspace.yaml - Dev workspace
kindling:
TELEMETRY:
logging:
level: DEBUG
LIMITS:
max_executors: 5
# workspace_prod-workspace.yaml - Prod workspace
kindling:
TELEMETRY:
logging:
level: ERROR
LIMITS:
max_executors: 50
# env_prod.yaml - Production environment overrides
kindling:
SPARK_CONFIGS:
spark.executor.memory: "32g"Result: Dev workspace gets debug logging with limited executors; prod workspace gets error-only logging with high executor limits, plus larger memory from environment config.
- Keep settings.yaml minimal - Only truly universal settings
- Use platform configs for platform-specific tuning - Spark configs, resource limits, platform features
- Use workspace configs for organizational boundaries - Teams, regions, cost centers
- Use environment configs for deployment stages - Dev, staging, prod
- All configs are optional - Framework works with just bootstrap config if needed
- Document your hierarchy - Add comments explaining why settings are at each level
- Version control your configs - Track changes to configuration over time
Check bootstrap output for config file status:
Using artifacts path: abfss://artifacts@...
Platform: fabric
Workspace ID: 12345678-1234-1234-1234-123456789abc
Environment: prod
✓ Downloaded: settings.yaml
✓ Downloaded: platform_fabric.yaml
✓ Downloaded: workspace_12345678-1234-1234-1234-123456789abc.yaml
✓ Downloaded: env_prod.yaml
If workspace-specific config isn't loading, check detection:
# Check what workspace ID is detected
from kindling.bootstrap import _get_workspace_id_for_platform
platform = "fabric"
workspace_id = _get_workspace_id_for_platform(platform)
print(f"Detected workspace ID: {workspace_id}")Check final config values:
from kindling.injection import get_kindling_service
from kindling.spark_config import ConfigService
config = get_kindling_service(ConfigService)
# Check specific value
log_level = config.get("kindling.TELEMETRY.logging.level")
print(f"Effective log level: {log_level}")
# Check all config
all_config = config.get_all()
print(f"All config: {all_config}")Before:
config/
└── settings.yaml # Everything in one file
After:
config/
├── settings.yaml # Base settings
├── platform_fabric.yaml # Fabric-specific
└── env_prod.yaml # Environment-specific
Migration Steps:
- Keep existing
settings.yamlas base - Extract platform-specific settings to
platform_{platform}.yaml - Extract environment-specific settings to
env_{environment}.yaml - Test with each layer to verify behavior
Before:
BOOTSTRAP_CONFIG = {
"log_level": "INFO",
"platform": "fabric",
"artifacts_storage_path": "...",
# ... 50 more settings
}After:
# Minimal bootstrap with YAML doing the work
BOOTSTRAP_CONFIG = {
"artifacts_storage_path": "...",
"environment": "prod",
# Only runtime-specific overrides here
}Move settings to YAML files for better maintainability and version control.
- Platform API Architecture - Technical deep dive
- Setup Guide - Initialization and configuration workflow
- Example Configs - Complete working examples