This guide explains how to install, configure, and start using the Spark Kindling Framework across Microsoft Fabric, Azure Synapse Analytics, and Databricks environments.
For local development, scaffold a project with the CLI and run it without cloud credentials:
pip install 'spark-kindling[standalone]' spark-kindling-cli
kindling repo init my-app --output-dir ./my_app
cd my_app
kindling package init my-app
kindling app init my-app --package my-app
cd apps/my_app
kindling app run . --env localThe scaffold uses in-memory entity providers by default — no Azure storage account needed.
Projects scaffolded with spark-kindling-cli include local Spark fixtures and tests so
entities, pipes, and notebooks can be developed before connecting to Fabric, Synapse, or
Databricks. See Local Python-First Development for the full guide.
- One of the following platforms:
- Microsoft Fabric (with Spark runtime)
- Azure Synapse Analytics (with Spark pools)
- Databricks (Azure, AWS, or GCP)
- Python 3.10+
- Apache Spark 3.4+
- Delta Lake 2.0+
- Azure Storage Account (for artifacts storage)
- Azure Monitor workspace (for telemetry extension)
The framework is distributed as platform-specific wheels:
-
Upload wheels to artifacts storage:
# From releases or build output az storage blob upload \ --account-name <storage-account> \ --container artifacts \ --name packages/kindling_fabric-0.2.0-py3-none-any.whl \ --file dist/kindling_fabric-0.2.0-py3-none-any.whl
-
Install in notebook:
BOOTSTRAP_CONFIG = { "artifacts_storage_path": "abfss://artifacts@<storage>.dfs.core.windows.net/", "environment": "dev", "use_lake_packages": True, # Install from artifacts } %run /path/to/kindling_bootstrap.py
If wheels are already deployed to your artifacts storage:
# In your notebook
BOOTSTRAP_CONFIG = {
"artifacts_storage_path": "Files/artifacts", # Or full abfss:// path
"environment": "production",
"use_lake_packages": True,
}
## Configuration
### Hierarchical Configuration System
Kindling uses a layered configuration approach with YAML files:
**Priority (lowest → highest):**
1. `settings.yaml` - Base framework settings
2. `platform_{platform}.yaml` - Platform-specific (fabric/synapse/databricks)
3. `workspace_{workspace_id}.yaml` - Workspace-specific
4. `env_{environment}.yaml` - Environment-specific (dev/prod/etc)
5. `spark.kindling.*` - Spark pool/session config overrides
6. `BOOTSTRAP_CONFIG` - Runtime overrides (highest)
See [Hierarchical Configuration Guide](./platform_workspace_config.md) for complete details.
### Bootstrap Configuration
Minimal bootstrap example:
```python
BOOTSTRAP_CONFIG = {
# Required
'artifacts_storage_path': "abfss://artifacts@<storage>.dfs.core.windows.net/",
'environment': 'dev', # Loads env_dev.yaml if exists
# Package loading
'use_lake_packages': True, # Install from artifacts storage
'load_local_packages': True, # Load workspace notebooks as packages
# Optional overrides
'log_level': 'INFO', # Override YAML log level
'platform': 'fabric', # Force platform (auto-detected if omitted)
# Extensions (can also be in YAML)
'extensions': ['kindling-otel-azure>=0.3.0'],
# Spark configs
'spark_configs': {
'spark.sql.adaptive.enabled': 'true'
}
}
}
}
%run environment_bootstrapYou can set startup config in pool/session Spark config using spark.kindling.*:
spark.kindling.bootstrap.artifacts_storage_path=abfss://artifacts@<storage>.dfs.core.windows.net/
spark.kindling.bootstrap.environment=dev
spark.kindling.bootstrap.use_lake_packages=true
spark.kindling.bootstrap.load_local=false
spark.kindling.extensions=["kindling-otel-azure>=0.3.0"]
Mapping rules:
spark.kindling.bootstrap.<key>-> bootstrap key<key>(withload_lake/load_localcompatibility aliases)spark.kindling.<key>->kindling.<key>config key
BOOTSTRAP_CONFIG still wins over SparkConf when both are present.
The framework requires these Python packages:
- injector: For dependency injection
- delta-spark: For Delta Lake functionality
- dynaconf: For configuration management
- pytest: For testing (optional for production)
Implement a custom EntityPathLocator for your environment:
@GlobalInjector.singleton_autobind()
class MyEntityPathLocator(EntityPathLocator):
def get_table_path(self, entity):
# Example: Map entity IDs to cloud storage paths
return f"abfss://data@storage.dfs.core.windows.net/tables/{entity.entityid}"Implement a custom EntityNameMapper for your naming convention:
@GlobalInjector.singleton_autobind()
class MyEntityNameMapper(EntityNameMapper):
def get_table_name(self, entity):
# Example: Convert entity IDs to table names
return entity.entityid.replace(".", "_")Implement a custom WatermarkEntityFinder for watermark storage:
@GlobalInjector.singleton_autobind()
class MyWatermarkEntityFinder(WatermarkEntityFinder):
def get_watermark_entity_for_entity(self, context):
return "system.watermarks"
def get_watermark_entity_for_layer(self, layer):
return "system.watermarks"TODO: Rewrite this hallucination For optimal organization, structure your notebooks following this pattern:
/workspace
/project
/bronze
# Bronze layer transformation notebooks
/silver
# Silver layer transformation notebooks
/gold
# Gold layer transformation notebooks
/common
# Shared utility notebooks and entity definitions
/orchestration
# Pipeline orchestration notebooks
For better organization and discoverability:
- Entity IDs:
<domain>.<entity_name>(e.g.,sales.transactions) - Pipe IDs:
<stage>.<domain>.<operation>(e.g.,validate.sales.check_amounts)
To enable testing:
- Create test notebooks for each component
- Configure test data paths
- Import the test framework:
notebook_import("kindling.test_framework")
# Define a test case
@test_case("My test case")
def test_my_pipe():
# Test implementation
assert result == expected
# Run tests
run_tests()If you encounter issues with Delta table access, configure the appropriate access mode:
# In your configuration
'spark_configs': {
'delta_table_access_mode': 'forName' # or 'forPath', 'auto'
}To enable schema evolution for Delta tables:
'spark_configs': {
'spark.databricks.delta.schema.autoMerge.enabled': 'true'
}If you encounter dependency injection issues, check:
- Provider implementation and binding
- Import order in notebooks
- Provider scope (singleton vs. transient)
For additional assistance:
- Check the framework documentation
- Review the test notebooks for examples
- Open an issue in the GitHub repository