OpenAutoLoader is a high-performance, incremental data ingestion engine. It bridges the gap between raw cloud storage and production-ready Delta Lakes using the lightning-fast Polars Rust engine.
Stop writing complex Spark jobs for simple file ingestion. OpenAutoLoader provides a "Databricks-style" Auto Loader experience in a lightweight Python package.
Traditional ingestion often requires heavy JVM clusters (Spark) or manual file tracking. OpenAutoLoader changes that:
- Zero-Spark Overhead: Runs on standard Python environments with Rust-level performance.
- Exactly-Once Processing: Integrated SQLite checkpointing ensures no duplicate data, even if a job restarts.
- Schema First: Automatically infers, saves, and enforces JSON schema contracts to prevent data corruption.
- Cloud Native: A single API for Local, S3, Azure Blob (ABFSS), and GCS.
# Core (Local files only)
pip install open-auto-loader
# Full Cloud Support (Recommended)
pip install "open-auto-loader[all]"from open_auto_loader import OpenAutoLoader
# Define your cloud credentials
storage_options = {
"aws_access_key_id": "YOUR_ACCESS_KEY",
"aws_secret_access_key": "YOUR_SECRET_KEY",
"region": "ap-south-1"
}
# Initialize the loader
loader = OpenAutoLoader(
source="s3://my-raw-bucket/incoming_logs/",
target="s3://my-silver-bucket/tables/user_logs",
check_point="./metadata/checkpoints.db",
schema_path="./metadata/schemas/",
storage_options=storage_options
)
# Run the ingestion batch
loader.run(batch_id="daily_run_2026_03_18")- Scanner: Uses
fsspecto identify new files since the last successfulbatch_id. - Schema Guard: Checks the file header against the stored JSON contract in
schema_path. - Polars Engine: Streams the data using
sink_delta(), minimizing memory footprint. - Metadata Injection: Automatically adds
_batch_id,_processed_at, and_source_fileto every row for full auditability. - Committer: Updates the SQLite checkpoint only after a successful Delta write.
| Feature | Local | AWS S3 | Azure Blob | Google GCS |
|---|---|---|---|---|
| Incremental Loading | ✅ | ✅ | ✅ | ✅ |
| Schema Enforcement | ✅ | ✅ | ✅ | ✅ |
| Service Principal Auth | N/A | ✅ | ✅ | ✅ |
| Streaming Sink | ✅ | ✅ | ✅ | ✅ |
Contributions are welcome! Whether it's a bug fix, a new cloud provider, or performance tuning, feel free to open a PR.
Created with ❤️ by Nitish Katkade