Skip to content

Latest commit

 

History

History
328 lines (241 loc) · 7.9 KB

File metadata and controls

328 lines (241 loc) · 7.9 KB

SQLite Backend Storage

SQLite is a lightweight, embedded SQL database that provides zero-configuration storage for Pangolin's metadata.

Overview

SQLite stores all Pangolin catalog metadata in a single file with:

  • Zero configuration required
  • ACID compliance
  • Full SQL support
  • Extremely low resource usage
  • Perfect for development and embedded deployments

Pros and Cons

✅ Advantages

  • Zero Configuration: No separate database server needed
  • Single File: Entire database in one file
  • Lightweight: Minimal memory and CPU usage
  • Fast: Excellent performance for read-heavy workloads
  • Portable: Copy the file to move the database
  • Embedded: Perfect for edge devices and embedded systems
  • ACID Compliant: Full transaction support
  • Foreign Keys: Referential integrity support
  • No Network: No network latency

⚠️ Considerations

  • Single Writer: Limited concurrent write performance
  • No Replication: Manual backup/restore required
  • File-Based: Not suitable for distributed systems
  • Scalability: Limited to single machine
  • No User Management: File-level permissions only

Use Cases

Best For:

  • Local development
  • Testing and CI/CD
  • Embedded applications
  • Edge/IoT deployments
  • Single-user applications
  • Prototyping

Not Ideal For:

  • High-concurrency write workloads
  • Distributed systems
  • Multi-server deployments
  • When you need replication
  • Large-scale production (use PostgreSQL/MongoDB)

Prerequisites

  • None! SQLite is embedded in Pangolin

Installation

No installation required - SQLite is built into Pangolin.

Configuration

Environment Variables

# File-based database
DATABASE_URL=sqlite:///path/to/pangolin.db

# Relative path
DATABASE_URL=sqlite://./pangolin.db

# In-memory database (testing only, data lost on restart)
DATABASE_URL=sqlite::memory:

# Absolute path
DATABASE_URL=sqlite:////home/user/pangolin/data/pangolin.db

Schema Initialization

Pangolin automatically initializes the SQLite schema on first startup using the bundled sqlite_schema.sql.

The comprehensive schema includes:

Core Infrastructure:

  • tenants - Multi-tenant isolation records
  • warehouses - Storage configurations (S3, Azure, GCP)
  • catalogs - Iceberg catalog definitions
  • namespaces - Namespace hierarchies and properties
  • assets - Table and view metadata pointers
  • branches - Git-like branch definitions
  • tags - Immutable commit pointers (snapshots)
  • commits - Detailed operation history
  • metadata_locations - Physical Iceberg metadata file path tracking

Governance & Security:

  • users - Root and Tenant user accounts
  • roles - RBAC role definitions
  • user_roles - Role assignments to users
  • permissions - Direct (TBAC/RBAC) permission grants
  • access_requests - Data discovery access workflows
  • audit_logs - Comprehensive tamper-evident trail
  • active_tokens - Session tracking for active tokens

System & Maintenance:

  • business_metadata - Rich discovery metadata for assets
  • system_settings - Tenant-specific configuration overrides
  • federated_sync_stats - Status tracking for cross-tenant federation

Indexes:

  • Pragma-enforced Foreign Keys
  • Primary keys on all tables
  • Composite indexes for branch isolation (e.g., UNIQUE(tenant_id, catalog_name, branch_name, namespace_path, name))
  • Composite indexes for common listing operations

Manual Schema Application

# Download schema
curl -O https://raw.githubusercontent.com/AlexMercedCoder/Pangolin/main/pangolin/pangolin_store/sql/sqlite_schema.sql

# Apply schema
sqlite3 pangolin.db < sqlite_schema.sql

Performance Tuning

Recommended PRAGMA Settings

-- Enable foreign keys (required)
PRAGMA foreign_keys = ON;

-- Performance optimizations
PRAGMA journal_mode = WAL;           -- Write-Ahead Logging for better concurrency
PRAGMA synchronous = NORMAL;         -- Balance between safety and speed
PRAGMA cache_size = -64000;          -- 64MB cache
PRAGMA temp_store = MEMORY;          -- Use memory for temp tables
PRAGMA mmap_size = 30000000000;      -- 30GB memory-mapped I/O
PRAGMA page_size = 4096;             -- 4KB pages (default)

Monitoring

-- Database size
SELECT page_count * page_size as size FROM pragma_page_count(), pragma_page_size();

-- Table sizes
SELECT 
  name,
  SUM(pgsize) as size
FROM dbstat
GROUP BY name
ORDER BY size DESC;

-- Index usage
EXPLAIN QUERY PLAN SELECT * FROM catalogs WHERE tenant_id = '...';

Backup and Restore

File-Based Backup

# Simple file copy (database must be idle)
cp pangolin.db pangolin_backup.db

# Using SQLite backup command (safe while database is in use)
sqlite3 pangolin.db ".backup pangolin_backup.db"

# Dump to SQL
sqlite3 pangolin.db .dump > pangolin_backup.sql

# Restore from SQL
sqlite3 pangolin_new.db < pangolin_backup.sql

Automated Backups

#!/bin/bash
# backup_sqlite.sh

BACKUP_DIR="/backups/pangolin"
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
DB_FILE="/path/to/pangolin.db"

mkdir -p $BACKUP_DIR
sqlite3 $DB_FILE ".backup $BACKUP_DIR/pangolin_$TIMESTAMP.db"

# Keep only last 7 days
find $BACKUP_DIR -name "pangolin_*.db" -mtime +7 -delete

High Availability

SQLite does not support built-in replication. For high availability:

  1. Use PostgreSQL or MongoDB for production
  2. File-level replication: Use tools like Litestream
  3. Regular backups: Automated backup scripts

Litestream (Continuous Replication)

# Install Litestream
curl -s https://api.github.com/repos/benbjohnson/litestream/releases/latest \
  | grep "browser_download_url.*linux-amd64.tar.gz" \
  | cut -d : -f 2,3 \
  | tr -d \" \
  | wget -qi -

# Configure litestream.yml
replicas:
  - url: s3://my-bucket/pangolin.db
    
# Run Litestream
litestream replicate pangolin.db

Security Best Practices

  1. File Permissions: Restrict access to database file

    chmod 600 pangolin.db
    chown pangolin:pangolin pangolin.db
  2. Encryption at Rest: Use filesystem encryption (LUKS, dm-crypt)

  3. Backup Encryption: Encrypt backup files

    sqlite3 pangolin.db .dump | gpg -c > pangolin_backup.sql.gpg
  4. Network Security: SQLite has no network access (inherently secure)

Migration

To PostgreSQL

# Export from SQLite
sqlite3 pangolin.db .dump > sqlite_dump.sql

# Convert to PostgreSQL (requires manual editing)
# Then import to PostgreSQL
psql -U pangolin -d pangolin -f converted_dump.sql

To MongoDB

Custom migration script required.

Troubleshooting

Database Locked

# Check for locks
lsof | grep pangolin.db

# Force unlock (use with caution)
fuser -k pangolin.db

Corruption

# Check integrity
sqlite3 pangolin.db "PRAGMA integrity_check;"

# Recover from corruption
sqlite3 pangolin.db ".recover" | sqlite3 pangolin_recovered.db

Performance Issues

-- Analyze database
ANALYZE;

-- Vacuum to reclaim space
VACUUM;

-- Rebuild indexes
REINDEX;

Test Results

All 6 SQLite tests passing:

  • test_sqlite_tenant_crud
  • test_sqlite_warehouse_crud
  • test_sqlite_catalog_crud
  • test_sqlite_namespace_operations
  • test_sqlite_asset_operations
  • test_sqlite_multi_tenant_isolation

Development Workflow

# Start Pangolin with SQLite
DATABASE_URL=sqlite://./dev.db cargo run --bin pangolin_api

# Inspect database
sqlite3 dev.db
sqlite> .tables
sqlite> .schema catalogs
sqlite> SELECT * FROM tenants;

# Reset database
rm dev.db
# Restart Pangolin to recreate schema

Additional Resources

Next Steps