PostgreSQL Backend Storage

PostgreSQL is a powerful, open-source relational database that provides excellent performance, reliability, and feature set for Pangolin's metadata storage.

Overview

PostgreSQL stores all Pangolin catalog metadata in a structured SQL schema with:

Strong ACID guarantees
Foreign key constraints for data integrity
Efficient indexing for fast queries
Built-in replication and backup tools
Mature ecosystem and tooling

Pros and Cons

✅ Advantages

Battle-tested: Decades of production use
ACID Compliance: Strong consistency guarantees
Rich Query Capabilities: Complex joins, aggregations, window functions
Foreign Keys: Automatic referential integrity
Managed Options: AWS RDS, Azure Database, Google Cloud SQL
Excellent Tooling: pgAdmin, DBeaver, psql, and many others
JSON Support: JSONB for flexible property storage
Full-text Search: Built-in search capabilities
Extensions: PostGIS, pg_partman, and hundreds more

⚠️ Considerations

Setup Complexity: Requires database server installation
Resource Usage: Higher memory footprint than SQLite
Vertical Scaling: Primarily scales up (bigger server)
Cost: Managed services have ongoing costs

Use Cases

Best For:

Production deployments
Multi-user environments
High-concurrency workloads
Enterprise deployments
When you need strong consistency

Not Ideal For:

Embedded applications
Edge/IoT devices
Serverless with cold starts
When you need horizontal scaling (consider MongoDB)

Prerequisites

PostgreSQL 12+ installed
Database created for Pangolin
User with full permissions on the database

Installation

Local Development (Docker)

# Start PostgreSQL
docker run -d \
  --name pangolin-postgres \
  -e POSTGRES_DB=pangolin \
  -e POSTGRES_USER=pangolin \
  -e POSTGRES_PASSWORD=your_password \
  -p 5432:5432 \
  postgres:16

# Verify connection
psql -h localhost -U pangolin -d pangolin

Production (AWS RDS)

# Create RDS instance via AWS Console or CLI
aws rds create-db-instance \
  --db-instance-identifier pangolin-prod \
  --db-instance-class db.t3.medium \
  --engine postgres \
  --engine-version 16.1 \
  --master-username pangolin \
  --master-user-password your_secure_password \
  --allocated-storage 100 \
  --storage-type gp3 \
  --vpc-security-group-ids sg-xxxxx \
  --db-subnet-group-name your-subnet-group \
  --backup-retention-period 7 \
  --preferred-backup-window "03:00-04:00" \
  --preferred-maintenance-window "mon:04:00-mon:05:00"

Configuration

Environment Variables

# Connection string format:
# postgresql://username:password@host:port/database
# postgres://username:password@host:port/database

# Local development
DATABASE_URL=postgres://pangolin:your_password@localhost:5432/pangolin

# Production (RDS)
DATABASE_URL=postgresql://pangolin:password@pangolin-prod.xxxxx.us-east-1.rds.amazonaws.com:5432/pangolin

# With SSL (recommended for production)
DATABASE_URL=postgresql://pangolin:password@host:5432/pangolin?sslmode=require

# Connection pool settings (optional)
DATABASE_MAX_CONNECTIONS=10
DATABASE_MIN_CONNECTIONS=2
DATABASE_CONNECT_TIMEOUT=30

Schema Initialization

Pangolin automatically initializes and migrates the PostgreSQL schema on startup using sqlx. Manual schema application is generally not required unless you are performing a custom installation or troubleshooting.

The comprehensive schema includes:

Core Infrastructure:

tenants - Multi-tenant isolation records
warehouses - Storage configurations (S3, Azure, GCP)
catalogs - Iceberg catalog definitions
namespaces - Namespace hierarchies and properties
assets - Table and view metadata pointers (Isolated by branch_name)
branches - Git-like branch definitions
tags - Immutable commit pointers (snapshots)
commits - Detailed operation history
metadata_locations - Physical Iceberg metadata file path tracking

Governance & Security:

users - Root and Tenant user accounts
service_users - Machine-to-machine API identities
roles - RBAC role definitions
user_roles - Role assignments to users
permissions - Direct (TBAC/RBAC) permission grants
access_requests - Data discovery access workflows
audit_logs - Comprehensive tamper-evident trail
merge_operations - Branch merge tracking
merge_conflicts - Conflict details for merges
revoked_tokens - Blacklisted JWT identifiers
active_tokens - (Internal) Session tracking for active tokens

System & Maintenance:

system_settings - Tenant-specific configuration overrides
federated_sync_stats - Status tracking for cross-tenant federation

Indexes:

Primary keys on all tables
Foreign key indexes for joining tenant/catalog/namespace context
Composite indexes for optimized listing (e.g., idx_assets_tenant_catalog)
GIN indexes for efficient JSONB and Array searches (PostgreSQL specific)

Manual Schema Application

Warning

Manual application of postgres_schema.sql may bypass migration tracking. It is recommended to let Pangolin handle initialization automatically.

If you must apply the schema manually:

# Download the current consolidated schema
curl -O https://raw.githubusercontent.com/AlexMercedCoder/Pangolin/main/pangolin/pangolin_store/sql/postgres_schema.sql

# Apply to your database
psql -h localhost -U pangolin -d pangolin -f postgres_schema.sql

Performance Tuning

Monitoring

-- Check active connections
SELECT count(*) FROM pg_stat_activity WHERE datname = 'pangolin';

-- Check table sizes
SELECT 
  schemaname,
  tablename,
  pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;

-- Check index usage
SELECT 
  schemaname,
  tablename,
  indexname,
  idx_scan,
  idx_tup_read,
  idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;

Backup and Restore

Automated Backups (RDS)

AWS RDS provides automated backups:

Daily snapshots
Point-in-time recovery
Cross-region replication

Manual Backups

# Backup
pg_dump -h localhost -U pangolin -d pangolin -F c -f pangolin_backup.dump

# Restore
pg_restore -h localhost -U pangolin -d pangolin -c pangolin_backup.dump

# Backup to SQL (human-readable)
pg_dump -h localhost -U pangolin -d pangolin > pangolin_backup.sql

# Restore from SQL
psql -h localhost -U pangolin -d pangolin < pangolin_backup.sql

High Availability

Replication

-- On primary server
ALTER SYSTEM SET wal_level = 'replica';
ALTER SYSTEM SET max_wal_senders = 3;
ALTER SYSTEM SET wal_keep_size = '1GB';

-- Create replication user
CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'repl_password';

-- On replica server
# In recovery.conf or postgresql.auto.conf
primary_conninfo = 'host=primary_host port=5432 user=replicator password=repl_password'

Connection Pooling

Use PgBouncer for connection pooling:

# pgbouncer.ini
[databases]
pangolin = host=localhost port=5432 dbname=pangolin

[pgbouncer]
listen_port = 6432
listen_addr = *
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25

Then connect to PgBouncer instead:

DATABASE_URL=postgresql://pangolin:password@localhost:6432/pangolin

Troubleshooting

Connection Issues

# Test connection
psql -h localhost -U pangolin -d pangolin -c "SELECT version();"

# Check if PostgreSQL is running
sudo systemctl status postgresql

# Check logs
sudo tail -f /var/log/postgresql/postgresql-16-main.log

Performance Issues

-- Find slow queries
SELECT 
  pid,
  now() - query_start AS duration,
  query,
  state
FROM pg_stat_activity
WHERE state != 'idle'
  AND now() - query_start > interval '5 seconds'
ORDER BY duration DESC;

-- Analyze table statistics
ANALYZE VERBOSE;

-- Rebuild indexes
REINDEX DATABASE pangolin;

Security Best Practices

Use SSL/TLS: Always use sslmode=require in production
Strong Passwords: Use long, random passwords
Least Privilege: Grant only necessary permissions
Network Security: Use VPC/security groups to restrict access
Regular Updates: Keep PostgreSQL updated
Audit Logging: Enable PostgreSQL audit logging
Encryption at Rest: Use encrypted storage (EBS encryption for RDS)

Migration from Other Backends

From SQLite

# Export from SQLite
sqlite3 pangolin.db .dump > sqlite_dump.sql

# Convert to PostgreSQL format (manual editing required)
# Then import:
psql -h localhost -U pangolin -d pangolin -f converted_dump.sql

From MongoDB

Custom migration script required. Contact support or see migration guide.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PostgreSQL Backend Storage

Overview

Pros and Cons

✅ Advantages

⚠️ Considerations

Use Cases

Prerequisites

Installation

Local Development (Docker)

Production (AWS RDS)

Configuration

Environment Variables

Schema Initialization

Manual Schema Application

Performance Tuning

Recommended PostgreSQL Settings

Monitoring

Backup and Restore

Automated Backups (RDS)

Manual Backups

High Availability

Replication

Connection Pooling

Troubleshooting

Connection Issues

Performance Issues

Security Best Practices

Migration from Other Backends

From SQLite

From MongoDB

Additional Resources

Next Steps

FilesExpand file tree

postgresql.md

Latest commit

History

postgresql.md

File metadata and controls

PostgreSQL Backend Storage

Overview

Pros and Cons

✅ Advantages

⚠️ Considerations

Use Cases

Prerequisites

Installation

Local Development (Docker)

Production (AWS RDS)

Configuration

Environment Variables

Schema Initialization

Manual Schema Application

Performance Tuning

Recommended PostgreSQL Settings

Monitoring

Backup and Restore

Automated Backups (RDS)

Manual Backups

High Availability

Replication

Connection Pooling

Troubleshooting

Connection Issues

Performance Issues

Security Best Practices

Migration from Other Backends

From SQLite

From MongoDB

Additional Resources

Next Steps