Pangolin is a Rust-based, multi-tenant, branch-aware lakehouse catalog. It is fully compatible with the Apache Iceberg REST API while providing enterprise-grade extensions for Git-like branching, unified discovery, and cross-catalog federation.
- Framework: Axum (Async Rust).
- Core Engine: Handles HTTP routing, JSON (de)serialization, and OpenAPI schema generation via
utoipa. - Key Modules:
iceberg_handlers: Faithful implementation of the Iceberg REST specification (Namespaces, Tables, Scans).branch_handlers: Logic for Git-like workflows (Branching, Tagging, Merging).business_metadata_handlers: Business catalog features (Search, Tags, Access Requests).auth_handlers: Multi-mode authentication logic (JWT, API Keys, OAuth2).management_handlers: Administrative CRUD for Tenants, Warehouses, and Users.federated_handlers: Proxy logic for external REST catalogs.
- Responsibility: Defines the system's "Source of Truth" models and validation logic.
- Key Models:
Tenant: The root multi-tenancy unit; all data is isolated by Tenant ID.Asset: Unified representation ofIcebergTable,View, and other data resources.Branch: References to commit chains (IngestvsExperimentaltypes).Commit: Immutable snapshots of catalog state trackingPutandDeleteoperations.PermissionScope: Granular target definitions (Tenant, Catalog, Namespace, Asset, Tag).
- Metadata Persistence: Abstracted via the
CatalogStoretrait.- Modular Backends: All backends are refactored into focused submodules (e.g.,
tenants.rs,warehouses.rs,assets.rs) for better maintainability. MemoryStore: Concurrent in-memory store for rapid development/testing.PostgresStore: SQL backend usingsqlxfor production scale.MongoStore: Document backend for high-availability deployments.SqliteStore: Embedded backend for local dev and edge use cases.
- Modular Backends: All backends are refactored into focused submodules (e.g.,
- Performance: Direct
assets_by_idlookup for O(1) authorization checks. - Data Storage: Object storage (S3/GCS/Azure) via the
object_storecrate. - Credential Vending: Integrated
Signertrait to vend temporary tokens (AWS STS, Azure SAS, GCP Downscoped).
- Authentication:
- JWT: Standard for UI and corporate identity access.
- API Keys: Managed via Service Users for machine-to-machine/CI-CD access. Includes automatic rotation and usage tracking.
- OAuth 2.0 / OIDC: Native integration with Google, Microsoft, GitHub, and custom providers.
- Authorization:
- RBAC: Role-based access control with 3 default tiers (Root, TenantAdmin, TenantUser).
- TBAC: Tag-based access control allowing permissions to flow to assets with specific business labels.
- Access Requests: Integrated workflow for users to request access to restricted assets via the UI.
- Tenant Isolation: Strictly enforced at the middleware layer; all store queries are scoped by
tenant_id.
- Branching Engine: Supports full and partial catalog branching.
- Fork-on-Write: Writes to a branch create new commits without affecting the parent branch until merged.
- 3-Way Merging: Automated conflict detection using common ancestor (base commit) analysis.
- Conflict Types: Detects Schema, Data (partition overlap), Metadata, and Deletion conflicts.
- REST Proxy: Pangolin acts as a unified entry point, proxying requests to external Iceberg REST catalogs.
- Auth Translation: Translates Pangolin credentials to the authentication required by remote catalogs (Basic, Bearer, ApiKey).
- Global Governance: Apply Pangolin RBAC and Audit policies even to external federated data.
- Stack: SvelteKit + Vanilla CSS (modern, glassmorphic design).
- Features: Visual management of Users, Permissions, Audit Logs, and the Data Discovery portal.
- pangolin-admin: High-level system administration (Tenants, Warehouses, Roles).
- pangolin-user: Engineering workflow tool (Branching, Code generation, Search).
- Auth Middleware: Resolves user identity from Header. Validates JWT or API Key.
- Tenant Middleware: Resolves
X-Pangolin-Tenantor extracts it from User session. - Handler: Executes business logic. Interacts with
CatalogStorefor metadata. - Vending (Optional): If requesting a table load, vends temporary S3/Cloud credentials.
- Audit Middleware: Asynchronously logs the operation result (Success/Failure) to the audit store.
- Store Tests: Generic test suite run against all 4 backends (Memory, PG, Mongo, SQLite).
- API Tests: Axum integration tests covering all RBAC permutations.
- Live Verification: End-to-end scripts using PyIceberg, Spark, and the Pangolin CLI.