Skip to content

Commit e949a1f

Browse files
inmemory store is now modularized, modularizing backends complete
1 parent 934f9fd commit e949a1f

62 files changed

Lines changed: 3184 additions & 761 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,3 +31,13 @@ ui_*.log
3131
# Test output files
3232
test_output*.txt
3333
*_output*.txt
34+
35+
# Databases
36+
*.db
37+
*.sqlite
38+
*.sqlite3
39+
40+
# Temporary/Debug files
41+
temp_*
42+
tmp_*
43+
debug_*

docs/api/README.md

Lines changed: 16 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,9 @@ Authentication methods:
7171
- `GET/PUT /api/v1/access-requests/{id}` - Get/update access request
7272

7373
**Audit Logs**:
74-
- `GET /api/v1/audit` - List audit events
74+
- `GET /api/v1/audit` - List audit events (with filtering)
75+
- `GET /api/v1/audit/count` - Get audit event counts
76+
- `GET /api/v1/audit/{id}` - Get specific audit event
7577

7678
**Warehouse Management**:
7779
- `GET /api/v1/warehouses` - List warehouses
@@ -82,40 +84,19 @@ Authentication methods:
8284
- `GET /api/v1/catalogs` - List catalogs
8385
- `POST /api/v1/catalogs` - Create catalog
8486
- `GET /api/v1/catalogs/{name}` - Get catalog
85-
- `PUT /api/v1/catalogs/{name}` - Update catalog
86-
- `DELETE /api/v1/catalogs/{name}` - Delete catalog
87-
88-
**Federated Catalog Management**:
89-
- `GET /api/v1/federated-catalogs` - List federated catalogs
90-
- `POST /api/v1/federated-catalogs` - Create federated catalog
91-
- `GET /api/v1/federated-catalogs/{name}` - Get federated catalog
92-
- `DELETE /api/v1/federated-catalogs/{name}` - Delete federated catalog
93-
- `POST /api/v1/federated-catalogs/{name}/test` - Test federated connection
94-
95-
**Permission & Role Management**:
96-
- `GET /api/v1/roles` - List roles
97-
- `POST /api/v1/roles` - Create role
98-
- `GET /api/v1/permissions` - List permissions
99-
- `POST /api/v1/permissions` - Grant permission
100-
- `DELETE /api/v1/permissions/{id}` - Revoke permission
101-
102-
**Token Management**:
103-
- `POST /api/v1/tokens` - Generate JWT token
104-
- `POST /api/v1/auth/revoke` - Revoke current token
105-
- `POST /api/v1/auth/revoke/{token_id}` - Revoke specific token
106-
- `GET /api/v1/users/{user_id}/tokens` - List user tokens (Admin)
107-
- `DELETE /api/v1/tokens/{token_id}` - Delete specific token (Admin)
108-
109-
**System Configuration**:
110-
- `GET /api/v1/config/settings` - Get system settings (Admin)
111-
- `PUT /api/v1/config/settings` - Update system settings (Admin)
112-
113-
**Federated Catalog Operations**:
114-
- `POST /api/v1/federated-catalogs/{name}/sync` - Trigger sync
115-
- `GET /api/v1/federated-catalogs/{name}/stats` - Get sync stats
116-
117-
**Data Explorer**:
118-
- `GET /api/v1/catalogs/{prefix}/namespaces/tree` - Get namespace tree structure
87+
- `GET /api/v1/catalogs/{prefix}/namespaces/tree` - Get tree structure
88+
89+
**Service Users & API Keys**:
90+
- `POST /api/v1/service-users` - Create service user
91+
- `GET /api/v1/service-users` - List service users
92+
- `POST /api/v1/service-users/{id}/rotate` - Rotate API key
93+
94+
## Interactive Documentation
95+
96+
Pangolin provides an interactive Swagger UI for live API exploration:
97+
98+
- **Swagger UI**: `http://localhost:8080/swagger-ui`
99+
- **OpenAPI JSON**: `http://localhost:8080/api-docs/openapi.json`
119100

120101
## Contents
121102

docs/api/api_overview.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ Pangolin exposes a multi-tenant REST API split into three core functional areas:
1313
| `tokens` | POST | Generate a long-lived JWT token for a specific user. |
1414
| `auth/revoke` | POST | Invalidate current user token. |
1515
| `service-users` | GET/POST | Manage programmatic API identities. |
16+
| `service-users/{id}` | GET/PUT/DELETE | Manage specific service user. |
1617
| `service-users/{id}/rotate` | POST | Rotate API key for a service user. |
1718

1819
---
@@ -62,10 +63,12 @@ Pangolin is 100% compliant with the [Apache Iceberg REST Specification](https://
6263
### 4. Search & Optimization
6364
| Endpoint | Method | Use Case |
6465
| :--- | :--- | :--- |
65-
| `search` | GET | Unified search across Catalogs, Namespaces, and Tables. |
66-
| `search/assets` | GET | Optimized lookup for specific tables/views by name. |
67-
| `validate/names` | POST | Check if an asset name is valid and available. |
68-
| `bulk/assets/delete` | POST | Bulk deletion of assets. |
66+
| `search` | GET | Unified search across Catalogs, Namespaces, Tables, and Branches. |
67+
| `search/assets` | GET | Optimized lookup for specific tables/views by name with permission filtering. |
68+
| `validate/names` | POST | Check if a catalog or warehouse name is available. |
69+
| `bulk/assets/delete` | POST | Bulk deletion of assets (up to 100). |
70+
| `dashboard/stats` | GET | Global platform statistics. |
71+
| `catalogs/{name}/summary` | GET | Catalog-specific statistics. |
6972

7073

7174
---

docs/api/authentication.md

Lines changed: 14 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,9 @@ Pangolin implements a secure authentication system based on JSON Web Tokens (JWT
88
- **Root Login**: Omit `tenant-id` or set to `null`
99
- **Tenant-Scoped Login**: Include `tenant-id` with tenant UUID
1010
2. **Token Issuance**: Upon successful authentication, the server returns a signed JWT.
11-
3. **Authenticated Requests**: Clients must include this JWT in the `Authorization` header of subsequent requests:
11+
3. **API Key Authentication (Service Users)**: Machine accounts use a static API key passed in the `X-API-Key` header.
1212
```
13-
Authorization: Bearer <token>
13+
X-API-Key: <your-api-key>
1414
```
1515
1616
### Login Examples
@@ -47,20 +47,23 @@ Pangolin supports the following roles:
4747
- **TenantAdmin**: Tenant-level administration. Can manage warehouses, catalogs, and users within a tenant.
4848
- **TenantUser**: Standard access. Can read/write data based on catalog permissions.
4949

50-
## Token Generation
50+
## API Key Authentication (Service Users)
5151

52-
For automation and scripts, users can generate long-lived JWT tokens:
52+
Service users are intended for machine-to-machine communication (CI/CD, automated scripts). They do not use JWT tokens; instead, they use a persistent API key.
5353

5454
```bash
55-
curl -X POST http://localhost:8080/api/v1/tokens \
56-
-H "Content-Type: application/json" \
57-
-d '{
58-
"tenant_id": "your-tenant-id",
59-
"username": "your-username",
60-
"expires_in_hours": 720
61-
}'
55+
curl http://localhost:8080/api/v1/catalogs \
56+
-H "X-API-Key: pgl_key_abc123..." \
57+
-H "X-Pangolin-Tenant: <tenant-uuid>"
6258
```
6359

60+
> [!TIP]
61+
> API keys are only displayed once upon creation. If lost, the key must be rotated via the `/api/v1/service-users/{id}/rotate` endpoint.
62+
63+
## Token Generation (Users)
64+
65+
For temporary programmatic access by human users, you can generate long-lived JWT tokens:
66+
6467
## Token Revocation
6568

6669
Tokens can be revoked for security:

docs/api/curl_examples.md

Lines changed: 21 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -460,7 +460,8 @@ curl -X POST http://localhost:8080/api/v1/service-users \
460460
-d '{
461461
"name": "ci-cd-pipeline",
462462
"description": "Service user for CI/CD automation",
463-
"expires_at": "2026-12-31T23:59:59Z"
463+
"role": "TenantUser",
464+
"expires_in_days": 30
464465
}'
465466
```
466467

@@ -523,9 +524,12 @@ curl -X POST http://localhost:8080/api/v1/permissions \
523524
-H "Authorization: Bearer $TOKEN" \
524525
-H "Content-Type: application/json" \
525526
-d '{
526-
"user_id": "USER_ID",
527-
"scope": "catalog:production",
528-
"actions": ["read", "write"]
527+
"user-id": "USER_ID",
528+
"scope": {
529+
"type": "catalog",
530+
"catalog-id": "550e8400-e29b-41d4-a716-446655440000"
531+
},
532+
"actions": ["Read", "Write"]
529533
}'
530534
```
531535

@@ -775,7 +779,7 @@ curl -X POST http://localhost:8080/api/v1/conflicts/CONFLICT_ID/resolve \
775779
-H "Authorization: Bearer $TOKEN" \
776780
-H "Content-Type: application/json" \
777781
-d '{
778-
"strategy": "use_source"
782+
"strategy": "TakeSource"
779783
}'
780784
```
781785

@@ -1082,8 +1086,18 @@ curl -X POST http://localhost:8080/api/v1/validate/names \
10821086
-H "Authorization: Bearer $TOKEN" \
10831087
-H "Content-Type: application/json" \
10841088
-d '{
1085-
"name": "new_table_v2",
1086-
"type": "asset"
1089+
"resource_type": "catalog",
1090+
"names": ["new_catalog_v2"]
1091+
}'
1092+
1093+
### Bulk Asset Deletion
1094+
```bash
1095+
curl -X POST http://localhost:8080/api/v1/bulk/assets/delete \
1096+
-H "Authorization: Bearer $TOKEN" \
1097+
-H "Content-Type: application/json" \
1098+
-d '{
1099+
"asset_ids": ["uuid-1", "uuid-2"]
10871100
}'
10881101
```
1102+
```
10891103

docs/architecture/README.md

Lines changed: 18 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -2,17 +2,23 @@
22

33
This directory contains detailed technical documentation for the Pangolin architecture.
44

5-
## Core Components
6-
- **[Models](./models.md)**: Details the core data structures (Structs), including Tenant, Warehouse, and Asset definitions.
7-
- **[Enums](./enums.md)**: Exhaustive list of system enumerations like `AssetType`, `CatalogType`, and `VendingStrategy`.
8-
- **[Traits](./traits.md)**: Deep dive into the `CatalogStore` and `Signer` traits which define the storage and security interfaces.
9-
- **[Handlers](./handlers.md)**: Breakdown of the API handler modules and their functional domains.
5+
## 🏗️ Core Structure
6+
- **[High-Level Architecture](./architecture.md)**: Overall system design, component interaction, and multi-tenant isolation.
7+
- **[API Handlers](./handlers.md)**: Map of API endpoints categorized by functional domain (Iceberg, Versioning, Admin).
8+
- **[Models](./models.md)**: Comprehensive guide to core system structs (Tenant, Asset, Merge, User).
9+
- **[Enums](./enums.md)**: Exhaustive list of system enumerations and their serialized values.
1010

11-
## Logic & Patterns
12-
- **[Branching](./branching.md)**: Explanation of the "Git-for-Data" model, including branching, committing, and 3-way merge logic.
13-
- **[Caching](./caching.md)**: Details on the multi-layered caching strategy using `moka` (Metadata) and `DashMap` (ObjectStore).
14-
- **[Dependencies](./dependencies.md)**: Comprehensive list of libraries and frameworks used in Backend and Frontend.
11+
## 🔧 Interfaces & Logic
12+
- **[System Traits](./traits.md)**: In-depth look at `CatalogStore` and `Signer` interfaces.
13+
- **[Branching & Merging](./branching.md)**: Operational details of the "Git-for-Data" versioning model.
14+
- **[Caching Strategy](./caching.md)**: multi-layered performance optimizations for metadata and cloud backends.
1515

16-
## Legacy Docs
17-
- `catalog-store-trait.md`: (Ref `traits.md`)
18-
- `signer-trait.md`: (Ref `traits.md`)
16+
## 🔐 Security & Operations
17+
- **[Authentication](./authentication.md)**: Deep dive into JWT, Service User API Keys, and RBAC.
18+
- **[Storage & Connectivity](./storage_and_connectivity.md)**: Cloud connectivity, modular store structure, and credential vending.
19+
- **[Dependencies](./dependencies.md)**: Final list of technology stack and library versions.
20+
21+
---
22+
23+
## 📅 Status
24+
All documents in this directory were audited and updated in **December 2025** to reflect the modularized backend and enhanced security features.

docs/architecture/architecture.md

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,9 @@ Pangolin is a Rust-based, multi-tenant, branch-aware lakehouse catalog. It is fu
2727

2828
### 3. Storage Layer (`pangolin_store`)
2929
- **Metadata Persistence**: Abstracted via the `CatalogStore` trait.
30+
- **Modular Backends**: All backends are refactored into focused submodules (e.g., `tenants.rs`, `warehouses.rs`, `assets.rs`) for better maintainability.
3031
- `MemoryStore`: Concurrent in-memory store for rapid development/testing.
31-
- `PostgresStore`: SQL backend using `sqlx` for relational scale.
32+
- `PostgresStore`: SQL backend using `sqlx` for production scale.
3233
- `MongoStore`: Document backend for high-availability deployments.
3334
- `SqliteStore`: Embedded backend for local dev and edge use cases.
3435
- **Performance**: Direct `assets_by_id` lookup for O(1) authorization checks.
@@ -37,13 +38,14 @@ Pangolin is a Rust-based, multi-tenant, branch-aware lakehouse catalog. It is fu
3738

3839
### 4. Security & Isolation
3940
- **Authentication**:
40-
- **JWT**: Standard for UI and human CLI access.
41-
- **API Keys**: Distributed via **Service Users** for programmatic/CI-CD access.
42-
- **OAuth 2.0**: OIDC integration with Google, Microsoft, GitHub, and Okta.
41+
- **JWT**: Standard for UI and corporate identity access.
42+
- **API Keys**: Managed via **Service Users** for machine-to-machine/CI-CD access. Includes automatic rotation and usage tracking.
43+
- **OAuth 2.0 / OIDC**: Native integration with Google, Microsoft, GitHub, and custom providers.
4344
- **Authorization**:
4445
- **RBAC**: Role-based access control with 3 default tiers (Root, TenantAdmin, TenantUser).
45-
- **TBAC**: Tag-based access control allowing permissions to flow to assets with specific labels.
46-
- **Tenant Isolation**: Strictly enforced at the middleware layer; storage queries always include `tenant_id`.
46+
- **TBAC**: Tag-based access control allowing permissions to flow to assets with specific business labels.
47+
- **Access Requests**: Integrated workflow for users to request access to restricted assets via the UI.
48+
- **Tenant Isolation**: Strictly enforced at the middleware layer; all store queries are scoped by `tenant_id`.
4749

4850
### 5. Git-like Data Lifecycle
4951
- **Branching Engine**: Supports full and partial catalog branching.

docs/architecture/authentication.md

Lines changed: 24 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -119,26 +119,41 @@ Tokens are signed JWTs containing:
119119
4. Extract claims and create session
120120
5. Insert session into request extensions
121121

122-
### 3. API Key Authentication
122+
### 3. API Key Authentication (Service Users Only)
123123

124124
**Location**: [auth_middleware.rs:L135-154](file:///home/alexmerced/development/personal/Personal/2026/pangolin/pangolin/pangolin_api/src/auth_middleware.rs#L135-154)
125125

126-
API keys provide long-lived authentication for programmatic access.
126+
API keys provide long-lived authentication specifically for **Service Users** (machine accounts). Regular human users cannot use API keys; they use Bearer tokens.
127127

128128
#### Header Format
129129
```
130130
X-API-Key: <api_key_value>
131131
```
132132

133133
#### How It Works
134-
1. Extract API key from `X-API-Key` header
135-
2. Look up key in store via `get_api_key_by_value`
136-
3. Verify key is not expired
137-
4. Create session from key's user context
138-
5. Insert session into request extensions
134+
1. Extract API key from `X-API-Key` header.
135+
2. Hash the key and look up the matching record in the store via `get_service_user_by_api_key_hash`.
136+
3. Verify the service account is active and not expired.
137+
4. Create a session with the service user's ID, tenant context, and role.
138+
5. Track usage via `update_service_user_last_used`.
139+
140+
> [!TIP]
141+
> Use Service Users for CI/CD pipelines, Spark clusters, or any non-interactive automation.
142+
143+
---
144+
145+
## Granular Permissions (RBAC)
146+
147+
**Location**: [permission.rs](file:///home/alexmerced/development/personal/Personal/2026/pangolin/pangolin/pangolin_core/src/permission.rs)
148+
149+
Beyond roles, Pangolin supports fine-grained permission grants at the Asset, Namespace, or Catalog level.
150+
151+
### Permission Structure
152+
- **Scope**: Where the permission applies (Tenant, Catalog, Namespace, Asset, or Tag).
153+
- **Actions**: What is allowed (`Read`, `Write`, `Create`, `Delete`, `ManageDiscovery`, etc.).
139154

140-
\u003e [!NOTE]
141-
\u003e API keys are managed through the `/api/v1/users/{user_id}/tokens` endpoints.
155+
### Multi-Tenant Isolation
156+
The auth middleware automatically extracts the tenant context from the user's session or the `X-Pangolin-Tenant` header (for Root users). All subsequent storage operations are strictly scoped to this tenant ID.
142157

143158
## Authentication Flow
144159

docs/architecture/branching.md

Lines changed: 21 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -20,35 +20,24 @@ A **Commit** is an immutable record of a state change.
2020
### 3. Tags
2121
A **Tag** is an immutable reference to a specific commit. Useful for marking releases (e.g., `Q1_REPORT_FINAL`).
2222

23-
## Workflows
24-
25-
### Isolation (Experimentation)
26-
1. User creates branch `dev/experiment` from `main`.
27-
2. Changes made in `dev/experiment` are isolated. `main` users continue to see the old data.
28-
3. If the experiment fails, the branch is deleted. No impact on production.
29-
30-
### Zero-Copy Ingestion
31-
1. ETL job writes data to `etl-job-123` branch.
32-
2. Data quality checks run against this branch.
33-
3. If checks pass, `etl-job-123` is merged into `main` atomically.
34-
4. If checks fail, the branch is discarded.
35-
36-
## Merge Logic
37-
38-
Merging involves taking changes from a `source` branch and applying them to a `target` branch.
39-
40-
### 3-Way Merge Strategy
41-
Pangolin uses a 3-way merge algorithm to detect conflicts:
42-
1. Identify the **Base Commit** (common ancestor).
43-
2. Compare **Source** vs **Base** to find changes.
44-
3. Compare **Target** vs **Base** to find changes.
45-
4. Detect conflicts where both Source and Target modified the same asset.
46-
47-
### Conflict Types
48-
- **Schema Conflict**: Table schema evolved incompatibly in both branches.
49-
- **Data Conflict**: Both branches wrote to the same partitions.
50-
- **Metadata Conflict**: Table properties changed incompatibly.
51-
52-
### Resolution
53-
- **Auto-Merge**: If changes are orthogonal (e.g., Branch A touched Table X, Branch B touched Table Y), they are merged automatically.
54-
- **Manual**: If a conflict is detected, the merge operation pauses (`Conflicted` status) and requires API intervention to choose a winner (`TakeSource` or `TakeTarget`).
23+
---
24+
25+
## Merge Lifecycle
26+
27+
Merging is managed via the `MergeOperation` model, which tracks the transition from initiation to completion.
28+
29+
### 1. Initiation
30+
A merge is started between a `source` and `target` branch. Pangolin identifies the **Base Commit** (common ancestor) to perform a 3-way analysis.
31+
32+
### 2. Conflict Detection
33+
Pangolin automatically detects:
34+
- **Schema Conflict**: Incompatible evolution on both branches.
35+
- **Data Conflict**: Concurrent writes to overlapping partitions.
36+
- **Metadata Conflict**: Conflicting table property changes.
37+
38+
### 3. Resolution
39+
- **Auto-Merge**: Orthogonal changes are applied automatically, and the operation moves to `Completed`.
40+
- **Manual Intervention**: If conflicts are found, the operation enters `Conflicted` status. Users must use the API to resolve each conflict using a `ResolutionStrategy` (`TakeSource`, `TakeTarget`, or `ThreeWayMerge`).
41+
42+
### 4. Completion
43+
Once all conflicts are resolved, the merge is finalized, creating a new commit on the target branch.

docs/architecture/caching.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,8 @@ Reading Iceberg metadata files (snapshots, manifests) from S3/GCS is a high-late
1515
- **Value**: Byte vector (`Vec<u8>`).
1616

1717
### Usage
18-
- Used by all CatalogStore implementations (`MemoryStore`, `SqliteStore`, `PostgresStore`, `MongoStore`).
18+
- Used by all `CatalogStore` implementations (`MemoryStore`, `SqliteStore`, `PostgresStore`, `MongoStore`).
19+
- Shared across all modular sub-backends (e.g., `postgres/assets.rs` and `postgres/tenants.rs` both leverage the same cache instance).
1920
- Implements a `get_or_fetch` pattern to handle cache misses transparently.
2021

2122
```rust

0 commit comments

Comments
 (0)