Skip to content

Commit 6502423

Browse files
docs
1 parent 6d7a9fe commit 6502423

69 files changed

Lines changed: 1817 additions & 3924 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

docs/README.md

Lines changed: 53 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -1,51 +1,71 @@
1-
# Pangolin Documentation Index
1+
# Pangolin Documentation
22

3-
Welcome to the comprehensive documentation for Pangolin. Use the index below to find guides for setup, core concepts, feature deep-dives, and tool references.
3+
Welcome to the comprehensive documentation for **Pangolin**, the cloud-native Apache Iceberg REST Catalog. Use the categories below to navigate the guides, feature deep-dives, and tool references.
4+
5+
---
46

57
## 🏁 1. Getting Started
6-
*Everything you need to get up and running.*
8+
*Quickest path from zero to a running lakehouse.*
79

10+
- **[Onboarding Index](./getting-started/README.md)** - **Start Here!**
811
- **[Installation Guide](./getting-started/getting_started.md)** - Run Pangolin in 5 minutes.
9-
- **[Authentication Modes](./authentication.md)** - Auth vs No-Auth and OAuth setup.
10-
- **[User Scopes](./getting-started/getting_started.md#user-scopes)** - Root, Tenant Admin, and Tenant User roles.
11-
- **[Environment Variables](./getting-started/env_vars.md)** - Complete reference for all settings.
12-
- **[Configuration](./getting-started/configuration.md)** - Server and storage configuration.
13-
- **[Client Configuration](./getting-started/client_configuration.md)** - Connecting PyIceberg, Spark, and Trino.
14-
- **[Deployment](./getting-started/deployment.md)** - Docker and production setup.
12+
- **[Evaluating Pangolin](./getting-started/evaluating-pangolin.md)** - Rapid local testing with `NO_AUTH` mode.
13+
- **[Deployment Guide](./getting-started/deployment.md)** - Local, Docker, and Production setup.
14+
- **[Environment Variables](./getting-started/env_vars.md)** - Complete system configuration reference.
15+
16+
---
1517

1618
## 🏗️ 2. Core Infrastructure
17-
*Managing warehouses, catalogs, and metadata storage.*
19+
*Managing the foundations: storage and metadata.*
20+
21+
- **[Infrastructure Features](./features/README.md)** - Index of all platform capabilities.
22+
- **[Warehouse Management](./warehouse/README.md)** - Configuring S3, Azure, and GCS storage.
23+
- **[Metadata Backends](./backend_storage/README.md)** - Memory, Postgres, MongoDB, and SQLite.
24+
- **[Asset Management](./features/asset_management.md)** - Tables, Views, and CRUD operations.
25+
- **[Federated Catalogs](./features/federated_catalogs.md)** - Proxying external REST catalogs.
26+
27+
---
1828

19-
- **[Warehouse Management](./warehouse/README.md)** - Configuring S3, Azure, and GCS.
20-
- **[Catalog Assets](./features/asset_management.md)** - Tables, Views, and Federated proxying.
21-
- **[Backend Metadata](./backend_storage/README.md)** - Postgres, MongoDB, and SQLite backends.
29+
## ⚖️ 3. Governance & Security
30+
*Multi-tenancy, RBAC, and auditing.*
2231

23-
## 🧪 3. Data & Governance
24-
*Managing data lifecycle and security.*
32+
- **[Security Concepts](./features/security_vending.md)** - Identity and Credential Vending principles.
33+
- **[Credential Vending (IAM Roles)](./features/iam_roles.md)** - Scoped cloud access (STS, SAS, Downscoped).
34+
- **[Permission System](./permissions.md)** - Understanding RBAC and granular grants.
35+
- **[Service Users](./features/service_users.md)** - Programmatic access and API key management.
36+
- **[Audit Logging](./features/audit_logs.md)** - Global action tracking and compliance.
2537

26-
- **[Git-like Branching](./features/branch_management.md)** - Branching, tagging, and forking logic.
27-
- **[Permission System](./permissions.md)** - RBAC, TBAC, and cascading grants.
28-
- **[Merge Operations](./features/merge_operations.md)** - Reconciliation and conflict handling.
29-
- **[Audit Logging](./features/audit_logs.md)** - Tracking actions via API, CLI, and UI.
30-
- **[Maintenance Utilities](./features/maintenance.md)** - Snapshots and orphan file management.
31-
- **[Business Metadata](./features/business_catalog.md)** - Tags, discovery, and search.
38+
---
39+
40+
## 🧪 4. Data Life Cycle
41+
*Git-for-Data and maintenance workflows.*
3242

33-
## 🛠️ 4. Tools & References
34-
*Direct guides for our interfaces and APIs.*
43+
- **[Branch Management](./features/branch_management.md)** - Working with isolated data environments.
44+
- **[Merge Operations](./features/merge_operations.md)** - The 3-way merge workflow.
45+
- **[Merge Conflicts](./features/merge_conflicts.md)** - Theory and resolution strategies.
46+
- **[Business Metadata & Discovery](./features/business_catalog.md)** - Search, tags, and access requests.
47+
- **[Maintenance Utilities](./features/maintenance.md)** - Snapshot expiration and compaction.
48+
49+
---
3550

36-
- **[CLI Reference](./cli/overview.md)** - Admin and User command guides.
37-
- **[API Reference](./api/api_overview.md)** - Iceberg REST and Management endpoints.
38-
- **[Management UI](./ui/overview.md)** - Visual guide to the administration portal.
39-
- **[Service Users](./service_users.md)** - Programmatic access and API keys.
51+
## 🛠️ 5. Interfaces & Integration
52+
*Connecting tools and using our management layers.*
53+
54+
- **[Management UI](./ui/README.md)** - Visual guide to the administration portal.
55+
- **[PyIceberg Integration](./pyiceberg/README.md)** - Native Python client configuration.
56+
- **[CLI Reference](./cli/overview.md)** - Documentation for `pangolin-admin` and `pangolin-user`.
57+
- **[API Reference](./api/api_overview.md)** - Iceberg REST and Management API specs.
58+
59+
---
4060

41-
## 🏗️ 5. Architecture & Extending
42-
*Internal design for developers.*
61+
## 🏗️ 6. Architecture & Internals
62+
*Deep-dives for developers and contributors.*
4363

44-
- **[System Architecture](./architecture/architecture.md)** - Design and components.
45-
- **[Data Models](./architecture/models.md)** - Core entity definitions.
46-
- **[Storage Abstractions](./architecture/catalog-store-trait.md)** - Implementation traits.
64+
- **[System Architecture](./architecture/architecture.md)** - Design and component interaction.
65+
- **[Data Models](./architecture/models.md)** - Understanding the internal schema.
66+
- **[CatalogStore Trait](./architecture/catalog-store-trait.md)** - Extending Pangolin storage.
4767

4868
---
4969

5070
**Last Updated**: December 2025
51-
**Version**: Alpha
71+
**Project Status**: Alpha

docs/api/api_overview.md

Lines changed: 52 additions & 92 deletions
Original file line numberDiff line numberDiff line change
@@ -1,119 +1,79 @@
1-
# API Overview
1+
# API Reference Overview
22

3-
Pangolin provides a rich set of APIs for catalog management, branching, merging, and authentication.
3+
Pangolin exposes a multi-tenant REST API split into three core functional areas: The Standard Iceberg REST API, the Pangolin Management API, and the Identity/Security API.
44

5-
## Authentication
5+
---
66

7-
| Endpoint | Method | Description |
8-
| :--- | :--- | :--- |
9-
| `/api/v1/users/login` | POST | Authenticate and receive a JWT token. |
10-
| `/api/v1/users/logout` | POST | Invalidate current session (optional). |
11-
| `/api/v1/users/me` | GET | Get current user information. |
12-
| `/api/v1/tokens` | POST | Generate a long-lived JWT token for a specific tenant/user. |
13-
14-
## Iceberg REST API
15-
16-
Pangolin implements the standard Apache Iceberg REST Catalog API.
7+
## 🔐 Authentication & Identity
8+
*Base Path: `/api/v1/`*
179

1810
| Endpoint | Method | Description |
1911
| :--- | :--- | :--- |
20-
| `/v1/config` | GET | Get Iceberg client configuration. |
21-
| `/v1/{tenant}/config` | GET | Get tenant-specific Iceberg client configuration. |
22-
| `/v1/{prefix}/namespaces` | GET/POST | List and create namespaces. |
23-
| `/v1/{prefix}/namespaces/{namespace}/tables` | GET/POST | List and create tables. |
24-
| `/v1/{prefix}/namespaces/{namespace}/tables/{table}` | GET/POST/DELETE | Manage table metadata and snapshots. |
12+
| `users/login` | POST | Authenticate and receive a JWT session. |
13+
| `tokens` | POST | Generate a long-lived JWT token for a specific user. |
14+
| `auth/revoke` | POST | Invalidate current user token. |
15+
| `service-users` | GET/POST | Manage programmatic API identities. |
16+
| `service-users/{id}/rotate` | POST | Rotate API key for a service user. |
2517

26-
**Note**: Branching is supported via the `table@branch` syntax (e.g., `GET .../tables/my_table@dev`).
18+
---
2719

28-
## Pangolin Extended APIs
20+
## 📋 Standard Iceberg API (REST Catalog)
21+
*Base Path: `/v1/`*
2922

30-
### Branch Operations
23+
Pangolin is 100% compliant with the [Apache Iceberg REST Specification](https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml).
3124

3225
| Endpoint | Method | Description |
3326
| :--- | :--- | :--- |
34-
| `/api/v1/branches` | GET/POST | List all branches or create a new branch. |
35-
| `/api/v1/branches/merge` | POST | Initiate a merge from a source branch to a target branch. |
36-
| `/api/v1/branches/{name}/commits` | GET | List commit history for a specific branch. |
37-
38-
### Tag Management
39-
40-
| Endpoint | Method | Description |
41-
| :--- | :--- | :--- |
42-
| `/api/v1/tags` | GET/POST | List all tags or create a new tag. |
43-
| `/api/v1/tags/{name}` | GET/DELETE | View or delete a specific tag. |
44-
45-
### Merge Operations (Conflict Resolution)
46-
47-
| Endpoint | Method | Description |
48-
| :--- | :--- | :--- |
49-
| `/api/v1/catalogs/{catalog}/merge-operations` | GET | List merge operations for a catalog. |
50-
| `/api/v1/merge-operations/{id}` | GET | Get merge operation details and status. |
51-
| `/api/v1/merge-operations/{id}/conflicts` | GET | List conflicts for a merge operation. |
52-
| `/api/v1/conflicts/{id}/resolve` | POST | Resolve a specific conflict with a strategy. |
53-
| `/api/v1/merge-operations/{id}/complete` | POST | Complete a merge after resolving all conflicts. |
54-
| `/api/v1/merge-operations/{id}/abort` | POST | Abort a pending merge operation. |
55-
56-
### Management CRUD
57-
58-
| Entity | Endpoints | Methods |
59-
| :--- | :--- | :--- |
60-
| **Tenants** | `/api/v1/tenants` | GET, POST, PUT, DELETE |
61-
| **Warehouses** | `/api/v1/warehouses` | GET, POST, PUT, DELETE |
62-
| **Catalogs** | `/api/v1/catalogs` | GET, POST, PUT, DELETE |
63-
| **Federated Catalogs** | `/api/v1/federated-catalogs` | GET, POST, DELETE, TEST |
64-
| **Users** | `/api/v1/users` | GET, POST, PUT, DELETE |
65-
| **Service Users**| `/api/v1/service-users` | GET, POST, PUT, DELETE, ROTATE |
66-
| **Roles** | `/api/v1/roles` | GET, POST, PUT, DELETE |
67-
| **Permissions** | `/api/v1/permissions` | GET, POST, DELETE |
27+
| `config` | GET | Get client configuration. |
28+
| `{prefix}/namespaces` | GET/POST | Manage catalog namespaces. |
29+
| `{prefix}/namespaces/{ns}/tables` | GET/POST | Manage tables (Standard Iceberg). |
30+
| `{prefix}/namespaces/{ns}/tables/{table}` | GET/POST | Table metadata and schema evolution. |
6831

69-
### Token Management
32+
> [!TIP]
33+
> **Branching**: Use the `@` suffix in the table name (e.g., `my_table@dev`) to redirect Iceberg operations to a specific Pangolin branch.
7034
71-
| Endpoint | Method | Description |
72-
| :--- | :--- | :--- |
73-
| `/api/v1/tokens` | POST | Generate a long-lived JWT token for scripts/automation. |
74-
| `/api/v1/auth/revoke` | POST | Revoke current user's token. |
75-
| `/api/v1/auth/revoke/{token_id}` | POST | Revoke a specific token (admin only). |
76-
| `/api/v1/auth/cleanup-tokens` | POST | Clean up expired tokens (admin only). |
77-
| `/api/v1/users/{user_id}/tokens` | GET | List all tokens for a specific user (admin only). |
78-
| `/api/v1/tokens/{token_id}` | DELETE | Delete a specific token by ID (admin only). |
35+
---
7936

80-
### System Configuration
37+
## 🏗️ Management & Data Governance
38+
*Base Path: `/api/v1/`*
8139

82-
| Endpoint | Method | Description |
40+
### 1. Multi-Tenancy (Root Only)
41+
| Endpoint | Method | Use Case |
8342
| :--- | :--- | :--- |
84-
| `/api/v1/config/settings` | GET | Get system configuration settings (admin only). |
85-
| `/api/v1/config/settings` | PUT | Update system configuration settings (admin only). |
86-
87-
### Federated Catalog Operations
43+
| `tenants` | GET/POST/PUT | Onboard new organizations to the platform. |
44+
| `config/settings` | GET/PUT | Manage global platform defaults. |
8845

89-
| Endpoint | Method | Description |
46+
### 2. Infrastructure & Data (Tenant Admin)
47+
| Endpoint | Method | Use Case |
9048
| :--- | :--- | :--- |
91-
| `/api/v1/federated-catalogs/{name}/sync` | POST | Trigger immediate metadata sync for federated catalog. |
92-
| `/api/v1/federated-catalogs/{name}/stats` | GET | Get sync statistics and status for federated catalog. |
49+
| `warehouses` | GET/POST/PUT | Configure cloud storage containers (S3/GCS/Azure). |
50+
| `catalogs` | GET/POST/PUT | Create local or federated Iceberg catalogs. |
51+
| `audit` | GET/POST | Query standard or count audit logs. |
52+
| `maintenance` | POST | Trigger snapshot expiration or orphan file removal. |
9353

94-
### Data Explorer
95-
96-
| Endpoint | Method | Description |
54+
### 3. Branching & Merging
55+
| Endpoint | Method | Use Case |
9756
| :--- | :--- | :--- |
98-
| `/api/v1/catalogs/{prefix}/namespaces/tree` | GET | Get hierarchical namespace tree structure for a catalog. |
57+
| `branches` | GET/POST | List or create feature/ingest branches. |
58+
| `branches/merge` | POST | Initiate a Git-like merge between two branches. |
59+
| `merge-operations` | GET | Track progress of active or past merges. |
60+
| `conflicts/{id}/resolve`| POST | Apply conflict resolution strategies (Source Wins/Target Wins). |
9961

100-
### OAuth
62+
---
10163

102-
| Endpoint | Method | Description |
103-
| :--- | :--- | :--- |
104-
| `/oauth/authorize/{provider}` | GET | Initiate OAuth flow (Google, GitHub, etc.). |
105-
| `/oauth/callback/{provider}` | GET | OAuth callback handler. |
64+
## 🧬 OAuth 2.0 Flow
65+
Pangolin supports Google, GitHub, and Microsoft OAuth.
10666

107-
## Auditing
67+
- **Initiate**: `GET /oauth/authorize/{provider}`
68+
- **Callback**: `GET /oauth/callback/{provider}`
10869

109-
| Endpoint | Method | Description |
110-
| :--- | :--- | :--- |
111-
| `/api/v1/audit-logs` | GET | Retrieve audit logs for the current tenant. |
70+
---
11271

113-
## Other APIs
72+
## 🚀 Quick Integration
73+
To use the API, ensure you provide the `Authorization` header and the `X-Pangolin-Tenant` header (unless in `NO_AUTH` mode).
11474

115-
- **Credential Vending**: `/api/v1/credentials` (GET)
116-
- **S3 Presigning**: `/api/v1/presign` (POST)
117-
- **Business Metadata**: `/api/v1/metadata/search` (POST)
118-
- **Access Requests**: `/api/v1/access-requests` (GET/POST)
119-
- **App Config**: `/api/v1/app-config` (GET)
75+
```bash
76+
curl -X GET http://localhost:8080/api/v1/catalogs \
77+
-H "Authorization: Bearer <JWT_TOKEN>" \
78+
-H "X-Pangolin-Tenant: <TENANT_UUID>"
79+
```

docs/architecture/README.md

Lines changed: 20 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,16 +4,29 @@
44

55
Pangolin is a multi-tenant Apache Iceberg REST Catalog with advanced features including federated catalogs, credential vending, fine-grained permissions, and Git-like branching.
66

7-
## Architecture Documentation
7+
## Pangolin Architecture Documentation
88

9-
This directory contains comprehensive architecture documentation for the Pangolin project.
9+
This directory contains technical documentation for the Pangolin architecture, core data models, and primary abstraction layers.
1010

11-
### Core Documentation
11+
## Documentation Index
1212

13-
- **[System Architecture](./architecture.md)** - High-level system design and component overview
14-
- **[Data Models](./models.md)** - Complete catalog of all data models
15-
- **[CatalogStore Trait](./catalog-store-trait.md)** - Core storage abstraction interface
16-
- **[Signer Trait](./signer-trait.md)** - Credential vending and pre-signed URL generation
13+
### 1. [Architecture Overview](./architecture.md)
14+
High-level system design, core components, security logic, and request flow. Start here to understand how Pangolin works.
15+
16+
### 2. [Data Models](./models.md)
17+
Detailed reference for all core structs and enums across the `model`, `user`, `permission`, `business_metadata`, and `audit` domains.
18+
19+
### 3. [CatalogStore Trait](./catalog-store-trait.md)
20+
Reference for the primary storage abstraction layer, covering multi-tenant isolation and metadata lifecycle operations.
21+
22+
### 4. [Signer Trait](./signer-trait.md)
23+
Documentation of the cloud credential vending logic, used to provide temporary access to S3, GCS, and Azure storage.
24+
25+
## Target Audience
26+
This documentation is intended for:
27+
- **Core Contributors**: Developers extending the backend or adding new storage engines.
28+
- **Security Auditors**: Engineers reviewing tenant isolation and credential vending patterns.
29+
- **Enterprise Integrators**: Teams deploying Pangolin in custom cloud environments.
1730

1831
### Quick Links
1932

0 commit comments

Comments
 (0)