Skip to content

Commit 5da8186

Browse files
authored
docs: add Iceberg security model (#3425)
Similar to: apache/iceberg#16538 This PR publishes a security model for Iceberg and adds a detailed threat model referenced from AGENTS.md so there is a clear project reference for how Iceberg distinguishes security issues from other bugs. The goal is to make Iceberg’s security boundaries explicit for both human triage and automated scanning, and to reduce false positives in future security reporting. This PR was drafted with AI assistance and then manually reviewed and edited to reflect the project’s current approach to security roles, trust boundaries, and the handling of recent security reports.
1 parent aa33ab4 commit 5da8186

2 files changed

Lines changed: 258 additions & 0 deletions

File tree

AGENTS.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
<!--
2+
Licensed to the Apache Software Foundation (ASF) under one
3+
or more contributor license agreements. See the NOTICE file
4+
distributed with this work for additional information
5+
regarding copyright ownership. The ASF licenses this file
6+
to you under the Apache License, Version 2.0 (the
7+
"License"); you may not use this file except in compliance
8+
with the License. You may obtain a copy of the License at
9+
10+
http://www.apache.org/licenses/LICENSE-2.0
11+
12+
Unless required by applicable law or agreed to in writing,
13+
software distributed under the License is distributed on an
14+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
KIND, either express or implied. See the License for the
16+
specific language governing permissions and limitations
17+
under the License.
18+
-->
19+
20+
# Apache Iceberg Python — Agent Instructions
21+
22+
This file provides repository-specific guidance for automated agents working
23+
in this repository.
24+
25+
## Security Model
26+
27+
When assessing potential vulnerabilities or calibrating automated security
28+
findings, use [`SECURITY-THREAT-MODEL.md`](SECURITY-THREAT-MODEL.md) as the
29+
authoritative detailed description of this repository's security boundaries,
30+
trust assumptions, and non-boundaries.

SECURITY-THREAT-MODEL.md

Lines changed: 228 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,228 @@
1+
<!--
2+
Licensed to the Apache Software Foundation (ASF) under one
3+
or more contributor license agreements. See the NOTICE file
4+
distributed with this work for additional information
5+
regarding copyright ownership. The ASF licenses this file
6+
to you under the Apache License, Version 2.0 (the
7+
"License"); you may not use this file except in compliance
8+
with the License. You may obtain a copy of the License at
9+
10+
http://www.apache.org/licenses/LICENSE-2.0
11+
12+
Unless required by applicable law or agreed to in writing,
13+
software distributed under the License is distributed on an
14+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
KIND, either express or implied. See the License for the
16+
specific language governing permissions and limitations
17+
under the License.
18+
-->
19+
20+
# Apache Iceberg Python Security Threat Model
21+
22+
This document describes the detailed security threat model for Apache
23+
Iceberg Python. It is intended for maintainers and automated security triage.
24+
25+
## Purpose
26+
27+
Apache Iceberg Python is primarily a client library and implementation of the
28+
Iceberg table format and catalog interactions for Python applications and
29+
services. It is commonly embedded in larger systems that provide their own
30+
authentication, authorization, and credential management. Because of that
31+
deployment model, many bug classes that look security-relevant in the abstract
32+
are not actually security vulnerabilities in Iceberg Python itself.
33+
34+
This model is intended to answer:
35+
36+
- what Iceberg Python generally treats as a security vulnerability
37+
- what Iceberg Python generally treats as correctness, hardening, or
38+
deployment work
39+
- which boundaries are primarily owned by Iceberg Python versus the
40+
surrounding catalog, application, or service
41+
- which issue classes should be downgraded by default by scanners
42+
43+
## Scope
44+
45+
This model is scoped to the Apache Iceberg Python repository itself:
46+
47+
- table format and metadata handling
48+
- catalog and REST catalog clients
49+
- transport, credential, and configuration handling implemented in this repo
50+
- helper tooling shipped in this repo
51+
52+
It is not a general threat model for every Python service that embeds Iceberg
53+
Python.
54+
55+
In particular, it does not attempt to define the complete security model for:
56+
57+
- applications or services that embed Iceberg Python
58+
- storage-level authorization enforced outside Iceberg Python
59+
60+
## Security Goals
61+
62+
Iceberg Python should:
63+
64+
- avoid exposing secrets or delegated credentials to principals that were not
65+
already trusted with them
66+
- avoid creating new unauthorized capabilities in Iceberg Python-owned
67+
components
68+
- avoid violating trust boundaries that Iceberg Python itself owns, such as
69+
leaking auth, transport, or credential-bearing state across catalog or
70+
client boundaries in the same process
71+
72+
Iceberg Python does not aim to be the primary enforcement point for:
73+
74+
- user-to-user authorization inside the embedding application
75+
- storage-level authorization
76+
- service-side credential scoping performed by an external catalog
77+
78+
## Roles
79+
80+
### Operator
81+
82+
The operator configures the surrounding catalog, application, service, and
83+
storage integration around Iceberg Python. This role is trusted to choose
84+
endpoints, warehouses, storage integrations, and credentials.
85+
86+
### Catalog control plane
87+
88+
The catalog control plane resolves tables and supplies metadata, locations,
89+
configuration, and delegated credentials to Iceberg Python. It may be
90+
implemented by a REST catalog server or another catalog implementation.
91+
Iceberg Python assumes this control plane is trusted and outside its primary
92+
security boundary.
93+
94+
### REST catalog client
95+
96+
The REST catalog client consumes catalog-provided metadata, configuration, and
97+
credentials. Client-side bugs in routing, caching, or reuse may still be
98+
security-relevant if they leak credential-bearing state across boundaries that
99+
the Iceberg Python client is expected to preserve.
100+
101+
### Embedding application
102+
103+
Applications and services embedding Iceberg Python are responsible for their
104+
own user-facing authorization boundaries unless Iceberg Python explicitly
105+
documents otherwise.
106+
107+
### Table writer or maintainer
108+
109+
This role may already have legitimate power to write or replace table
110+
metadata, write or delete files, choose paths under an allowed warehouse or
111+
table location, and invoke destructive maintenance operations. If a report
112+
only shows a new way to achieve the same effect this role can already cause
113+
legitimately, it is usually not a security issue in Iceberg Python.
114+
115+
## Trust Boundaries
116+
117+
### Boundary 1: operator-trusted configuration
118+
119+
The following are generally treated as trusted operator or deployment inputs:
120+
121+
- catalog properties
122+
- endpoint configuration
123+
- warehouse and storage roots
124+
- transport wiring and credential configuration
125+
126+
If a report depends on the attacker controlling those values directly, it is
127+
usually not a vulnerability in Iceberg Python itself.
128+
129+
### Boundary 2: catalog-supplied metadata
130+
131+
Iceberg Python often accepts metadata locations, table properties, namespace
132+
properties, and related control-plane information from a catalog. By default,
133+
Iceberg Python treats those sources as trusted.
134+
135+
This means a malicious catalog supplying incorrect or malicious metadata is
136+
usually not an Iceberg Python vulnerability by itself.
137+
138+
### Boundary 3: REST catalog-supplied configuration and delegated storage access
139+
140+
In REST deployments, Iceberg Python may also accept service endpoints,
141+
configuration, and delegated storage access from the REST catalog server. By
142+
default, those are treated as trusted control-plane inputs unless Iceberg
143+
Python explicitly documents a stronger guarantee.
144+
145+
This means a malicious REST catalog server sending dangerous endpoints is
146+
usually not an Iceberg Python vulnerability by itself. It also means many
147+
credential-selection bugs are often correctness or specification issues rather
148+
than security boundary failures.
149+
150+
The major exception is secret exposure. If Iceberg Python surfaces credentials
151+
or secrets to a new audience that was not already trusted with them, that is
152+
security-relevant.
153+
154+
### Boundary 4: storage-level authorization
155+
156+
Object store permissions are enforced by the storage provider and the
157+
credentials the surrounding deployment chooses to hand to Iceberg Python.
158+
Iceberg Python is not the root authority for bucket- or object-level
159+
authorization.
160+
161+
## In-Scope Security Vulnerabilities
162+
163+
The following categories are generally security-relevant in Iceberg Python
164+
when the report is credible and reproducible.
165+
166+
### 1. Secret or credential disclosure to a new audience
167+
168+
Examples include:
169+
170+
- catalog or storage credentials exposed through a user-visible surface
171+
- one catalog's credentials or auth state leaking into another catalog or
172+
client
173+
174+
### 2. Iceberg Python-owned trust-boundary violations
175+
176+
Security issues exist when Iceberg Python itself is expected to separate
177+
catalogs, clients, or principals and fails to do so.
178+
179+
Examples include:
180+
181+
- process-global auth or transport state crossing catalog instances
182+
- secret-bearing state from one principal reused for another principal within
183+
an Iceberg Python-owned boundary
184+
185+
## Usually Out of Scope or Non-Security by Default
186+
187+
These categories may still be real bugs worth fixing, but they are not usually
188+
security vulnerabilities in Iceberg Python itself.
189+
190+
### 1. Correctness bugs
191+
192+
Examples include incorrect metadata handling, ambiguous matching semantics,
193+
and logic bugs that do not create a new trust-boundary violation.
194+
195+
### 2. Parser hardening and malformed-input robustness
196+
197+
Malformed-input crashes, runtime exceptions, and memory amplification are
198+
usually treated as robustness or hardening work rather than security issues in
199+
Iceberg Python itself.
200+
201+
### 3. Malicious catalog or external service scenarios
202+
203+
Reports that require a malicious catalog or other external control-plane
204+
service are usually outside Iceberg Python's primary security boundary.
205+
206+
### 4. Equivalent-harm reports
207+
208+
If the actor already has a legitimate capability that can cause the same harm,
209+
the new path is usually not a security issue.
210+
211+
## Scanner Calibration Rules
212+
213+
A scanner targeting Iceberg Python should treat a finding as higher-confidence
214+
only if it plausibly shows one of the following:
215+
216+
- exposure of a secret or delegated credential to a new audience
217+
- creation of a new unauthorized capability in an Iceberg Python-owned
218+
component
219+
- violation of an Iceberg Python-owned trust boundary rather than a
220+
surrounding catalog, application, service, or operator boundary
221+
222+
A finding should be downgraded or rejected by default if it instead depends
223+
primarily on:
224+
225+
- malformed-input robustness or denial-of-service behavior
226+
- a malicious catalog or external service
227+
- a principal that already has equivalent power through legitimate write or
228+
maintenance capabilities

0 commit comments

Comments
 (0)