Skip to content

Commit 18e055c

Browse files
authored
docs: add Iceberg security model (apache#688)
Similar to: apache/iceberg#16538 This PR publishes a security model for Iceberg and adds a detailed threat model referenced from AGENTS.md so there is a clear project reference for how Iceberg distinguishes security issues from other bugs. The goal is to make Iceberg’s security boundaries explicit for both human triage and automated scanning, and to reduce false positives in future security reporting. This PR was drafted with AI assistance and then manually reviewed and edited to reflect the project’s current approach to security roles, trust boundaries, and the handling of recent security reports. ### Note: `AGENTS.md` is currently in `.gitignore`, but this file was force committed to ensure that agents running automated security scans will be able to find the security model.
1 parent d6c1da0 commit 18e055c

2 files changed

Lines changed: 268 additions & 0 deletions

File tree

AGENTS.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
<!--
2+
Licensed to the Apache Software Foundation (ASF) under one
3+
or more contributor license agreements. See the NOTICE file
4+
distributed with this work for additional information
5+
regarding copyright ownership. The ASF licenses this file
6+
to you under the Apache License, Version 2.0 (the
7+
"License"); you may not use this file except in compliance
8+
with the License. You may obtain a copy of the License at
9+
10+
http://www.apache.org/licenses/LICENSE-2.0
11+
12+
Unless required by applicable law or agreed to in writing,
13+
software distributed under the License is distributed on an
14+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
KIND, either express or implied. See the License for the
16+
specific language governing permissions and limitations
17+
under the License.
18+
-->
19+
20+
# Apache Iceberg C++ — Agent Instructions
21+
22+
This file provides repository-specific guidance for automated agents working
23+
in this repository.
24+
25+
## Security Model
26+
27+
When assessing potential vulnerabilities or calibrating automated security
28+
findings, use [`SECURITY-THREAT-MODEL.md`](SECURITY-THREAT-MODEL.md) as the
29+
authoritative detailed description of this repository's security boundaries,
30+
trust assumptions, and non-boundaries.

SECURITY-THREAT-MODEL.md

Lines changed: 238 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,238 @@
1+
<!--
2+
Licensed to the Apache Software Foundation (ASF) under one
3+
or more contributor license agreements. See the NOTICE file
4+
distributed with this work for additional information
5+
regarding copyright ownership. The ASF licenses this file
6+
to you under the Apache License, Version 2.0 (the
7+
"License"); you may not use this file except in compliance
8+
with the License. You may obtain a copy of the License at
9+
10+
http://www.apache.org/licenses/LICENSE-2.0
11+
12+
Unless required by applicable law or agreed to in writing,
13+
software distributed under the License is distributed on an
14+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
KIND, either express or implied. See the License for the
16+
specific language governing permissions and limitations
17+
under the License.
18+
-->
19+
20+
# Apache Iceberg C++ Security Threat Model
21+
22+
This document describes the detailed security threat model for Apache
23+
Iceberg C++. It is intended for maintainers and automated security triage.
24+
25+
## Purpose
26+
27+
Apache Iceberg C++ is primarily a native library implementing Iceberg table
28+
format handling, catalog interactions, and related tooling for C++
29+
applications and services. It is commonly embedded in larger systems that
30+
provide their own authentication, authorization, and credential management.
31+
Because of that deployment model, not every unsafe or surprising behavior is a
32+
security vulnerability in Iceberg C++ itself.
33+
34+
This model is intended to answer:
35+
36+
- what Iceberg C++ generally treats as a security vulnerability
37+
- what Iceberg C++ generally treats as correctness, hardening, or deployment
38+
work
39+
- which boundaries are primarily owned by Iceberg C++ versus the surrounding
40+
catalog, application, or service
41+
- which issue classes should be downgraded by default by scanners
42+
43+
## Scope
44+
45+
This model is scoped to the Apache Iceberg C++ repository itself:
46+
47+
- table format and metadata handling
48+
- catalog and REST catalog clients
49+
- transport, credential, and configuration handling implemented in this repo
50+
- native parsing, memory management, and helper tooling shipped in this repo
51+
52+
It is not a general threat model for every application that embeds Iceberg
53+
C++.
54+
55+
In particular, it does not attempt to define the complete security model for:
56+
57+
- applications or services that embed Iceberg C++
58+
- storage-level authorization enforced outside Iceberg C++
59+
60+
## Security Goals
61+
62+
Iceberg C++ should:
63+
64+
- avoid exposing secrets or delegated credentials to principals that were not
65+
already trusted with them
66+
- avoid creating new unauthorized capabilities in Iceberg C++-owned
67+
components
68+
- avoid violating trust boundaries that Iceberg C++ itself owns, such as
69+
leaking auth, transport, or credential-bearing state across catalog or
70+
client boundaries in the same process
71+
- avoid memory-safety violations triggered by untrusted input, including
72+
out-of-bounds access, use-after-free, and other memory corruption
73+
74+
Iceberg C++ does not aim to be the primary enforcement point for:
75+
76+
- user-to-user authorization inside the embedding application
77+
- storage-level authorization
78+
- service-side credential scoping performed by an external catalog
79+
80+
## Roles
81+
82+
### Operator
83+
84+
The operator configures the surrounding catalog, application, service, and
85+
storage integration around Iceberg C++. This role is trusted to choose
86+
endpoints, warehouses, storage integrations, and credentials.
87+
88+
### Catalog control plane
89+
90+
The catalog control plane resolves tables and supplies metadata, locations,
91+
configuration, and delegated credentials to Iceberg C++. It may be
92+
implemented by a REST catalog server or another catalog implementation.
93+
Iceberg C++ assumes this control plane is trusted and outside its primary
94+
security boundary.
95+
96+
### REST catalog client
97+
98+
The REST catalog client consumes catalog-provided metadata, configuration, and
99+
credentials. Client-side bugs in routing, caching, or reuse may still be
100+
security-relevant if they leak credential-bearing state across boundaries that
101+
the Iceberg C++ client is expected to preserve.
102+
103+
### Embedding application
104+
105+
Applications and services embedding Iceberg C++ are responsible for their own
106+
user-facing authorization boundaries unless Iceberg C++ explicitly documents
107+
otherwise.
108+
109+
### Table writer or maintainer
110+
111+
This role may already have legitimate power to write or replace table
112+
metadata, write or delete files, choose paths under an allowed warehouse or
113+
table location, and invoke destructive maintenance operations. If a report
114+
only shows a new way to achieve the same effect this role can already cause
115+
legitimately, it is usually not a security issue in Iceberg C++.
116+
117+
## Trust Boundaries
118+
119+
### Boundary 1: operator-trusted configuration
120+
121+
The following are generally treated as trusted operator or deployment inputs:
122+
123+
- catalog properties
124+
- endpoint configuration
125+
- warehouse and storage roots
126+
- transport wiring and credential configuration
127+
128+
If a report depends on the attacker controlling those values directly, it is
129+
usually not a vulnerability in Iceberg C++ itself.
130+
131+
### Boundary 2: catalog-supplied metadata
132+
133+
Iceberg C++ often accepts metadata locations, table properties, namespace
134+
properties, and related control-plane information from a catalog. By default,
135+
Iceberg C++ treats those sources as trusted.
136+
137+
This means a malicious catalog supplying incorrect or malicious metadata is
138+
usually not an Iceberg C++ vulnerability by itself.
139+
140+
### Boundary 3: REST catalog-supplied configuration and delegated storage access
141+
142+
In REST deployments, Iceberg C++ may also accept service endpoints,
143+
configuration, and delegated storage access from the REST catalog server. By
144+
default, those are treated as trusted control-plane inputs unless Iceberg C++
145+
explicitly documents a stronger guarantee.
146+
147+
This means a malicious REST catalog server sending dangerous endpoints is
148+
usually not an Iceberg C++ vulnerability by itself. It also means many
149+
credential-selection bugs are often correctness or specification issues rather
150+
than security boundary failures.
151+
152+
The major exception is secret exposure. If Iceberg C++ surfaces credentials or
153+
secrets to a new audience that was not already trusted with them, that is
154+
security-relevant.
155+
156+
### Boundary 4: storage-level authorization
157+
158+
Object store permissions are enforced by the storage provider and the
159+
credentials the surrounding deployment chooses to hand to Iceberg C++.
160+
Iceberg C++ is not the root authority for bucket- or object-level
161+
authorization.
162+
163+
## In-Scope Security Vulnerabilities
164+
165+
The following categories are generally security-relevant in Iceberg C++ when
166+
the report is credible and reproducible.
167+
168+
### 1. Secret or credential disclosure to a new audience
169+
170+
Examples include:
171+
172+
- catalog or storage credentials exposed through a user-visible surface
173+
- one catalog's credentials or auth state leaking into another catalog or
174+
client
175+
176+
### 2. Iceberg C++-owned trust-boundary violations
177+
178+
Security issues exist when Iceberg C++ itself is expected to separate
179+
catalogs, clients, or principals and fails to do so.
180+
181+
Examples include:
182+
183+
- process-global auth or transport state crossing catalog instances
184+
- secret-bearing state from one principal reused for another principal within
185+
an Iceberg C++-owned boundary
186+
187+
### 3. Memory-safety violations from untrusted input
188+
189+
Out-of-bounds access, use-after-free, memory corruption, and similar native
190+
memory-safety issues triggered by untrusted input are generally security-
191+
relevant in Iceberg C++.
192+
193+
## Usually Out of Scope or Non-Security by Default
194+
195+
These categories may still be real bugs worth fixing, but they are not usually
196+
security vulnerabilities in Iceberg C++ itself.
197+
198+
### 1. Correctness bugs
199+
200+
Examples include incorrect metadata handling, ambiguous matching semantics,
201+
and logic bugs that do not create a new trust-boundary violation.
202+
203+
### 2. Parser hardening and malformed-input robustness without memory corruption
204+
205+
Malformed-input crashes, bounded allocation failures, and memory amplification
206+
without memory corruption are usually treated as robustness or hardening work
207+
rather than security issues in Iceberg C++ itself.
208+
209+
### 3. Malicious catalog or external service scenarios
210+
211+
Reports that require a malicious catalog or other external control-plane
212+
service are usually outside Iceberg C++'s primary security boundary.
213+
214+
### 4. Equivalent-harm reports
215+
216+
If the actor already has a legitimate capability that can cause the same harm,
217+
the new path is usually not a security issue.
218+
219+
## Scanner Calibration Rules
220+
221+
A scanner targeting Iceberg C++ should treat a finding as higher-confidence
222+
only if it plausibly shows one of the following:
223+
224+
- exposure of a secret or delegated credential to a new audience
225+
- creation of a new unauthorized capability in an Iceberg C++-owned component
226+
- violation of an Iceberg C++-owned trust boundary rather than a surrounding
227+
catalog, application, service, or operator boundary
228+
- memory corruption or other native memory-safety violations triggered by
229+
untrusted input
230+
231+
A finding should be downgraded or rejected by default if it instead depends
232+
primarily on:
233+
234+
- malformed-input robustness or denial-of-service behavior without memory
235+
corruption
236+
- a malicious catalog or external service
237+
- a principal that already has equivalent power through legitimate write or
238+
maintenance capabilities

0 commit comments

Comments
 (0)