Skip to content

Commit 62073eb

Browse files
author
Maruan Sahyoun
committed
PDFBOX-6208: add AGENTS.md and SECURITY.md
git-svn-id: https://svn.apache.org/repos/asf/pdfbox/trunk@1934731 13f79535-47bb-0310-9956-ffa450edef68
1 parent ff93da3 commit 62073eb

2 files changed

Lines changed: 208 additions & 0 deletions

File tree

AGENTS.md

Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
# Agent Guidance
2+
3+
This file is read by automated agents (security scanners, code analyzers,
4+
AI assistants) operating on this repository. It points them at the
5+
human-authored references they should consult before producing output.
6+
7+
## Project Overview
8+
9+
Apache PDFBox is a Java library for working with PDF documents. It is used
10+
as a dependency (`pdfbox.jar`) in other Java projects and is accessed through
11+
its public Java API. The project also ships several command-line utilities.
12+
13+
## Branches
14+
15+
| Branch | Status | Java requirement | Latest release |
16+
|--------|--------|-----------------|----------------|
17+
| `trunk` | Future development (next major version, not yet released) | Java 11+ ||
18+
| `3.0` | **Actively maintained** — current stable series | Java 8+ | 3.0.7 |
19+
| `2.0` | **Actively maintained** — legacy stable series | Java 6+ | 2.0.36 |
20+
21+
When evaluating code or reporting issues, note which branch is in scope.
22+
Security fixes are applied to both `3.0` and `2.0`. New features target
23+
`trunk` and `3.0`.
24+
25+
## Sub-modules
26+
27+
All branches share the same multi-module Maven structure:
28+
29+
- `pdfbox/` — Core library (PDF parsing, rendering, text extraction, encryption)
30+
- `fontbox/` — Font handling support library
31+
- `xmpbox/` — XMP metadata support library
32+
- `io/` — I/O utilities shared across modules (`3.0` and `trunk` only)
33+
- `tools/` — Command-line utilities
34+
- `debugger/` / `debugger-app/` — PDF debugger application
35+
- `examples/` — Standalone usage examples
36+
- `benchmark/` — JMH benchmarks
37+
38+
## Building
39+
40+
The standard build command is:
41+
42+
```
43+
mvn clean install
44+
```
45+
46+
To run only the tests without a full install:
47+
48+
```
49+
mvn test
50+
```
51+
52+
To build or test a specific module, use the `-pl` flag from the root:
53+
54+
```
55+
mvn -pl pdfbox test
56+
```
57+
58+
Minimum Java version depends on the branch — see the table above.
59+
60+
## Sensitive Areas
61+
62+
The following areas have historically been the source of subtle bugs and
63+
security issues. Changes here require extra care and regression testing.
64+
Avoid large refactorings in these areas unless explicitly requested:
65+
66+
- PDF parsing and xref recovery
67+
- Font parsing and font substitution
68+
- Stream decoding and decompression
69+
- Incremental save/update logic
70+
- Encryption and digital signatures
71+
- Rendering and text extraction ordering
72+
73+
## Security
74+
75+
Security model and scope: [SECURITY.md](SECURITY.md),
76+
also published at <https://pdfbox.apache.org/security.html>.
77+
78+
Agents that scan this repository **must** read the security model before
79+
reporting any finding. In particular, note:
80+
81+
- Processing malformed PDFs is **partially in scope**: crashes, unchecked
82+
exceptions (`NullPointerException`, `StackOverflowError`), or general
83+
resource consumption from large PDFs are **known limitations**, not
84+
security vulnerabilities. However, disproportionate resource consumption
85+
triggered by small, attacker-controlled inputs may be in scope — see
86+
`SECURITY.md` for the full scope definition.
87+
- Remote code execution or privilege escalation from untrusted PDFs **is** in scope.
88+
- Issues that require the attacker to control the Java application's classpath
89+
or configuration are **out of scope**.
90+
91+
For a list of known CVEs, see [SECURITY.md](SECURITY.md) or
92+
<https://pdfbox.apache.org/security.html>.
93+
94+
To report a new vulnerability, send a plain-text email to <security@apache.org>.
95+
Do **not** open a public JIRA issue for undisclosed vulnerabilities.
96+
97+
## Contribution Guidelines
98+
99+
- Pull requests on this GitHub repository are welcome.
100+
- Bug reports and feature requests go in the
101+
[JIRA issue tracker](https://issues.apache.org/jira/browse/PDFBOX).
102+
- Code must be compatible with the minimum Java version of the target branch
103+
(see table above).
104+
- Follow the existing code style; a Checkstyle configuration is provided in
105+
`pdfbox-checkstyle-5.xml` and an Eclipse formatter in
106+
`pdfbox-eclipse-formatter.xml`.
107+
- Parser, rendering, font, extraction, encryption, or signing fixes should
108+
include a minimal reproducer document where practical, along with regression
109+
tests covering the reported behavior.
110+
- Avoid introducing new runtime dependencies unless necessary.
111+
Security-sensitive or cryptographic dependencies require maintainer review.
112+
- For questions, use the [Users Mailing List](https://pdfbox.apache.org/mailinglists.html).

SECURITY.md

Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
# Security Scan Guidance
2+
3+
This document is for AI security scanners and vulnerability researchers to
4+
understand the security model of Apache PDFBox and to avoid false positives.
5+
6+
The canonical, human-authored version of this information is published at
7+
<https://pdfbox.apache.org/security.html>.
8+
9+
## Intended Use
10+
11+
Apache PDFBox is a Java library (`pdfbox.jar`) for creating, manipulating,
12+
and extracting content from PDF documents. It is accessed through its public
13+
Java API by other Java applications. It also ships command-line utilities
14+
built on top of that API.
15+
16+
## Security Model
17+
18+
### Processing untrusted PDFs
19+
20+
PDFBox supports processing of untrusted (potentially malicious or malformed)
21+
PDF files **to a limited degree**:
22+
23+
- **In scope**: Remote code execution, privilege escalation, unauthorized data
24+
access, sandbox escape, or disproportionate resource amplification (i.e.
25+
small attacker-controlled inputs triggering catastrophic memory or CPU
26+
consumption) caused by processing an untrusted PDF document. These are
27+
genuine vulnerabilities and should be reported privately.
28+
29+
- **Known limitations (not vulnerabilities)**: Malformed PDFs may cause
30+
unchecked exceptions such as `NullPointerException` or `StackOverflowError`,
31+
or may consume significant memory or CPU relative to document size. General
32+
parser and resource exhaustion issues are treated as robustness limitations
33+
rather than security vulnerabilities. Pure resource exhaustion findings
34+
without privilege escalation, unauthorized access, or disproportionate
35+
amplification (e.g. tenant starvation, indexing-service disruption from a
36+
tiny crafted input) are generally out of scope.
37+
38+
### Deployment and sandboxing
39+
40+
Applications embedding PDFBox are expected to apply appropriate operational
41+
controls — such as timeouts, memory limits, process isolation, and sandboxing
42+
— when processing untrusted documents at scale or in multi-tenant environments.
43+
Resource exhaustion risks in those contexts are the responsibility of the
44+
embedding application, not of PDFBox itself.
45+
46+
### Encryption and cryptography
47+
48+
PDFBox uses the Java Cryptography Architecture (JCA) and the Bouncy Castle
49+
libraries to implement PDF encryption (RC4, AES-128, AES-256) and digital
50+
signatures. Vulnerabilities in these dependencies should be reported to their
51+
respective projects; PDFBox will incorporate fixes in its releases as needed.
52+
Issues in how PDFBox *uses* those libraries (e.g. incorrect key derivation,
53+
bypass of access permissions) are in scope.
54+
55+
### Classpath and configuration trust boundary
56+
57+
PDFBox is a library. The embedding application is assumed to control its own
58+
JVM environment, including classpath, JVM arguments, installed security
59+
providers, classloaders, and dependency resolution. Attacks that require the
60+
attacker to influence any of these are **out of scope**.
61+
62+
### Command-line utilities
63+
64+
The command-line tools (`PDFToText`, `PDFDebugger`, etc.) are convenience
65+
wrappers around the library API. Their security scope follows the library
66+
scope above.
67+
68+
## Previously Disclosed Vulnerabilities
69+
70+
For a full list of disclosed CVEs, see <https://pdfbox.apache.org/security.html>.
71+
72+
Scanners should check there before reporting a finding to avoid duplicate reports.
73+
74+
## Reporting a Vulnerability
75+
76+
**Do not open a public JIRA issue for an undisclosed vulnerability.**
77+
78+
Report undisclosed vulnerabilities by sending a plain-text email to:
79+
80+
```
81+
security@apache.org
82+
```
83+
84+
Send one email per vulnerability. The PDFBox security team will work with
85+
you privately to confirm and resolve the issue before public disclosure.
86+
87+
The typical handling process is:
88+
89+
1. Reporter sends details to <security@apache.org>.
90+
2. The PDFBox security team acknowledges receipt and works privately with the
91+
reporter to validate and fix the issue.
92+
3. A new release is prepared that includes the fix.
93+
4. The vulnerability and its fix are publicly announced on the blog and in the CVE database.
94+
95+
For more detail on the Apache vulnerability handling process, see
96+
<https://www.apache.org/security/committers.html>.

0 commit comments

Comments
 (0)