|
| 1 | +# Agent Guidance |
| 2 | + |
| 3 | +This file is read by automated agents (security scanners, code analyzers, |
| 4 | +AI assistants) operating on this repository. It points them at the |
| 5 | +human-authored references they should consult before producing output. |
| 6 | + |
| 7 | +## Project Overview |
| 8 | + |
| 9 | +Apache PDFBox is a Java library for working with PDF documents. It is used |
| 10 | +as a dependency (`pdfbox.jar`) in other Java projects and is accessed through |
| 11 | +its public Java API. The project also ships several command-line utilities. |
| 12 | + |
| 13 | +## Branches |
| 14 | + |
| 15 | +| Branch | Status | Java requirement | Latest release | |
| 16 | +|--------|--------|-----------------|----------------| |
| 17 | +| `trunk` | Future development (next major version, not yet released) | Java 11+ | — | |
| 18 | +| `3.0` | **Actively maintained** — current stable series | Java 8+ | 3.0.7 | |
| 19 | +| `2.0` | **Actively maintained** — legacy stable series | Java 6+ | 2.0.36 | |
| 20 | + |
| 21 | +When evaluating code or reporting issues, note which branch is in scope. |
| 22 | +Security fixes are applied to both `3.0` and `2.0`. New features target |
| 23 | +`trunk` and `3.0`. |
| 24 | + |
| 25 | +## Sub-modules |
| 26 | + |
| 27 | +All branches share the same multi-module Maven structure: |
| 28 | + |
| 29 | +- `pdfbox/` — Core library (PDF parsing, rendering, text extraction, encryption) |
| 30 | +- `fontbox/` — Font handling support library |
| 31 | +- `xmpbox/` — XMP metadata support library |
| 32 | +- `io/` — I/O utilities shared across modules (`3.0` and `trunk` only) |
| 33 | +- `tools/` — Command-line utilities |
| 34 | +- `debugger/` / `debugger-app/` — PDF debugger application |
| 35 | +- `examples/` — Standalone usage examples |
| 36 | +- `benchmark/` — JMH benchmarks |
| 37 | + |
| 38 | +## Building |
| 39 | + |
| 40 | +The standard build command is: |
| 41 | + |
| 42 | +``` |
| 43 | +mvn clean install |
| 44 | +``` |
| 45 | + |
| 46 | +To run only the tests without a full install: |
| 47 | + |
| 48 | +``` |
| 49 | +mvn test |
| 50 | +``` |
| 51 | + |
| 52 | +To build or test a specific module, use the `-pl` flag from the root: |
| 53 | + |
| 54 | +``` |
| 55 | +mvn -pl pdfbox test |
| 56 | +``` |
| 57 | + |
| 58 | +Minimum Java version depends on the branch — see the table above. |
| 59 | + |
| 60 | +## Sensitive Areas |
| 61 | + |
| 62 | +The following areas have historically been the source of subtle bugs and |
| 63 | +security issues. Changes here require extra care and regression testing. |
| 64 | +Avoid large refactorings in these areas unless explicitly requested: |
| 65 | + |
| 66 | +- PDF parsing and xref recovery |
| 67 | +- Font parsing and font substitution |
| 68 | +- Stream decoding and decompression |
| 69 | +- Incremental save/update logic |
| 70 | +- Encryption and digital signatures |
| 71 | +- Rendering and text extraction ordering |
| 72 | + |
| 73 | +## Security |
| 74 | + |
| 75 | +Security model and scope: [SECURITY.md](SECURITY.md), |
| 76 | +also published at <https://pdfbox.apache.org/security.html>. |
| 77 | + |
| 78 | +Agents that scan this repository **must** read the security model before |
| 79 | +reporting any finding. In particular, note: |
| 80 | + |
| 81 | +- Processing malformed PDFs is **partially in scope**: crashes, unchecked |
| 82 | + exceptions (`NullPointerException`, `StackOverflowError`), or general |
| 83 | + resource consumption from large PDFs are **known limitations**, not |
| 84 | + security vulnerabilities. However, disproportionate resource consumption |
| 85 | + triggered by small, attacker-controlled inputs may be in scope — see |
| 86 | + `SECURITY.md` for the full scope definition. |
| 87 | +- Remote code execution or privilege escalation from untrusted PDFs **is** in scope. |
| 88 | +- Issues that require the attacker to control the Java application's classpath |
| 89 | + or configuration are **out of scope**. |
| 90 | + |
| 91 | +For a list of known CVEs, see [SECURITY.md](SECURITY.md) or |
| 92 | +<https://pdfbox.apache.org/security.html>. |
| 93 | + |
| 94 | +To report a new vulnerability, send a plain-text email to <security@apache.org>. |
| 95 | +Do **not** open a public JIRA issue for undisclosed vulnerabilities. |
| 96 | + |
| 97 | +## Contribution Guidelines |
| 98 | + |
| 99 | +- Pull requests on this GitHub repository are welcome. |
| 100 | +- Bug reports and feature requests go in the |
| 101 | + [JIRA issue tracker](https://issues.apache.org/jira/browse/PDFBOX). |
| 102 | +- Code must be compatible with the minimum Java version of the target branch |
| 103 | + (see table above). |
| 104 | +- Follow the existing code style; a Checkstyle configuration is provided in |
| 105 | + `pdfbox-checkstyle-5.xml` and an Eclipse formatter in |
| 106 | + `pdfbox-eclipse-formatter.xml`. |
| 107 | +- Parser, rendering, font, extraction, encryption, or signing fixes should |
| 108 | + include a minimal reproducer document where practical, along with regression |
| 109 | + tests covering the reported behavior. |
| 110 | +- Avoid introducing new runtime dependencies unless necessary. |
| 111 | + Security-sensitive or cryptographic dependencies require maintainer review. |
| 112 | +- For questions, use the [Users Mailing List](https://pdfbox.apache.org/mailinglists.html). |
0 commit comments