-
Clone
git clone https://github.com/Stirling-Tools/JPDFium.git cd JPDFium -
Install native dependencies (for full feature set)
Fedora / RHEL:
sudo dnf install -y gcc-c++ cmake pcre2-devel freetype-devel harfbuzz-devel \ libicu-devel qpdf-devel pugixml-devel libunibreak-develUbuntu / Debian:
sudo apt install -y g++ cmake libpcre2-dev libfreetype-dev libharfbuzz-dev \ libicu-dev libqpdf-dev libpugixml-dev libunibreak-dev -
Quick Try-Out (no PDFium or native dependencies needed)
./gradlew quickTry
This builds the stub bridge and runs all 87 samples in stub mode. Perfect for testing Java-layer changes.
-
Full Build with Real PDFium (recommended for production testing)
# One command: build PDFium from source, build real bridge, run all tests and samples ./gradlew fullBuildAndTestOr step by step:
./gradlew buildPdfium # Build PDFium from EmbedPDF fork (~15 GB, first time takes 15-60 min) ./gradlew buildRealBridge # Build native bridge against real PDFium ./gradlew test # Unit tests (stub mode) ./gradlew :jpdfium:integrationTest # Integration tests (real PDFium) ./gradlew runAllSamples # Run all 87 samples
Note: The PDFium build requires
git,python3, and ~15 GB disk space. On Fedora, also install:sudo dnf install clang lld pkg-config ninja-build. The build script installsdepot_tools(gclient/gn/ninja) automatically. Subsequent builds with--rebuildare much faster (incremental).
JPDFium uses the EmbedPDF fork (branch embedpdf/main) which adds
EPDF_* APIs for native encryption, annotation control, and redaction.
The build uses a component build (is_component_build=true) with use_allocator_shim=false:
- Component build produces
libpdfium.soplus dependency.sofiles (PartitionAlloc, ICU, zlib, abseil) use_allocator_shim=falseis required - without it, PartitionAlloc replaces the system allocator (malloc/free), causing crashes when loaded into a JVM that manages its own heapCOMPONENT_BUILD+FPDF_IMPLEMENTATIONdefines are set automatically, giving FPDF_EXPORT symbols__attribute__((visibility("default")))- no manual header patching needed
The build script applies these source patches:
base/BUILD.gn: Stub file (standalone PDFium lacks full//base)third_party/libpng/visibility.gni: Adds fpdfsdk to libpng visibility (EmbedPDF adds PNG export)cpdf_pagecontentgenerator.cpp: Fixes two null pointer crashes in PDFium's resource dict handling
The NativeLoader reads a native-libs.txt manifest to extract all component .so files to a temp
directory before loading. The dynamic linker resolves dependencies via RUNPATH=$ORIGIN.
- Open in IntelliJ IDEA - import as a Gradle project. Add
--enable-native-access=ALL-UNNAMEDto Run Configurations -> Templates -> Application -> VM Options.
| Task | Description |
|---|---|
./gradlew quickTry |
Quick try-out: stub bridge + unit tests + all samples (no PDFium) |
./gradlew buildPdfium |
Build PDFium from EmbedPDF fork source (~15 GB, first build 15-60 min) |
./gradlew buildRealBridge |
Build real native bridge with PDFium |
./gradlew buildStubBridge |
Build stub native bridge (no PDFium) |
./gradlew fullBuildAndTest |
Full end-to-end: PDFium + real bridge + all tests + samples |
./gradlew runAllSamples |
Run all 87 samples (requires real bridge for full features) |
./gradlew runSample -Psample=01 |
Run a specific sample (01-87) |
./gradlew test |
Run unit tests (stub mode) |
./gradlew :jpdfium:integrationTest |
Run integration tests (real PDFium required) |
| Sample | Feature | Sample | Feature | Sample | Feature |
|---|---|---|---|---|---|
| 01 | Render | 31 | Image Extract | 61 | Search Highlight |
| 02 | Text Extract | 32 | Page Objects | 62 | Page Split 2-Up |
| 03 | Text Search | 33 | Encryption | 63 | Page Labels |
| 04 | Metadata | 34 | Linearizer | 64 | Link Validation |
| 05 | Bookmarks | 35 | Overlay | 65 | Posterize |
| 06 | Redact Words | 36 | Annotation Builder | 66 | PDF Diff |
| 07 | Annotations | 37 | Path Drawer | 67 | Auto-Deskew |
| 08 | Full Pipeline | 38 | JavaScript Inspector | 68 | Font Audit |
| 09 | Flatten | 39 | Web Links | 69 | PDF/A Conversion |
| 10 | Signatures | 40 | Page Boxes | 70 | Page Scaling |
| 11 | Attachments | 41 | Version Converter | 71 | Margin Adjust |
| 12 | Links | 42 | Bounded Text | 72 | Selective Flatten |
| 13 | Page Import | 43 | Stream Optimizer | 73 | Annotation Export |
| 14 | Structure Tree | 44 | Flatten Rotation | 74 | Image Replace |
| 15 | Thumbnails (embedded) | 45 | Page Interleaver | 75 | Long Image |
| 16 | Page Editing | 46 | Named Destinations | 76 | Duplicate Detect |
| 17 | N-Up Layout | 47 | Blank Page Detector | 77 | Column Extract |
| 18 | Repair | 48 | EmbedPDF Annotations | 78 | Image DPI |
| 19 | PDF to Images | 49 | Native Encryption | 79 | Page Mirror |
| 20 | Images to PDF | 50 | Native Redaction | 80 | Background |
| 21 | Thumbnails (generated) | 51 | Compress | 81 | Reading Order |
| 22 | Merge/Split | 52 | Bookmark Editor | 82 | Resource Dedup |
| 23 | Watermark | 53 | Barcode Generate | 83 | TOC Generate |
| 24 | Table Extract | 54 | Page Reorder | 84 | Selective Raster |
| 25 | Page Geometry | 55 | Color Convert | 85 | Annotation Stats |
| 26 | Header/Footer | 56 | Booklet | 86 | Posterize Sizes |
| 27 | Security | 58 | Analytics | 87 | AutoCrop Margins |
| 28 | Doc Info | 59 | Form Fill | ||
| 29 | Render Options | 60 | AutoCrop | ||
| 30 | Form Reader |
Stub-only development (no PDFium or native libraries needed):
bash native/build-stub.sh
./gradlew testThe stub provides pass-through behavior for Java-layer testing only.
JPDFium/
native/
bridge/
include/
- jpdfium.h # Public C API (consumed by jextract)
- jpdfium_internal.h # DocWrapper, PageWrapper, helpers
src/
- jpdfium_document.cpp # Core document operations
- jpdfium_render.cpp # Page rendering
- jpdfium_text.cpp # Text extraction and search
- jpdfium_redact.cpp # Redaction (Object Fission)
- jpdfium_advanced.cpp # PII pipeline: PCRE2, FreeType, HarfBuzz, ICU, qpdf, pugixml
- jpdfium_repair.cpp # PDF repair cascade
- jpdfium_image.cpp # PDF to image conversion
- jpdfium_brotli.cpp # Brotli transcoding
- jpdfium_lcms.cpp # ICC color profile validation
- jpdfium_openjpeg.cpp # JPEG2000 validation
- jpdfium_pdfio.cpp # PDFio fallback repair
- jpdfium_unicode.cpp # Unicode text processing
- jpdfium_stub.cpp # Stub for testing without PDFium
- setup-pdfium.sh # Build PDFium from EmbedPDF fork source
- build-real.sh # Build bridge against real PDFium
- build-stub.sh # Build stub only
jpdfium/ # All Java source (main module)
src/
main/java/stirling/software/jpdfium/
- PdfDocument.java # Main document API
- PdfPage.java # Page API
- PdfImageConverter.java # PDF <-> Image conversion
- PdfMerge.java # Merge multiple PDFs
- PdfSplit.java # Split PDFs by strategy
- Watermark.java # Watermark builder
- WatermarkApplier.java # Apply watermarks
- HeaderFooter.java # Header/footer builder
- HeaderFooterApplier.java # Apply headers/footers
- doc/ # Document inspection APIs
- PdfMetadata.java
- PdfBookmarks.java
- PdfAnnotations.java
- PdfLinks.java
- PdfSignatures.java
- PdfAttachments.java
- PdfThumbnails.java
- PdfStructureTree.java
- PdfPageImporter.java
- PdfPageEditor.java
- PdfRepair.java
- PdfSecurity.java # Security hardening
- PdfTableExtractor.java
- PdfOverlay.java # Page overlay/stamp
- PdfLinearizer.java # Fast web view
- PdfStreamOptimizer.java # File size reduction
- PdfVersionConverter.java # PDF version
- PdfEncryption.java # AES-256 encrypt/decrypt (native + qpdf file-to-file)
- EmbedPdfAnnotations.java # Extended annotation operations (opacity, rotation, redaction, etc.)
- PdfPageInterleaver.java # Duplex scan merge
- PdfNamedDestinations.java # Named destination lookup
- PdfWebLinks.java # Web link enumeration
- PdfPageBoxes.java # All five page boxes
- PdfAnnotationBuilder.java # Create annotations
- PdfPathDrawer.java # Vector path drawing
- PdfFlattenRotation.java # Apply rotation transform
- PdfFormReader.java # Form field extraction
- PdfImageExtractor.java # Embedded image extraction
- PdfJavaScriptInspector.java # JS audit
- PdfBoundedText.java # Bounded text blocks
- BlankPageDetector.java # Blank page detection
- DocInfo.java # Comprehensive PDF audit
- RenderOptions.java # Advanced render options
- NUpLayout.java
- QpdfHelper.java
- PdfCompressor.java # PDF compression
- PdfBarcode.java # QR code generation
- PdfBookmarkEditor.java # Bookmark editing
- PdfPageReorder.java # Page reordering
- PdfColorConverter.java # Color space conversion
- PdfPrint.java # Booklet imposition
- PdfAnalytics.java # Document analytics
- PdfFormFiller.java # Form field filling
- ...
- exception/ # JPDFiumException hierarchy
- fonts/ # FontInfo, FontNormalizer
- model/ # Rect, PageSize, RenderResult, ImageFormat, PdfVersion, ColorScheme, FlattenMode, Position, ImageToPdfOptions, PdfToImageOptions
- panama/ # NativeLoader, JpdfiumLib, JpdfiumH (generated), DocBindings, PageEditBindings, AnnotationBindings, EmbedPdfAnnotationBindings, EmbedPdfDocumentBindings, TextPageBindings, RenderBindings, JavaScriptBindings, ActionBindings, FontLib, RepairLib, FfmHelper
- redact/ # PdfRedactor, RedactOptions, RedactResult, RedactionSession
- pii/ # PatternEngine, PiiCategory, EntityRedactor, GlyphRedactor, XmpRedactor
- text/ # PdfTextExtractor, PdfTextSearcher, PageText, TextLine, TextWord, TextChar, Table
- edit/ # TextEditor
- transform/ # PageOps, PdfPageGeometry, PdfPageBoxes
- convert/ # (PDF <-> Image conversion utilities)
- util/ # NativeJsonParser
test/java/stirling/software/jpdfium/
- PdfDocumentTest.java # Unit tests (stub native)
- RealPdfIntegrationTest.java # Integration tests (real PDFium)
- ManualTest.java # Quick smoke-test (right-click -> Run)
- samples/ # Numbered manual-test classes (S01-S87)
- RunAllSamples.java # Execute all 87 samples
- SampleBase.java # Base class for samples
- S01_Render.java ... S87_AutoCropMargins.java
- ...
jpdfium-natives/ # Platform-specific native JARs
- jpdfium-natives-linux-x64/
- ...
jpdfium-spring/ # Spring Boot auto-configuration
jpdfium-bom/ # Maven BOM
buildSrc/ # Gradle convention plugins
Adding a feature follows a consistent pattern across all layers:
Step 1 - C header (native/bridge/include/jpdfium.h):
JPDFIUM_EXPORT int32_t jpdfium_page_rotate(int64_t page, int32_t rotation);Rules:
- Return
int32_terror codes (0= OK, negative = error) - Opaque handles use
int64_t - Output params via pointer (
float* width) - Caller-freed buffers via double pointer + length (
uint8_t** data, int64_t* len) - Strings are UTF-8
const char*- convert to UTF-16LE internally for PDFium
Step 2 - Real implementation (native/bridge/src/jpdfium_document.cpp):
int32_t jpdfium_page_rotate(int64_t page, int32_t rotation) {
PageWrapper* pw = decodePage(page);
if (!pw || !pw->page) return JPDFIUM_ERR_INVALID;
FPDFPage_SetRotation(pw->page, rotation);
return JPDFIUM_OK;
}Step 3 - Stub (native/bridge/src/jpdfium_stub.cpp):
int32_t jpdfium_page_rotate(int64_t, int32_t) { return JPDFIUM_OK; }Step 4 - jextract function list (jpdfium/build.gradle.kts):
val jpdfiumFunctions = listOf(
...,
"jpdfium_page_rotate" // add here
)Regenerate: ./gradlew :jpdfium:generateBindings
Note on redaction functions:
jpdfium_redact_wordsdelegates tojpdfium_redact_words_exinternally. New redaction features should be added tojpdfium_redact_words_ex(and its stub) rather than creating a third variant.
Step 5 - Java wrapper (panama/JpdfiumLib.java):
public static void pageRotate(long page, int rotation) {
check(JpdfiumH.jpdfium_page_rotate(page, rotation), "pageRotate");
}Step 6 - High-level API (PdfPage.java):
public void rotate(int rotation) {
ensureOpen();
JpdfiumLib.pageRotate(handle, rotation);
}Step 7 - Tests - a unit test (stub, always runs in ./gradlew test) and an
integration test tagged with @EnabledIfSystemProperty(named = "jpdfium.integration", matches = "true").
Step 8 - Rebuild and verify:
# Rebuild stub (for unit tests)
bash native/build-stub.sh
./gradlew test # unit tests (stub)
# Rebuild real (for integration tests)
bash native/build-real.sh
./gradlew :jpdfium:integrationTest # integration tests (real PDFium)Features using PCRE2, FreeType, HarfBuzz, ICU, qpdf, pugixml live in jpdfium_advanced.cpp
and are conditionally compiled with #ifdef JPDFIUM_HAS_<LIB> preprocessor guards.
Architecture:
jpdfium.h (API) -> jpdfium_stub.cpp (stub) + jpdfium_advanced.cpp (real impl)
|
#ifdef JPDFIUM_HAS_PCRE2 -> pcre2.h
#ifdef JPDFIUM_HAS_FREETYPE -> ft2build.h
#ifdef JPDFIUM_HAS_HARFBUZZ -> hb/hb.h, hb-subset.h
#ifdef JPDFIUM_HAS_ICU -> unicode/unorm2.h, ubrk.h, ubidi.h
#ifdef JPDFIUM_HAS_QPDF -> qpdf/QPDF.hh
#ifdef JPDFIUM_HAS_PUGIXML -> pugixml.hpp
Steps to add a new native feature:
- Declare function in
jpdfium.h - Add stub in
jpdfium_stub.cpp(must compile without any external libraries) - Add real implementation in
jpdfium_advanced.cppguarded by#ifdef JPDFIUM_HAS_<LIB> - Add pkg-config detection in
CMakeLists.txt - Add Java FFM wrapper in
JpdfiumLib.java - Add function name to jextract list in
jpdfium/build.gradle.kts - Create high-level Java API class in the appropriate package
- Write tests (unit tests run against stub, integration against real libs)
Building with native libraries:
# CMake detects all libraries via pkg-config
bash native/build-real.shJava-side pattern for native features:
// High-level class in redact/pii/ or fonts/
public final class MyFeature {
private MyFeature() {} // static utility or AutoCloseable
public static Result doSomething(PdfDocument doc, ...) {
// Call JpdfiumLib.myNativeFunction(...)
// Parse JSON result from native layer
// Return typed Java record
}
}All native features must work gracefully in stub mode (return empty results, not throw) so Java-layer tests pass without installing native dependencies.
JPDFium uses the EmbedPDF fork of PDFium, which adds
native encryption, annotation, and redaction APIs (the EPDF_* / EPDFAnnot_* symbols).
No pre-built binaries are available - the library must be built from source.
# 1. Build PDFium from EmbedPDF fork source (first run: ~15 GB download + 15-60 min build)
bash native/setup-pdfium.sh
# Subsequent builds (incremental, much faster):
bash native/setup-pdfium.sh --rebuild
# Full clean rebuild:
bash native/setup-pdfium.sh --clean
# 2. Build real bridge with CMake (auto-detects native libraries)
bash native/build-real.sh
# 3. Integration tests
./gradlew :jpdfium:integrationTest| Requirement | Notes |
|---|---|
git, python3 |
Used by depot_tools (gclient/gn/ninja) |
clang, lld |
C++ compiler and linker (Fedora: dnf install clang lld) |
| ~15 GB disk space | Source checkout + build artifacts |
| Network access | First build downloads Chromium build toolchain |
The script installs depot_tools automatically. After a successful build, the
resulting libpdfium.so is placed in native/pdfium/lib/ with headers in
native/pdfium/include/. The build verifies that EPDF_* symbols are exported.
The samples package provides quick 1-click runnable classes for each feature:
Right-click any S01_Render ... S87_AutoCropMargins class in IntelliJ and hit Run.
RunAllSamples runs all 87 samples in sequence. Output lands in jpdfium/samples-output/.
See jpdfium/src/test/java/stirling/software/jpdfium/samples/ for details.
- C bridge, not raw jextract - PDFium's 400+ function API has platform-specific wide strings and complex lifetime rules. The bridge owns all of this.
- Handles as
int64_t- Opaque, no structs across the FFM boundary. - Bridge copies byte buffers -
jpdfium_doc_open_bytescopies data because the JavaArenafrees it before the document is closed. - BGRA -> RGBA in C - PDFium renders BGRA; the bridge swaps channels so Java always receives consistent RGBA.
- No
FPDF_ApplyRedactions- Does not exist in the public PDFium API. - Object Fission Algorithm - True text removal via
jpdfium_redact_words_ex:- Map text-page char indices -> page objects (spatial correlation, bounding-box centres). Characters with degenerate bounding boxes (synthetic spaces) are assigned to their nearest neighbor's object.
- Fully-contained objects ->
FPDFPage_RemoveObject+FPDFPageObj_Destroy. - Partially-overlapping objects -> split surviving characters into per-word text objects.
Each word gets original
a,b,c,d+e,ffromFPDFText_GetCharOriginof its first character, preserving exact inter-word positioning regardless of font advance widths. - Fission validation: if any fragment has degenerate bounds (Type 3 fonts), the plan is aborted and the original object is left for fallback removal.
- Fallback: objects unmapped by spatial correlation removed if >= 70% within match bbox.
- Paint filled rectangles at all match bboxes, then single
FPDFPage_GenerateContent.
- UTF-16LE for search -
FPDFText_FindStartexpects 2-byte UTF-16LE, not 4-bytewchar_t. Useutf8_to_utf16le()helper. - Wide regex for pattern redaction -
std::wregexon astd::wstringbuilt from PDFium codepoints handles accented, CJK, and Cyrillic characters correctly. - NativeLoader extracts to a temp directory - Both
libpdfium.soandlibjpdfium.soare extracted to the same temp dir so the$ORIGINrpath resolves at runtime. - Single
jpdfiummodule - All Java API lives in one module. No internal module split (core/bindings/document) to simplify dependency management for consumers.
- Java - Follow existing style. No Lombok. Records for value types. Javadoc on all public API.
- C++ - C++17. All exported functions prefixed
jpdfium_. RAII viaDocWrapper/PageWrapperdestructors.extern "C"injpdfium.his mandatory. - Commits - Conventional commits:
feat:,fix:,refactor:,docs:,test:,build:. - Tests - Every Java API method needs a unit test (stub) and an integration test (real PDFium).
- Stub implementation added for any new native function
- Implementation guarded by
#ifdef JPDFIUM_HAS_*(if using native libs) - Function added to jextract list in
jpdfium/build.gradle.kts - Unit tests pass:
./gradlew test - Integration tests pass:
./gradlew :jpdfium:integrationTest - New public API has Javadoc
- No memory leaks in C++ (check with valgrind when modifying native code)
- README updated if adding user-visible features