When rendering a PDF document containing large or high-resolution images using PDFBox, a java.lang.OutOfMemoryError: Java heap space occurs.
Summary:
The application exhausts Java heap memory while attempting to convert PDF images to RGB format during rendering. The error originates in PDDeviceRGB.toRGBImage() and SampledImageReader during image conversion.
Full stack trace:
java.lang.OutOfMemoryError: Java heap space
at java.desktop/java.awt.image.DataBufferInt.<init>(DataBufferInt.java:76)
at java.desktop/java.awt.image.Raster.createPackedRaster(Raster.java:538)
at java.desktop/java.awt.image.DirectColorModel.createCompatibleWritableRaster(DirectColorModel.java:1032)
at java.desktop/java.awt.image.BufferedImage.<init>(BufferedImage.java:324)
at org.apache.pdfbox.pdmodel.graphics.color.PDDeviceRGB.toRGBImage(PDDeviceRGB.java:85)
at org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.from8bit(SampledImageReader.java:506)
at org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:217)
at org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:465)
at org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438)
at org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1106)
at org.apache.pdfbox.contentstream.operator.graphics.DrawObject.process(DrawObject.java:74)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:893)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:531)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:506)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:153)
at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:286)
at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:330)
at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:247)
at org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:233)
at org.verapdf.wcag.algorithms.semanticalgorithms.consumers.ContrastRatioConsumer.renderPage(ContrastRatioConsumer.java:306)
at org.verapdf.wcag.algorithms.semanticalgorithms.consumers.ContrastRatioConsumer.getRenderPage(ContrastRatioConsumer.java:145)
at org.verapdf.wcag.algorithms.semanticalgorithms.consumers.ContrastRatioConsumer.getPageSubImage(ContrastRatioConsumer.java:223)
at org.opendataloader.pdf.utils.ImagesUtils.createImageFile(ImagesUtils.java:120)
at org.opendataloader.pdf.utils.ImagesUtils.writeImage(ImagesUtils.java:103)
at org.opendataloader.pdf.utils.ImagesUtils.writeFromContents(ImagesUtils.java:66)
at org.opendataloader.pdf.utils.ImagesUtils.write(ImagesUtils.java:59)
at org.opendataloader.pdf.processors.DocumentProcessor.generateOutputs(DocumentProcessor.java:333)
at org.opendataloader.pdf.processors.DocumentProcessor.processFile(DocumentProcessor.java:96)
at org.opendataloader.pdf.api.OpenDataLoaderPDF.processFile(OpenDataLoaderPDF.java:40)
Steps to reproduce
- Prepare a PDF file containing high-resolution images (e.g., larger than 2000×2000 pixels) or multiple large images
- Process the PDF file using
OpenDataLoaderPDF.processFile()
- Run the application with default or insufficient heap memory configuration (e.g., without setting
-Xmx parameter)
- OOM error is triggered during image rendering (
ContrastRatioConsumer → ImagesUtils.writeImage)
Call path:
DocumentProcessor.processFile() → generateOutputs() → ImagesUtils.write() → writeFromContents() → writeImage() → createImageFile() → ContrastRatioConsumer.getPageSubImage() → renderPage() → PDFRenderer.renderImageWithDPI() → OOM during image conversion
Version
- OpenDataLoader PDF: 2.2.1
- Apache PDFBox: (Please check your dependency version)
Java version
openjdk 21 2023-09-19 LTS
Additional information requested
To help resolve this issue more quickly, please provide:
- PDF characteristics: Dimensions and number of images in the PDF that triggers the error
- Current JVM arguments: Is
-Xmx set? If so, what is the current value?
- PDFBox version: The exact version of
org.apache.pdfbox in your dependency management (pom.xml or build.gradle)
- Expected workload: Number and size of PDFs being processed simultaneously
Temporary workarounds
While waiting for a fix, consider these workarounds:
-
Increase heap memory:
java -Xmx2g -jar your-application.jar
-
Reduce rendering DPI (if configurable in your application)
-
Process PDFs sequentially rather than in parallel to reduce peak memory usage
Please let me know if you need me to help draft a fix, such as implementing streaming image processing or adding configurable DPI limits.
When rendering a PDF document containing large or high-resolution images using PDFBox, a
java.lang.OutOfMemoryError: Java heap spaceoccurs.Summary:
The application exhausts Java heap memory while attempting to convert PDF images to RGB format during rendering. The error originates in
PDDeviceRGB.toRGBImage()andSampledImageReaderduring image conversion.Full stack trace:
Steps to reproduce
OpenDataLoaderPDF.processFile()-Xmxparameter)ContrastRatioConsumer→ImagesUtils.writeImage)Call path:
DocumentProcessor.processFile()→generateOutputs()→ImagesUtils.write()→writeFromContents()→writeImage()→createImageFile()→ContrastRatioConsumer.getPageSubImage()→renderPage()→PDFRenderer.renderImageWithDPI()→ OOM during image conversionVersion
Java version
Additional information requested
To help resolve this issue more quickly, please provide:
-Xmxset? If so, what is the current value?org.apache.pdfboxin your dependency management (pom.xml or build.gradle)Temporary workarounds
While waiting for a fix, consider these workarounds:
Increase heap memory:
Reduce rendering DPI (if configurable in your application)
Process PDFs sequentially rather than in parallel to reduce peak memory usage
Please let me know if you need me to help draft a fix, such as implementing streaming image processing or adding configurable DPI limits.