Skip to content

java.lang.OutOfMemoryError: Java heap space #458

@xiaoymin

Description

@xiaoymin

When rendering a PDF document containing large or high-resolution images using PDFBox, a java.lang.OutOfMemoryError: Java heap space occurs.

Summary:
The application exhausts Java heap memory while attempting to convert PDF images to RGB format during rendering. The error originates in PDDeviceRGB.toRGBImage() and SampledImageReader during image conversion.

Full stack trace:

java.lang.OutOfMemoryError: Java heap space
	at java.desktop/java.awt.image.DataBufferInt.<init>(DataBufferInt.java:76)
	at java.desktop/java.awt.image.Raster.createPackedRaster(Raster.java:538)
	at java.desktop/java.awt.image.DirectColorModel.createCompatibleWritableRaster(DirectColorModel.java:1032)
	at java.desktop/java.awt.image.BufferedImage.<init>(BufferedImage.java:324)
	at org.apache.pdfbox.pdmodel.graphics.color.PDDeviceRGB.toRGBImage(PDDeviceRGB.java:85)
	at org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.from8bit(SampledImageReader.java:506)
	at org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:217)
	at org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:465)
	at org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:438)
	at org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1106)
	at org.apache.pdfbox.contentstream.operator.graphics.DrawObject.process(DrawObject.java:74)
	at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:893)
	at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:531)
	at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:506)
	at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:153)
	at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:286)
	at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:330)
	at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:247)
	at org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:233)
	at org.verapdf.wcag.algorithms.semanticalgorithms.consumers.ContrastRatioConsumer.renderPage(ContrastRatioConsumer.java:306)
	at org.verapdf.wcag.algorithms.semanticalgorithms.consumers.ContrastRatioConsumer.getRenderPage(ContrastRatioConsumer.java:145)
	at org.verapdf.wcag.algorithms.semanticalgorithms.consumers.ContrastRatioConsumer.getPageSubImage(ContrastRatioConsumer.java:223)
	at org.opendataloader.pdf.utils.ImagesUtils.createImageFile(ImagesUtils.java:120)
	at org.opendataloader.pdf.utils.ImagesUtils.writeImage(ImagesUtils.java:103)
	at org.opendataloader.pdf.utils.ImagesUtils.writeFromContents(ImagesUtils.java:66)
	at org.opendataloader.pdf.utils.ImagesUtils.write(ImagesUtils.java:59)
	at org.opendataloader.pdf.processors.DocumentProcessor.generateOutputs(DocumentProcessor.java:333)
	at org.opendataloader.pdf.processors.DocumentProcessor.processFile(DocumentProcessor.java:96)
	at org.opendataloader.pdf.api.OpenDataLoaderPDF.processFile(OpenDataLoaderPDF.java:40)

Steps to reproduce

  1. Prepare a PDF file containing high-resolution images (e.g., larger than 2000×2000 pixels) or multiple large images
  2. Process the PDF file using OpenDataLoaderPDF.processFile()
  3. Run the application with default or insufficient heap memory configuration (e.g., without setting -Xmx parameter)
  4. OOM error is triggered during image rendering (ContrastRatioConsumerImagesUtils.writeImage)

Call path:
DocumentProcessor.processFile()generateOutputs()ImagesUtils.write()writeFromContents()writeImage()createImageFile()ContrastRatioConsumer.getPageSubImage()renderPage()PDFRenderer.renderImageWithDPI() → OOM during image conversion

Version

  • OpenDataLoader PDF: 2.2.1
  • Apache PDFBox: (Please check your dependency version)

Java version

openjdk 21 2023-09-19 LTS

Additional information requested

To help resolve this issue more quickly, please provide:

  1. PDF characteristics: Dimensions and number of images in the PDF that triggers the error
  2. Current JVM arguments: Is -Xmx set? If so, what is the current value?
  3. PDFBox version: The exact version of org.apache.pdfbox in your dependency management (pom.xml or build.gradle)
  4. Expected workload: Number and size of PDFs being processed simultaneously

Temporary workarounds

While waiting for a fix, consider these workarounds:

  1. Increase heap memory:

    java -Xmx2g -jar your-application.jar
  2. Reduce rendering DPI (if configurable in your application)

  3. Process PDFs sequentially rather than in parallel to reduce peak memory usage


Please let me know if you need me to help draft a fix, such as implementing streaming image processing or adding configurable DPI limits.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions