Skip to content

Memory leak in Document constructor when using stream parameter #248

@moonheart

Description

@moonheart

Summary

The Document(stream: byte[]) constructor leaks native (unmanaged) memory on every call. The Document(fileName: string) constructor does not leak.

Root Cause

In Document.cs, when the stream parameter is used:

IntPtr dataPtr = Marshal.AllocHGlobal(stream.Length);  // allocates unmanaged memory
Marshal.Copy(stream, 0, dataPtr, stream.Length);
SWIGTYPE_p_unsigned_char swigData = new SWIGTYPE_p_unsigned_char(dataPtr, true);
FzStream data = mupdf.mupdf.fz_open_memory(swigData, (uint)stream.Length);
// ...
data.Dispose();  // releases the fz_stream, but NOT dataPtr

dataPtr is allocated with Marshal.AllocHGlobal() but is never freed with Marshal.FreeHGlobal(). SWIGTYPE_p_unsigned_char has no finalizer and does not release the pointer. FzStream.Dispose() only releases the fz_stream struct, not the underlying buffer.

Reproduction

Environment

  • MuPDF.NET version: 3.2.16
  • .NET version: 8.0
  • OS: Windows 11

Steps

  1. Create a PDF file (any size, e.g. 5 MB).
  2. Run the following program:
using System.Diagnostics;
using MuPDF.NET;

string pdfFile = @"C:\temp\sample.pdf";
int iterations = 100;

Console.WriteLine("[TEST 1] new Document(fileName: path)  <-- NO LEAK");
RunTest(useStream: false);

GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
Thread.Sleep(1000);

Console.WriteLine("[TEST 2] new Document(stream: bytes)   <-- LEAKS");
RunTest(useStream: true);

void RunTest(bool useStream)
{
    long before = GetPrivateMemoryMB();
    byte[]? pdfBytes = useStream ? File.ReadAllBytes(pdfFile) : null;

    for (int i = 0; i < iterations; i++)
    {
        Document doc = useStream
            ? new Document(stream: pdfBytes)
            : new Document(fileName: pdfFile);

        int count = doc.PageCount;  // trigger parsing
        doc.Close();

        if ((i + 1) % 10 == 0)
        {
            GC.Collect();
            GC.WaitForPendingFinalizers();
            GC.Collect();
            Console.WriteLine($"  Iter {i + 1,3}: {GetPrivateMemoryMB()} MB  (+{GetPrivateMemoryMB() - before} MB)");
        }
    }
    Console.WriteLine($"End: {GetPrivateMemoryMB()} MB  (delta: +{GetPrivateMemoryMB() - before} MB)");
}

static long GetPrivateMemoryMB()
{
    using var p = Process.GetCurrentProcess();
    p.Refresh();
    return p.PrivateMemorySize64 / 1024 / 1024;
}

Expected Behavior

Both constructors should release all memory after Document.Close() / Document.Dispose().

Actual Behavior

Constructor Start End (100 iterations) Delta
new Document(fileName: path) 6 MB 47 MB +41 MB ✅ (baseline, stable after 1st iteration)
new Document(stream: bytes) 47 MB 606 MB +559 MB ❌ (~5.6 MB per iteration)

The leak is proportional to stream.Length × iteration count.

Workaround

Use new Document(fileName: pdfFilePath) instead of new Document(stream: File.ReadAllBytes(pdfFilePath)).

Suggested Fix

Add a field to track the allocated buffer and free it in Document.Dispose():

// 1. Add field to Document class
private IntPtr _streamDataPtr;

// 2. In constructor, assign to the field instead of local variable
if (stream != null)
{
    _streamDataPtr = Marshal.AllocHGlobal(stream.Length);
    Marshal.Copy(stream, 0, _streamDataPtr, stream.Length);
    SWIGTYPE_p_unsigned_char swigData = new SWIGTYPE_p_unsigned_char(_streamDataPtr, true);
    FzStream data = mupdf.mupdf.fz_open_memory(swigData, (uint)stream.Length);
    // ... open document ...
    data.Dispose();
}

// 3. In Dispose(), free after _nativeDocument is disposed
public void Dispose()
{
    if (IsClosed)
        return;

    // ... existing cleanup ...

    lock (Utils.MuPDFLock)
    {
        _nativeDocument.Dispose();
    }
    _nativeDocument = null;

    if (_streamDataPtr != IntPtr.Zero)
    {
        Marshal.FreeHGlobal(_streamDataPtr);
        _streamDataPtr = IntPtr.Zero;
    }
}

Note: The buffer cannot be freed immediately after fz_open_document_with_stream because MuPDF keeps a reference to it. It must be freed only after _nativeDocument.Dispose() completes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions