Skip to content

Fix binwalk (and entropy) file descriptor leak#749

Open
paulnoalhyt wants to merge 6 commits into
masterfrom
bugfix/binwalk_fds
Open

Fix binwalk (and entropy) file descriptor leak#749
paulnoalhyt wants to merge 6 commits into
masterfrom
bugfix/binwalk_fds

Conversation

@paulnoalhyt
Copy link
Copy Markdown
Collaborator

@paulnoalhyt paulnoalhyt commented May 22, 2026

  • I have reviewed the OFRAK contributor guide and attest that this pull request is in accordance with it.
  • I have made or updated a changelog entry for the changes in this pull request.

One sentence summary of this PR (This should go in the CHANGELOG!)

Fix binwalk and entropy file descriptor leak

Link to Related Issue(s)

None.

Please describe the changes in your request.

I was running into errors like this: OSError: [Errno 24] Too many open files: '/usr/local/lib/python3.9/site-packages/pygments/__init__.py'.

It turns out that binwalk creates a ProcessPoolExecutor, which opens a lot of file descriptors for pipes (depending on the CPU core count). When running tests that used the binwalk analyzer, it was creating a lot of FDs for each test case, which rapidly scaled and triggered the OSError: [Errno 24] Too many open files

A simple test I ran was this one:

import asyncio
import os

from ofrak import OFRAK
from ofrak.core.binwalk import BinwalkAttributes


def fd_count() -> int:
    return len(os.listdir(f"/proc/{os.getpid()}/fd"))

async def main(ofrak_context):
    resource = await ofrak_context.create_root_resource_from_file("/usr/bin/ls")
    before = fd_count()
    print(f"FDs before binwalk: {before}")
    await resource.analyze(BinwalkAttributes)
    after = fd_count()
    print(f"FDs after binwalk:  {after}  (delta = {after - before})")

if __name__ == "__main__":
    ofrak = OFRAK()
    ofrak.run(main)

Before my fix, this resulted in FDs after binwalk: 116 (delta = 96). With the proposed fix, I saw: FDs after binwalk: 14 (delta = -6).

I added a regression test for this.

Anyone you think should look at this, specifically?

@rbs-jacob ?

Comment thread ofrak_core/src/ofrak/core/binwalk.py Outdated
Comment thread ofrak_core/src/ofrak/core/binwalk.py Outdated
async def analyze(self, resource: Resource, config=None) -> BinwalkAttributes:
if not BINWALK_INSTALLED:
raise ComponentMissingDependencyError(self, BINWALK_TOOL)
pool = ProcessPoolExecutor()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pool = ProcessPoolExecutor()
pool = ProcessPoolExecutor(max_workers=1)

We only ever want this to spawn one direct child process, rather than a pool with one child for each CPU.

Since we only ever want to run this one thing concurrently, does it still make sense to use a process pool and not a thread pool (not sure on the state of this stuff relative to the GIL these days)?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a usecase for the AbstractOfrakService -- you could add it in and implement run, shutdown to create and destroy this pool...

Copy link
Copy Markdown
Collaborator Author

@paulnoalhyt paulnoalhyt May 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whyitfor I think that having to start and stop a binwalk service wouldn't be very intuitive for users. We don't expect binwalk to keep running in the background when we're not using it. Also I don't think any other OFRAK analyzer is defined as an AbstractOfrakService

Comment thread ofrak_core/src/ofrak/core/binwalk.py Outdated
Comment thread ofrak_core/tests/components/test_binwalk_component.py
Comment thread ofrak_core/tests/components/test_binwalk_component.py Outdated
Comment thread ofrak_core/CHANGELOG.md Outdated
@paulnoalhyt paulnoalhyt changed the title Fix binwalk file descriptor leak Fix binwalk (and entropy) file descriptor leak May 26, 2026
Comment thread ofrak_core/src/ofrak/core/entropy/entropy.py
Comment on lines 69 to 71
offsets = await asyncio.get_running_loop().run_in_executor(
self.pool, _run_binwalk_on_file, temp_path
pool, _run_binwalk_on_file, temp_path
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also be wrapped in a try like in the entropy components? Or should that try be removed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants