Skip to content
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions ofrak_core/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
- Fix bug in Segment Injector: favor regions with data when injecting data ([#682](https://github.com/redballoonsecurity/ofrak/pull/682))
- Pass `usedforsecurity=False` to non-cryptographic `hashlib` calls to prevent failures when Python links against FIPS OpenSSL ([#744](https://github.com/redballoonsecurity/ofrak/pull/744))
- Fix `chdir` in UEFI components causing failing tests. ([#747](https://github.com/redballoonsecurity/ofrak/pull/747))
- Fix binwalk file descriptor leak. ([#749](https://github.com/redballoonsecurity/ofrak/pull/749))
Comment thread
rbs-jacob marked this conversation as resolved.
Outdated

## [3.3.0](https://github.com/redballoonsecurity/ofrak/compare/ofrak-v3.2.0...ofrak-v3.3.0) - 2025-10-03

Expand Down
5 changes: 3 additions & 2 deletions ofrak_core/src/ofrak/core/binwalk.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,17 +67,18 @@ def __init__(
resource_service: ResourceServiceInterface,
):
super().__init__(resource_factory, data_service, resource_service)
Comment thread
paulnoalhyt marked this conversation as resolved.
Outdated
self.pool = ProcessPoolExecutor()

async def analyze(self, resource: Resource, config=None) -> BinwalkAttributes:
if not BINWALK_INSTALLED:
raise ComponentMissingDependencyError(self, BINWALK_TOOL)
pool = ProcessPoolExecutor()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pool = ProcessPoolExecutor()
pool = ProcessPoolExecutor(max_workers=1)

We only ever want this to spawn one direct child process, rather than a pool with one child for each CPU.

Since we only ever want to run this one thing concurrently, does it still make sense to use a process pool and not a thread pool (not sure on the state of this stuff relative to the GIL these days)?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a usecase for the AbstractOfrakService -- you could add it in and implement run, shutdown to create and destroy this pool...

Copy link
Copy Markdown
Collaborator Author

@paulnoalhyt paulnoalhyt May 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whyitfor I think that having to start and stop a binwalk service wouldn't be very intuitive for users. We don't expect binwalk to keep running in the background when we're not using it. Also I don't think any other OFRAK analyzer is defined as an AbstractOfrakService

async with resource.temp_to_disk() as temp_path:
# Should errors be handled the way they are in the `DataSummaryAnalyzer`? Likely to be
# overkill here.
offsets = await asyncio.get_running_loop().run_in_executor(
self.pool, _run_binwalk_on_file, temp_path
pool, _run_binwalk_on_file, temp_path
)
Comment on lines 69 to 71
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also be wrapped in a try like in the entropy components? Or should that try be removed?

pool.shutdown(wait=True, cancel_futures=True)
Comment thread
rbs-jacob marked this conversation as resolved.
Outdated
return BinwalkAttributes(offsets)


Expand Down
29 changes: 29 additions & 0 deletions ofrak_core/tests/components/test_binwalk_component.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,3 +79,32 @@ async def test_binwalk_component(ofrak_context, test_case):
if test_case.number_of_results is not None:
assert len(binwalk_offsets) == test_case.number_of_results
assert test_case.subset_of_results.items() <= binwalk_offsets.items()


@pytest.mark.skipif(
not os.path.isdir(f"/proc/{os.getpid()}/fd"),
reason="Requires /proc/<pid>/fd (Linux only)",
)
@pytest.mark.skipif_missing_deps([BinwalkAnalyzer])
async def test_binwalk_does_not_leak_fds(ofrak_context):
"""
Regression test for the ProcessPoolExecutor FD leak in BinwalkAnalyzer.
"""
fd_dir = f"/proc/{os.getpid()}/fd"
Comment thread
rbs-jacob marked this conversation as resolved.
asset_path = os.path.join(BINWALK_ASSETS_PATH, "dirtraversal.tar")

root_resource = await ofrak_context.create_root_resource_from_file(asset_path)
before = len(os.listdir(fd_dir))
await root_resource.analyze(BinwalkAttributes)
Comment thread
paulnoalhyt marked this conversation as resolved.
Outdated

iterations = 5
for _ in range(iterations):
root_resource = await ofrak_context.create_root_resource_from_file(asset_path)
await root_resource.analyze(BinwalkAttributes)
after = len(os.listdir(fd_dir))

delta = after - before
assert delta < 10, (
f"BinwalkAnalyzer leaked {delta} FDs across {iterations} iterations "
f"({before} -> {after})."
)
Loading