Fix binwalk (and entropy) file descriptor leak#749
Conversation
| async def analyze(self, resource: Resource, config=None) -> BinwalkAttributes: | ||
| if not BINWALK_INSTALLED: | ||
| raise ComponentMissingDependencyError(self, BINWALK_TOOL) | ||
| pool = ProcessPoolExecutor() |
There was a problem hiding this comment.
| pool = ProcessPoolExecutor() | |
| pool = ProcessPoolExecutor(max_workers=1) |
We only ever want this to spawn one direct child process, rather than a pool with one child for each CPU.
Since we only ever want to run this one thing concurrently, does it still make sense to use a process pool and not a thread pool (not sure on the state of this stuff relative to the GIL these days)?
There was a problem hiding this comment.
This seems like a usecase for the AbstractOfrakService -- you could add it in and implement run, shutdown to create and destroy this pool...
There was a problem hiding this comment.
@whyitfor I think that having to start and stop a binwalk service wouldn't be very intuitive for users. We don't expect binwalk to keep running in the background when we're not using it. Also I don't think any other OFRAK analyzer is defined as an AbstractOfrakService
| offsets = await asyncio.get_running_loop().run_in_executor( | ||
| self.pool, _run_binwalk_on_file, temp_path | ||
| pool, _run_binwalk_on_file, temp_path | ||
| ) |
There was a problem hiding this comment.
Should this also be wrapped in a try like in the entropy components? Or should that try be removed?
One sentence summary of this PR (This should go in the CHANGELOG!)
Fix binwalk and entropy file descriptor leak
Link to Related Issue(s)
None.
Please describe the changes in your request.
I was running into errors like this:
OSError: [Errno 24] Too many open files: '/usr/local/lib/python3.9/site-packages/pygments/__init__.py'.It turns out that binwalk creates a
ProcessPoolExecutor, which opens a lot of file descriptors for pipes (depending on the CPU core count). When running tests that used the binwalk analyzer, it was creating a lot of FDs for each test case, which rapidly scaled and triggered theOSError: [Errno 24] Too many open filesA simple test I ran was this one:
Before my fix, this resulted in
FDs after binwalk: 116 (delta = 96). With the proposed fix, I saw:FDs after binwalk: 14 (delta = -6).I added a regression test for this.
Anyone you think should look at this, specifically?
@rbs-jacob ?