Skip to content

Added windows support#67

Merged
MelvinFrederiks merged 7 commits into
NetherlandsForensicInstitute:masterfrom
jessevz:master
Apr 16, 2026
Merged

Added windows support#67
MelvinFrederiks merged 7 commits into
NetherlandsForensicInstitute:masterfrom
jessevz:master

Conversation

@jessevz
Copy link
Copy Markdown
Contributor

@jessevz jessevz commented Apr 12, 2025

solves #61 for windows.
Added windows support by:

  • moving the init worker outside of the main function, the reason for that is that unlike with Linux, windows doesn't use fork where all resources are copied to the child. Instead windows uses spawn(), which requires the initworker function to be imported from a module-level scope — and that means it has to be pickleable, which local functions are not.
  • Windows uses latin1 by default, I made it so that utf-8 encoding was used to have the same behavior as on Linux.

I have tested this code by running the tests on windows 11 from powershell and also on WSL with debian to check if this also still works on Linux. What I did notice is that Windows performs way worse than on Linux. It took windows 4:18 minutes to pass all tests vs 4,40 seconds on Linux. I assume this has something to do with how windows does multithreading.

@jessevz jessevz requested a review from zyronix April 12, 2025 13:40
@zyronix
Copy link
Copy Markdown
Collaborator

zyronix commented May 20, 2025

I have played around with this, I am bit worried by the code as it is indeed so slow under windows. Maybe you could find a solution for this?

@jessevz
Copy link
Copy Markdown
Contributor Author

jessevz commented Apr 8, 2026

I have played around with this, I am bit worried by the code as it is indeed so slow under windows. Maybe you could find a solution for this?

@zyronix now it performs similar in speed on WIndows, Can you review this pull request again?

I did had to pin the chardet version to an old version, because some tests will fail on a recent chardet version (regardless of the code changes in this branch), because in some of the test cases, files will be wrongly recognized in Herbrew. We probably have to think how to handle chardet guesses with a low confidence rating.

@MelvinFrederiks MelvinFrederiks merged commit 5460f9c into NetherlandsForensicInstitute:master Apr 16, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants