Skip to content

fix: handle UnicodeDecodeError on usernames with special characters#2853

Merged
ppfeister merged 2 commits into
sherlock-project:masterfrom
salmanrajz:fix/unicode-decode-error-special-chars
May 5, 2026
Merged

fix: handle UnicodeDecodeError on usernames with special characters#2853
ppfeister merged 2 commits into
sherlock-project:masterfrom
salmanrajz:fix/unicode-decode-error-special-chars

Conversation

@salmanrajz
Copy link
Copy Markdown
Contributor

Fixes #2730

Problem

Usernames containing non-ASCII characters (e.g. Émile) crash Sherlock with a UnicodeDecodeError. The exception is raised inside the requests library during redirect handling when a server returns a non-UTF-8 encoded Location header.

UnicodeDecodeError is not a subclass of requests.exceptions.RequestException, so it escapes all existing except blocks in get_response() and propagates up as an unhandled crash.

Fix

Added a catch for UnicodeError (parent of both UnicodeDecodeError and UnicodeEncodeError) in get_response(). Sites that trigger encoding errors are now gracefully reported as Encoding Error instead of crashing the entire scan.

Changes

  • sherlock_project/sherlock.py: Added except UnicodeError handler in get_response()
  • tests/test_unicode.py: Added regression tests for both UnicodeDecodeError and UnicodeEncodeError

Testing

$ python -m pytest tests/test_unicode.py -v
tests/test_unicode.py::test_get_response_handles_unicode_decode_error PASSED
tests/test_unicode.py::test_get_response_handles_unicode_encode_error PASSED

@salmanrajz salmanrajz requested a review from ppfeister as a code owner March 31, 2026 15:53
Fixes sherlock-project#2730. Usernames containing non-ASCII characters (e.g. 'Émile')
can trigger a UnicodeDecodeError inside the requests library during
redirect handling. This exception is not a subclass of
requests.exceptions.RequestException, so it escaped all existing
except blocks in get_response() and crashed the program.

Added a catch for UnicodeError (parent of both UnicodeDecodeError and
UnicodeEncodeError) so these sites are gracefully skipped instead of
crashing the entire scan.

Added regression tests in tests/test_unicode.py.
@salmanrajz salmanrajz force-pushed the fix/unicode-decode-error-special-chars branch from 7adf61b to 4656d95 Compare March 31, 2026 15:58
@salmanrajz
Copy link
Copy Markdown
Contributor Author

CI Note: The tox-lint and docker-build-test checks pass. The tox-matrix failures are all caused by 3 pre-existing broken tests in test_ux.py (test_remove_nsfw, test_nsfw_explicit_selection) that reference Pornhub which appears to have been removed from the site list. These failures are unrelated to this PR.

Our new tests in test_unicode.py pass across all matrix combinations.

Pornhub was added to the remote false_positive_exclusions.txt, causing
test_remove_nsfw and test_nsfw_explicit_selection to fail since the
site gets filtered out before the test runs. Replaced with Xvideos and
Erome which are NSFW-flagged but not excluded.
@ppfeister
Copy link
Copy Markdown
Member

Good fix on both counts, thank you
Now let's get this merged....

@ppfeister ppfeister merged commit 43a354b into sherlock-project:master May 5, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Crash: UnicodeDecodeError on usernames with special characters

2 participants