Skip to content

Rewrite decode.py to always pick a valid codec#19

Merged
sethmlarson merged 4 commits intopython:mainfrom
StanFromIreland:decode
Apr 7, 2026
Merged

Rewrite decode.py to always pick a valid codec#19
sethmlarson merged 4 commits intopython:mainfrom
StanFromIreland:decode

Conversation

@StanFromIreland
Copy link
Copy Markdown
Member

I don't quite get the original idea behind the fuzzer, as I assume currently it is failing with a LookupError on an invalid codec most of the time, never reaching any actual decoding. Instead, I suggest we drop the dictionary and pick a known codec. The codec rejection path is quite simple, I don't really think it is worth spending time fuzzing it.

@StanFromIreland StanFromIreland requested a review from a team March 26, 2026 20:27
Copy link
Copy Markdown
Collaborator

@sethmlarson sethmlarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're using FuzzerInput[0] directly as an integer this means potentially we'd start missing codecs if there are more than 256 of them. How many codecs are there today, are we at risk of getting close to that number? If so: maybe we take two bytes for the index, add an assert in there than len(ALL_CODECS) < 0xFFFF and call that good?

@StanFromIreland
Copy link
Copy Markdown
Member Author

We currently have 120, IIRC the last codec added was the oem codec, and that would have been ~8 years ago. I also updated to use pkgutil to iterate over encodings instead, that should be more complete.

@sethmlarson sethmlarson merged commit 148cf5d into python:main Apr 7, 2026
1 check passed
@StanFromIreland StanFromIreland deleted the decode branch April 7, 2026 20:43
@StanFromIreland
Copy link
Copy Markdown
Member Author

Thanks for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants