fix(coding): write user-defined functions module as UTF-8#2819
Open
genisis0x wants to merge 3 commits into
Open
fix(coding): write user-defined functions module as UTF-8#2819genisis0x wants to merge 3 commits into
genisis0x wants to merge 3 commits into
Conversation
LocalCommandLineCodeExecutor._setup_functions wrote the generated functions module via Path.write_text without an explicit encoding, inheriting locale.getpreferredencoding() (cp1252 on most Windows installs). Any user-supplied function whose source contained a non-cp1252 glyph — non-Latin docstring, emoji, smart quotes in an Annotated description, accented identifiers — would crash mid-write with UnicodeEncodeError, leaving _setup_functions_complete=False and the next execute_code_blocks call to silently re-run the same broken path. Pass encoding='utf-8' to the write_text call. Inline comment explains why so a future refactor doesn't drop the kwarg. Tests: - test_setup_functions_writes_module_as_utf8 drives _setup_functions with a FunctionWithRequirements.from_str source that contains Café, em-dash, ☕, 例, and 🚀; reads the resulting module file as raw bytes and asserts the UTF-8 round-trip succeeds. - test_setup_functions_passes_utf8_encoding_to_write_text patches Path.write_text and asserts encoding='utf-8' is in the kwargs of the first call inside _setup_functions. This guards the contract on POSIX, where the platform default is already UTF-8 and the byte-level test would otherwise pass even if the kwarg were dropped. Both tests tolerate the downstream subprocess syntax-check call that _setup_functions runs after the write — it depends on a 'python' executable being on PATH, which is not guaranteed in every environment, and the write we care about has already happened by the time the subprocess runs.
This was referenced May 15, 2026
marklysze
approved these changes
May 17, 2026
Collaborator
marklysze
left a comment
There was a problem hiding this comment.
Correct fix — func_file.write_text(func_file_content, encoding="utf-8") is the minimal change needed to survive any non-cp1252 glyph in a user-supplied function's source. The parametrized test with a realistic payload (emoji, CJK, diacritics) is solid. LGTM.
marklysze
approved these changes
May 17, 2026
Collaborator
marklysze
left a comment
There was a problem hiding this comment.
Clean, targeted fix. Path.write_text defaulting to locale.getpreferredencoding() is a well-known Windows footgun — good catch. The test correctly mocks locale.getpreferredencoding to simulate a cp1252 environment and verifies the write succeeds with non-cp1252 content. Approved.
Codecov Report✅ All modified and coverable lines are covered by tests.
... and 67 files with indirect coverage changes 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
LocalCommandLineCodeExecutor._setup_functionswrites the generated functions module to disk via:Path.write_textdefaults tolocale.getpreferredencoding(), which iscp1252on most Windows installs. Any user-supplied function whose source contains a non-cp1252 glyph crashes mid-write withUnicodeEncodeError. Common triggers:Annotated[..., Field(description=\"…\")]parameterThe first call also crashes silently in the sense that
_setup_functions_completeis left atFalse, so the nextexecute_code_blocksinvocation re-runs the same broken write path.Same defect class as #1731 / #2814 / #2815, on the LocalCommandLineCodeExecutor surface.
Fix
Pass
encoding=\"utf-8\"explicitly to thewrite_textcall. Inline comment explains why so a future refactor doesn't drop the kwarg. No public API change; no new options.Test
test/coding/test_user_defined_functions.pygets two regression tests:test_setup_functions_writes_module_as_utf8drives_setup_functionswith aFunctionWithRequirements.from_strsource containingCafé, em-dash, ☕, 例, and 🚀; reads the resulting module as raw bytes and decodes as UTF-8 explicitly.test_setup_functions_passes_utf8_encoding_to_write_textpatchesPath.write_textand assertsencoding='utf-8'is in the kwargs of the first call inside_setup_functions. This guards the contract on POSIX, where the platform default is already UTF-8 and the byte-level test would otherwise pass even if the kwarg were dropped.Both tests tolerate the downstream subprocess syntax-check call
_setup_functionsruns after the write — it depends on apythonexecutable being on PATH, not guaranteed in every environment, and the write we care about has already happened by the time the subprocess fires..venv/bin/python -m pytest test/coding/test_user_defined_functions.py::test_setup_functions_writes_module_as_utf8 test/coding/test_user_defined_functions.py::test_setup_functions_passes_utf8_encoding_to_write_textreports 2 passed. The byte-level test confirms the round-trip; the patch-based test confirms the kwarg, and reverting the production change makes the second one fail on POSIX.