Description
FileHistory.store_string crashes when the input string contains surrogate pairs (e.g., \ud83d\udc3a from Windows clipboard). This commonly occurs on Windows when users paste text containing emoji from certain applications.
Steps to Reproduce
from prompt_toolkit.history import FileHistory
history = FileHistory("/tmp/test_history")
# This string contains a lone surrogate that cannot be encoded to UTF-8
history.append_string("\udead") # UnicodeEncodeError: 'utf-8' codec can't encode character '\udead'
Actual Behavior
UnicodeEncodeError: 'utf-8' codec can't encode characters in position X-Y: surrogates not allowed
Root Cause
In prompt_toolkit/history.py, the store_string method encodes directly without handling surrogates:
def store_string(self, string: str) -> None:
with open(self.filename, "ab") as f:
def write(t: str) -> None:
f.write(t.encode("utf-8")) # ← No error handling for surrogates
...
Suggested Fix
Add error handling when encoding:
def write(t: str) -> None:
f.write(t.encode("utf-8", errors="replace")) # or "surrogatepass"
Environment
- Python: 3.11+
- prompt_toolkit: latest
- OS: Windows (most affected due to UTF-16 clipboard handling)
Related
Description
FileHistory.store_stringcrashes when the input string contains surrogate pairs (e.g.,\ud83d\udc3afrom Windows clipboard). This commonly occurs on Windows when users paste text containing emoji from certain applications.Steps to Reproduce
Actual Behavior
Root Cause
In
prompt_toolkit/history.py, thestore_stringmethod encodes directly without handling surrogates:Suggested Fix
Add error handling when encoding:
Environment
Related