Skip to content

Fix: handle surrogate pairs in FileHistory.store_string#2062

Open
JiajunBernoulli wants to merge 1 commit intoprompt-toolkit:mainfrom
JiajunBernoulli:fix-surrogate-encoding
Open

Fix: handle surrogate pairs in FileHistory.store_string#2062
JiajunBernoulli wants to merge 1 commit intoprompt-toolkit:mainfrom
JiajunBernoulli:fix-surrogate-encoding

Conversation

@JiajunBernoulli
Copy link
Copy Markdown

Summary

Fix FileHistory.store_string to gracefully handle surrogate pairs in input strings by using errors="replace" when encoding to UTF-8.

Problem

FileHistory.store_string crashes with UnicodeEncodeError when the input string contains surrogate pairs (e.g., \ud83d\udc3a from Windows clipboard). This commonly occurs on Windows when users paste text containing emoji from certain applications.

Solution

Add errors="replace" to the UTF-8 encoding call, which replaces invalid surrogates with the Unicode replacement character (U+FFFD) instead of raising an exception.

Test

from prompt_toolkit.history import FileHistory

history = FileHistory("/tmp/test_history")
# Before: UnicodeEncodeError
# After: Works, surrogate replaced with \ufffd
history.append_string("\udead")

Fixes #2061

@JiajunBernoulli JiajunBernoulli force-pushed the fix-surrogate-encoding branch from 2483efa to 6058198 Compare April 6, 2026 14:09
FileHistory.store_string crashes with UnicodeEncodeError when the
input string contains surrogate pairs (e.g., from Windows clipboard
containing emoji). Add errors='replace' to gracefully handle invalid
surrogates by replacing them with the Unicode replacement character.

Fixes prompt-toolkit#2061
@JiajunBernoulli JiajunBernoulli force-pushed the fix-surrogate-encoding branch from 6058198 to 2e9062f Compare April 6, 2026 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FileHistory.store_string crashes with UnicodeEncodeError when string contains surrogate pairs

1 participant