Skip to content

FileHistory.store_string crashes with UnicodeEncodeError when string contains surrogate pairs #2061

@JiajunBernoulli

Description

@JiajunBernoulli

Description

FileHistory.store_string crashes when the input string contains surrogate pairs (e.g., \ud83d\udc3a from Windows clipboard). This commonly occurs on Windows when users paste text containing emoji from certain applications.

Steps to Reproduce

from prompt_toolkit.history import FileHistory

history = FileHistory("/tmp/test_history")
# This string contains a lone surrogate that cannot be encoded to UTF-8
history.append_string("\udead")  # UnicodeEncodeError: 'utf-8' codec can't encode character '\udead'

Actual Behavior

UnicodeEncodeError: 'utf-8' codec can't encode characters in position X-Y: surrogates not allowed

Root Cause

In prompt_toolkit/history.py, the store_string method encodes directly without handling surrogates:

def store_string(self, string: str) -> None:
    with open(self.filename, "ab") as f:
        def write(t: str) -> None:
            f.write(t.encode("utf-8"))  # ← No error handling for surrogates
        ...

Suggested Fix

Add error handling when encoding:

def write(t: str) -> None:
    f.write(t.encode("utf-8", errors="replace"))  # or "surrogatepass"

Environment

  • Python: 3.11+
  • prompt_toolkit: latest
  • OS: Windows (most affected due to UTF-16 clipboard handling)

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions