Skip to content

Clarify default encoding behavior of open() with UTF-8 mode PEP 686 (Windows confusion) #146528

@botkero

Description

@botkero

Documentation

Hi,

The documentation for open() currently states:

In text mode, if encoding is not specified the encoding used is platform-dependent: locale.getencoding() is called to get the current locale encoding.

However, my observations on Windows suggest that the actual behaviour is unclear or at least confusing in practice.

with open("result_utf-8.txt", "w") as f:
    f.write("fff")

with open("result_ansi.txt", "w") as f:
    f.write("ffföäü")

Started with python -B myscript.py

I use Python 3.13.1 on Windows 11 (German language pack).


According to debugging, the encoding used internally appears to be cp1252 (matching locale.getencoding()).
However, when opening the files in Notepad:

  • result_utf8.txt is detected as UTF-8
  • result_ansi.txt is detected as ANSI (cp1252)

This gives the impression that the effective encoding depends on file content, which is unexpected.

I would expect one of the following:

  • The encoding is consistently cp1252 (as per locale.getencoding()), or
  • The encoding is consistently UTF-8 (e.g. due to UTF-8 mode / PEP686)

I'm not sure exactly what category this falls under, whether it's documentation or a bug in Python. Maybe I'm completely off track here, and actually, everything fits. Please clarify this for me.

Edit:

Hmm, after further investigation. It seems everything is correct after all. I should have checked the bytes. The result is ANSI in both cases. Notepad just incorrectly displays UTF-8 in the first case.

Metadata

Metadata

Assignees

No one assigned

    Labels

    docsDocumentation in the Doc dir

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions