-
-
Notifications
You must be signed in to change notification settings - Fork 34.4k
Clarify default encoding behavior of open() with UTF-8 mode PEP 686 (Windows confusion) #146528
Description
Documentation
Hi,
The documentation for open() currently states:
In text mode, if encoding is not specified the encoding used is platform-dependent: locale.getencoding() is called to get the current locale encoding.
However, my observations on Windows suggest that the actual behaviour is unclear or at least confusing in practice.
with open("result_utf-8.txt", "w") as f:
f.write("fff")
with open("result_ansi.txt", "w") as f:
f.write("ffföäü")Started with python -B myscript.py
I use Python 3.13.1 on Windows 11 (German language pack).
According to debugging, the encoding used internally appears to be cp1252 (matching locale.getencoding()).
However, when opening the files in Notepad:
result_utf8.txtis detected asUTF-8result_ansi.txtis detected asANSI(cp1252)
This gives the impression that the effective encoding depends on file content, which is unexpected.
I would expect one of the following:
- The encoding is consistently cp1252 (as per
locale.getencoding()), or - The encoding is consistently UTF-8 (e.g. due to UTF-8 mode / PEP686)
I'm not sure exactly what category this falls under, whether it's documentation or a bug in Python. Maybe I'm completely off track here, and actually, everything fits. Please clarify this for me.
Edit:
Hmm, after further investigation. It seems everything is correct after all. I should have checked the bytes. The result is ANSI in both cases. Notepad just incorrectly displays UTF-8 in the first case.
Metadata
Metadata
Assignees
Labels
Projects
Status