|
| 1 | +# Windows Console Notes |
| 2 | + |
| 3 | +```{versionadded} 6.0 |
| 4 | +``` |
| 5 | + |
| 6 | +Click emulates output streams on Windows to support unicode to the Windows console through separate APIs and we perform |
| 7 | +different decoding of parameters. |
| 8 | + |
| 9 | +Here is a brief overview of how this works and what it means to you. |
| 10 | + |
| 11 | +## Unicode Arguments |
| 12 | + |
| 13 | +Click internally is generally based on the concept that any argument can come in as either byte string or unicode string |
| 14 | +and conversion is performed to the type expected value as late as possible. This has some advantages as it allows us to |
| 15 | +accept the data in the most appropriate form for the operating system and Python version. |
| 16 | + |
| 17 | +This caused some problems on Windows where initially the wrong encoding was used and garbage ended up in your input |
| 18 | +data. We not only fixed the encoding part, but we also now extract unicode parameters from `sys.argv`. |
| 19 | + |
| 20 | +There is also another limitation with this: if `sys.argv` was modified prior to invoking a click handler, we have to |
| 21 | +fall back to the regular byte input in which case not all unicode values are available but only a subset of the codepage |
| 22 | +used for parameters. |
| 23 | + |
| 24 | +## Unicode Output and Input |
| 25 | + |
| 26 | +Unicode output and input on Windows is implemented through the concept of a dispatching text stream. What this means is |
| 27 | +that when click first needs a text output (or input) stream on windows it goes through a few checks to figure out of a |
| 28 | +windows console is connected or not. If no Windows console is present then the text output stream is returned as such |
| 29 | +and the encoding for that stream is set to `utf-8` like on all platforms. |
| 30 | + |
| 31 | +However if a console is connected the stream will instead be emulated and use the cmd.exe unicode APIs to output text |
| 32 | +information. In this case the stream will also use `utf-16-le` as internal encoding. However there is some hackery going |
| 33 | +on that the underlying raw IO buffer is still bypassing the unicode APIs and byte output through an indirection is still |
| 34 | +possible. |
| 35 | + |
| 36 | +- This unicode support is limited to `click.echo`, `click.prompt` as well as `click.get_text_stream`. |
| 37 | +- Depending on if unicode values or byte strings are passed the control flow goes completely different places internally |
| 38 | + which can have some odd artifacts if data partially ends up being buffered. Click attempts to protect against that by |
| 39 | + manually always flushing but if you are mixing and matching different string types to `stdout` or `stderr` you will |
| 40 | + need to manually flush. |
| 41 | +- The raw output stream is set to binary mode, which is a global operation on Windows, so `print` calls will be |
| 42 | + affected. Prefer `click.echo` over `print`. |
| 43 | +- On Windows 7 and below, there is a limitation where at most 64k characters can be written in one call in binary mode. |
| 44 | + In this situation, `sys.stdout` and `sys.stderr` are replaced with wrappers that work around the limitation. |
| 45 | + |
| 46 | +Another important thing to note is that the Windows console's default fonts do not support a lot of characters which |
| 47 | +means that you are mostly limited to international letters but no emojis or special characters. |
0 commit comments