Commit eaa42e3
fix(runner): use utf-8 encoding for subprocess I/O on Windows
Objective: On Korean Windows (and other non-UTF-8 Windows locales),
running opendataloader-pdf fails immediately with
UnicodeDecodeError: 'cp949' codec can't decode byte 0xe2
because the JAR outputs UTF-8 but the Python wrapper reads it
using the system locale encoding (cp949).
Approach: Replace locale.getpreferredencoding(False) with a hard-coded
"utf-8" for both subprocess.run (quiet mode) and subprocess.Popen
(streaming mode) — the JAR always outputs UTF-8 regardless of OS locale,
so tying the decoder to the system locale was always wrong.
Also switch sys.stdout.write(line) to sys.stdout.buffer.write with
utf-8 encoding so the decoded text reaches the terminal correctly on
Windows where stdout may also default to cp949.
Evidence: Verified source after patch — encoding="utf-8" appears in
both call sites (2 occurrences), locale.getpreferredencoding removed
(0 occurrences), stdout.buffer.write added (1 occurrence).
Before: UnicodeDecodeError on first byte of UTF-8 multibyte sequence.
After: subprocess reads and writes UTF-8 correctly on cp949 Windows.1 parent ad6e906 commit eaa42e3
1 file changed
Lines changed: 3 additions & 3 deletions
Lines changed: 3 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
29 | | - | |
| 29 | + | |
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
| |||
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
39 | | - | |
| 39 | + | |
40 | 40 | | |
41 | 41 | | |
42 | 42 | | |
43 | | - | |
| 43 | + | |
44 | 44 | | |
45 | 45 | | |
46 | 46 | | |
| |||
0 commit comments