Skip to content

Commit 796cd68

Browse files
committed
fix(ls): use GetACP to detect UTF-8 encoding on Windows
On Windows, locale environment variables (LC_ALL, LC_COLLATE, LANG) are typically unset, causing get_locale_from_env() to default to UEncoding::Ascii. This makes non-ASCII filenames display as octal escape sequences or `?` characters in ls output. Fix by querying the system ANSI code page via GetACP() when no locale variables are set. If the active code page is 65001 (UTF-8), use UEncoding::Utf8. This aligns with GNU coreutils' gnulib approach which calls locale_charset() -> GetACP() on Windows. Fixes: #11103
1 parent 5605eac commit 796cd68

1 file changed

Lines changed: 30 additions & 2 deletions

File tree

  • src/uucore/src/lib/features/i18n

src/uucore/src/lib/features/i18n/mod.rs

Lines changed: 30 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,24 @@ pub enum UEncoding {
2828
// This ensures real locales like "en-US" won't match
2929
const DEFAULT_LOCALE: Locale = locale!("und");
3030

31+
/// On Windows, detect the encoding from the system ANSI code page.
32+
/// Returns `UEncoding::Utf8` if the active code page is 65001 (UTF-8),
33+
/// otherwise `UEncoding::Ascii`.
34+
///
35+
/// This mirrors the GNU lib approach where `locale_charset()` calls `GetACP()` on Windows.
36+
#[cfg(target_os = "windows")]
37+
fn get_windows_encoding() -> UEncoding {
38+
unsafe extern "system" {
39+
fn GetACP() -> u32;
40+
}
41+
let acp = unsafe { GetACP() };
42+
if acp == 65001 {
43+
UEncoding::Utf8
44+
} else {
45+
UEncoding::Ascii
46+
}
47+
}
48+
3149
/// Look at 3 environment variables in the following order
3250
///
3351
/// 1. LC_ALL
@@ -70,8 +88,18 @@ pub fn get_locale_from_env(locale_name: &str) -> (Locale, UEncoding) {
7088
return (locale, encoding);
7189
}
7290
}
73-
// Default POSIX locale representing LC_ALL=C
74-
(DEFAULT_LOCALE, UEncoding::Ascii)
91+
// No locale environment variables set.
92+
// On Windows, check the system ANSI code page to determine encoding,
93+
// matching GNU coreutils' approach (locale_charset -> GetACP).
94+
#[cfg(target_os = "windows")]
95+
{
96+
(DEFAULT_LOCALE, get_windows_encoding())
97+
}
98+
#[cfg(not(target_os = "windows"))]
99+
{
100+
// Default POSIX locale representing LC_ALL=C
101+
(DEFAULT_LOCALE, UEncoding::Ascii)
102+
}
75103
}
76104

77105
/// Get the collating locale from the environment

0 commit comments

Comments
 (0)