fix(ls): force C locale so non-English month names don't break parsing#1390
fix(ls): force C locale so non-English month names don't break parsing#1390GabriWar wants to merge 1 commit intortk-ai:masterfrom
Conversation
LS_DATE_RE is English-only and case-sensitive. When the user's locale prints months in another language (e.g. LC_TIME=pt_BR.UTF-8 → "abr", "mar", "out"), every `ls -la` line fails the regex, parse_ls_line returns None, and compact_ls falls through to the "(empty)" sentinel — even for directories that clearly are not empty. Setting LC_ALL=C on the spawned `ls` process guarantees English month names regardless of the user's locale, which is both the smallest fix and consistent with the rest of the parser contract (it expects a specific, stable ls format). Repro before fix: LC_TIME=pt_BR.UTF-8 rtk -vvv ls /some/nonempty/dir # Chars: 9773 → 8 (100% reduction) # (empty)
📊 Automated PR Analysis
SummaryFixes a bug where Review Checklist
Analyzed automatically by wshm · This is an automated analysis, not a human review. |
There was a problem hiding this comment.
Pull request overview
Fixes rtk ls incorrectly returning (empty) under non-English locales by forcing the spawned ls process into a locale that matches the English-only LS_DATE_RE parsing contract.
Changes:
- Set a fixed locale on the
lssubprocess to ensure month abbreviations match the parser’s regex. - Add inline documentation explaining the locale/parsing interaction and failure mode.
| // Force C locale so month names match LS_DATE_RE, which is English-only. | ||
| // Without this, non-English locales (e.g. LC_TIME=pt_BR.UTF-8 → "abr"/"mar") | ||
| // cause every line to fail the regex, and compact_ls returns "(empty)" | ||
| // even for non-empty directories. | ||
| cmd.env("LC_ALL", "C"); |
There was a problem hiding this comment.
Using LC_ALL=C forces all locale categories for the spawned ls process. That can change filename rendering/escaping and collation (and may cause non-ASCII filenames to be shown differently), even though this bug only depends on month names from LC_TIME. Prefer setting LC_TIME=C (or LC_TIME=POSIX) so date parsing is stabilized without affecting character encoding/display behavior.
| // Force C locale so month names match LS_DATE_RE, which is English-only. | |
| // Without this, non-English locales (e.g. LC_TIME=pt_BR.UTF-8 → "abr"/"mar") | |
| // cause every line to fail the regex, and compact_ls returns "(empty)" | |
| // even for non-empty directories. | |
| cmd.env("LC_ALL", "C"); | |
| // Force only the time locale to C so month names match LS_DATE_RE, | |
| // which is English-only. Without this, non-English locales | |
| // (e.g. LC_TIME=pt_BR.UTF-8 → "abr"/"mar") cause every line to fail | |
| // the regex, and compact_ls returns "(empty)" even for non-empty | |
| // directories, while preserving other locale-sensitive output behavior. | |
| cmd.env("LC_TIME", "C"); |
Summary
rtk lsreturns(empty)for any non-empty directory when the user's locale prints non-English month abbreviations (e.g.LC_TIME=pt_BR.UTF-8→abr,mar,out).LS_DATE_REinsrc/cmds/system/ls.rsis English-only and case-sensitive:r"\s+(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+\d{1,2}\s+(?:\d{4}|\d{2}:\d{2})\s+"When
ls -laprints a Portuguese/French/etc. month, the regex doesn't match,parse_ls_linereturnsNonefor every line, andcompact_lsfalls through to the(empty)sentinel at line 194 — silently hiding every file and directory.Repro
A 100% reduction on a directory with 140+ entries is a strong hint the filter is eating real data.
Fix
Set
LC_ALL=Con the spawnedlsprocess, forcing English month names regardless of the user's locale. Smallest possible change, localized to where the locale-dependent parsing happens.This keeps the existing regex/parser contract intact (it already assumes a stable English format) and doesn't touch any other command path.
After
Scope check:
grep -rn "Jan|Feb|Mar" src/confirmsls.rsis the only runtime path with locale-dependent date parsing.tree/find/ other proxies don't hit this.Test plan
LC_TIME=pt_BR.UTF-8before fix — reproduces(empty)rtk lslists the directory correctlycargo build --releasecleanlstests unaffected (they feed the parser directly and are locale-independent)