Skip to content

Commit e8c8ec8

Browse files
committed
File.dirname: add a spec for Shift-JS handling
While trying to speedup various `File.*` methods, I realized they were way slower and complicated than they should for no apparent reason. However after asking Nobu he explained that Shift-JS encoded text can contain `0x5C` (ASCII backslash) as the second byte of a two byte character sequence. Since on Windows `0x5C` is `File::ALT_SEPARATOR`, this can easily break naive path related algorithms searching for directory separators.
1 parent ddcd042 commit e8c8ec8

1 file changed

Lines changed: 31 additions & 0 deletions

File tree

core/file/dirname_spec.rb

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,37 @@ def object.to_int; 2; end
7878
File.dirname("foo/../").should == "foo"
7979
end
8080

81+
it "rejects strings encoded with non ASCII-compatible or dummy encodings" do
82+
valid, invalid = Encoding.list.partition { |e| e.ascii_compatible? && !e.dummy? }
83+
84+
valid.reject do |enc|
85+
path = "/foo/bar".encode(enc)
86+
expected = "/foo".encode(enc)
87+
88+
File.dirname(path) == expected
89+
end.should == []
90+
91+
invalid.reject do |enc|
92+
path = "/foo/bar".encode(enc)
93+
rescue Encoding::ConverterNotFoundError
94+
true
95+
else
96+
begin
97+
File.dirname(path)
98+
false
99+
rescue Encoding::CompatibilityError
100+
true
101+
end
102+
end.should == []
103+
end
104+
105+
it "handles Shift-JS 0x5C (\\) as second byte of a multi-byte sequence" do
106+
# dir/fileソname.txt
107+
path = "dir/file\x83\x5cname.txt".force_encoding(Encoding::SHIFT_JIS)
108+
path.valid_encoding?.should be_true
109+
File.dirname(path).should == "dir"
110+
end
111+
81112
platform_is_not :windows do
82113
it "returns all the components of filename except the last one (edge cases on non-windows)" do
83114
File.dirname('/////').should == '/'

0 commit comments

Comments
 (0)