Skip to content

Commit 911b122

Browse files
authored
Fix encoding error when C parser reads external source files (ruby#1657)
When a C file references another source file via `/* in file.c */`, the parser read it with bare `File.read` which uses `Encoding.default_external`. On systems where this is US-ASCII (e.g. Debian CI), non-ASCII bytes in the source file cause `ArgumentError: invalid byte sequence in US-ASCII` in String#scan. Use `RDoc::Encoding.read_file` instead, which reads in binary mode and properly handles encoding detection and transcoding. This was triggered by Ruby commit [`a2531ba293`](ruby/ruby@a2531ba293) which added UTF-8 right arrows (→) in comments in `class.c`, which is referenced from `object.c` via `/* in class.c */`.
1 parent 0e9daee commit 911b122

File tree

2 files changed

+37
-1
lines changed

2 files changed

+37
-1
lines changed

lib/rdoc/parser/c.rb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1016,7 +1016,7 @@ def handle_method(type, var_name, meth_name, function, param_count,
10161016
file_name = File.join @file_dir, source_file
10171017

10181018
if File.exist? file_name then
1019-
file_content = File.read file_name
1019+
file_content = RDoc::Encoding.read_file file_name, @options.encoding
10201020
else
10211021
@options.warn "unknown source #{source_file} for #{meth_name} in #{@file_name}"
10221022
end

test/rdoc/parser/c_test.rb

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2292,6 +2292,42 @@ def test_reparse_c_file_no_duplicates
22922292
assert_include method_names, 'baz'
22932293
end
22942294

2295+
def test_handle_method_source_file_with_non_ascii
2296+
# Regression test: when the C parser reads an external source file
2297+
# (via "/* in file.c */"), it must use RDoc::Encoding.read_file instead
2298+
# of File.read. On systems where Encoding.default_external is US-ASCII,
2299+
# bare File.read produces a US-ASCII string that raises ArgumentError
2300+
# on String#scan when the file contains non-ASCII bytes.
2301+
source_path = File.join(File.dirname(@fn), 'greet.c')
2302+
File.binwrite source_path, <<~C.encode('UTF-8')
2303+
/*
2304+
* Returns a greeting \u2014 "h\u00e9llo w\u00f6rld"
2305+
*/
2306+
VALUE
2307+
rb_greet(VALUE obj) {
2308+
return rb_str_new2("hello");
2309+
}
2310+
C
2311+
2312+
parser = util_parser <<~C
2313+
void Init_Foo(void) {
2314+
VALUE cFoo = rb_define_class("Foo", rb_cObject);
2315+
rb_define_method(cFoo, "greet", rb_greet, 0); /* in greet.c */
2316+
}
2317+
C
2318+
2319+
parser.scan
2320+
2321+
foo = @top_level.find_module_named 'Foo'
2322+
assert foo, 'Foo class should be found'
2323+
2324+
greet = foo.method_list.first
2325+
assert greet, 'greet method should be found'
2326+
assert_equal 'greet', greet.name
2327+
ensure
2328+
File.delete source_path if source_path && File.exist?(source_path)
2329+
end
2330+
22952331
def util_get_class(content, name = nil)
22962332
@parser = util_parser content
22972333
@parser.scan

0 commit comments

Comments
 (0)