Skip to content

Commit 99a4050

Browse files
kddnewtonmatzbot
authored andcommitted
[ruby/prism] Restructure regexp encoding validation
Move all the logic from prism.c into regexp.c. Now regexp.c does two passes. The first pass scans the raw source to track escape types, non-ASCII literals, and multibyte validity for encoding validation. The second pass scans the unescaped content for named capture extraction (needed because escape sequences like line continuations alter group names). Fixed a couple of things along the way. ascii_only was previously computed from unescaped content, but we can do that as we go to avoid scanning again. Unicode properties also now properly error for regexp with modifiers. ruby/prism@0944c7fba2
1 parent 68bf517 commit 99a4050

7 files changed

Lines changed: 1064 additions & 372 deletions

File tree

prism/config.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -248,7 +248,9 @@ errors:
248248
- PATTERN_TERM_PAREN
249249
- PIPEPIPEEQ_MULTI_ASSIGN
250250
- REGEXP_ENCODING_OPTION_MISMATCH
251+
- REGEXP_ESCAPED_NON_ASCII_IN_UTF8
251252
- REGEXP_INCOMPAT_CHAR_ENCODING
253+
- REGEXP_INVALID_CHAR_PROPERTY
252254
- REGEXP_INVALID_UNICODE_RANGE
253255
- REGEXP_NON_ESCAPED_MBC
254256
- REGEXP_PARSE_ERROR

prism/parser.h

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -933,12 +933,6 @@ struct pm_parser {
933933
*/
934934
bool semantic_token_seen;
935935

936-
/**
937-
* True if the current regular expression being lexed contains only ASCII
938-
* characters.
939-
*/
940-
bool current_regular_expression_ascii_only;
941-
942936
/**
943937
* By default, Ruby always warns about mismatched indentation. This can be
944938
* toggled with a magic comment.

prism/prism.c

Lines changed: 80 additions & 290 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)