Skip to content

optimized syntax '+' cause 'random_recursive_mutation' error #42

@0x7Fancy

Description

@0x7Fancy
          going further, I found a way to mitigate;

based on the above issues, we create simpler test cases, test.json:

{
    "<entry>": [["I ", "<stmt1>", "like C++\n"]],
    "<stmt1>": [["<NODE>", "<stmt1>"], []],
    "<NODE>": [["very "]]
}

tanslate to test.g4:

grammar test;
entry: 'I ' stmt1 'like C++\n' EOF
     ;
stmt1: 
     | NODE stmt1
     ;
NODE : 'very '
     ;

and input 40960_very.txt:

I very very ...(*40956)... very very like C++

running with antlr4-parse:
Screen Shot 2024-01-08 at 17 56 39

from the perspective of antlr4, we can use the + syntax to describe test.g4, and ignore this prefix matching, as follows test.g4:

grammar test;
entry: 'I ' stmt1 'like C++\n' EOF
     ;
stmt1: 
     | (NODE)+
     ;
NODE : 'very '
     ;

running again with antlr4-parse:
Screen Shot 2024-01-08 at 17 59 40

so I made a patch to implement the above ideas, please refer to 0x7Fancy@6eae7d1;

I have only implemented the optimization of head recursion and tail recursion here, which is simple and easy to understand. for intermediate recursion, I think it can be rewritten as head/tail recursion in json

of course, this is just a mitigation measure. When the mutation generates a sufficiently complex syntax tree, it may still cause antlr4 to get stuck in syntax parsing.

Originally posted by @0x7Fancy in #17 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions