Commit 8c8157f
authored
Add limited support for backtracking Regex single char loops to simplified code gen (dotnet#60385)
* Add limited support for backtracking Regex single char loops to simplified code gen
In .NET 5, we added simpler compiled code gen for regexes that didn't entail backtracking (or that had only very constrained backtracking, such as a top-level alternation). In our corpus of ~90K regular expressions, that code generator is employed for ~40% of them. The primary purpose of adding that code generator initially was performance, as it was able to avoid lots of the expense that original code generator had, especially for simple regexes. However, with the source generator, it's much more valuable to use this code gen as the generated code is human-readable and really helps to understand how the regex is operating, is much more easily debugged, etc.
This change allows the simplified code gen to be used even if there are backtracking single-character loops in the regex, as long as those loops are in a top-level concatenation (or a simple grouping structure like a capture). This increases the percentage of expressions in our corpus that will use the simplified code gen to ~65%.
Once we have the simplified loop code gen, it's also a lot easier to add in vectorization of searching for the next location to back off to based on a literal that comes immediately after the loop (e.g. "abc.*def"). This adds support into both RegexOptions.Compiled and the source generator to use LastIndexOf in that case.
The change also entailed adding/updating a few recursive functions. The plan has been to adopt the same model as in System.Linq.Expressions, Roslyn, and elsewhere, where we fork processing to continue on a secondary thread, rather than trying to enforce some max depth or rewrite as iterative, so I've done that as part of this change as well.
* Address PR feedback
* Clean up partial classes in SourceGenRegexAsync test helper1 parent 565f3ee commit 8c8157f
13 files changed
Lines changed: 924 additions & 540 deletions
File tree
- src/libraries/System.Text.RegularExpressions
- gen
- src
- System
- Text/RegularExpressions
- Symbolic
- Threading
- tests
Lines changed: 271 additions & 135 deletions
Large diffs are not rendered by default.
Lines changed: 1 addition & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
| 32 | + | |
32 | 33 | | |
33 | 34 | | |
34 | 35 | | |
| |||
Lines changed: 2 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
| 10 | + | |
10 | 11 | | |
11 | 12 | | |
12 | 13 | | |
| |||
17 | 18 | | |
18 | 19 | | |
19 | 20 | | |
20 | | - | |
21 | 21 | | |
| 22 | + | |
22 | 23 | | |
23 | 24 | | |
24 | 25 | | |
| |||
53 | 54 | | |
54 | 55 | | |
55 | 56 | | |
56 | | - | |
57 | 57 | | |
58 | 58 | | |
59 | 59 | | |
| |||
Lines changed: 234 additions & 143 deletions
Large diffs are not rendered by default.
Lines changed: 281 additions & 184 deletions
Large diffs are not rendered by default.
Lines changed: 1 addition & 9 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1659 | 1659 | | |
1660 | 1660 | | |
1661 | 1661 | | |
1662 | | - | |
1663 | | - | |
1664 | | - | |
1665 | | - | |
1666 | | - | |
1667 | | - | |
1668 | 1662 | | |
1669 | 1663 | | |
1670 | 1664 | | |
| |||
1683 | 1677 | | |
1684 | 1678 | | |
1685 | 1679 | | |
1686 | | - | |
| 1680 | + | |
1687 | 1681 | | |
1688 | 1682 | | |
1689 | 1683 | | |
| |||
1804 | 1798 | | |
1805 | 1799 | | |
1806 | 1800 | | |
1807 | | - | |
1808 | 1801 | | |
1809 | 1802 | | |
1810 | 1803 | | |
1811 | 1804 | | |
1812 | 1805 | | |
1813 | 1806 | | |
1814 | 1807 | | |
1815 | | - | |
1816 | 1808 | | |
1817 | 1809 | | |
1818 | 1810 | | |
| |||
Lines changed: 3 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
| 8 | + | |
8 | 9 | | |
9 | 10 | | |
10 | 11 | | |
| |||
201 | 202 | | |
202 | 203 | | |
203 | 204 | | |
204 | | - | |
| 205 | + | |
205 | 206 | | |
206 | | - | |
207 | | - | |
208 | | - | |
| 207 | + | |
209 | 208 | | |
210 | 209 | | |
211 | 210 | | |
| |||
Lines changed: 0 additions & 31 deletions
This file was deleted.
Lines changed: 10 additions & 16 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
| 8 | + | |
8 | 9 | | |
9 | 10 | | |
10 | 11 | | |
| |||
618 | 619 | | |
619 | 620 | | |
620 | 621 | | |
621 | | - | |
622 | | - | |
| 622 | + | |
623 | 623 | | |
624 | | - | |
625 | | - | |
| 624 | + | |
| 625 | + | |
626 | 626 | | |
627 | 627 | | |
628 | 628 | | |
| |||
690 | 690 | | |
691 | 691 | | |
692 | 692 | | |
693 | | - | |
| 693 | + | |
694 | 694 | | |
695 | | - | |
696 | | - | |
697 | | - | |
| 695 | + | |
698 | 696 | | |
699 | 697 | | |
700 | 698 | | |
| |||
1100 | 1098 | | |
1101 | 1099 | | |
1102 | 1100 | | |
1103 | | - | |
| 1101 | + | |
1104 | 1102 | | |
1105 | | - | |
1106 | | - | |
| 1103 | + | |
1107 | 1104 | | |
1108 | 1105 | | |
1109 | 1106 | | |
| |||
1665 | 1662 | | |
1666 | 1663 | | |
1667 | 1664 | | |
1668 | | - | |
| 1665 | + | |
1669 | 1666 | | |
1670 | | - | |
1671 | | - | |
1672 | | - | |
1673 | | - | |
| 1667 | + | |
1674 | 1668 | | |
1675 | 1669 | | |
1676 | 1670 | | |
| |||
Lines changed: 82 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
0 commit comments