Skip to content

Commit 8c8157f

Browse files
authored
Add limited support for backtracking Regex single char loops to simplified code gen (dotnet#60385)
* Add limited support for backtracking Regex single char loops to simplified code gen In .NET 5, we added simpler compiled code gen for regexes that didn't entail backtracking (or that had only very constrained backtracking, such as a top-level alternation). In our corpus of ~90K regular expressions, that code generator is employed for ~40% of them. The primary purpose of adding that code generator initially was performance, as it was able to avoid lots of the expense that original code generator had, especially for simple regexes. However, with the source generator, it's much more valuable to use this code gen as the generated code is human-readable and really helps to understand how the regex is operating, is much more easily debugged, etc. This change allows the simplified code gen to be used even if there are backtracking single-character loops in the regex, as long as those loops are in a top-level concatenation (or a simple grouping structure like a capture). This increases the percentage of expressions in our corpus that will use the simplified code gen to ~65%. Once we have the simplified loop code gen, it's also a lot easier to add in vectorization of searching for the next location to back off to based on a literal that comes immediately after the loop (e.g. "abc.*def"). This adds support into both RegexOptions.Compiled and the source generator to use LastIndexOf in that case. The change also entailed adding/updating a few recursive functions. The plan has been to adopt the same model as in System.Linq.Expressions, Roslyn, and elsewhere, where we fork processing to continue on a secondary thread, rather than trying to enforce some max depth or rewrite as iterative, so I've done that as part of this change as well. * Address PR feedback * Clean up partial classes in SourceGenRegexAsync test helper
1 parent 565f3ee commit 8c8157f

13 files changed

Lines changed: 924 additions & 540 deletions

src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs

Lines changed: 271 additions & 135 deletions
Large diffs are not rendered by default.

src/libraries/System.Text.RegularExpressions/gen/System.Text.RegularExpressions.Generator.csproj

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@
2929
<Compile Include="$(CommonPath)System\Text\ValueStringBuilder.cs" Link="Production\ValueStringBuilder.cs" />
3030
<Compile Include="$(CoreLibSharedDir)System\Collections\Generic\ValueListBuilder.cs" Link="Production\ValueListBuilder.cs" />
3131
<Compile Include="..\src\System\Collections\Generic\ValueListBuilder.Pop.cs" Link="Production\ValueListBuilder.Pop.cs" />
32+
<Compile Include="..\src\System\Threading\StackHelper.cs" Link="Production\StackHelper.cs" />
3233
<Compile Include="..\src\System\Text\RegularExpressions\RegexBoyerMoore.cs" Link="Production\RegexBoyerMoore.cs" />
3334
<Compile Include="..\src\System\Text\RegularExpressions\RegexCharClass.cs" Link="Production\RegexCharClass.cs" />
3435
<Compile Include="..\src\System\Text\RegularExpressions\RegexCharClass.MappingTable.cs" Link="Production\RegexCharClass.MappingTable.cs" />

src/libraries/System.Text.RegularExpressions/src/System.Text.RegularExpressions.csproj

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
<ItemGroup>
88
<Compile Include="System\Collections\HashtableExtensions.cs" />
99
<Compile Include="System\Collections\Generic\ValueListBuilder.Pop.cs" />
10+
<Compile Include="System\Threading\StackHelper.cs" />
1011
<Compile Include="System\Text\SegmentStringBuilder.cs" />
1112
<Compile Include="System\Text\RegularExpressions\Capture.cs" />
1213
<Compile Include="System\Text\RegularExpressions\CaptureCollection.cs" />
@@ -17,8 +18,8 @@
1718
<Compile Include="System\Text\RegularExpressions\MatchCollection.cs" />
1819
<Compile Include="System\Text\RegularExpressions\Regex.cs" />
1920
<Compile Include="System\Text\RegularExpressions\Regex.Cache.cs" />
20-
<Compile Include="System\Text\RegularExpressions\Regex.Match.cs" />
2121
<Compile Include="System\Text\RegularExpressions\Regex.Debug.cs" />
22+
<Compile Include="System\Text\RegularExpressions\Regex.Match.cs" />
2223
<Compile Include="System\Text\RegularExpressions\Regex.Replace.cs" />
2324
<Compile Include="System\Text\RegularExpressions\Regex.Split.cs" />
2425
<Compile Include="System\Text\RegularExpressions\Regex.Timeout.cs" />
@@ -53,7 +54,6 @@
5354
<Compile Include="System\Text\RegularExpressions\Symbolic\DfaMatchingState.cs" />
5455
<Compile Include="System\Text\RegularExpressions\Symbolic\MintermClassifier.cs" />
5556
<Compile Include="System\Text\RegularExpressions\Symbolic\RegexNodeToSymbolicConverter.cs" />
56-
<Compile Include="System\Text\RegularExpressions\Symbolic\StackHelper.cs" />
5757
<Compile Include="System\Text\RegularExpressions\Symbolic\SymbolicMatch.cs" />
5858
<Compile Include="System\Text\RegularExpressions\Symbolic\SymbolicNFA.cs" />
5959
<Compile Include="System\Text\RegularExpressions\Symbolic\SymbolicRegexBuilder.cs" />

src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCompiler.cs

Lines changed: 234 additions & 143 deletions
Large diffs are not rendered by default.

src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexNode.cs

Lines changed: 281 additions & 184 deletions
Large diffs are not rendered by default.

src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexParser.cs

Lines changed: 1 addition & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1659,12 +1659,6 @@ private char ScanControl()
16591659
throw MakeException(RegexParseError.UnrecognizedControlCharacter, SR.UnrecognizedControlCharacter);
16601660
}
16611661

1662-
/// <summary>Returns true for options allowed only at the top level</summary>
1663-
private bool IsOnlyTopOption(RegexOptions options) =>
1664-
options == RegexOptions.RightToLeft ||
1665-
options == RegexOptions.CultureInvariant ||
1666-
options == RegexOptions.ECMAScript;
1667-
16681662
/// <summary>Scans cimsx-cimsx option string, stops at the first unrecognized char.</summary>
16691663
private void ScanOptions()
16701664
{
@@ -1683,7 +1677,7 @@ private void ScanOptions()
16831677
else
16841678
{
16851679
RegexOptions options = OptionFromCode(ch);
1686-
if (options == 0 || IsOnlyTopOption(options))
1680+
if (options == 0)
16871681
{
16881682
return;
16891683
}
@@ -1804,15 +1798,13 @@ private static RegexOptions OptionFromCode(char ch)
18041798
return ch switch
18051799
{
18061800
'i' => RegexOptions.IgnoreCase,
1807-
'r' => RegexOptions.RightToLeft,
18081801
'm' => RegexOptions.Multiline,
18091802
'n' => RegexOptions.ExplicitCapture,
18101803
's' => RegexOptions.Singleline,
18111804
'x' => RegexOptions.IgnorePatternWhitespace,
18121805
#if DEBUG
18131806
'd' => RegexOptions.Debug,
18141807
#endif
1815-
'e' => RegexOptions.ECMAScript,
18161808
_ => 0,
18171809
};
18181810
}

src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/RegexNodeToSymbolicConverter.cs

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
using System.Diagnostics;
66
using System.Globalization;
77
using System.Runtime.CompilerServices;
8+
using System.Threading;
89

910
namespace System.Text.RegularExpressions.Symbolic
1011
{
@@ -201,11 +202,9 @@ BDD MapCategoryCodeToCondition(int code) =>
201202
public SymbolicRegexNode<BDD> Convert(RegexNode node, bool topLevel)
202203
{
203204
// Guard against stack overflow due to deep recursion
204-
if (!RuntimeHelpers.TryEnsureSufficientExecutionStack())
205+
if (!StackHelper.TryEnsureSufficientExecutionStack())
205206
{
206-
RegexNode localNode = node;
207-
bool localTopLevel = topLevel;
208-
return StackHelper.CallOnEmptyStack(() => Convert(localNode, localTopLevel));
207+
return StackHelper.CallOnEmptyStack(Convert, node, topLevel);
209208
}
210209

211210
switch (node.Type)

src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/StackHelper.cs

Lines changed: 0 additions & 31 deletions
This file was deleted.

src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexNode.cs

Lines changed: 10 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
using System.Diagnostics;
66
using System.Diagnostics.CodeAnalysis;
77
using System.Runtime.CompilerServices;
8+
using System.Threading;
89

910
namespace System.Text.RegularExpressions.Symbolic
1011
{
@@ -618,11 +619,10 @@ public SymbolicRegexNode<S> Restrict(S pred)
618619
/// </summary>
619620
public int GetFixedLength()
620621
{
621-
// Guard against stack overflow due to deep recursion.
622-
if (!RuntimeHelpers.TryEnsureSufficientExecutionStack())
622+
if (!StackHelper.TryEnsureSufficientExecutionStack())
623623
{
624-
SymbolicRegexNode<S> thisRef = this;
625-
return StackHelper.CallOnEmptyStack(() => thisRef.GetFixedLength());
624+
// If we can't recur further, assume no fixed length.
625+
return -1;
626626
}
627627

628628
switch (_kind)
@@ -690,11 +690,9 @@ public int GetFixedLength()
690690
internal SymbolicRegexNode<S> MkDerivative(S elem, uint context)
691691
{
692692
// Guard against stack overflow due to deep recursion
693-
if (!RuntimeHelpers.TryEnsureSufficientExecutionStack())
693+
if (!StackHelper.TryEnsureSufficientExecutionStack())
694694
{
695-
S localElem = elem;
696-
uint localContext = context;
697-
return StackHelper.CallOnEmptyStack(() => MkDerivative(localElem, localContext));
695+
return StackHelper.CallOnEmptyStack(MkDerivative, elem, context);
698696
}
699697

700698
if (this == _builder._anyStar || this == _builder._nothing)
@@ -1100,10 +1098,9 @@ public override string ToString()
11001098
internal void ToString(StringBuilder sb)
11011099
{
11021100
// Guard against stack overflow due to deep recursion
1103-
if (!RuntimeHelpers.TryEnsureSufficientExecutionStack())
1101+
if (!StackHelper.TryEnsureSufficientExecutionStack())
11041102
{
1105-
StringBuilder localSb = sb;
1106-
StackHelper.CallOnEmptyStack(() => ToString(localSb));
1103+
StackHelper.CallOnEmptyStack(ToString, sb);
11071104
return;
11081105
}
11091106

@@ -1665,12 +1662,9 @@ private S ComputeStartSet()
16651662
internal SymbolicRegexNode<S> PruneAnchors(uint prevKind, bool contWithWL, bool contWithNWL)
16661663
{
16671664
// Guard against stack overflow due to deep recursion
1668-
if (!RuntimeHelpers.TryEnsureSufficientExecutionStack())
1665+
if (!StackHelper.TryEnsureSufficientExecutionStack())
16691666
{
1670-
uint localPrevKind = prevKind;
1671-
bool localContWithWL = contWithWL;
1672-
bool localContWithNWL = contWithNWL;
1673-
return StackHelper.CallOnEmptyStack(() => PruneAnchors(localPrevKind, localContWithWL, localContWithNWL));
1667+
return StackHelper.CallOnEmptyStack(PruneAnchors, prevKind, contWithWL, contWithNWL);
16741668
}
16751669

16761670
if (!_info.StartsWithSomeAnchor)
Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
// Licensed to the .NET Foundation under one or more agreements.
2+
// The .NET Foundation licenses this file to you under the MIT license.
3+
4+
using System.Runtime.CompilerServices;
5+
using System.Threading.Tasks;
6+
7+
namespace System.Threading
8+
{
9+
/// <summary>Provides tools for avoiding stack overflows.</summary>
10+
internal static class StackHelper
11+
{
12+
/// <summary>Tries to ensure there is sufficient stack to execute the average .NET function.</summary>
13+
public static bool TryEnsureSufficientExecutionStack()
14+
{
15+
#if REGEXGENERATOR
16+
try
17+
{
18+
RuntimeHelpers.EnsureSufficientExecutionStack();
19+
return true;
20+
}
21+
catch
22+
{
23+
return false;
24+
}
25+
#else
26+
return RuntimeHelpers.TryEnsureSufficientExecutionStack();
27+
#endif
28+
}
29+
30+
// Queues the supplied delegate to the thread pool, then block waiting for it to complete.
31+
// It does so in a way that prevents task inlining (which would defeat the purpose) but that
32+
// also plays nicely with the thread pool's sync-over-async aggressive thread injection policies.
33+
34+
/// <summary>Calls the provided action on the stack of a different thread pool thread.</summary>
35+
/// <typeparam name="TArg1">The type of the first argument to pass to the function.</typeparam>
36+
/// <param name="action">The action to invoke.</param>
37+
/// <param name="arg1">The first argument to pass to the action.</param>
38+
public static void CallOnEmptyStack<TArg1>(Action<TArg1> action, TArg1 arg1) =>
39+
Task.Run(() => action(arg1))
40+
.ContinueWith(t => t.GetAwaiter().GetResult(), CancellationToken.None, TaskContinuationOptions.ExecuteSynchronously, TaskScheduler.Default)
41+
.GetAwaiter().GetResult();
42+
43+
/// <summary>Calls the provided action on the stack of a different thread pool thread.</summary>
44+
/// <typeparam name="TArg1">The type of the first argument to pass to the function.</typeparam>
45+
/// <typeparam name="TArg2">The type of the second argument to pass to the function.</typeparam>
46+
/// <typeparam name="TArg3">The type of the third argument to pass to the function.</typeparam>
47+
/// <param name="action">The action to invoke.</param>
48+
/// <param name="arg1">The first argument to pass to the action.</param>
49+
/// <param name="arg2">The second argument to pass to the action.</param>
50+
/// <param name="arg3">The second argument to pass to the action.</param>
51+
public static void CallOnEmptyStack<TArg1, TArg2, TArg3>(Action<TArg1, TArg2, TArg3> action, TArg1 arg1, TArg2 arg2, TArg3 arg3) =>
52+
Task.Run(() => action(arg1, arg2, arg3))
53+
.ContinueWith(t => t.GetAwaiter().GetResult(), CancellationToken.None, TaskContinuationOptions.ExecuteSynchronously, TaskScheduler.Default)
54+
.GetAwaiter().GetResult();
55+
56+
/// <summary>Calls the provided function on the stack of a different thread pool thread.</summary>
57+
/// <typeparam name="TArg1">The type of the first argument to pass to the function.</typeparam>
58+
/// <typeparam name="TArg2">The type of the second argument to pass to the function.</typeparam>
59+
/// <typeparam name="TResult">The return type of the function.</typeparam>
60+
/// <param name="func">The function to invoke.</param>
61+
/// <param name="arg1">The first argument to pass to the function.</param>
62+
/// <param name="arg2">The second argument to pass to the function.</param>
63+
public static TResult CallOnEmptyStack<TArg1, TArg2, TResult>(Func<TArg1, TArg2, TResult> func, TArg1 arg1, TArg2 arg2) =>
64+
Task.Run(() => func(arg1, arg2))
65+
.ContinueWith(t => t.GetAwaiter().GetResult(), CancellationToken.None, TaskContinuationOptions.ExecuteSynchronously, TaskScheduler.Default)
66+
.GetAwaiter().GetResult();
67+
68+
/// <summary>Calls the provided function on the stack of a different thread pool thread.</summary>
69+
/// <typeparam name="TArg1">The type of the first argument to pass to the function.</typeparam>
70+
/// <typeparam name="TArg2">The type of the second argument to pass to the function.</typeparam>
71+
/// <typeparam name="TArg3">The type of the third argument to pass to the function.</typeparam>
72+
/// <typeparam name="TResult">The return type of the function.</typeparam>
73+
/// <param name="func">The function to invoke.</param>
74+
/// <param name="arg1">The first argument to pass to the function.</param>
75+
/// <param name="arg2">The second argument to pass to the function.</param>
76+
/// <param name="arg3">The third argument to pass to the function.</param>
77+
public static TResult CallOnEmptyStack<TArg1, TArg2, TArg3, TResult>(Func<TArg1, TArg2, TArg3, TResult> func, TArg1 arg1, TArg2 arg2, TArg3 arg3) =>
78+
Task.Run(() => func(arg1, arg2, arg3))
79+
.ContinueWith(t => t.GetAwaiter().GetResult(), CancellationToken.None, TaskContinuationOptions.ExecuteSynchronously, TaskScheduler.Default)
80+
.GetAwaiter().GetResult();
81+
}
82+
}

0 commit comments

Comments
 (0)