Skip to content

Commit 093daaf

Browse files
committed
feat: Add comprehensive documentation for [GeneratedRegex] source generator, including usage, patterns, and common pitfalls
1 parent 082a54e commit 093daaf

5 files changed

Lines changed: 479 additions & 0 deletions

File tree

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
---
2+
name: csharp_regex
3+
description: Use the [GeneratedRegex] source generator for ALL regex in .NET 7+ C#. Produces AOT-safe, trimming-friendly, compile-time regex with zero startup cost and better throughput than RegexOptions.Compiled. Trigger whenever the user writes, reviews, or asks about regex in C# — including new Regex(...), Regex.IsMatch, pattern matching, string validation, or text parsing — even if they don't mention source generation. Never suggest new Regex(...) with RegexOptions.Compiled when [GeneratedRegex] is available.
4+
---
5+
6+
# C# Source-Generated Regex Skill
7+
8+
Use `[GeneratedRegex]` on a `partial` method or property so the compiler emits a fully-optimised, AOT-safe `Regex` implementation at build time. This delivers the throughput of `RegexOptions.Compiled` with near-zero startup cost, and the output is ordinary C# that you can read and debug.
9+
10+
## Why this matters
11+
12+
`new Regex(pattern, RegexOptions.Compiled)` pays a heavy startup cost (reflection-emit, JIT compilation) and is not AOT-safe. `[GeneratedRegex]` compiles the pattern into readable C# during the build, giving you:
13+
14+
- **Better throughput** than `RegexOptions.Compiled` (and more, in some benchmarks)
15+
- **Near-zero startup cost** — no JIT compile at runtime
16+
- **AOT / trimming safe** — no `Reflection.Emit`
17+
- **Debuggable** — step through the generated C# in any debugger
18+
- **Automatic singleton caching** — no `static readonly` field needed
19+
20+
Diagnostic `SYSLIB1045` fires when the compiler detects a place that could use `[GeneratedRegex]` but doesn't.
21+
22+
## Quick reference
23+
24+
| Concept | See |
25+
|---------|-----|
26+
| Constraints, method vs property, minimal examples | [basics.md](references/basics.md) |
27+
| `RegexOptions`, timeouts, culture | [options.md](references/options.md) |
28+
| Where to declare regex, file organisation | [patterns.md](references/patterns.md) |
29+
| Common mistakes and compiler errors | [pitfalls.md](references/pitfalls.md) |
30+
31+
## Minimal examples
32+
33+
**Method (≥ .NET 7):**
34+
```csharp
35+
public static partial class Patterns
36+
{
37+
[GeneratedRegex(@"\d+")]
38+
public static partial Regex Digits();
39+
}
40+
```
41+
42+
**Property (≥ .NET 9, C# 13):**
43+
```csharp
44+
public static partial class Patterns
45+
{
46+
[GeneratedRegex(@"\d+")]
47+
public static partial Regex Digits { get; }
48+
}
49+
```
50+
51+
## Essential rules at a glance
52+
53+
- The **containing class** must be `partial`.
54+
- Method form: must be `static`, `partial`, parameterless, non-generic, returning `Regex`.
55+
- Property form: must be `static`, `partial`, getter-only, returning `Regex` (C# 13 / .NET 9+).
56+
- Do **not** add `RegexOptions.Compiled` — it is ignored by the generator.
57+
- The generator handles caching; no `static readonly` field is needed.
58+
59+
## What to read next
60+
61+
- First time using `[GeneratedRegex]`[basics.md](references/basics.md)
62+
- Need `IgnoreCase`, timeouts, or culture → [options.md](references/options.md)
63+
- Deciding where to put the declaration → [patterns.md](references/patterns.md)
64+
- Getting a compiler error → [pitfalls.md](references/pitfalls.md)
Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
# GeneratedRegex Basics
2+
3+
Source: https://learn.microsoft.com/dotnet/standard/base-types/regular-expression-source-generators
4+
API ref: https://learn.microsoft.com/dotnet/api/system.text.regularexpressions.generatedregexattribute?view=net-10.0
5+
6+
## Requirements
7+
8+
| Requirement | Detail |
9+
|-------------|--------|
10+
| Runtime | .NET 7+ (method); .NET 9+ for property form |
11+
| Language | C# 11+ (method); C# 13+ (property) |
12+
| Namespace | `System.Text.RegularExpressions` |
13+
| NuGet | Built into `System.Text.RegularExpressions.dll` — no extra package |
14+
15+
## How it works
16+
17+
Applying `[GeneratedRegex]` to a `partial` method or property tells the Roslyn source generator (`System.Text.RegularExpressions.Generator`) to emit a complete `Regex`-derived class at compile time. The generated class:
18+
19+
- Embeds all match logic as readable C# (no `Reflection.Emit`)
20+
- Caches a singleton instance so every call to the method / property returns the same object
21+
- Requires no explicit `static readonly` field — the caching is inside the generated implementation
22+
23+
## Attribute constructors
24+
25+
```csharp
26+
// Pattern only
27+
[GeneratedRegex(string pattern)]
28+
29+
// Pattern + options
30+
[GeneratedRegex(string pattern, RegexOptions options)]
31+
32+
// Pattern + options + culture
33+
[GeneratedRegex(string pattern, RegexOptions options, string cultureName)]
34+
35+
// Pattern + options + timeout (ms)
36+
[GeneratedRegex(string pattern, RegexOptions options, int matchTimeoutMilliseconds)]
37+
38+
// All four
39+
[GeneratedRegex(string pattern, RegexOptions options, int matchTimeoutMilliseconds, string cultureName)]
40+
```
41+
42+
## Method form (≥ .NET 7)
43+
44+
```csharp
45+
using System.Text.RegularExpressions;
46+
47+
public static partial class Validators
48+
{
49+
// basic
50+
[GeneratedRegex(@"^\d{4}-\d{2}-\d{2}$")]
51+
public static partial Regex IsoDateFormat();
52+
53+
// with options
54+
[GeneratedRegex(@"^[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,}$",
55+
RegexOptions.IgnoreCase, "en-US")]
56+
public static partial Regex Email();
57+
}
58+
```
59+
60+
Usage:
61+
```csharp
62+
bool isValid = Validators.IsoDateFormat().IsMatch("2024-01-15"); // true
63+
```
64+
65+
Note: each call to `IsoDateFormat()` returns the **same cached** `Regex` instance — the call is cheap.
66+
67+
## Property form (≥ .NET 9, C# 13)
68+
69+
```csharp
70+
public static partial class Validators
71+
{
72+
[GeneratedRegex(@"^[A-Z]{2}\d{6}$")]
73+
public static partial Regex PassportNumber { get; }
74+
}
75+
```
76+
77+
Usage:
78+
```csharp
79+
bool ok = Validators.PassportNumber.IsMatch(input);
80+
```
81+
82+
Prefer the property form if you use C# 13+ — it reads at the call site like a constant rather than a factory call.
83+
84+
## Instance method
85+
86+
When the regex is relevant only inside one class, declare it there directly:
87+
88+
```csharp
89+
public partial class OrderValidator
90+
{
91+
[GeneratedRegex(@"^\d{5}(-\d{4})?$")]
92+
private static partial Regex ZipCode();
93+
94+
public bool IsValidZip(string zip) => ZipCode().IsMatch(zip);
95+
}
96+
```
97+
98+
The class must still be `partial`.
Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
# RegexOptions, Timeouts, and Culture
2+
3+
Source: https://learn.microsoft.com/dotnet/api/system.text.regularexpressions.regexoptions?view=net-10.0
4+
Source: https://learn.microsoft.com/dotnet/standard/base-types/regular-expression-source-generators
5+
6+
## RegexOptions reference
7+
8+
| Option | Notes |
9+
|--------|-------|
10+
| `IgnoreCase` | Case-insensitive match. Combine with `cultureName` for locale-aware casing. |
11+
| `Multiline` | `^` and `$` match line boundaries rather than string start/end. |
12+
| `Singleline` | `.` matches `\n` as well. |
13+
| `IgnorePatternWhitespace` | Unescaped whitespace is ignored; allows inline comments with `#`. |
14+
| `ExplicitCapture` | Only named groups capture; reduces allocations when you don't need unnamed groups. |
15+
| `NonBacktracking` | Linear-time matching (no catastrophic backtracking). Not supported by the source generator — use with `new Regex(...)`. |
16+
| `Compiled` | **Ignored by the source generator.** Do not include it. |
17+
| `RightToLeft` | Match right-to-left. Supported by the source generator. |
18+
| `CultureInvariant` | If combined with `IgnoreCase`, use invariant culture for case comparisons. |
19+
| `ECMAScript` | ECMAScript-compatible behaviour. Cannot combine with most other options. |
20+
21+
Combine with bitwise OR:
22+
```csharp
23+
[GeneratedRegex(@"hello\s+world",
24+
RegexOptions.IgnoreCase | RegexOptions.Multiline)]
25+
public static partial Regex HelloWorld();
26+
```
27+
28+
## Culture for case-insensitive matching
29+
30+
When `IgnoreCase` is set, supply a BCP-47 culture name to pin the casing table to a specific locale. The source generator bakes the casing table at **compile time** (unlike `new Regex(...)`, which resolves it at runtime).
31+
32+
```csharp
33+
// Turkish: 'I' casing differs from en-US
34+
[GeneratedRegex(@"^[a-z]+$", RegexOptions.IgnoreCase, "tr-TR")]
35+
public static partial Regex LowercaseTurkish();
36+
37+
// Invariant culture (portable, deterministic)
38+
[GeneratedRegex(@"^[a-z]+$", RegexOptions.IgnoreCase | RegexOptions.CultureInvariant)]
39+
public static partial Regex LowercaseInvariant();
40+
```
41+
42+
When no culture is specified and `IgnoreCase` is used, the generator uses the **invariant culture** at compile time. This differs from runtime `new Regex(...)` which uses the current culture. Always specify a culture when locale-sensitive matching matters.
43+
44+
## Timeouts
45+
46+
Pass a timeout (milliseconds) to guard against ReDoS on untrusted input. Use `Timeout.Infinite` (-1) only when the input is fully trusted and bounded.
47+
48+
```csharp
49+
// 500 ms timeout — good for parsing untrusted user input
50+
[GeneratedRegex(@"(\w+\s*)+", RegexOptions.None, matchTimeoutMilliseconds: 500)]
51+
public static partial Regex WordSequence();
52+
53+
// All four parameters
54+
[GeneratedRegex(@"^[a-z]+$",
55+
RegexOptions.IgnoreCase,
56+
matchTimeoutMilliseconds: 1000,
57+
cultureName: "en-US")]
58+
public static partial Regex AlphaOnly();
59+
```
60+
61+
> **Security:** Always set a timeout when the regex pattern or input comes from or is affected by user data. Polynomially-complex patterns (nested quantifiers, alternation with overlap) can cause runaway matching without a timeout.
62+
63+
## NonBacktracking and source generation
64+
65+
`RegexOptions.NonBacktracking` provides linear-time matching via a finite automaton engine. The source generator **does not support** this option — it will fall back to a runtime error or no-op. If you need linear-time guarantees, use `new Regex(pattern, RegexOptions.NonBacktracking)` and cache the instance in a `static readonly` field yourself.
Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
# Organisation Patterns
2+
3+
Source: https://learn.microsoft.com/dotnet/standard/base-types/regular-expression-source-generators
4+
5+
## Where to declare generated regex
6+
7+
The right placement depends on scope and reuse:
8+
9+
### Dedicated static class (recommended for shared patterns)
10+
11+
Centralise patterns that are used across multiple files or feature areas:
12+
13+
```csharp
14+
// File: Patterns.cs (or RegexPatterns.cs)
15+
using System.Text.RegularExpressions;
16+
17+
public static partial class Patterns
18+
{
19+
[GeneratedRegex(@"^\d{4}-\d{2}-\d{2}$")]
20+
public static partial Regex IsoDate();
21+
22+
[GeneratedRegex(@"^[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,}$",
23+
RegexOptions.IgnoreCase, "en-US")]
24+
public static partial Regex Email();
25+
26+
[GeneratedRegex(@"^\+?[1-9]\d{6,14}$")]
27+
public static partial Regex PhoneE164();
28+
}
29+
```
30+
31+
Usage anywhere in the assembly:
32+
```csharp
33+
if (!Patterns.Email().IsMatch(input)) { ... }
34+
```
35+
36+
### Declaring on the consuming class (private scope)
37+
38+
When only one class needs the pattern, declare it there to keep the scope narrow:
39+
40+
```csharp
41+
public partial class InvoiceParser
42+
{
43+
[GeneratedRegex(@"INV-\d{6}", RegexOptions.IgnoreCase)]
44+
private static partial Regex InvoiceNumber();
45+
46+
public string? ExtractInvoiceNumber(string text) =>
47+
InvoiceNumber().Match(text).Value;
48+
}
49+
```
50+
51+
The class must be `partial` — if it isn't already, make it so.
52+
53+
### Feature-area file splitting
54+
55+
For large modules with many patterns, split a single `partial` class across multiple files by feature area:
56+
57+
```
58+
Patterns/
59+
Patterns.cs ← partial class declaration (no members)
60+
Patterns.Dates.cs ← date-related patterns
61+
Patterns.Email.cs ← email/address patterns
62+
Patterns.Finance.cs ← currency, IBAN, card numbers
63+
```
64+
65+
Each file:
66+
```csharp
67+
// Patterns.Dates.cs
68+
public static partial class Patterns
69+
{
70+
[GeneratedRegex(@"^\d{4}-\d{2}-\d{2}$")]
71+
public static partial Regex IsoDate();
72+
}
73+
```
74+
75+
### Co-location with a validator
76+
77+
When a pattern is the heart of a validation class, keep it co-located:
78+
79+
```csharp
80+
public static partial class PostalCodeValidator
81+
{
82+
[GeneratedRegex(@"^\d{5}(-\d{4})?$")]
83+
private static partial Regex UsZip();
84+
85+
[GeneratedRegex(@"^[A-Z]{1,2}\d[A-Z\d]? ?\d[A-Z]{2}$",
86+
RegexOptions.IgnoreCase)]
87+
private static partial Regex UkPostcode();
88+
89+
public static bool Validate(string code, string country) =>
90+
country switch
91+
{
92+
"US" => UsZip().IsMatch(code),
93+
"UK" => UkPostcode().IsMatch(code),
94+
_ => throw new ArgumentOutOfRangeException(nameof(country))
95+
};
96+
}
97+
```
98+
99+
## Naming conventions
100+
101+
| Pattern | Recommended name | Rationale |
102+
|---------|-----------------|-----------|
103+
| Method form | Noun or noun phrase: `Email()`, `IsoDate()`, `PhoneNumber()` | The `()` implies a call; keep the name a descriptor of what it matches |
104+
| Property form | Same noun: `Email`, `IsoDate` | No parens, reads like a static constant |
105+
106+
Avoid prefixes like `Get`, `Create`, or `Match` — the attribute already communicates the nature of the member.
107+
108+
## Viewing the generated code
109+
110+
In Visual Studio: right-click the method or property declaration → **Go To Definition**.
111+
Or expand **Dependencies → Analyzers → System.Text.RegularExpressions.Generator → RegexGenerator.g.cs** in Solution Explorer.
112+
113+
The generated code is fully readable C# and can be stepped through in the debugger.

0 commit comments

Comments
 (0)