Add a built-in RE2 implementation based on re2js by jonbodner-buf · Pull Request #290 · bufbuild/cel-es

jonbodner-buf · 2026-04-21T17:27:06Z

The CEL spec says that its regular expressions meet the RE2 spec, but the existing ES implementation defaults to the regex implementation that's built into ES runtimes, which uses a backtracking implementation that has pathological cases susceptible to ReDOS attacks.

This PR adds a stripped-down version of RE2JS (https://github.com/le0pard/re2js) as a package, and integrates it into CEL as the default regex engine.

…h broken imports.

…eTables.ts

srikrsna

Left some comments regarding the integration. I'll review the RE2 implementation in a second pass.

srikrsna · 2026-04-21T17:37:01Z

+    /\\c[A-Z]/, // control character eg: /\cM\cJ/
+    /\\u[0-9a-fA-F]{4}/, // UTF-16 code-unit
+    /\\0(?!\d)/, // NUL
+    /\[\\b.*\]/, // Backspace eg: [\b]


Why do we need this if it is re2?

srikrsna · 2026-04-21T17:37:32Z

+    // can probably delete this since the RE2 engine will already reject them, but keep for now
+    for (const invalidPattern of invalidPatterns) {
+        if (invalidPattern.test(pattern)) {
+            throw new Error(
+                `Error evaluating pattern ${pattern}, invalid RE2 syntax`,
+            );
+        }


I think we should delete them

srikrsna · 2026-04-21T17:41:48Z

-  }
-  const re = new RegExp(pattern, flags);
-  return re.test(this);
+    const re: RE2JS = RE2JS.compile(pattern, flagVal);


My understanding is that flags are part of the syntax in RE2? Can we add support for them in the library instead of trying to identify them here? Or am I missing something?

cel-go also passes them directly to regex engine without any preprocessing: https://github.com/google/cel-go/blob/646cdc1728643aec9499e3a00236ef1007a5d3fa/common/types/string.go#L156

yeah, not needed. Removing the code.

srikrsna · 2026-04-21T17:43:44Z

+    "@bufbuild/re2": "0.4.0"
  },
  "devDependencies": {
+    "@unicode/unicode-16.0.0": "^1.6.16",


Is this a peer dependency that is required?

no, removed.

srikrsna · 2026-04-21T17:43:55Z

    "peggy": "^5.0.6",
    "peggy-ts": "github:hudlow/peggy-ts#v0.0.9",
-    "expect-type": "^1.3.0"
+    "unicode-property-value-aliases": "^3.9.0"


Is this a peer dependency that is required?

no, removed.

srikrsna · 2026-04-21T17:45:19Z

+    "@unicode/unicode-16.0.0": "^1.6.16",
+    "unicode-property-value-aliases": "^3.9.0"


I see that these are added as dev dependencies in the cel package as well, will users end up needing them?

no, they are used for tests and to build a unicode lookup table.

srikrsna · 2026-04-21T17:45:58Z

+  },
+  "dependencies": {
+    "@bufbuild/re2": "^0.1.0"


I don't think this is needed here.

I think it's needed? The cel package now depends on the re2 package to run.

But this is the root workspace package.json not for the CEL package.

oh! yes, this should go.

srikrsna · 2026-04-29T04:20:00Z


 // biome-ignore format: table
 export default [
-  // !


We should revert the formatting changes here. It was deliberately formatted like this for readability

yeah, this wasn't intentional. I'll revert it.

srikrsna · 2026-04-29T04:21:13Z

  SemanticAdorner,
  toDebugString,
-} from "@bufbuild/cel-spec/testdata/to-debug-string.js";
+} from "../../cel-spec/dist/cjs/testdata/to-debug-string.js";


Suggested change

} from "../../cel-spec/dist/cjs/testdata/to-debug-string.js";

} from "@bufbuild/cel-spec/testdata/to-debug-string.js";

srikrsna · 2026-04-29T04:25:16Z

+
+## Credits
+This code is a fork of the [RE2JS](https://re2js.leopard.in.ua) project. It has been converted to TypeScript and has a feature set tailored for
+CEL and Protovalidate-es.


I don't think it does anything special for protovalidate, does it?

Suggested change

CEL and Protovalidate-es.

CEL.

Not yet, but when native rules are added to protovalidate-es, we will want to use the re2 engine for string pattern rules. Agree that we should remove that for now.

timostamm · 2026-04-29T10:25:47Z

-    pattern = pattern.substring(flagMatches[0].length);
-  }
-  const re = new RegExp(pattern, flags);
+  const re: RE2JS = RE2JS.compile(pattern);


Seconding Krishna's comment - return RE2JS.compile(pattern).test(this); is idiomatic style.

timostamm · 2026-04-29T10:28:20Z

-  celMethod(olc.ENDS_WITH,    STRING, [STRING],       BOOL, String.prototype.endsWith),
-  celMethod(olc.STARTS_WITH,  STRING, [STRING],       BOOL, String.prototype.startsWith),
-  celMethod(olc.MATCHES,      STRING, [STRING],       BOOL,  matches),
+    // !


The previous formatting looks more readable to me, and there's a formatter ignore comment on line 86 to keep it. Any reason to change the formatting? If yes, let's drop the formatter ignore.

format is back to the way it was (this was an unintentional change)

timostamm · 2026-04-29T10:35:44Z

  SemanticAdorner,
  toDebugString,
-} from "@bufbuild/cel-spec/testdata/to-debug-string.js";
+} from "../../cel-spec/dist/cjs/testdata/to-debug-string.js";


This can break in subtle and mysterious ways. Need to keep the previous import.

I'm unclear. The current code is:

} from "../../cel-spec/dist/cjs/testdata/to-debug-string.js";

Do I change it to:

} from "@bufbuild/cel-spec/testdata/to-debug-string.js";

Or do I leave it as-is?

timostamm · 2026-04-29T12:22:46Z

+  "include": ["src/**/*.test.ts"],
+  "exclude": ["./src/__tests__", "./src/__fixtures__", "./src/__utils__"]


This is broken - the exclude means we aren't catching any compiler errors in tests, fixtures, or utils.

I've moved the tests out of tests and the utils out of utils into src. The fixtures directory is now being scanned as well.

timostamm · 2026-04-29T12:35:40Z

+dist/*/testing.js
+dist/*/testing.d.ts


Those two lines aren't needed here. The picture will change with a fixed tsconfig.json that includes tests and fixtures.

…e2 package. Fix all issues found.

le0pard · 2026-05-05T18:24:39Z

+
+describe("bug-hunt verification", () => {
+  // Phase 1c: DFA.match ANCHOR_START with pos>0
+  test("executeEngine with ANCHOR_START and pos>0 finds substring match", () => {


The test expects ANCHOR_START to mean "anchor the match to the current pos index (3)". However, in RE2 semantics, ANCHOR_START strictly means "the match must start at the absolute beginning of the input string (index 0)". So test is incorrect

P.S. update re2js vendoring, because latest will not pass here

le0pard · 2026-05-05T18:25:34Z

+
+  // Phase 1b: equalsIgnoreCase EOF handling
+  test("equalsIgnoreCase(-1, X) returns true per current implementation", () => {
+    assert.strictEqual(equalsIgnoreCase(-1, 0x41), true);


-1 represents EOF. Case-insensitively comparing EOF to a valid character like A (0x41) should absolutely be false. So it is incorrect test case

P.S. update re2js vendoring, because latest will not pass here

le0pard · 2026-05-05T18:50:23Z

@@ -0,0 +1,49 @@
+{
+  "name": "@bufbuild/re2",
+  "version": "0.4.0",


re2js already 2.3.1. Before v2 version it will be slow, because have only Pike VM (NFA), no DFA engines - https://github.com/le0pard/re2js#re2js-vs-re2-node-c-bindings

le0pard · 2026-05-06T06:42:33Z

+  "// returns stride-1 ranges. Surrogates are included — String.fromCodePoint",
+  "// returns the lone surrogate char and platform regex matches \\p{Cs} on it.",
+  "const sweepPlatform = (pattern: string): Uint32Array => {",
+  '  const re = new RegExp(pattern, "u")',


Cool idea about optimization, but just for info about problems to run on older js engines: the pattern passed into this function contains strings like \p{General_Category=L} or \p{Script=Latin}. Passing this to the native RegExp constructor alongside the "u" flag requires ES2018 support for Unicode property escapes. Older browsers, older versions of React Native, or legacy Node.js environments will immediately throw a SyntaxError here, crashing the script (which maybe not a problem, if not need support Node.js older than 10 version)

le0pard · 2026-05-06T06:44:28Z

+  '  const re = new RegExp(pattern, "u")',
+  "  const ranges: number[] = []",
+  "  let start = -1",
+  "  for (let cp = 0; cp <= 0x10ffff; cp++) {",


This for loop executes 1,114,112 times, allocating a string and executing a regex test on every single Unicode codepoint. Because JavaScript is single-threaded, this entirely blocks the main thread until the loop finishes. First-use CPU Spike, which blocking the main thread will happening here. That is why original re2js pays a penalty in bundle size by pre-compiling all arrays, but its compilation speed is consistently fast

le0pard · 2026-05-06T06:48:25Z

+  "",
+  "  const base = sweepPlatform(pattern)",
+  '  const delta = kind === "category" ? _DELTA_CATEGORIES.get(name) : _DELTA_SCRIPTS.get(name)',
+  "  const merged = delta ? mergeRanges(base, delta) : base",


Just for information: the delta mechanism assumes the host is running at least Unicode 15.0. If the host environment is running an ancient version of V8/Node with Unicode 12.0 or 13.0, applying the 15.0 -> 16.0 delta on top of the host's 12.0 base will result in an inaccurate and broken Unicode table (which maybe not a problem, if not need support Node.js older than 18 version)

jonbodner-buf added 8 commits April 17, 2026 15:31

add re2js to codebase. change logic.ts to call it. fix some tests wit…

b788755

…h broken imports.

fix imports

1872bae

move re2 to its own package. add LICENCE and initial README.md

9a4a332

update version in package-lock.json

cbcc87b

fix formatting. start fixing lint errors.

e768b44

additional lint updates

0eac393

finish lint changes

100f4db

add test to validate unicode range tables. fix delta ranges in Unicod…

6554cea

…eTables.ts

jonbodner-buf requested a review from srikrsna-buf April 21, 2026 17:27

srikrsna reviewed Apr 21, 2026

View reviewed changes

jonbodner-buf added 5 commits April 21, 2026 13:55

fix formatting. remove unused methods, constants, and files.

dd34db0

revert import change in checker.test.ts

cccca46

don't put license headers on the re2 files

1fc17e8

updates for PR

68ce0a3

remove unneeded dependency from root package.json.

af39200

srikrsna reviewed Apr 29, 2026

View reviewed changes

timostamm reviewed Apr 29, 2026

View reviewed changes

timostamm and others added 3 commits April 29, 2026 16:42

Fix exports (#292)

2684e25

Merge branch 'main' into jonbodner/add_re2

506d9a9

address code review issues

5530de2

jonbodner-buf force-pushed the jonbodner/add_re2 branch from c008b4c to 5530de2 Compare April 29, 2026 16:58

jonbodner-buf added 2 commits May 4, 2026 14:32

move tests and utils out of __tests__ and __utils__ into src in the r…

30a5ad6

…e2 package. Fix all issues found.

packaging fixes from code review

37780bf

le0pard reviewed May 5, 2026

View reviewed changes

le0pard reviewed May 6, 2026

View reviewed changes

		"@unicode/unicode-16.0.0": "^1.6.16",
		"unicode-property-value-aliases": "^3.9.0"

	} from "../../cel-spec/dist/cjs/testdata/to-debug-string.js";
	} from "@bufbuild/cel-spec/testdata/to-debug-string.js";

		"include": ["src/*/.test.ts"],
		"exclude": ["./src/__tests__", "./src/__fixtures__", "./src/__utils__"]

		dist/*/testing.js
		dist/*/testing.d.ts

Conversation

jonbodner-buf commented Apr 21, 2026

Uh oh!

srikrsna left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

srikrsna Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

le0pard May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

le0pard May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

srikrsna Apr 25, 2026 •

edited

Loading

le0pard May 5, 2026 •

edited

Loading

le0pard May 5, 2026 •

edited

Loading

le0pard May 6, 2026 •

edited

Loading

le0pard May 6, 2026 •

edited

Loading

le0pard May 6, 2026 •

edited

Loading