APMSP-3555 DI: add timeout to regexp evaluation in EL matches operator#5908
APMSP-3555 DI: add timeout to regexp evaluation in EL matches operator#5908p-datadog wants to merge 14 commits into
Conversation
The matches operator in the Dynamic Instrumentation expression language compiles a string into a Regexp and applies it. Pathological patterns (catastrophic backtracking) can otherwise run for many seconds on short inputs and stall the thread executing the probe. This change caps the regexp evaluation time at 500ms per matches call: - On Ruby 3.2+, Regexp.new is called with the timeout: keyword, which interrupts the matcher at every backtrack step. The engine raises Regexp::TimeoutError when the budget is exhausted. - On Ruby < 3.2, the matcher is wrapped in Timeout.timeout(0.5). The Onigmo engine on those versions polls for pending interrupts during matching, so Timeout.timeout's Thread#raise does bound the runtime, though with ~100ms granularity (verified locally on Ruby 2.6 through 3.1). The new spec at spec/datadog/di/el/evaluator_spec.rb parameterizes the timeout value over two cases (0.2s and 1.0s) and asserts that the observed wall-clock elapsed time tracks the configured value. Running both cases together demonstrates that the timeout is actually controlling behavior rather than the regexp finishing naturally or some unrelated bound being hit. Verified locally on Ruby 2.5 (DI-skipped), 2.6, 2.7, 3.0, 3.1, 3.2, 3.3, and 3.4.
|
Thank you for updating Change log entry section 👏 Visited at: 2026-06-23 21:03:13 UTC |
Typing analysisNote: Ignored files are excluded from the next sections.
|
BenchmarksBenchmark execution time: 2026-06-29 21:15:46 Comparing candidate commit 21d3bc0 in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 48 metrics, 1 unstable metrics.
|
The previous spec used the bare literals 0.05, 0.6, and 1.5 for the lower-bound clock-skew margin and the per-case upper bounds. Replace them with two named locals: - clock_skew_margin_seconds = 0.05 - overshoot_budget_seconds = 0.5 The upper bound for each case is now derived as timeout_seconds + overshoot_budget_seconds instead of being passed as a positional argument to the shared example, so both bounds in both cases come from the same named constants applied uniformly to each configured timeout. Also named the haystack_length local (was '* 30' inline) and added a short comment on the pattern explaining why it triggers catastrophic backtracking on every supported Ruby.
🎉 All green!🧪 All tests passed 🎯 Code Coverage (details) 🔗 Commit SHA: 21d3bc0 | Docs | Datadog PR Page | Give us feedback! |
|
@codex review |
There was a problem hiding this comment.
Pull request overview
This PR adds a runtime cap to Dynamic Instrumentation’s Expression Language matches operator to prevent pathological regular expressions (catastrophic backtracking) from stalling probe execution threads. It implements a 500ms-per-call timeout using Ruby 3.2+’s in-engine regexp timeout, with a Timeout.timeout fallback on older Rubies, and adds coverage to ensure the cap is effective.
Changes:
- Add
MATCHES_TIMEOUT_SECONDS = 0.5and enforce it inEvaluator#matches(Regexp engine timeout on Ruby 3.2+,Timeout.timeoutfallback otherwise). - Add a new spec that verifies normal matching behavior and that a pathological pattern is interrupted near the configured timeout.
- Update the RBS signature to include the new timeout constant.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
lib/datadog/di/el/evaluator.rb |
Enforces a per-call timeout for EL matches depending on Ruby version. |
spec/datadog/di/el/evaluator_spec.rb |
Adds regression tests for both correct matching and timeout behavior on pathological regexps. |
sig/datadog/di/el/evaluator.rbs |
Declares the new MATCHES_TIMEOUT_SECONDS constant in the Evaluator signature. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f234821e14
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Three review-driven changes to Evaluator#matches:
1. Switch from `haystack =~ re` to `re.match?(haystack)` so the
thread-local match data ($~, $1, ...) is not mutated as a side
effect of DI expression evaluation. The customer thread runs through
DI evaluation; previously, any subsequent customer code reading $~
on that thread would see DI-controlled match data.
2. Rewrite the method-body comment to be honest about the Ruby < 3.2
path: Onigmo's interrupt-poll mechanism reliably bounds many
pathological patterns but not all (see Ruby bug #18144). The old
wording ("~100ms granularity") framed this as a precision issue
when in practice some patterns are not bounded at all.
3. Add a YARD docstring to #matches covering @param, @return, and the
two @raise variants (Regexp::TimeoutError on Ruby 3.2+,
Timeout::Error on older Rubies).
No behavior change for well-formed patterns. Tests continue to pass:
- spec/datadog/di/el/evaluator_spec.rb: 4 examples, 0 failures
- spec/datadog/di/el/integration_spec.rb: 143 examples, 0 failures
The `matches` operator recompiled its Regexp on every probe firing
(`Regexp.new(needle, timeout:)` / `Regexp.compile(needle)` ran per call),
on the application thread between enter_probe/leave_probe. When the needle
is a string literal -- the common case, e.g. `matches(ref('var'),
"hello[a-z]")` -- the pattern is known at expression-compile time and need
not be recompiled per call.
The Compiler now precompiles literal needles once during compilation
(Compiler#precompile_regexp), stores them on the Compiler (`#regexps`),
threads them through Expression into the Evaluator instance, and emits
`matches_compiled(<haystack>, <index>)` instead of `matches(...)`. The
per-match timeout (MATCHES_TIMEOUT_SECONDS) is baked into the compiled
Regexp on Ruby 3.2+ exactly as before; on Ruby < 3.2 the Timeout.timeout
wrapping stays at match time. Both the runtime and precompiled paths share
Evaluator.compile_regexp (build) and the private #apply_match (apply with
timeout), so the version-split logic lives in one place.
Dynamically-computed needles (non-literal second argument) keep compiling
at evaluation time via #matches. Invalid literal patterns fall back to the
runtime path so they still raise at evaluation time as before, rather than
moving the RegexpError to expression-compile time.
Tests: extended evaluator_spec to cover #matches_compiled correctness and
timeout, the Compiler precompilation/dynamic/invalid-fallback branches, and
parametrized the timeout examples over both paths. integration_spec now
passes compiler.regexps to Expression so its literal `matches` case
exercises the precompiled path end to end.
Verified functionally on Ruby 3.2.3 via a standalone harness exercising the
real EL classes: compile -> Expression -> evaluate for literal needles,
matches/matches_compiled correctness (incl. nil haystack -> false), and
timeout firing (Regexp::TimeoutError, ~0.2s) on both paths. rspec, Steep,
and the Ruby < 3.2 Timeout.timeout path were not run locally (bundle install
fails with Bundler::PermissionError writing the gem dir); the di:di_with_ext
CI matrix (2.6-4.0) validates those.
The matches-precompilation change accumulated precompiled Regexps in a Compiler instance variable (@regexps) populated as a side effect of the recursive compile walk, then pulled back out by the caller via an attr_reader and threaded into Expression. That made a previously stateless AST->String transformer stateful, smuggled a second result out through an accessor instead of a return value, and let regexps accumulate across calls if a Compiler instance were reused (no guard enforced one-compile-per-instance). Make Compiler#compile return [code, regexps] and thread the regexp accumulator as an explicit local parameter through compile_partial and precompile_regexp. The Compiler holds no per-compilation state and is safe to reuse. Callers destructure the pair; Expression and Evaluator are unchanged. Verified: spec/datadog/di/el/{evaluator,integration}_spec.rb (154 examples, 0 failures) and the touched probe_notification_builder evaluate_template example pass; steep check clean on compiler.rb, expression.rb, probe_builder.rb. (The probe_notification_builder exception_message failures are the uncompiled libdatadog_api C extension, unrelated to this change.)
Address review feedback on the matches-timeout change:
- Type `matches_compiled`/`apply_match` haystack as String in the RBS,
matching the sibling `matches` signature, instead of untyped. Verified
with `steep check`.
- Remove `matches` from TWO_ARG_METHODS: it is now intercepted by the
dedicated `when 'matches'` branch (which precedes the generic two-arg
branch), so the membership was dead. Add a comment noting the special
casing.
- Fix the stale `compiled:` documentation in the string.yml `matches`
integration case: a literal needle now compiles to
`matches_compiled(ref('var'), 0)`, not `matches(...)`.
spec/datadog/di/el (integration + evaluator) passes, 154 examples.
Replace post-construction setter injection (@evaluator.regexps =) with constructor injection. The Evaluator now takes regexps in initialize and exposes it read-only; the public attr_writer and the lazy `@regexps ||= []` default are gone, so regexps is set once and cannot be mutated afterward. Steep types Class#new as () -> untyped and cannot see the inherited initializer through the dynamically created Evaluator subclass in Expression, so the cls.new(regexps) call carries a documented steep:ignore. The RBS for initialize is also corrected: it previously claimed a Context parameter the implementation never had. spec/datadog/di/el, probe_notification_builder, probe_builder pass (182 examples); steep check clean on the three el files.
|
@codex review |
|
Codex Review: Didn't find any major issues. 🚀 Reviewed commit: ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Replace bare RUBY_VERSION string comparisons in the EL matches timeout code with Datadog::RubyVersion.is?, the codebase's canonical version check. Datadog::RubyVersion.is? compares via Gem::Version semantics, avoiding the lexical-comparison hazard of bare RUBY_VERSION (e.g. "3.2.10" < "3.2.3" as strings) and matching the existing usage in di/base.rb and di/component.rb. Adds require_relative '../../ruby_version' to evaluator.rb because the load-time `require 'timeout'` guard now calls the helper before the rest of the library is loaded; ruby_version.rb only depends on rubygems, so it is safe to require on the DI path. Verified: steep check clean (full-context run including the inline-checked ruby_version.rb); spec/datadog/di/el 154 examples, 0 failures.
The Ruby < 3.2 branch wraps re.match? in Timeout.timeout, whose Thread#raise can interrupt the wrapped block at any point. That is safe only because re.match? on a string is side-effect-free pure computation; widening the block to include mutation, IO, or resource cleanup would risk partial state when the interrupt lands. Document the precondition at the call site so a future maintainer does not widen the block unsafely. Comment-only change; spec/datadog/di/el 154 examples, 0 failures.
Use the global-namespace :: prefix for core/stdlib classes (String, Array, Regexp, Integer, Float) in the RBS signatures added or modified by the matches-timeout change, so Steep resolves them to the core classes rather than any same-named type in the current module's namespace. Scoped to the lines this branch introduced; pre-existing bare names elsewhere in these files are left unchanged. Verified: steep check clean for the EL files (full-context run including the inline-checked ruby_version.rb); spec/datadog/di/el 154 examples, 0 failures.
Address review comment: choose the compile_regexp and apply_match method definitions once at load time instead of checking Datadog::RubyVersion.is? on every invocation, since the running Ruby version is fixed for the process and the branch result never changes. - lib/datadog/di/el/evaluator.rb: self.compile_regexp now defined per-version inside a single class-body if/else - lib/datadog/di/el/evaluator.rb: apply_match same treatment; private :apply_match retained after the if/else No behavior change: identical per-version logic, unchanged method signatures (RBS unaffected). Verified: ruby -c clean, bundle exec steep check clean, evaluator_spec 11 examples 0 failures (Ruby 3.2.3).
| # @return [Integer, nil] index into +regexps+, or nil if invalid. | ||
| def precompile_regexp(needle, regexps) | ||
| index = regexps.length | ||
| regexps << Evaluator.compile_regexp(needle) |
There was a problem hiding this comment.
Is regexps a cache, to avoid recompiling regular expressions during #compile?
My guess is that it is a compilation cache.
Knowing that, I can now understand what "accumulator for regexps precompiled" means, but when I first read the "accumulators" comment, I thought it was related to an aggregator result field.
If so, I think the descriptions around regexps are not clear that it's a supporting structure to make code faster (unless I'm reading the comments wrong).
| # Needle is computed at evaluation time (or is an invalid | ||
| # literal that must raise at evaluation time, as before): | ||
| # compile per call. | ||
| "matches(#{compile_partial(first, regexps)}, (#{compile_partial(second, regexps)}))" |
There was a problem hiding this comment.
When does this else happen, from a DI product point of view? Bad configuration?
| # Compiles +ast+ into eval'able Ruby source plus the Regexps | ||
| # precompiled from any literal `matches` needles. The regexps are |
There was a problem hiding this comment.
Reading Regexps precompiled from any literal `matches` needles., I don't understand what "needle" means here.
I'd think it's either related to a scan/match position in regex matching, but that wouldn't make sense in this paragraph; or related to "matches needles" or "literal needles", but I couldn't figure out what "needle" could mean in this case.
I thought that maybe I missed a DI-specific concept of needle, but it seems like it's only scoped to regular expression matching in DI.
In other words: I'm not sure what "needle" means throughout the PR (not only in this paragraph).
What does this PR do?
Caps the regexp evaluation time of the
matchesoperator in the Dynamic Instrumentation expression language (lib/datadog/di/el/evaluator.rb) at 500ms per call.Regexp.new(needle, timeout: 0.5). The regexp engine itself enforces the cap at every backtrack step and raisesRegexp::TimeoutErrorwhen the budget is exhausted.Regexp.compile(needle)is preserved and the=~is wrapped inTimeout.timeout(0.5). Onigmo on those versions polls for pending interrupts during matching, soTimeout.timeout'sThread#raisedoes bound the runtime (~100ms granularity).The timeout is a class-level constant
Datadog::DI::EL::Evaluator::MATCHES_TIMEOUT_SECONDS, kept private (@api private) like the rest of the evaluator.Motivation:
matchescompiles a string into a Regexp and applies it. Pathological patterns (catastrophic backtracking) can otherwise run for many seconds on short haystacks and stall the thread executing the probe before the probe's post-execution circuit breaker (check_and_disable_if_exceeded,lib/datadog/di/instrumenter.rb:598) gets a chance to fire.Capping at the regexp engine bounds the operator's wall-clock cost independently of pattern shape or haystack content.
Change log entry
Yes. DI: cap regexp evaluation time in the expression language
matchesoperator at 500ms.Additional Notes:
The fix is a bit ugly but a better fix requires rewriting the entire compilation and evaluation logic which is not in scope for this PR.
Regexp.new(..., timeout:)is documented at https://docs.ruby-lang.org/en/3.2/Regexp.html#class-Regexp-label-Timeout. It interrupts the matcher itself, unlikeTimeout.timeout, which uses a separate sleeper thread and delivers the interrupt viaThread#raise. For thematchesoperator both are safe because the matcher is a side-effect-free pure computation, and the EL is restricted to side-effect-free operators by design.How to test the change?
New spec at
spec/datadog/di/el/evaluator_spec.rbcovers:^([a-z]+)*\1$against('a' * 30) + '1'— over 5 seconds baseline on every supported Ruby) is interrupted after approximately the configured timeout.For case (2) the timeout is parameterized over two values (0.2s and 1.0s) via
stub_constonMATCHES_TIMEOUT_SECONDS. Each case asserts that the elapsed wall-clock time tracks the configured timeout. Running both cases together demonstrates that the timeout is actually controlling behavior rather than the regexp finishing naturally or some unrelated bound being hit.The spec was verified locally on Ruby 2.5 (skipped — DI requires 2.6+), 2.6, 2.7, 3.0, 3.1, 3.2, 3.3, and 3.4. The existing
spec/datadog/di/el/integration_spec.rb(143 examples, including thematchescases inintegration_cases/string.yml) continues to pass.