Skip to content

feat(analyzer): Enhance type-aware pattern matching and generic type handling#99

Open
misonijnik wants to merge 27 commits intomainfrom
misonijnik/match-generic-types
Open

feat(analyzer): Enhance type-aware pattern matching and generic type handling#99
misonijnik wants to merge 27 commits intomainfrom
misonijnik/match-generic-types

Conversation

@misonijnik
Copy link
Copy Markdown
Member

No description provided.

@misonijnik misonijnik force-pushed the misonijnik/match-generic-types branch from e0d76b7 to 62580aa Compare April 21, 2026 15:16
@Saloed Saloed force-pushed the misonijnik/match-generic-types branch from 62580aa to c3a715f Compare April 29, 2026 12:26
@Saloed
Copy link
Copy Markdown
Contributor

Saloed commented Apr 29, 2026

  1. Looks like TypeNamePattern.WildcardType should work as TypeNamePattern.AnyType (see unifyTypeName)
  2. Let's use condition on the Result position, instead of SerializedSignatureMatcher
  3. In ClassPattern it is better to use null to represent no-type-args
  4. Use PositionTypeResolver in JIRBasicAtomEvaluator instead of typedMethod
  5. SerializedTypeNameMatcher must be specialized during rule resolution (see TaintConfiguration), so matchErasedName method must be removed

@misonijnik misonijnik force-pushed the misonijnik/match-generic-types branch 2 times, most recently from 517e0b0 to e1dc401 Compare April 30, 2026 22:29
misonijnik added 25 commits May 1, 2026 00:29
Replace source pre-resolution approach with simpler plumbing fix:
preserve generic type args through the existing pipeline and match
against JIRTypedMethod at runtime. Covers method-level signatures
and call-site receiver generics via LocalVariableTypeTable.
- Add private matchType(JIRType) extension on SerializedTypeNameMatcher that
  performs structural generic comparison (ClassPattern with typeArgs vs JIRClassType)
  and array matching (Array vs JIRArrayType), falling back to erased name matching
  for backward compatibility
- Update resolveIsType() to extract typeArgs from ClassPattern normalizedTypeIs
  and pass them to TypeMatchesPattern for deferred evaluation at instruction level
- Add imports: JIRArrayType, JIRClassType, JIRType from org.opentaint.ir.api.jvm
…ceiver matching

- Fix resolveIsType() short-circuit: when typeArgs are present, skip
  mkTrue() early return and defer to TypeMatchesPattern deferred evaluation
- Add typedMethod: JIRTypedMethod? parameter to JIRBasicAtomEvaluator
  and JIRMarkAwareConditionRewriter (optional, backward-compatible)
- Extend typeMatchesPattern() to check generic type args via
  JIRTypedMethod.typeOf(LocalVariableNode) resolved from LocalVariableTypeTable
- Add matchType() and matchErasedName() helpers for SerializedTypeNameMatcher
  recursive matching against JIRType (class, array, wildcards)
- Pass returnType from SemgrepPatternAction.MethodSignature to automata
  MethodSignature in ActionListToAutomata.constructSignatureFormula()
- Unify returnType in MethodFormulaSimplifier.unify()
- Preserve returnType in notEvaluatedSignature()
- Move return type processing before early return in evaluateFormulaSignature()
- Fix normalizeAnyName() in ClassNameUtils to preserve typeArgs
- Store cp in TaintConfiguration for typed method resolution
- Add resolveTypedMethod() for eager generic type checking on Argument/Result
- Use erased class name (jIRClass.name) instead of typeName in matchType()
  to avoid matching against generic-param-decorated names
- Fix TypeAwarePatternTest to pass EXPECT_STATE_VAR for generic test
Adds six pattern-inside return-type / param-type matching sample rules
that exercise generic, array, nested-generic, wildcard and raw-type
forms introduced on this branch. Tests also pin current engine behavior
where the method-decl return-type specificity is effectively ignored
(A1/A3/A6 show ResponseEntity<X> rules match other parameterized and
raw forms of ResponseEntity).
The three samples previously pinned over-matching behavior as Positive
with explanatory comments. That made TypeAwarePatternTest report
"10/10 passing" while silently hiding the gap that this branch's
type-matching feature does not discriminate by concrete type argument
at method-decl return position.

Rename the pins:

- RuleWithGenericByteArrayReturnType:
PositiveResponseEntityStringPinsOverMatch
  → NegativeResponseEntityString (rule asks for ResponseEntity<byte[]>,
  sample method returns ResponseEntity<String>)
- RuleWithNestedGenericReturnType: PositiveFlatGenericPinsOverMatch
  → NegativeFlatGeneric (rule asks for ResponseEntity<List<String>>,
  sample method returns ResponseEntity<String>)
- RuleWithRawResponseEntity: PositiveParameterizedString and
  PositiveParameterizedByteArray → NegativeParameterizedString and
  NegativeParameterizedByteArray (rule uses raw ResponseEntity, sample
  methods return parameterized forms)

With honest labels, TypeAwarePatternTest reports 7 passing / 3
failing, the three failures surfacing the actual engine behavior on
this branch. The @disabled B11 in EngineGapsTest remains as a second
witness to the same gap on origin/main.
Before this change TypeAwarePatternTest's A1, A3, and A6 failed by
over-matching:

- A1 (rule `ResponseEntity<byte[]>`) matched `ResponseEntity<String>`
  methods — concrete type-arg specificity on the return type was
  ignored.
- A3 (rule `ResponseEntity<List<String>>`) matched
  `ResponseEntity<String>` methods — nested generic specificity was
  ignored.
- A6 (rule raw `ResponseEntity`) matched parameterized
  `ResponseEntity<String>` and `ResponseEntity<byte[]>` methods — raw
  vs parameterized collapsed to erased name.

The fix has four pieces:

1. SerializedSignatureMatcher.matchFunctionSignature now resolves the
   method via cp.typeOf(...).declaredMethods (JIRTypedMethod) so the
   structural `matchType(JIRType)` sees real generic type arguments.
   When typed resolution fails, it falls back to the pre-existing
   erased-name match on TypeName.typeName.

2. matchType's "no typeArgs on matcher" branch now requires the class
   type to be raw-like (no concrete type arguments) — otherwise a
   pattern like `ResponseEntity` would silently match
   `ResponseEntity<String>`. Raw-like means either no type arguments
   at all or only declared type variables / unbound wildcards (so a
   raw method whose resolved type surfaces its class's own type
   variable — `ResponseEntity<T>` — still matches the raw rule).

3. typeMatcher preserves the arity of a class-pattern's typeArgs list
   even when an inner metavariable / AnyType resolves to a null
   constraint. Previously `mapNotNull` silently dropped such entries,
   collapsing `ResponseEntity<$T>` into the raw form. Empty slots now
   become an `anyClassPattern()` matcher.

4. The wildcard `?` in pattern type arguments is now a first-class
   TypeName.WildcardTypeName instead of a dropped null, so
   `ResponseEntity<?>` keeps its arity through parsing and rewriting
   and gets converted to TypeNamePattern.AnyType by the pattern
   converter.

TypeAwarePatternTest now reports 10/10 passing (was 7/10 after the
honest-label conversion). Full :opentaint-java-querylang:test suite
stays green (no regressions elsewhere).
Add four new cases to TypeAwarePatternTest to surface remaining gaps in
handling generic function definitions:

- A8 Map<$K, String> mixed metavar + concrete (PASSES).
- A10 List<List<String>> deep nesting (PASSES for both flat and inner
  mismatch Negatives).
- A12 List<String> concrete parameter-position discrimination (PASSES).
- A13 ResponseEntity<java.lang.String> fully-qualified type argument
  (PASSES; FQN resolves to the same class as simple name).
Add seven new cases to TypeAwarePatternTest to further saturate coverage
of generic function-definition matching:

- A15 List<String>[] array of parameterized type (PASSES; array +
  inner-type arg discrimination works).
- A17 String vs Integer concrete return discrimination (PASSES).
- A19 List<Map<String, Integer>> nested generic in parameter position
  (PASSES; structural recursion works on parameter side as well as
  return side).
- A20 Class<$T> reflection-style parameter with metavar type arg
  (PASSES).
- A21 Collection<String> vs List<String> (PASSES after flipping the
  List<String> sample from Positive to Negative — observed: the engine
  uses exact-type matching at method-decl return position, not subtype
  widening; List<String> does NOT match a Collection<String> pattern).
- A22 Map<String, List<Integer>> nested mixed containers (PASSES; all
  four structural-mismatch negatives stay silent).
- A23 String[][] two-dim array return — dimension + element-type
  discrimination (PASSES).

Additional engine gap exposed:
- Subtype widening (A21) — the engine does exact-type matching; a
  Collection<T> pattern does NOT match a List<T>-returning method, in
  contrast to semgrep's widening semantics. Documented as a Negative.
Two parallel implementations of SerializedTypeNameMatcher.matchType
had grown up across modules:

- TaintConfiguration (opentaint-jvm-sast-dataflow): strict matcher
  with isRawLike() check for raw-vs-parameterized discrimination,
  uses PatternManager-cached name matching.
- JIRBasicAtomEvaluator (opentaint-jvm-dataflow): nested-only matcher
  with stateless Regex name matching.

Moved the recursion structure to configuration-rules-jvm (next to
SerializedTypeNameMatcher itself) as a single matchType(JIRType,
erasedMatch) extension. Each caller plugs in its own erased-name
matcher, so the PatternManager cache is preserved in the hot path
while the evaluator keeps its stateless Regex path.

Other cleanups in the same pass:

- Collapsed 4 repeated `if (typedType != null) matchType else match`
  branches in matchFunctionSignature into a matchTypedOrErased helper.
- Extracted resolveTypedPositionType to flatten the nested
  if/when/continue block in resolveIsType.
- Added kdoc on JIRMarkAwareConditionRewriter.typedMethod explaining
  that null silently disables generic-type-argument matching.
- Narrowed and documented the bare catch in resolveGenericType.
- Dropped WHAT-style comments left by the original type-matching
  implementation.

Net: -119 / +51 lines across the three consumer files, +61 lines in
one shared file. Behavior unchanged — TypeAwarePatternTest results
are identical pre- and post-refactor.
A pattern like `ResponseEntity<?>` previously matched concrete
parameterizations such as `ResponseEntity<String>` because wildcards
were lowered to an unconstrained "any class" matcher at the
type-argument slot. Introduce a dedicated wildcard representation —
`TypeNamePattern.WildcardType` in the query language and
`SerializedTypeNameMatcher.Wildcard` in the serialized matcher — so a
wildcard slot in the pattern matches only a `JIRUnboundWildcard` at the
same slot in code.
Pass-through rules like `java.util.List#get` returning
`java.lang.Object`
were no longer matching since typed method resolution via
`cp.typeOf(method.enclosingClass)` surfaces declared return/parameter
types as `JIRTypeVariable` (e.g. `E`) rather than the erased class.
`erasedName()` fell through to `typeName`, producing the type-variable
symbol `"E"` instead of `"java.lang.Object"`, so every string-based
matcher missed. Map type variables and unbound wildcards to their
declared erasure via `jIRClass.name`, and extend the same lookup to
array element types.
misonijnik added 2 commits May 1, 2026 00:29
…dition-on-Result

- Encode return-type constraints as IsType conditions on
PositionBase.Result
  rather than via SerializedSignatureMatcher.Partial.return; drop the
now-unused
  return field on Partial.
- Make ClassPattern.typeArgs nullable (null = no type-args / raw match).
- Specialize SerializedTypeNameMatcher into TypeArgMatcher during rule
  resolution: name matchers are pre-compiled into ConditionNameMatcher,
so the
  runtime evaluator dispatches on a small structural shape instead of
running
  matchErasedName on a serialized matcher.
- Replace JIRBasicAtomEvaluator's typedMethod+ASM-debug-info path with a
  PositionResolver<JIRType?> for resolving the typed view at a position.
- Treat WildcardType as AnyType: collapse it at action translation, then
drop
  the now-dead SerializedTypeNameMatcher.Wildcard /
TypeArgMatcher.Wildcard
  variants. Java's <?> is the supertype of any concrete
parameterization, so
  ResponseEntity<?> accepts any ResponseEntity<X>; A5 sample updated to
flip
  the parameterized form from Negative to Positive.
- resolveIsType now forces a typed-view check for ClassPattern/Array
(instead
  of returning mkTrue early on erased-name match) so a raw pattern
correctly
  rejects parameterized forms when the typed view is available.
@misonijnik misonijnik force-pushed the misonijnik/match-generic-types branch from e1dc401 to 8baa50c Compare April 30, 2026 22:30
@seqradev seqradev requested a review from Saloed May 1, 2026 05:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants