Skip to content

text.contains underspecified: Text or pattern? #5751

@ftsfranklin

Description

@ftsfranklin

What's up?

The text substring search functions (contains, starts_with, ends_with) take a pattern rather than plain text, which may be surprising.

The documentation doesn't specify if it's a substring or a pattern.

Since it doesn't escape special characters, it is treated like a pattern, so wildcards pass through:

  • Input: filter (text.contains '%ne_dle' haystack)
  • Output: AND haystack LIKE CONCAT('%', '%ne_dle', '%')
  • The current behavior uses CONCAT to surround the input with %, even if you give a literal. (Some SQL flavors don't allow three args to CONCAT, such as Oracle before 23

What's the intended behavior? It should be documented.
, and Redshift.)

  • If it's intended to be a pattern:
    • For literals, it SHOULD just do LIKE '%ne_dle%', right? This is a minor optimization.
  • If it's meant to be plain text:
    • Escape: LIKE '%ne\_dle%' ESCAPE '\'
      • Transact-SQL: Special characters are %_[].
      • The escape character can be any symbol character which isn't in the string, or else it would need to double existing escape characters in the string.
    • Use INSTR(haystack, 'ne_dle') > 0.
      • Microsoft's Transact: CHARINDEX instead of INSTR.
      • For starts_with, compare to = 1.
      • For ends_with, this may be complicated.
        • INSTR(haystack, 'ne_dle', -1) will search backwards, but the right value is dependent on the lengths of both strings: = LENGTH(haystack) - LENGTH('ne_dle') + 1
        • MySQL doesn't have three-arg INSTR. Instead, it has LOCATE, which can't search from the end of the string, so it will return the first position.
        • SUBSTR(haystack, LENGTH(haystack) - LENGTH('ne_dle') + 1) = 'ne_dle', but may be inefficient by copying.
        • REVERSE turns ends_with into starts_with, but may be inefficient.

If it's supposed to be plain text, it may also be useful to give LIKE as an alternative, such as a text.like function (#1123). I don't know what the project's principle is for handling differences in LIKE syntax.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions