Skip to content

Support JSON-style escaping in SCIM filter string literals (#913)#927

Open
vnekhoroshev-work wants to merge 3 commits into
Captain-P-Goldfish:mainfrom
vnekhoroshev-work:filter_support_escaped_symbols
Open

Support JSON-style escaping in SCIM filter string literals (#913)#927
vnekhoroshev-work wants to merge 3 commits into
Captain-P-Goldfish:mainfrom
vnekhoroshev-work:filter_support_escaped_symbols

Conversation

@vnekhoroshev-work
Copy link
Copy Markdown

Summary

Fixes #913: SCIM filter string literals did not support escaping, so filters could not express attribute values that contain " or \, and some URL-encoded filters failed with lexer errors (e.g. unquoted :) instead of matching stored data.

Per RFC 7644 §3.4.2.2, compValue string syntax should align with JSON string rules. The previous grammar treated the content between quotes as a minimal non-greedy run (STRING: .+?), which does not model escapes and breaks on real-world names (e.g. Azure-style display names).

What changed

  • ScimFilter.g4: Replaced the old quoted-string rule with explicit tokens: a string literal is " followed by zero or more ESC (backslash escapes) or SAFE_CODE_POINT (any character except ", \, or ASCII control characters), then closing ". Escapes support \", \\, \/, \b, \f, \n, \r, \t, and \u hex unicode sequences—aligned with typical JSON-style escaping used in filters.
  • CompareValue: After parsing, the compare value is unescaped with StringEscapeUtils.unescapeJava(...) and the outer quotes are stripped, so the value used for matching matches the actual stored string (e.g. This is "test" user).

Tests

  • FilterVisitorTest: Parameterized parsing of eq filters whose string operands use escaped quotes (including a long special-character style value).
  • FilterResourceResolverTest: Dynamic EQ cases for top-level strings, complex / multi-complex attributes, and string arrays when stored values contain embedded double quotes and the filter uses escaped form.
  • ResourceEndpointTest: End-to-end GET (query string) and POST (SearchRequest) list flows with filters using escaped quotes and sw (starts with), verifying correct handler invocation and result counts.

@Captain-P-Goldfish Captain-P-Goldfish self-requested a review April 21, 2026 17:52
@Captain-P-Goldfish
Copy link
Copy Markdown
Owner

Captain-P-Goldfish commented Apr 23, 2026

If possible, I would like to refrain from additional dependencies if just a single method of this library is used. I would suggest removing this library as we do not need all usecases of the unescape function and use this implementation instead. It will also do the trick.

package de.captaingoldfish.scim.sdk.common.constants;

/**
 * Utility class for unescaping Java/JSON-style escape sequences within strings.
 * <p>
 * This implementation is intentionally lightweight and independent of external libraries. It is designed for
 * use cases such as SCIM filter parsing, where quoted string values may contain escaped characters that must
 * be converted back into their literal form before comparison.
 * </p>
 * <p>
 * The following escape sequences are supported:
 * </p>
 * <ul>
 * <li>{@code \b} -> backspace</li>
 * <li>{@code \t} -> tab</li>
 * <li>{@code \n} -> newline</li>
 * <li>{@code \f} -> form feed</li>
 * <li>{@code \r} -> carriage return</li>
 * <li>{@code \"} -> double quote</li>
 * <li>{@code \'} -> single quote</li>
 * <li>{@code \\} -> backslash</li>
 * <li>{@code \/} -> forward slash</li>
 * <li>{@code \\uXXXX} -> unicode escape with exactly four hexadecimal digits</li>
 * </ul>
 * <p>
 * Unknown escape sequences (for example {@code \x}) are preserved as-is to avoid unintended data loss. A
 * trailing backslash is also preserved literally.
 * </p>
 * <p>
 * This class does not aim to be a full drop-in replacement for {@code StringEscapeUtils.unescapeJava(...)}.
 * Instead, it deliberately supports the escape sequences required by the filter grammar and a few closely
 * related variants that are commonly expected by developers.
 * </p>
 */
public final class JavaStringUnescaper
{

  private JavaStringUnescaper()
  {
    // Utility class
  }

  /**
   * Unescapes supported Java/JSON-style escape sequences in the given input string.
   * <p>
   * If the input is {@code null}, this method returns {@code null}.
   * </p>
   * <p>
   * Supported examples:
   * </p>
   *
   * <pre>
   * {@code
   * unescapeJava("hello\\nworld")   -> "hello\nworld"
   * unescapeJava("\\\"test\\\"")    -> "\"test\""
   * unescapeJava("foo\\/bar")       -> "foo/bar"
   * unescapeJava("\\u0041")         -> "A"
   * }
   * </pre>
   *
   * @param input the input string that may contain escape sequences
   * @return the unescaped string, or {@code null} if the input is {@code null}
   * @throws IllegalArgumentException if an incomplete or invalid unicode escape sequence is encountered
   */
  public static String unescapeJava(String input)
  {
    // Preserve null semantics so callers do not need an additional null check.
    if (input == null)
    {
      return null;
    }

    // Pre-size the builder to roughly the input length to reduce resizing overhead.
    StringBuilder result = new StringBuilder(input.length());

    // Walk through the input one character at a time.
    for ( int i = 0 ; i < input.length() ; i++ )
    {
      char current = input.charAt(i);

      // Fast path for ordinary characters: append directly.
      if (current != '\\')
      {
        result.append(current);
        continue;
      }

      // A trailing backslash cannot form a valid escape sequence.
      // We preserve it literally instead of throwing an exception.
      if (i + 1 >= input.length())
      {
        result.append('\\');
        break;
      }

      // Consume the next character to determine the escape sequence.
      char next = input.charAt(++i);

      switch (next)
      {
        case 'b':
          result.append('\b'); // backspace
          break;

        case 't':
          result.append('\t'); // horizontal tab
          break;

        case 'n':
          result.append('\n'); // newline
          break;

        case 'f':
          result.append('\f'); // form feed
          break;

        case 'r':
          result.append('\r'); // carriage return
          break;

        case '"':
          result.append('\"'); // escaped double quote
          break;

        case '\'':
          result.append('\''); // escaped single quote
          break;

        case '\\':
          result.append('\\'); // escaped backslash
          break;

        case '/':
          result.append('/'); // escaped forward slash (JSON-style)
          break;

        case 'u':
          // Unicode escape sequence: \\uXXXX
          // Exactly four hexadecimal digits must follow.
          if (i + 4 >= input.length())
          {
            throw new IllegalArgumentException("Incomplete unicode escape sequence at index " + (i - 1));
          }

          String hex = input.substring(i + 1, i + 5);

          try
          {
            int codePoint = Integer.parseInt(hex, 16);
            result.append((char)codePoint);
          }
          catch (NumberFormatException ex)
          {
            throw new IllegalArgumentException("Invalid unicode escape sequence: \\u" + hex, ex);
          }

          // Skip the four hex digits because they were already consumed.
          i += 4;
          break;

        default:
          // Preserve unknown escape sequences literally.
          // Example: "\x" remains "\x".
          result.append('\\').append(next);
          break;
      }
    }

    return result.toString();
  }
}
package de.captaingoldfish.scim.sdk.common.constants;

import org.junit.jupiter.api.Assertions;
import org.junit.jupiter.api.DisplayName;
import org.junit.jupiter.api.Test;


/**
 * Test suite for {@link JavaStringUnescaper}.
 * <p>
 * These tests validate the supported escape sequences and the explicitly defined edge-case behavior of the
 * custom unescape implementation.
 * </p>
 * <p>
 * The goal is to verify the behavior required by the filter grammar and matching logic, not to replicate
 * every detail of Apache Commons Text.
 * </p>
 */
class JavaStringUnescaperTest
{

  /**
   * Verifies that {@code null} input is returned unchanged.
   * <p>
   * This ensures callers do not need to add explicit null checks.
   * </p>
   */
  @Test
  @DisplayName("Should return null when input is null")
  void shouldReturnNull()
  {
    Assertions.assertNull(JavaStringUnescaper.unescapeJava(null));
  }

  /**
   * Ensures that strings without escape sequences remain unchanged.
   */
  @Test
  @DisplayName("Should return unchanged string when no escape sequences are present")
  void shouldReturnUnchangedString()
  {
    String input = "hello world";

    String result = JavaStringUnescaper.unescapeJava(input);

    Assertions.assertEquals("hello world", result);
  }

  /**
   * Validates common control character escapes.
   */
  @Test
  @DisplayName("Should correctly unescape common control sequences")
  void shouldUnescapeControlSequences()
  {
    Assertions.assertEquals("hello\nworld", JavaStringUnescaper.unescapeJava("hello\\nworld"));
    Assertions.assertEquals("a\tb", JavaStringUnescaper.unescapeJava("a\\tb"));
    Assertions.assertEquals("line1\rline2", JavaStringUnescaper.unescapeJava("line1\\rline2"));
  }

  /**
   * Validates less frequently used control characters.
   */
  @Test
  @DisplayName("Should correctly unescape backspace and form feed")
  void shouldUnescapeBackspaceAndFormFeed()
  {
    Assertions.assertEquals("a\bb", JavaStringUnescaper.unescapeJava("a\\bb"));
    Assertions.assertEquals("a\fb", JavaStringUnescaper.unescapeJava("a\\fb"));
  }

  /**
   * Ensures correct handling of quotes and backslashes.
   */
  @Test
  @DisplayName("Should correctly unescape quotes and backslash")
  void shouldUnescapeQuotesAndBackslash()
  {
    Assertions.assertEquals("\"test\"", JavaStringUnescaper.unescapeJava("\\\"test\\\""));
    Assertions.assertEquals("'", JavaStringUnescaper.unescapeJava("\\'"));
    Assertions.assertEquals("\\", JavaStringUnescaper.unescapeJava("\\\\"));
  }

  /**
   * Ensures JSON-style escaped forward slashes are supported.
   */
  @Test
  @DisplayName("Should correctly unescape forward slash")
  void shouldUnescapeForwardSlash()
  {
    Assertions.assertEquals("foo/bar", JavaStringUnescaper.unescapeJava("foo\\/bar"));
    Assertions.assertEquals("/", JavaStringUnescaper.unescapeJava("\\/"));
  }

  /**
   * Verifies correct decoding of unicode escape sequences.
   */
  @Test
  @DisplayName("Should correctly unescape unicode sequences")
  void shouldUnescapeUnicode()
  {
    Assertions.assertEquals("A", JavaStringUnescaper.unescapeJava("\\u0041"));
    Assertions.assertEquals("ö", JavaStringUnescaper.unescapeJava("\\u00F6"));
    Assertions.assertEquals("!", JavaStringUnescaper.unescapeJava("\\u0021"));
  }

  /**
   * Ensures incomplete unicode escapes fail fast.
   */
  @Test
  @DisplayName("Should throw exception on incomplete unicode escape sequence")
  void shouldThrowOnIncompleteUnicode()
  {
    IllegalArgumentException exception = Assertions.assertThrows(IllegalArgumentException.class,
                                                                 () -> JavaStringUnescaper.unescapeJava("\\u12"));

    Assertions.assertTrue(exception.getMessage().contains("Incomplete unicode"));
  }

  /**
   * Ensures invalid unicode escapes fail fast.
   */
  @Test
  @DisplayName("Should throw exception on invalid unicode escape sequence")
  void shouldThrowOnInvalidUnicode()
  {
    IllegalArgumentException exception = Assertions.assertThrows(IllegalArgumentException.class,
                                                                 () -> JavaStringUnescaper.unescapeJava("\\uZZZZ"));

    Assertions.assertTrue(exception.getMessage().contains("Invalid unicode"));
  }

  /**
   * Verifies that unknown escape sequences are preserved.
   */
  @Test
  @DisplayName("Should preserve unknown escape sequences as-is")
  void shouldPreserveUnknownEscapes()
  {
    Assertions.assertEquals("\\q", JavaStringUnescaper.unescapeJava("\\q"));
    Assertions.assertEquals("\\x", JavaStringUnescaper.unescapeJava("\\x"));
  }

  /**
   * Ensures trailing backslashes are preserved.
   */
  @Test
  @DisplayName("Should preserve trailing backslash")
  void shouldPreserveTrailingBackslash()
  {
    Assertions.assertEquals("test\\", JavaStringUnescaper.unescapeJava("test\\"));
  }

  /**
   * Validates mixed escape usage in a realistic input.
   */
  @Test
  @DisplayName("Should handle mixed content with multiple escape types")
  void shouldHandleMixedContent()
  {
    String input = "Hello\\nWorld\\t\\u0021 \\\"test\\\" foo\\/bar";

    String result = JavaStringUnescaper.unescapeJava(input);

    Assertions.assertEquals("Hello\nWorld\t! \"test\" foo/bar", result);
  }

  /**
   * Verifies SCIM filter-style usage: escaped quotes are correctly converted before matching.
   */
  @Test
  @DisplayName("Should unescape filter-style quoted content for matching")
  void shouldUnescapeFilterStyleQuotedContentForMatching()
  {
    String input = "This is \\\"test\\\" user";

    String result = JavaStringUnescaper.unescapeJava(input);

    Assertions.assertEquals("This is \"test\" user", result);
  }
}

@vnekhoroshev-work
Copy link
Copy Markdown
Author

Thanks for helping!

I replaced the common-text library with the JavaStringUnescaper class.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SCIM SDK does not support filter values escaping for special symbols

2 participants