Skip to content

Commit 75968a9

Browse files
committed
fix: support non-Latin text in InMemoryMemoryService search
Fixes #5501. `_extract_words_lower` used `[A-Za-z]+` regex which only matched ASCII letters, silently discarding Japanese, Chinese, Korean, Cyrillic and other non-Latin characters. Change to `\w+` with `re.UNICODE` to match all Unicode word characters.
1 parent c87ee1e commit 75968a9

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

src/google/adk/memory/in_memory_memory_service.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ def _user_key(app_name: str, user_id: str) -> str:
3939

4040
def _extract_words_lower(text: str) -> set[str]:
4141
"""Extracts words from a string and converts them to lowercase."""
42-
return set([word.lower() for word in re.findall(r'[A-Za-z]+', text)])
42+
return set([word.lower() for word in re.findall(r'\w+', text, re.UNICODE)])
4343

4444

4545
class InMemoryMemoryService(BaseMemoryService):

0 commit comments

Comments
 (0)