[search] support uppercase OR/AND by drew-u410 · Pull Request #500 · sourcebot-dev/sourcebot

drew-u410 · 2025-09-09T12:08:25Z

Support uppercase AND and OR in search syntax.
Add a few tests around how this works.

Summary by CodeRabbit

New Features
- Search now accepts uppercase boolean operators (AND, OR) in addition to lowercase, including within parentheses and with prefixes (e.g., (file:yarn.lock OR file:package.json)).
Documentation
- Updated syntax reference to state boolean operators are case-insensitive.
- Added examples demonstrating uppercase OR (e.g., foo OR bar) and combined prefix usage.

coderabbitai · 2025-09-09T12:08:33Z

Walkthrough

Added query normalization to convert uppercase boolean operators (OR, AND) to lowercase in transformZoektQuery. Introduced unit tests validating normalization across varied cases and parentheses. Updated documentation to state boolean operators are case-insensitive and added examples, including prefixed expressions with uppercase OR.

Changes

Cohort / File(s)	Summary
Docs: Search syntax `docs/docs/features/search/syntax-reference.mdx`	Clarifies boolean operators are case-insensitive; adds examples for `OR` and a combined `(file:yarn.lock OR file:package.json)` case.
Search API: Query normalization `packages/web/src/features/search/searchApi.ts`	In transformZoektQuery, adds normalization of standalone `OR`/`AND` to `or`/`and` before existing parsing/transforms; no API/signature changes.
Tests: Query normalization `packages/web/src/features/search/searchApi.test.ts`	New Vitest suite for normalizeQueryOperators behavior: standalone operators, parentheses, mixed/complex queries, non-operator substrings, idempotence on lowercase, multiple occurrences.

Sequence Diagram(s)

sequenceDiagram
  participant UI as UI
  participant SearchAPI as transformZoektQuery
  participant Normalizer as OperatorNormalizer
  participant Transformer as QueryTransformer
  participant Backend as Search Backend

  UI->>SearchAPI: queryString
  activate SearchAPI
  SearchAPI->>Normalizer: normalize(queryString)
  activate Normalizer
  Note over Normalizer: Replace standalone OR/AND<br/>with or/and
  Normalizer-->>SearchAPI: normalizedQuery
  deactivate Normalizer

  SearchAPI->>Transformer: transform(normalizedQuery)
  activate Transformer
  Note over Transformer: Existing mapping (rev:, context:, etc.)
  Transformer-->>SearchAPI: transformedQuery
  deactivate Transformer

  SearchAPI->>Backend: transformedQuery
  Backend-->>SearchAPI: results
  SearchAPI-->>UI: results
  deactivate SearchAPI

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Pre-merge checks (3 passed)

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title clearly summarizes the central change of the pull request by indicating that the search feature now supports uppercase OR and AND operators, matching the main modifications in the changeset.
Description Check	✅ Passed	The description succinctly outlines the key adjustments made by the pull request, specifically adding support for uppercase boolean operators and related tests, which directly correspond to the implemented changes.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

Poem

I nibble on queries, hop through AND and OR,
Flip caps to lowercase, then bound across the floor.
In docs I leave footprints: examples, crisp and neat—
Tests like clover patches make the change complete.
With whiskers twitching, I search for more to explore. 🐇🔎

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

drew-u410 · 2025-09-09T12:09:59Z

@coderabbitai review

coderabbitai · 2025-09-09T12:10:07Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

docs/docs/features/search/syntax-reference.mdx (1)
5-5: Fix grammar: “enabled” → “enables”.

User-facing docs typo.
-Sourcebot uses a powerful regex-based query language that enabled precise code search within large codebases.
+Sourcebot uses a powerful regex-based query language that enables precise code search within large codebases.

🧹 Nitpick comments (5)

packages/web/src/features/search/searchApi.ts (2)
39-50: Confirm intended scope: mixed-case operators (e.g., 'Or', 'AnD') are not normalized.

If the docs promise “case-insensitive operators,” extend normalization to any case; otherwise, clarify docs to “lowercase and uppercase only.”

Example tweak inside the helper if full case-insensitivity is desired:
// replace detection with:
const lowerTwo = input.slice(i, i + 2).toLowerCase();
const lowerThree = input.slice(i, i + 3).toLowerCase();
if (lowerTwo === "or" && isBoundary(prev) && isBoundary(next2)) { out += "or"; i += 2; continue; }
if (lowerThree === "and" && isBoundary(prev) && isBoundary(next3)) { out += "and"; i += 3; continue; }
51-51: Tokenization is fragile with split(" "); consider resilient whitespace handling.

split(" ") drops tabs/multiple spaces into empty tokens. If you keep this approach, prefer split(/\s+/). If you later parse quotes/parentheses, a proper tokenizer would be safer.

Apply:
-    const prevQueryParts = normalizedQuery.split(" ");
+    const prevQueryParts = normalizedQuery.trim().split(/\s+/);
docs/docs/features/search/syntax-reference.mdx (1)
18-18: Clarify AND semantics.

Mention that AND is implicit; uppercase AND is accepted but unnecessary.
-Multiple expressions can be or'd together with `or` (or `OR`), negated with `-`, or grouped with `()`. Both lowercase and uppercase boolean operators are supported.
+Multiple expressions can be or'd together with `or` (or `OR`), negated with `-`, or grouped with `()`. `and` (and `AND`) is supported but implicit between terms.
packages/web/src/features/search/searchApi.test.ts (2)
3-11: Avoid duplicating production logic in tests; import the helper instead.

Export the normalization helper from searchApi and import it here to prevent drift.

Apply:
-import { describe, it, expect } from 'vitest';
+import { describe, it, expect } from 'vitest';
+import { normalizeBoolOperators } from './searchApi';
@@
-// Simple function to test the query transformation logic
-// We'll extract just the normalization part to test it separately
-const normalizeQueryOperators = (query: string): string => {
-    return query
-        // Replace standalone uppercase OR with lowercase or
-        .replace(/\bOR\b/g, 'or')
-        // Replace standalone uppercase AND with lowercase and (though AND is implicit in Zoekt)
-        .replace(/\bAND\b/g, 'and');
-};
+// Using the actual implementation under test
+const normalizeQueryOperators = normalizeBoolOperators;
48-56: Decide on mixed-case operators and test accordingly.

If “case-insensitive” is intended, add tests for Or/AnD; otherwise, add a test asserting they are left unchanged and reflect that in docs.

Example test to add:
+        it('should (not) convert mixed-case operators depending on policy', () => {
+            // If supporting case-insensitive:
+            // expect(normalizeQueryOperators('A Or B AnD C')).toBe('A or B and C');
+            // Else:
+            expect(normalizeQueryOperators('A Or B AnD C')).toBe('A Or B AnD C');
+        });

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ccd0706 and 8cc896d.

📒 Files selected for processing (3)

docs/docs/features/search/syntax-reference.mdx (1 hunks)
packages/web/src/features/search/searchApi.test.ts (1 hunks)
packages/web/src/features/search/searchApi.ts (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: build

coderabbitai · 2025-09-09T12:15:54Z

+        it('should not affect lowercase operators', () => {
+            expect(normalizeQueryOperators('file:yarn.lock or file:package.json'))
+                .toBe('file:yarn.lock or file:package.json');
+        });
+
+        it('should not affect OR/AND when part of other words', () => {
+            expect(normalizeQueryOperators('ORDER BY something'))
+                .toBe('ORDER BY something');
+
+            expect(normalizeQueryOperators('ANDROID app'))
+                .toBe('ANDROID app');
+        });


⚠️ Potential issue

Add tests to ensure quoted literals are not altered.

Covers the critical case the regex would break.

it('should not affect OR/AND when part of other words', () => { expect(normalizeQueryOperators('ORDER BY something')) .toBe('ORDER BY something'); expect(normalizeQueryOperators('ANDROID app')) .toBe('ANDROID app'); }); + it('should not alter quoted phrases containing OR/AND', () => { + expect(normalizeQueryOperators('"A OR B"')).toBe('"A OR B"'); + expect(normalizeQueryOperators('content:"\\bOR\\b"')).toBe('content:"\\bOR\\b"'); + });

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

it('should not affect lowercase operators', () => {

expect(normalizeQueryOperators('file:yarn.lock or file:package.json'))

.toBe('file:yarn.lock or file:package.json');

});

it('should not affect OR/AND when part of other words', () => {

expect(normalizeQueryOperators('ORDER BY something'))

.toBe('ORDER BY something');

expect(normalizeQueryOperators('ANDROID app'))

.toBe('ANDROID app');

});

it('should not affect lowercase operators', () => {

expect(normalizeQueryOperators('file:yarn.lock or file:package.json'))

.toBe('file:yarn.lock or file:package.json');

});

it('should not affect OR/AND when part of other words', () => {

expect(normalizeQueryOperators('ORDER BY something'))

.toBe('ORDER BY something');

expect(normalizeQueryOperators('ANDROID app'))

.toBe('ANDROID app');

});

it('should not alter quoted phrases containing OR/AND', () => {

expect(normalizeQueryOperators('"A OR B"')).toBe('"A OR B"');

expect(normalizeQueryOperators('content:"\\bOR\\b"')).toBe('content:"\\bOR\\b"');

});

🤖 Prompt for AI Agents

In packages/web/src/features/search/searchApi.test.ts around lines 35 to 46, add tests that assert normalizeQueryOperators does not modify quoted literals: create one or more it blocks passing strings with quoted phrases containing operator-like tokens (e.g. '"OR"', '"AND"', '"or"' or '"OR in a phrase"') and expect the output to equal the original input; ensure both single and double quoted cases are covered and that operators inside quotes remain unchanged.

coderabbitai · 2025-09-09T12:15:54Z

+    // First, normalize boolean operators to lowercase (Zoekt requirement)
+    // Zoekt only recognizes lowercase 'or' and 'and' operators, but users often expect
+    // uppercase OR/AND to work. This transformation allows both cases to work.
+    // Examples:
+    //   - "(file:yarn.lock OR file:package.json)" → "(file:yarn.lock or file:package.json)"
+    //   - "foo AND bar" → "foo and bar" (though AND is implicit in Zoekt)
+    let normalizedQuery = query
+        // Replace standalone uppercase OR with lowercase or
+        .replace(/\bOR\b/g, 'or')
+        // Replace standalone uppercase AND with lowercase and (though AND is implicit in Zoekt)
+        .replace(/\bAND\b/g, 'and');
+


⚠️ Potential issue

Avoid rewriting OR/AND inside quoted phrases; implement token-aware normalization.

The regex replaces OR/AND anywhere, including within quoted literals (e.g., "A OR B" or content:"\bOR\b"), which can change query semantics, especially under case-sensitive searches. Limit normalization to tokens outside quotes and only when surrounded by whitespace/parentheses.

Apply this diff to delegate to a safer helper:

- let normalizedQuery = query - // Replace standalone uppercase OR with lowercase or - .replace(/\bOR\b/g, 'or') - // Replace standalone uppercase AND with lowercase and (though AND is implicit in Zoekt) - .replace(/\bAND\b/g, 'and'); + const normalizedQuery = normalizeBoolOperators(query);

Add this helper (outside the shown range, e.g., above transformZoektQuery):

export const normalizeBoolOperators = (input: string): string => { let out = ""; let i = 0; let inQuotes = false; const isBoundary = (ch: string | undefined) => !ch || /\s|\(|\)/.test(ch); while (i < input.length) { const ch = input[i]; if (ch === '"') { inQuotes = !inQuotes; out += ch; i++; continue; } if (!inQuotes) { const two = input.slice(i, i + 2); const three = input.slice(i, i + 3); const prev = i > 0 ? input[i - 1] : undefined; const next2 = input[i + 2]; const next3 = input[i + 3]; if (two === "OR" && isBoundary(prev) && isBoundary(next2)) { out += "or"; i += 2; continue; } if (three === "AND" && isBoundary(prev) && isBoundary(next3)) { out += "and"; i += 3; continue; } } out += ch; i++; } return out; };

This is a good catch, good bot. @drew-u410 we will need to handle this case - a solution like the bot suggested looks good (as long as we pair it with some good UTs, thx for starting those 🤠).

Longer term, I'd like to move to using a parser generator like Lezer where we can formally define the grammar of Sourcebot's search query language, have it generate a syntax tree for us, and then translate that syntax tree into a zoekt compatible query. Lezer has the added benefit that it's what Codemirror uses, so we could get additional benefits like error squiggles & better syntax highlighting support for our search box.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

brendan-kellam

thx for the contribution! Some thinking still to be done w.r.t., handling operators enclosed in quotes, but overall supportive of this change

brendan-kellam · 2025-09-11T22:03:10Z

+        // Replace standalone uppercase OR with lowercase or
+        .replace(/\bOR\b/g, 'or')
+        // Replace standalone uppercase AND with lowercase and (though AND is implicit in Zoekt)
+        .replace(/\bAND\b/g, 'and');


and isn't actually a boolean operator that is natively supported in zoekt and by extension Sourcebot (I see that Sourcegraph supports it).

I do see the value in having a explicit and boolean operator though, but maybe something we want to roll into a separate change.

brendan-kellam · 2025-09-11T22:14:44Z

+    // First, normalize boolean operators to lowercase (Zoekt requirement)
+    // Zoekt only recognizes lowercase 'or' and 'and' operators, but users often expect
+    // uppercase OR/AND to work. This transformation allows both cases to work.
+    // Examples:
+    //   - "(file:yarn.lock OR file:package.json)" → "(file:yarn.lock or file:package.json)"
+    //   - "foo AND bar" → "foo and bar" (though AND is implicit in Zoekt)
+    let normalizedQuery = query
+        // Replace standalone uppercase OR with lowercase or
+        .replace(/\bOR\b/g, 'or')
+        // Replace standalone uppercase AND with lowercase and (though AND is implicit in Zoekt)
+        .replace(/\bAND\b/g, 'and');
+


This is a good catch, good bot. @drew-u410 we will need to handle this case - a solution like the bot suggested looks good (as long as we pair it with some good UTs, thx for starting those 🤠).

Longer term, I'd like to move to using a parser generator like Lezer where we can formally define the grammar of Sourcebot's search query language, have it generate a syntax tree for us, and then translate that syntax tree into a zoekt compatible query. Lezer has the added benefit that it's what Codemirror uses, so we could get additional benefits like error squiggles & better syntax highlighting support for our search box.

drew-u410 · 2025-09-12T11:03:51Z

Cool, will close for now- just wanted to start the discussion. If you want to drive it, feel free!

[search] support uppercase OR/AND

8cc896d

drew-u410 force-pushed the drew/uppercase_operators branch from 3793ff9 to 8cc896d Compare September 9, 2025 12:13

coderabbitai bot reviewed Sep 9, 2025

View reviewed changes

brendan-kellam reviewed Sep 11, 2025

View reviewed changes

drew-u410 closed this Sep 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[search] support uppercase OR/AND#500

[search] support uppercase OR/AND#500
drew-u410 wants to merge 1 commit intosourcebot-dev:mainfrom
drew-u410:drew/uppercase_operators

drew-u410 commented Sep 9, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 9, 2025 •

edited

Loading

Uh oh!

drew-u410 commented Sep 9, 2025

Uh oh!

coderabbitai bot commented Sep 9, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 9, 2025

Uh oh!

coderabbitai bot Sep 9, 2025 •

edited

Loading

Uh oh!

brendan-kellam Sep 11, 2025

Uh oh!

coderabbitai bot Sep 11, 2025

Uh oh!

brendan-kellam left a comment

Uh oh!

brendan-kellam Sep 11, 2025

Uh oh!

brendan-kellam Sep 11, 2025

Uh oh!

drew-u410 commented Sep 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

drew-u410 commented Sep 9, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks (3 passed)

Poem

Uh oh!

drew-u410 commented Sep 9, 2025

Uh oh!

coderabbitai bot commented Sep 9, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brendan-kellam Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

brendan-kellam left a comment

Choose a reason for hiding this comment

Uh oh!

brendan-kellam Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

brendan-kellam Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

drew-u410 commented Sep 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

drew-u410 commented Sep 9, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 9, 2025 •

edited

Loading

coderabbitai bot Sep 9, 2025 •

edited

Loading