Skip to content

[search] support uppercase OR/AND#500

Closed
drew-u410 wants to merge 1 commit intosourcebot-dev:mainfrom
drew-u410:drew/uppercase_operators
Closed

[search] support uppercase OR/AND#500
drew-u410 wants to merge 1 commit intosourcebot-dev:mainfrom
drew-u410:drew/uppercase_operators

Conversation

@drew-u410
Copy link
Copy Markdown
Contributor

@drew-u410 drew-u410 commented Sep 9, 2025

  • Support uppercase AND and OR in search syntax.
  • Add a few tests around how this works.

Summary by CodeRabbit

  • New Features
    • Search now accepts uppercase boolean operators (AND, OR) in addition to lowercase, including within parentheses and with prefixes (e.g., (file:yarn.lock OR file:package.json)).
  • Documentation
    • Updated syntax reference to state boolean operators are case-insensitive.
    • Added examples demonstrating uppercase OR (e.g., foo OR bar) and combined prefix usage.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Sep 9, 2025

Walkthrough

Added query normalization to convert uppercase boolean operators (OR, AND) to lowercase in transformZoektQuery. Introduced unit tests validating normalization across varied cases and parentheses. Updated documentation to state boolean operators are case-insensitive and added examples, including prefixed expressions with uppercase OR.

Changes

Cohort / File(s) Summary
Docs: Search syntax
docs/docs/features/search/syntax-reference.mdx
Clarifies boolean operators are case-insensitive; adds examples for OR and a combined (file:yarn.lock OR file:package.json) case.
Search API: Query normalization
packages/web/src/features/search/searchApi.ts
In transformZoektQuery, adds normalization of standalone OR/AND to or/and before existing parsing/transforms; no API/signature changes.
Tests: Query normalization
packages/web/src/features/search/searchApi.test.ts
New Vitest suite for normalizeQueryOperators behavior: standalone operators, parentheses, mixed/complex queries, non-operator substrings, idempotence on lowercase, multiple occurrences.

Sequence Diagram(s)

sequenceDiagram
  participant UI as UI
  participant SearchAPI as transformZoektQuery
  participant Normalizer as OperatorNormalizer
  participant Transformer as QueryTransformer
  participant Backend as Search Backend

  UI->>SearchAPI: queryString
  activate SearchAPI
  SearchAPI->>Normalizer: normalize(queryString)
  activate Normalizer
  Note over Normalizer: Replace standalone OR/AND<br/>with or/and
  Normalizer-->>SearchAPI: normalizedQuery
  deactivate Normalizer

  SearchAPI->>Transformer: transform(normalizedQuery)
  activate Transformer
  Note over Transformer: Existing mapping (rev:, context:, etc.)
  Transformer-->>SearchAPI: transformedQuery
  deactivate Transformer

  SearchAPI->>Backend: transformedQuery
  Backend-->>SearchAPI: results
  SearchAPI-->>UI: results
  deactivate SearchAPI
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Pre-merge checks (3 passed)

✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The title clearly summarizes the central change of the pull request by indicating that the search feature now supports uppercase OR and AND operators, matching the main modifications in the changeset.
Description Check ✅ Passed The description succinctly outlines the key adjustments made by the pull request, specifically adding support for uppercase boolean operators and related tests, which directly correspond to the implemented changes.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.

Poem

I nibble on queries, hop through AND and OR,
Flip caps to lowercase, then bound across the floor.
In docs I leave footprints: examples, crisp and neat—
Tests like clover patches make the change complete.
With whiskers twitching, I search for more to explore. 🐇🔎

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@drew-u410
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Sep 9, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@drew-u410 drew-u410 force-pushed the drew/uppercase_operators branch from 3793ff9 to 8cc896d Compare September 9, 2025 12:13
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/docs/features/search/syntax-reference.mdx (1)

5-5: Fix grammar: “enabled” → “enables”.

User-facing docs typo.

-Sourcebot uses a powerful regex-based query language that enabled precise code search within large codebases.
+Sourcebot uses a powerful regex-based query language that enables precise code search within large codebases.
🧹 Nitpick comments (5)
packages/web/src/features/search/searchApi.ts (2)

39-50: Confirm intended scope: mixed-case operators (e.g., 'Or', 'AnD') are not normalized.

If the docs promise “case-insensitive operators,” extend normalization to any case; otherwise, clarify docs to “lowercase and uppercase only.”

Example tweak inside the helper if full case-insensitivity is desired:

// replace detection with:
const lowerTwo = input.slice(i, i + 2).toLowerCase();
const lowerThree = input.slice(i, i + 3).toLowerCase();
if (lowerTwo === "or" && isBoundary(prev) && isBoundary(next2)) { out += "or"; i += 2; continue; }
if (lowerThree === "and" && isBoundary(prev) && isBoundary(next3)) { out += "and"; i += 3; continue; }

51-51: Tokenization is fragile with split(" "); consider resilient whitespace handling.

split(" ") drops tabs/multiple spaces into empty tokens. If you keep this approach, prefer split(/\s+/). If you later parse quotes/parentheses, a proper tokenizer would be safer.

Apply:

-    const prevQueryParts = normalizedQuery.split(" ");
+    const prevQueryParts = normalizedQuery.trim().split(/\s+/);
docs/docs/features/search/syntax-reference.mdx (1)

18-18: Clarify AND semantics.

Mention that AND is implicit; uppercase AND is accepted but unnecessary.

-Multiple expressions can be or'd together with `or` (or `OR`), negated with `-`, or grouped with `()`. Both lowercase and uppercase boolean operators are supported.
+Multiple expressions can be or'd together with `or` (or `OR`), negated with `-`, or grouped with `()`. `and` (and `AND`) is supported but implicit between terms.
packages/web/src/features/search/searchApi.test.ts (2)

3-11: Avoid duplicating production logic in tests; import the helper instead.

Export the normalization helper from searchApi and import it here to prevent drift.

Apply:

-import { describe, it, expect } from 'vitest';
+import { describe, it, expect } from 'vitest';
+import { normalizeBoolOperators } from './searchApi';
@@
-// Simple function to test the query transformation logic
-// We'll extract just the normalization part to test it separately
-const normalizeQueryOperators = (query: string): string => {
-    return query
-        // Replace standalone uppercase OR with lowercase or
-        .replace(/\bOR\b/g, 'or')
-        // Replace standalone uppercase AND with lowercase and (though AND is implicit in Zoekt)
-        .replace(/\bAND\b/g, 'and');
-};
+// Using the actual implementation under test
+const normalizeQueryOperators = normalizeBoolOperators;

48-56: Decide on mixed-case operators and test accordingly.

If “case-insensitive” is intended, add tests for Or/AnD; otherwise, add a test asserting they are left unchanged and reflect that in docs.

Example test to add:

+        it('should (not) convert mixed-case operators depending on policy', () => {
+            // If supporting case-insensitive:
+            // expect(normalizeQueryOperators('A Or B AnD C')).toBe('A or B and C');
+            // Else:
+            expect(normalizeQueryOperators('A Or B AnD C')).toBe('A Or B AnD C');
+        });
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ccd0706 and 8cc896d.

📒 Files selected for processing (3)
  • docs/docs/features/search/syntax-reference.mdx (1 hunks)
  • packages/web/src/features/search/searchApi.test.ts (1 hunks)
  • packages/web/src/features/search/searchApi.ts (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build

Comment on lines +35 to +46
it('should not affect lowercase operators', () => {
expect(normalizeQueryOperators('file:yarn.lock or file:package.json'))
.toBe('file:yarn.lock or file:package.json');
});

it('should not affect OR/AND when part of other words', () => {
expect(normalizeQueryOperators('ORDER BY something'))
.toBe('ORDER BY something');

expect(normalizeQueryOperators('ANDROID app'))
.toBe('ANDROID app');
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add tests to ensure quoted literals are not altered.

Covers the critical case the regex would break.

         it('should not affect OR/AND when part of other words', () => {
             expect(normalizeQueryOperators('ORDER BY something'))
                 .toBe('ORDER BY something');
             
             expect(normalizeQueryOperators('ANDROID app'))
                 .toBe('ANDROID app');
         });
 
+        it('should not alter quoted phrases containing OR/AND', () => {
+            expect(normalizeQueryOperators('"A OR B"')).toBe('"A OR B"');
+            expect(normalizeQueryOperators('content:"\\bOR\\b"')).toBe('content:"\\bOR\\b"');
+        });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
it('should not affect lowercase operators', () => {
expect(normalizeQueryOperators('file:yarn.lock or file:package.json'))
.toBe('file:yarn.lock or file:package.json');
});
it('should not affect OR/AND when part of other words', () => {
expect(normalizeQueryOperators('ORDER BY something'))
.toBe('ORDER BY something');
expect(normalizeQueryOperators('ANDROID app'))
.toBe('ANDROID app');
});
it('should not affect lowercase operators', () => {
expect(normalizeQueryOperators('file:yarn.lock or file:package.json'))
.toBe('file:yarn.lock or file:package.json');
});
it('should not affect OR/AND when part of other words', () => {
expect(normalizeQueryOperators('ORDER BY something'))
.toBe('ORDER BY something');
expect(normalizeQueryOperators('ANDROID app'))
.toBe('ANDROID app');
});
it('should not alter quoted phrases containing OR/AND', () => {
expect(normalizeQueryOperators('"A OR B"')).toBe('"A OR B"');
expect(normalizeQueryOperators('content:"\\bOR\\b"')).toBe('content:"\\bOR\\b"');
});
🤖 Prompt for AI Agents
In packages/web/src/features/search/searchApi.test.ts around lines 35 to 46, add
tests that assert normalizeQueryOperators does not modify quoted literals:
create one or more it blocks passing strings with quoted phrases containing
operator-like tokens (e.g. '"OR"', '"AND"', '"or"' or '"OR in a phrase"') and
expect the output to equal the original input; ensure both single and double
quoted cases are covered and that operators inside quotes remain unchanged.

Comment on lines +39 to +50
// First, normalize boolean operators to lowercase (Zoekt requirement)
// Zoekt only recognizes lowercase 'or' and 'and' operators, but users often expect
// uppercase OR/AND to work. This transformation allows both cases to work.
// Examples:
// - "(file:yarn.lock OR file:package.json)" → "(file:yarn.lock or file:package.json)"
// - "foo AND bar" → "foo and bar" (though AND is implicit in Zoekt)
let normalizedQuery = query
// Replace standalone uppercase OR with lowercase or
.replace(/\bOR\b/g, 'or')
// Replace standalone uppercase AND with lowercase and (though AND is implicit in Zoekt)
.replace(/\bAND\b/g, 'and');

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Avoid rewriting OR/AND inside quoted phrases; implement token-aware normalization.

The regex replaces OR/AND anywhere, including within quoted literals (e.g., "A OR B" or content:"\bOR\b"), which can change query semantics, especially under case-sensitive searches. Limit normalization to tokens outside quotes and only when surrounded by whitespace/parentheses.

Apply this diff to delegate to a safer helper:

-    let normalizedQuery = query
-        // Replace standalone uppercase OR with lowercase or
-        .replace(/\bOR\b/g, 'or')
-        // Replace standalone uppercase AND with lowercase and (though AND is implicit in Zoekt)
-        .replace(/\bAND\b/g, 'and');
+    const normalizedQuery = normalizeBoolOperators(query);

Add this helper (outside the shown range, e.g., above transformZoektQuery):

export const normalizeBoolOperators = (input: string): string => {
  let out = "";
  let i = 0;
  let inQuotes = false;

  const isBoundary = (ch: string | undefined) => !ch || /\s|\(|\)/.test(ch);

  while (i < input.length) {
    const ch = input[i];
    if (ch === '"') {
      inQuotes = !inQuotes;
      out += ch;
      i++;
      continue;
    }
    if (!inQuotes) {
      const two = input.slice(i, i + 2);
      const three = input.slice(i, i + 3);
      const prev = i > 0 ? input[i - 1] : undefined;
      const next2 = input[i + 2];
      const next3 = input[i + 3];

      if (two === "OR" && isBoundary(prev) && isBoundary(next2)) {
        out += "or";
        i += 2;
        continue;
      }
      if (three === "AND" && isBoundary(prev) && isBoundary(next3)) {
        out += "and";
        i += 3;
        continue;
      }
    }
    out += ch;
    i++;
  }
  return out;
};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good catch, good bot. @drew-u410 we will need to handle this case - a solution like the bot suggested looks good (as long as we pair it with some good UTs, thx for starting those 🤠).

Longer term, I'd like to move to using a parser generator like Lezer where we can formally define the grammar of Sourcebot's search query language, have it generate a syntax tree for us, and then translate that syntax tree into a zoekt compatible query. Lezer has the added benefit that it's what Codemirror uses, so we could get additional benefits like error squiggles & better syntax highlighting support for our search box.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Copy link
Copy Markdown
Contributor

@brendan-kellam brendan-kellam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx for the contribution! Some thinking still to be done w.r.t., handling operators enclosed in quotes, but overall supportive of this change

// Replace standalone uppercase OR with lowercase or
.replace(/\bOR\b/g, 'or')
// Replace standalone uppercase AND with lowercase and (though AND is implicit in Zoekt)
.replace(/\bAND\b/g, 'and');
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and isn't actually a boolean operator that is natively supported in zoekt and by extension Sourcebot (I see that Sourcegraph supports it).

I do see the value in having a explicit and boolean operator though, but maybe something we want to roll into a separate change.

Comment on lines +39 to +50
// First, normalize boolean operators to lowercase (Zoekt requirement)
// Zoekt only recognizes lowercase 'or' and 'and' operators, but users often expect
// uppercase OR/AND to work. This transformation allows both cases to work.
// Examples:
// - "(file:yarn.lock OR file:package.json)" → "(file:yarn.lock or file:package.json)"
// - "foo AND bar" → "foo and bar" (though AND is implicit in Zoekt)
let normalizedQuery = query
// Replace standalone uppercase OR with lowercase or
.replace(/\bOR\b/g, 'or')
// Replace standalone uppercase AND with lowercase and (though AND is implicit in Zoekt)
.replace(/\bAND\b/g, 'and');

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good catch, good bot. @drew-u410 we will need to handle this case - a solution like the bot suggested looks good (as long as we pair it with some good UTs, thx for starting those 🤠).

Longer term, I'd like to move to using a parser generator like Lezer where we can formally define the grammar of Sourcebot's search query language, have it generate a syntax tree for us, and then translate that syntax tree into a zoekt compatible query. Lezer has the added benefit that it's what Codemirror uses, so we could get additional benefits like error squiggles & better syntax highlighting support for our search box.

@drew-u410
Copy link
Copy Markdown
Contributor Author

Cool, will close for now- just wanted to start the discussion. If you want to drive it, feel free!

@drew-u410 drew-u410 closed this Sep 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants