Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion docs/docs/features/search/syntax-reference.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,15 @@ Queries consist of space-separated regular expressions. Wrapping expressions in
| `foo bar` | Match files with regex `/foo/` **and** `/bar/` |
| `"foo bar"` | Match files with regex `/foo bar/` |

Multiple expressions can be or'd together with `or`, negated with `-`, or grouped with `()`.
Multiple expressions can be or'd together with `or` (or `OR`), negated with `-`, or grouped with `()`. Both lowercase and uppercase boolean operators are supported.

| Example | Explanation |
| :--- | :--- |
| `foo or bar` | Match files with regex `/foo/` **or** `/bar/` |
| `foo OR bar` | Same as above - uppercase `OR` is also supported |
| `foo -bar` | Match files with regex `/foo/` but **not** `/bar/` |
| `foo (bar or baz)` | Match files with regex `/foo/` **and** either `/bar/` **or** `/baz/` |
| `(file:yarn.lock OR file:package.json)` | Match files named `yarn.lock` **or** `package.json` |

Expressions can be prefixed with certain keywords to modify search behavior. Some keywords can be negated using the `-` prefix.

Expand Down
58 changes: 58 additions & 0 deletions packages/web/src/features/search/searchApi.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
import { describe, it, expect } from 'vitest';

// Simple function to test the query transformation logic
// We'll extract just the normalization part to test it separately
const normalizeQueryOperators = (query: string): string => {
return query
// Replace standalone uppercase OR with lowercase or
.replace(/\bOR\b/g, 'or')
// Replace standalone uppercase AND with lowercase and (though AND is implicit in Zoekt)
.replace(/\bAND\b/g, 'and');
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and isn't actually a boolean operator that is natively supported in zoekt and by extension Sourcebot (I see that Sourcegraph supports it).

I do see the value in having a explicit and boolean operator though, but maybe something we want to roll into a separate change.

};

describe('Query transformation', () => {
describe('normalizeQueryOperators', () => {
it('should convert uppercase OR to lowercase or', () => {
expect(normalizeQueryOperators('file:yarn.lock OR file:package.json'))
.toBe('file:yarn.lock or file:package.json');
});

it('should convert uppercase AND to lowercase and', () => {
expect(normalizeQueryOperators('foo AND bar'))
.toBe('foo and bar');
});

it('should handle parenthesized expressions', () => {
expect(normalizeQueryOperators('(file:yarn.lock OR file:package.json)'))
.toBe('(file:yarn.lock or file:package.json)');
});

it('should handle complex queries with multiple operators', () => {
expect(normalizeQueryOperators('(file:*.json OR file:*.lock) AND content:react'))
.toBe('(file:*.json or file:*.lock) and content:react');
});

it('should not affect lowercase operators', () => {
expect(normalizeQueryOperators('file:yarn.lock or file:package.json'))
.toBe('file:yarn.lock or file:package.json');
});

it('should not affect OR/AND when part of other words', () => {
expect(normalizeQueryOperators('ORDER BY something'))
.toBe('ORDER BY something');

expect(normalizeQueryOperators('ANDROID app'))
.toBe('ANDROID app');
});
Comment on lines +35 to +46
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add tests to ensure quoted literals are not altered.

Covers the critical case the regex would break.

         it('should not affect OR/AND when part of other words', () => {
             expect(normalizeQueryOperators('ORDER BY something'))
                 .toBe('ORDER BY something');
             
             expect(normalizeQueryOperators('ANDROID app'))
                 .toBe('ANDROID app');
         });
 
+        it('should not alter quoted phrases containing OR/AND', () => {
+            expect(normalizeQueryOperators('"A OR B"')).toBe('"A OR B"');
+            expect(normalizeQueryOperators('content:"\\bOR\\b"')).toBe('content:"\\bOR\\b"');
+        });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
it('should not affect lowercase operators', () => {
expect(normalizeQueryOperators('file:yarn.lock or file:package.json'))
.toBe('file:yarn.lock or file:package.json');
});
it('should not affect OR/AND when part of other words', () => {
expect(normalizeQueryOperators('ORDER BY something'))
.toBe('ORDER BY something');
expect(normalizeQueryOperators('ANDROID app'))
.toBe('ANDROID app');
});
it('should not affect lowercase operators', () => {
expect(normalizeQueryOperators('file:yarn.lock or file:package.json'))
.toBe('file:yarn.lock or file:package.json');
});
it('should not affect OR/AND when part of other words', () => {
expect(normalizeQueryOperators('ORDER BY something'))
.toBe('ORDER BY something');
expect(normalizeQueryOperators('ANDROID app'))
.toBe('ANDROID app');
});
it('should not alter quoted phrases containing OR/AND', () => {
expect(normalizeQueryOperators('"A OR B"')).toBe('"A OR B"');
expect(normalizeQueryOperators('content:"\\bOR\\b"')).toBe('content:"\\bOR\\b"');
});
🤖 Prompt for AI Agents
In packages/web/src/features/search/searchApi.test.ts around lines 35 to 46, add
tests that assert normalizeQueryOperators does not modify quoted literals:
create one or more it blocks passing strings with quoted phrases containing
operator-like tokens (e.g. '"OR"', '"AND"', '"or"' or '"OR in a phrase"') and
expect the output to equal the original input; ensure both single and double
quoted cases are covered and that operators inside quotes remain unchanged.


it('should handle mixed case queries', () => {
expect(normalizeQueryOperators('file:src OR file:test and lang:typescript'))
.toBe('file:src or file:test and lang:typescript');
});

it('should handle multiple ORs and ANDs', () => {
expect(normalizeQueryOperators('A OR B OR C AND D AND E'))
.toBe('A or B or C and D and E');
});
});
});
14 changes: 13 additions & 1 deletion packages/web/src/features/search/searchApi.ts
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,19 @@ enum zoektPrefixes {
}

const transformZoektQuery = async (query: string, orgId: number): Promise<string | ServiceError> => {
const prevQueryParts = query.split(" ");
// First, normalize boolean operators to lowercase (Zoekt requirement)
// Zoekt only recognizes lowercase 'or' and 'and' operators, but users often expect
// uppercase OR/AND to work. This transformation allows both cases to work.
// Examples:
// - "(file:yarn.lock OR file:package.json)" → "(file:yarn.lock or file:package.json)"
// - "foo AND bar" → "foo and bar" (though AND is implicit in Zoekt)
let normalizedQuery = query
// Replace standalone uppercase OR with lowercase or
.replace(/\bOR\b/g, 'or')
// Replace standalone uppercase AND with lowercase and (though AND is implicit in Zoekt)
.replace(/\bAND\b/g, 'and');

Comment on lines +39 to +50
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Avoid rewriting OR/AND inside quoted phrases; implement token-aware normalization.

The regex replaces OR/AND anywhere, including within quoted literals (e.g., "A OR B" or content:"\bOR\b"), which can change query semantics, especially under case-sensitive searches. Limit normalization to tokens outside quotes and only when surrounded by whitespace/parentheses.

Apply this diff to delegate to a safer helper:

-    let normalizedQuery = query
-        // Replace standalone uppercase OR with lowercase or
-        .replace(/\bOR\b/g, 'or')
-        // Replace standalone uppercase AND with lowercase and (though AND is implicit in Zoekt)
-        .replace(/\bAND\b/g, 'and');
+    const normalizedQuery = normalizeBoolOperators(query);

Add this helper (outside the shown range, e.g., above transformZoektQuery):

export const normalizeBoolOperators = (input: string): string => {
  let out = "";
  let i = 0;
  let inQuotes = false;

  const isBoundary = (ch: string | undefined) => !ch || /\s|\(|\)/.test(ch);

  while (i < input.length) {
    const ch = input[i];
    if (ch === '"') {
      inQuotes = !inQuotes;
      out += ch;
      i++;
      continue;
    }
    if (!inQuotes) {
      const two = input.slice(i, i + 2);
      const three = input.slice(i, i + 3);
      const prev = i > 0 ? input[i - 1] : undefined;
      const next2 = input[i + 2];
      const next3 = input[i + 3];

      if (two === "OR" && isBoundary(prev) && isBoundary(next2)) {
        out += "or";
        i += 2;
        continue;
      }
      if (three === "AND" && isBoundary(prev) && isBoundary(next3)) {
        out += "and";
        i += 3;
        continue;
      }
    }
    out += ch;
    i++;
  }
  return out;
};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good catch, good bot. @drew-u410 we will need to handle this case - a solution like the bot suggested looks good (as long as we pair it with some good UTs, thx for starting those 🤠).

Longer term, I'd like to move to using a parser generator like Lezer where we can formally define the grammar of Sourcebot's search query language, have it generate a syntax tree for us, and then translate that syntax tree into a zoekt compatible query. Lezer has the added benefit that it's what Codemirror uses, so we could get additional benefits like error squiggles & better syntax highlighting support for our search box.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

const prevQueryParts = normalizedQuery.split(" ");
const newQueryParts = [];

for (const part of prevQueryParts) {
Expand Down
Loading