[search] support uppercase OR/AND#500
Conversation
WalkthroughAdded query normalization to convert uppercase boolean operators (OR, AND) to lowercase in transformZoektQuery. Introduced unit tests validating normalization across varied cases and parentheses. Updated documentation to state boolean operators are case-insensitive and added examples, including prefixed expressions with uppercase OR. Changes
Sequence Diagram(s)sequenceDiagram
participant UI as UI
participant SearchAPI as transformZoektQuery
participant Normalizer as OperatorNormalizer
participant Transformer as QueryTransformer
participant Backend as Search Backend
UI->>SearchAPI: queryString
activate SearchAPI
SearchAPI->>Normalizer: normalize(queryString)
activate Normalizer
Note over Normalizer: Replace standalone OR/AND<br/>with or/and
Normalizer-->>SearchAPI: normalizedQuery
deactivate Normalizer
SearchAPI->>Transformer: transform(normalizedQuery)
activate Transformer
Note over Transformer: Existing mapping (rev:, context:, etc.)
Transformer-->>SearchAPI: transformedQuery
deactivate Transformer
SearchAPI->>Backend: transformedQuery
Backend-->>SearchAPI: results
SearchAPI-->>UI: results
deactivate SearchAPI
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Poem
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
3793ff9 to
8cc896d
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/docs/features/search/syntax-reference.mdx (1)
5-5: Fix grammar: “enabled” → “enables”.User-facing docs typo.
-Sourcebot uses a powerful regex-based query language that enabled precise code search within large codebases. +Sourcebot uses a powerful regex-based query language that enables precise code search within large codebases.
🧹 Nitpick comments (5)
packages/web/src/features/search/searchApi.ts (2)
39-50: Confirm intended scope: mixed-case operators (e.g., 'Or', 'AnD') are not normalized.If the docs promise “case-insensitive operators,” extend normalization to any case; otherwise, clarify docs to “lowercase and uppercase only.”
Example tweak inside the helper if full case-insensitivity is desired:
// replace detection with: const lowerTwo = input.slice(i, i + 2).toLowerCase(); const lowerThree = input.slice(i, i + 3).toLowerCase(); if (lowerTwo === "or" && isBoundary(prev) && isBoundary(next2)) { out += "or"; i += 2; continue; } if (lowerThree === "and" && isBoundary(prev) && isBoundary(next3)) { out += "and"; i += 3; continue; }
51-51: Tokenization is fragile with split(" "); consider resilient whitespace handling.split(" ") drops tabs/multiple spaces into empty tokens. If you keep this approach, prefer split(/\s+/). If you later parse quotes/parentheses, a proper tokenizer would be safer.
Apply:
- const prevQueryParts = normalizedQuery.split(" "); + const prevQueryParts = normalizedQuery.trim().split(/\s+/);docs/docs/features/search/syntax-reference.mdx (1)
18-18: Clarify AND semantics.Mention that AND is implicit; uppercase AND is accepted but unnecessary.
-Multiple expressions can be or'd together with `or` (or `OR`), negated with `-`, or grouped with `()`. Both lowercase and uppercase boolean operators are supported. +Multiple expressions can be or'd together with `or` (or `OR`), negated with `-`, or grouped with `()`. `and` (and `AND`) is supported but implicit between terms.packages/web/src/features/search/searchApi.test.ts (2)
3-11: Avoid duplicating production logic in tests; import the helper instead.Export the normalization helper from searchApi and import it here to prevent drift.
Apply:
-import { describe, it, expect } from 'vitest'; +import { describe, it, expect } from 'vitest'; +import { normalizeBoolOperators } from './searchApi'; @@ -// Simple function to test the query transformation logic -// We'll extract just the normalization part to test it separately -const normalizeQueryOperators = (query: string): string => { - return query - // Replace standalone uppercase OR with lowercase or - .replace(/\bOR\b/g, 'or') - // Replace standalone uppercase AND with lowercase and (though AND is implicit in Zoekt) - .replace(/\bAND\b/g, 'and'); -}; +// Using the actual implementation under test +const normalizeQueryOperators = normalizeBoolOperators;
48-56: Decide on mixed-case operators and test accordingly.If “case-insensitive” is intended, add tests for
Or/AnD; otherwise, add a test asserting they are left unchanged and reflect that in docs.Example test to add:
+ it('should (not) convert mixed-case operators depending on policy', () => { + // If supporting case-insensitive: + // expect(normalizeQueryOperators('A Or B AnD C')).toBe('A or B and C'); + // Else: + expect(normalizeQueryOperators('A Or B AnD C')).toBe('A Or B AnD C'); + });
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
docs/docs/features/search/syntax-reference.mdx(1 hunks)packages/web/src/features/search/searchApi.test.ts(1 hunks)packages/web/src/features/search/searchApi.ts(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
| it('should not affect lowercase operators', () => { | ||
| expect(normalizeQueryOperators('file:yarn.lock or file:package.json')) | ||
| .toBe('file:yarn.lock or file:package.json'); | ||
| }); | ||
|
|
||
| it('should not affect OR/AND when part of other words', () => { | ||
| expect(normalizeQueryOperators('ORDER BY something')) | ||
| .toBe('ORDER BY something'); | ||
|
|
||
| expect(normalizeQueryOperators('ANDROID app')) | ||
| .toBe('ANDROID app'); | ||
| }); |
There was a problem hiding this comment.
Add tests to ensure quoted literals are not altered.
Covers the critical case the regex would break.
it('should not affect OR/AND when part of other words', () => {
expect(normalizeQueryOperators('ORDER BY something'))
.toBe('ORDER BY something');
expect(normalizeQueryOperators('ANDROID app'))
.toBe('ANDROID app');
});
+ it('should not alter quoted phrases containing OR/AND', () => {
+ expect(normalizeQueryOperators('"A OR B"')).toBe('"A OR B"');
+ expect(normalizeQueryOperators('content:"\\bOR\\b"')).toBe('content:"\\bOR\\b"');
+ });📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| it('should not affect lowercase operators', () => { | |
| expect(normalizeQueryOperators('file:yarn.lock or file:package.json')) | |
| .toBe('file:yarn.lock or file:package.json'); | |
| }); | |
| it('should not affect OR/AND when part of other words', () => { | |
| expect(normalizeQueryOperators('ORDER BY something')) | |
| .toBe('ORDER BY something'); | |
| expect(normalizeQueryOperators('ANDROID app')) | |
| .toBe('ANDROID app'); | |
| }); | |
| it('should not affect lowercase operators', () => { | |
| expect(normalizeQueryOperators('file:yarn.lock or file:package.json')) | |
| .toBe('file:yarn.lock or file:package.json'); | |
| }); | |
| it('should not affect OR/AND when part of other words', () => { | |
| expect(normalizeQueryOperators('ORDER BY something')) | |
| .toBe('ORDER BY something'); | |
| expect(normalizeQueryOperators('ANDROID app')) | |
| .toBe('ANDROID app'); | |
| }); | |
| it('should not alter quoted phrases containing OR/AND', () => { | |
| expect(normalizeQueryOperators('"A OR B"')).toBe('"A OR B"'); | |
| expect(normalizeQueryOperators('content:"\\bOR\\b"')).toBe('content:"\\bOR\\b"'); | |
| }); |
🤖 Prompt for AI Agents
In packages/web/src/features/search/searchApi.test.ts around lines 35 to 46, add
tests that assert normalizeQueryOperators does not modify quoted literals:
create one or more it blocks passing strings with quoted phrases containing
operator-like tokens (e.g. '"OR"', '"AND"', '"or"' or '"OR in a phrase"') and
expect the output to equal the original input; ensure both single and double
quoted cases are covered and that operators inside quotes remain unchanged.
| // First, normalize boolean operators to lowercase (Zoekt requirement) | ||
| // Zoekt only recognizes lowercase 'or' and 'and' operators, but users often expect | ||
| // uppercase OR/AND to work. This transformation allows both cases to work. | ||
| // Examples: | ||
| // - "(file:yarn.lock OR file:package.json)" → "(file:yarn.lock or file:package.json)" | ||
| // - "foo AND bar" → "foo and bar" (though AND is implicit in Zoekt) | ||
| let normalizedQuery = query | ||
| // Replace standalone uppercase OR with lowercase or | ||
| .replace(/\bOR\b/g, 'or') | ||
| // Replace standalone uppercase AND with lowercase and (though AND is implicit in Zoekt) | ||
| .replace(/\bAND\b/g, 'and'); | ||
|
|
There was a problem hiding this comment.
Avoid rewriting OR/AND inside quoted phrases; implement token-aware normalization.
The regex replaces OR/AND anywhere, including within quoted literals (e.g., "A OR B" or content:"\bOR\b"), which can change query semantics, especially under case-sensitive searches. Limit normalization to tokens outside quotes and only when surrounded by whitespace/parentheses.
Apply this diff to delegate to a safer helper:
- let normalizedQuery = query
- // Replace standalone uppercase OR with lowercase or
- .replace(/\bOR\b/g, 'or')
- // Replace standalone uppercase AND with lowercase and (though AND is implicit in Zoekt)
- .replace(/\bAND\b/g, 'and');
+ const normalizedQuery = normalizeBoolOperators(query);Add this helper (outside the shown range, e.g., above transformZoektQuery):
export const normalizeBoolOperators = (input: string): string => {
let out = "";
let i = 0;
let inQuotes = false;
const isBoundary = (ch: string | undefined) => !ch || /\s|\(|\)/.test(ch);
while (i < input.length) {
const ch = input[i];
if (ch === '"') {
inQuotes = !inQuotes;
out += ch;
i++;
continue;
}
if (!inQuotes) {
const two = input.slice(i, i + 2);
const three = input.slice(i, i + 3);
const prev = i > 0 ? input[i - 1] : undefined;
const next2 = input[i + 2];
const next3 = input[i + 3];
if (two === "OR" && isBoundary(prev) && isBoundary(next2)) {
out += "or";
i += 2;
continue;
}
if (three === "AND" && isBoundary(prev) && isBoundary(next3)) {
out += "and";
i += 3;
continue;
}
}
out += ch;
i++;
}
return out;
};There was a problem hiding this comment.
This is a good catch, good bot. @drew-u410 we will need to handle this case - a solution like the bot suggested looks good (as long as we pair it with some good UTs, thx for starting those 🤠).
Longer term, I'd like to move to using a parser generator like Lezer where we can formally define the grammar of Sourcebot's search query language, have it generate a syntax tree for us, and then translate that syntax tree into a zoekt compatible query. Lezer has the added benefit that it's what Codemirror uses, so we could get additional benefits like error squiggles & better syntax highlighting support for our search box.
There was a problem hiding this comment.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
brendan-kellam
left a comment
There was a problem hiding this comment.
thx for the contribution! Some thinking still to be done w.r.t., handling operators enclosed in quotes, but overall supportive of this change
| // Replace standalone uppercase OR with lowercase or | ||
| .replace(/\bOR\b/g, 'or') | ||
| // Replace standalone uppercase AND with lowercase and (though AND is implicit in Zoekt) | ||
| .replace(/\bAND\b/g, 'and'); |
There was a problem hiding this comment.
and isn't actually a boolean operator that is natively supported in zoekt and by extension Sourcebot (I see that Sourcegraph supports it).
I do see the value in having a explicit and boolean operator though, but maybe something we want to roll into a separate change.
| // First, normalize boolean operators to lowercase (Zoekt requirement) | ||
| // Zoekt only recognizes lowercase 'or' and 'and' operators, but users often expect | ||
| // uppercase OR/AND to work. This transformation allows both cases to work. | ||
| // Examples: | ||
| // - "(file:yarn.lock OR file:package.json)" → "(file:yarn.lock or file:package.json)" | ||
| // - "foo AND bar" → "foo and bar" (though AND is implicit in Zoekt) | ||
| let normalizedQuery = query | ||
| // Replace standalone uppercase OR with lowercase or | ||
| .replace(/\bOR\b/g, 'or') | ||
| // Replace standalone uppercase AND with lowercase and (though AND is implicit in Zoekt) | ||
| .replace(/\bAND\b/g, 'and'); | ||
|
|
There was a problem hiding this comment.
This is a good catch, good bot. @drew-u410 we will need to handle this case - a solution like the bot suggested looks good (as long as we pair it with some good UTs, thx for starting those 🤠).
Longer term, I'd like to move to using a parser generator like Lezer where we can formally define the grammar of Sourcebot's search query language, have it generate a syntax tree for us, and then translate that syntax tree into a zoekt compatible query. Lezer has the added benefit that it's what Codemirror uses, so we could get additional benefits like error squiggles & better syntax highlighting support for our search box.
|
Cool, will close for now- just wanted to start the discussion. If you want to drive it, feel free! |
ANDandORin search syntax.Summary by CodeRabbit