Skip to content

Commit 8720ee1

Browse files
Merge branch 'main' into cursor/SOU-762-search-assistant-repo-filter-4b36
2 parents 56f9330 + 2fa86ff commit 8720ee1

File tree

21 files changed

+351
-29
lines changed

21 files changed

+351
-29
lines changed

CHANGELOG.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,17 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1010
### Fixed
1111
- Fixed AI Search Assist incorrectly using the `repo:` filter when searching for content within files. [#1045](https://github.com/sourcebot-dev/sourcebot/pull/1045)
1212

13+
## [4.16.3] - 2026-03-27
14+
15+
### Added
16+
- Added support for `.gitattributes` `linguist-language` overrides in the file viewer ([#1048](https://github.com/sourcebot-dev/sourcebot/pull/1048))
17+
- Added Basic language syntax highlighting in the file viewer ([#1054](https://github.com/sourcebot-dev/sourcebot/pull/1054))
18+
19+
### Fixed
20+
- Fixed Ask GitHub landing page chat box placement to be centered on the page instead of at the bottom. [#1046](https://github.com/sourcebot-dev/sourcebot/pull/1046)
21+
- Fixed issue where local git connections (`file://`) would fail when matching a file instead of a directory. [#1049](https://github.com/sourcebot-dev/sourcebot/pull/1049)
22+
- Fixed regex queries containing parentheses (e.g. `(test|render)<`) being incorrectly split into multiple search terms instead of treated as a single regex pattern. [#1050](https://github.com/sourcebot-dev/sourcebot/pull/1050)
23+
1324
## [4.16.2] - 2026-03-25
1425

1526
### Fixed

docs/api-reference/sourcebot-public.openapi.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
"openapi": "3.0.3",
33
"info": {
44
"title": "Sourcebot Public API",
5-
"version": "v4.16.2",
5+
"version": "v4.16.3",
66
"description": "OpenAPI description for the public Sourcebot REST endpoints used for search, repository listing, and file browsing."
77
},
88
"tags": [

docs/docs/configuration/idp.mdx

Lines changed: 48 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -418,12 +418,16 @@ A Keycloak connection can be used for [authentication](/docs/configuration/auth)
418418
</Steps>
419419
</Accordion>
420420

421-
### Microsoft Entra ID
421+
### Microsoft Entra ID (Azure AD)
422422

423423
[Auth.js Microsoft Entra ID Provider Docs](https://authjs.dev/getting-started/providers/microsoft-entra-id)
424424

425425
A Microsoft Entra ID connection can be used for [authentication](/docs/configuration/auth).
426426

427+
<Info>
428+
Microsoft renamed Azure Active Directory (Azure AD) to Microsoft Entra ID in 2023. If you have an existing Azure AD setup, these instructions will work for you. The underlying authentication infrastructure is the same.
429+
</Info>
430+
427431
<Accordion title="instructions">
428432
<Steps>
429433
<Step title="Register an OAuth Application">
@@ -570,4 +574,47 @@ A JumpCloud connection can be used for [authentication](/docs/configuration/auth
570574
</Steps>
571575
</Accordion>
572576

577+
### Google Cloud IAP
578+
579+
[Google Cloud IAP Documentation](https://cloud.google.com/iap/docs)
580+
581+
Google Cloud Identity-Aware Proxy (IAP) can be used for [authentication](/docs/configuration/auth). IAP provides a layer of security for applications deployed on Google Cloud, allowing you to control access based on user identity and context.
582+
583+
<Info>
584+
GCP IAP works differently from other identity providers. Instead of redirecting users to an OAuth flow, IAP intercepts requests at the infrastructure level and adds a signed JWT header that Sourcebot validates. This means users are automatically authenticated when accessing Sourcebot through an IAP-protected endpoint.
585+
</Info>
586+
587+
<Accordion title="instructions">
588+
<Steps>
589+
<Step title="Enable IAP for your application">
590+
Your Sourcebot deployment must be behind Google Cloud IAP. Follow [this guide](https://cloud.google.com/iap/docs/enabling-on-premises-howto) by Google to enable IAP for your application.
591+
592+
After enabling IAP, note the **Signed Header JWT Audience**. You can find this in the Google Cloud Console under **Security → Identity-Aware Proxy → (your application) → Edit OAuth Client → Application settings**.
593+
594+
The audience will be in the format: `/projects/<project-number>/global/backendServices/<service-id>` or `/projects/<project-number>/apps/<project-id>`.
595+
</Step>
596+
<Step title="Define environment variables">
597+
Set the IAP audience as an environment variable. This can be named whatever you like (ex. `GCP_IAP_AUDIENCE`).
598+
</Step>
599+
<Step title="Define the identity provider config">
600+
Create a `identityProvider` object in the [config file](/docs/configuration/config-file) with the following fields:
601+
602+
```json wrap icon="code"
603+
{
604+
"$schema": "https://raw.githubusercontent.com/sourcebot-dev/sourcebot/main/schemas/v3/index.json",
605+
"identityProviders": [
606+
{
607+
"provider": "gcp-iap",
608+
"purpose": "sso",
609+
"audience": {
610+
"env": "GCP_IAP_AUDIENCE"
611+
}
612+
}
613+
]
614+
}
615+
```
616+
</Step>
617+
</Steps>
618+
</Accordion>
619+
573620

packages/backend/src/repoCompileUtils.test.ts

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,17 +14,29 @@ vi.mock('glob', () => ({
1414
glob: vi.fn(),
1515
}));
1616

17+
// Mock fs/promises so tests don't touch the real filesystem.
18+
// By default, stat resolves as a directory; individual tests can override this.
19+
vi.mock('fs/promises', () => ({
20+
default: {
21+
stat: vi.fn().mockResolvedValue({ isDirectory: () => true }),
22+
},
23+
}));
24+
1725
import { isPathAValidGitRepoRoot, getOriginUrl, isUrlAValidGitRepo } from './git.js';
1826
import { glob } from 'glob';
27+
import fs from 'fs/promises';
1928

2029
const mockedGlob = vi.mocked(glob);
2130
const mockedIsPathAValidGitRepoRoot = vi.mocked(isPathAValidGitRepoRoot);
2231
const mockedGetOriginUrl = vi.mocked(getOriginUrl);
2332
const mockedIsUrlAValidGitRepo = vi.mocked(isUrlAValidGitRepo);
33+
const mockedFsStat = vi.mocked(fs.stat);
2434

2535
describe('compileGenericGitHostConfig_file', () => {
2636
beforeEach(() => {
2737
vi.clearAllMocks();
38+
// Default: all paths exist and are directories. Override per-test as needed.
39+
mockedFsStat.mockResolvedValue({ isDirectory: () => true } as any);
2840
});
2941

3042
afterEach(() => {
@@ -47,6 +59,22 @@ describe('compileGenericGitHostConfig_file', () => {
4759
expect(result.warnings[0]).toContain('/path/to/nonexistent/repo');
4860
});
4961

62+
test('should return warning when path is a file, not a directory', async () => {
63+
mockedGlob.mockResolvedValue(['/path/to/a-file.txt']);
64+
mockedFsStat.mockResolvedValue({ isDirectory: () => false } as any);
65+
66+
const config = {
67+
type: 'git' as const,
68+
url: 'file:///path/to/a-file.txt',
69+
};
70+
71+
const result = await compileGenericGitHostConfig_file(config, 1);
72+
73+
expect(result.repoData).toHaveLength(0);
74+
expect(result.warnings.length).toBeGreaterThanOrEqual(1);
75+
expect(result.warnings.some(w => w.includes('not a directory'))).toBe(true);
76+
});
77+
5078
test('should return warning when path is not a valid git repo', async () => {
5179
mockedGlob.mockResolvedValue(['/path/to/not-a-repo']);
5280
mockedIsPathAValidGitRepoRoot.mockResolvedValue(false);

packages/backend/src/repoCompileUtils.ts

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ import { createLogger } from '@sourcebot/shared';
1414
import { BitbucketConnectionConfig, GerritConnectionConfig, GiteaConnectionConfig, GitlabConnectionConfig, GenericGitHostConnectionConfig, AzureDevOpsConnectionConfig } from '@sourcebot/schemas/v3/connection.type';
1515
import { ProjectVisibility } from "azure-devops-node-api/interfaces/CoreInterfaces.js";
1616
import path from 'path';
17+
import fs from 'fs/promises';
1718
import { glob } from 'glob';
1819
import { getLocalDefaultBranch, getOriginUrl, isPathAValidGitRepoRoot, isUrlAValidGitRepo } from './git.js';
1920
import assert from 'assert';
@@ -611,6 +612,14 @@ export const compileGenericGitHostConfig_file = async (
611612
logger.info(`Found ${repoPaths.length} path(s) matching pattern '${configUrl.pathname}'`);
612613

613614
await Promise.all(repoPaths.map((repoPath) => gitOperationLimit(async () => {
615+
const stat = await fs.stat(repoPath).catch(() => null);
616+
if (!stat || !stat.isDirectory()) {
617+
const warning = `Skipping ${repoPath} - path is not a directory.`;
618+
logger.warn(warning);
619+
warnings.push(warning);
620+
return;
621+
}
622+
614623
const isGitRepo = await isPathAValidGitRepoRoot({
615624
path: repoPath,
616625
});

packages/queryLanguage/src/parser.terms.ts

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,4 +23,5 @@ export const
2323
RepoSetExpr = 16,
2424
ParenExpr = 17,
2525
QuotedTerm = 18,
26-
Term = 19
26+
Term = 19,
27+
Dialect_regex = 0

packages/queryLanguage/src/parser.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ export const parser = LRParser.deserialize({
1313
tokenData: "/U~R_XY!QYZ!Qpq!Qrs!`#T#U$S#V#W%i#Y#Z'R#`#a(_#b#c(|#c#d)X#d#e)p#f#g+]#g#h,w#j#k-`#m#n.s~!VRm~XY!QYZ!Qpq!Q~!cWOY!`Zr!`rs!{s#O!`#O#P#Q#P;'S!`;'S;=`#|<%lO!`~#QOw~~#TRO;'S!`;'S;=`#^;=`O!`~#aXOY!`Zr!`rs!{s#O!`#O#P#Q#P;'S!`;'S;=`#|;=`<%l!`<%lO!`~$PP;=`<%l!`~$VQ#b#c$]#f#g$h~$`P#m#n$c~$hO!R~~$kP#V#W$n~$qP#[#]$t~$wP#]#^$z~$}P#j#k%Q~%TP#X#Y%W~%ZP#W#X%^~%aP![!]%d~%iOq~~%lQ![!]%r#c#d%w~%wOx~~%zP#b#c%}~&QP#h#i&T~&WP#X#Y&Z~&^Q#b#c&d#l#m&p~&gP#h#i&j~&mP![!]%r~&sP#h#i&v~&yP![!]&|~'ROy~~'UR![!]'_#]#^'d#c#d'v~'dOz~~'gP#`#a'j~'mP#X#Y'p~'sP![!]'_~'yP#f#g'|~(PP#_#`(S~(VP![!](Y~(_O{~~(bP#T#U(e~(hP#b#c(k~(nP#Z#[(q~(tP![!](w~(|O!T~~)PP#c#d)S~)XOs~~)[P#b#c)_~)bP#`#a)e~)hP#m#n)k~)pOt~~)sQ#f#g)y#i#j*n~)|P#]#^*P~*SP#j#k*V~*YP#T#U*]~*`P#h#i*c~*fP#X#Y*i~*nO!Q~~*qP#U#V*t~*wP#`#a*z~*}P#]#^+Q~+TP#V#W+W~+]O!P~~+`Q![!]+f#X#Y+k~+kO!S~~+nQ#d#e+t#j#k,l~+wP#c#d+z~+}Q![!]+f#g#h,T~,WP#X#Y,Z~,^P#h#i,a~,dP![!],g~,lO!V~~,oP![!],r~,wOu~~,zP#m#n,}~-QP#a#b-T~-WP![!]-Z~-`O!U~~-cP#]#^-f~-iP#g#h-l~-oP#]#^-r~-uP#U#V-x~-{P#]#^.O~.RP#`#a.U~.XP#]#^.[~._P#h#i.b~.eP#m#n.h~.kP![!].n~.sO}~~.vP#X#Y.y~.|P#g#h/P~/UOr~",
1414
tokenizers: [negateToken, parenToken, wordToken, closeParenToken, orToken, 0],
1515
topRules: {"Program":[0,1]},
16+
dialects: {regex: 0},
1617
tokenPrec: 200,
1718
termNames: {"0":"⚠","1":"@top","2":"OrExpr","3":"AndExpr","4":"NegateExpr","5":"PrefixExpr","6":"ArchivedExpr","7":"RevisionExpr","8":"ContentExpr","9":"ContextExpr","10":"FileExpr","11":"ForkExpr","12":"VisibilityExpr","13":"RepoExpr","14":"LangExpr","15":"SymExpr","16":"RepoSetExpr","17":"ParenExpr","18":"QuotedTerm","19":"Term","20":"expr+","21":"(or andExpr)+","22":"␄","23":"negate","24":"openParen","25":"word","26":"closeParen","27":"or","28":"%mainskip","29":"space","30":"query","31":"andExpr","32":"expr","33":"archivedKw","34":"\"yes\"","35":"\"no\"","36":"\"only\"","37":"revisionKw","38":"value","39":"quotedString","40":"contentKw","41":"contextKw","42":"fileKw","43":"forkKw","44":"forkValue","45":"visibilityKw","46":"visibilityValue","47":"\"public\"","48":"\"private\"","49":"\"any\"","50":"repoKw","51":"langKw","52":"symKw","53":"reposetKw"}
1819
})

packages/queryLanguage/src/query.grammar

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@
44
@external tokens closeParenToken from "./tokens" { closeParen }
55
@external tokens orToken from "./tokens" { or }
66

7+
@dialects { regex }
8+
79
@top Program { query }
810

911
@precedence {

packages/queryLanguage/src/tokens.ts

Lines changed: 41 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
import { ExternalTokenizer, InputStream, Stack } from "@lezer/lr";
2-
import { negate, openParen, closeParen, word, or, ParenExpr } from "./parser.terms";
2+
import { negate, openParen, closeParen, word, or, Dialect_regex } from "./parser.terms";
33

44
// Character codes
55
const SPACE = 32;
@@ -243,9 +243,14 @@ function isInsideParenExpr(input: InputStream, stack: Stack): boolean {
243243
* This allows words like "(pr" or "func(arg)" to be parsed as single terms
244244
* while "(foo bar)" is parsed as a ParenExpr.
245245
*/
246-
export const parenToken = new ExternalTokenizer((input) => {
246+
export const parenToken = new ExternalTokenizer((input, stack) => {
247247
if (input.next !== OPEN_PAREN) return;
248-
248+
249+
// In regex mode, parens are just word characters — don't emit openParen
250+
if (stack.dialectEnabled(Dialect_regex)) {
251+
return;
252+
}
253+
249254
if (hasBalancedParensAt(input, 0)) {
250255
// Found balanced parens - emit openParen (just the '(')
251256
input.advance();
@@ -263,6 +268,11 @@ export const parenToken = new ExternalTokenizer((input) => {
263268
export const closeParenToken = new ExternalTokenizer((input, stack) => {
264269
if (input.next !== CLOSE_PAREN) return;
265270

271+
// In regex mode, parens are just word characters — don't emit closeParen
272+
if (stack.dialectEnabled(Dialect_regex)) {
273+
return;
274+
}
275+
266276
// Check if we should emit closeParen (when inside a ParenExpr)
267277
if (isInsideParenExpr(input, stack)) {
268278
input.advance();
@@ -312,7 +322,20 @@ export const wordToken = new ExternalTokenizer((input, stack) => {
312322
if (startsWithPrefix(input)) {
313323
return;
314324
}
315-
325+
326+
// In regex mode: consume all non-whitespace characters as a single word.
327+
// Parens and | are valid regex metacharacters, not query syntax in this mode.
328+
if (stack.dialectEnabled(Dialect_regex)) {
329+
const startPos = input.pos;
330+
while (input.next !== EOF && !isWhitespace(input.next)) {
331+
input.advance();
332+
}
333+
if (input.pos > startPos) {
334+
input.acceptToken(word);
335+
}
336+
return;
337+
}
338+
316339
// If starts with '(' and has balanced parens, determine whether this is a
317340
// regex alternation value (e.g. file:(test|spec)) or a ParenExpr grouping.
318341
// We're in a value context when the immediately preceding non-whitespace char
@@ -419,24 +442,28 @@ export const orToken = new ExternalTokenizer((input) => {
419442
* External tokenizer for negation.
420443
* Only tokenizes `-` as negate when followed by a prefix keyword or balanced `(`.
421444
*/
422-
export const negateToken = new ExternalTokenizer((input) => {
445+
export const negateToken = new ExternalTokenizer((input, stack) => {
423446
if (input.next !== DASH) return;
424-
447+
425448
// Look ahead using peek to see what follows the dash (skipping whitespace)
426449
let offset = 1;
427450
while (isWhitespace(input.peek(offset))) {
428451
offset++;
429452
}
430-
453+
431454
const chAfterDash = input.peek(offset);
432-
433-
// Check if followed by opening paren that starts a balanced ParenExpr
434-
if (chAfterDash === OPEN_PAREN && hasBalancedParensAt(input, offset)) {
435-
input.advance();
436-
input.acceptToken(negate);
437-
return;
455+
456+
// In normal mode: also check for balanced paren (negated group e.g. -(foo bar))
457+
// In regex mode: skip this — parens are not query grouping operators, so emitting
458+
// negate before a '(' would leave the parser without a matching ParenExpr to parse.
459+
if (!stack.dialectEnabled(Dialect_regex)) {
460+
if (chAfterDash === OPEN_PAREN && hasBalancedParensAt(input, offset)) {
461+
input.advance();
462+
input.acceptToken(negate);
463+
return;
464+
}
438465
}
439-
466+
440467
// Check if followed by a prefix keyword (by checking for keyword followed by colon)
441468
let foundColon = false;
442469
let peekOffset = offset;
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
import { parser as _parser } from "../src/parser";
2+
import { fileTests } from "@lezer/generator/dist/test";
3+
import { describe, it } from "vitest";
4+
import { fileURLToPath } from "url";
5+
import * as fs from "fs";
6+
import * as path from "path";
7+
8+
const regexParser = _parser.configure({ dialect: "regex" });
9+
const caseDir = path.dirname(fileURLToPath(import.meta.url));
10+
11+
describe("regex", () => {
12+
for (const { name, run } of fileTests(fs.readFileSync(path.join(caseDir, "regex.txt"), "utf8"), "regex.txt")) {
13+
it(name, () => run(regexParser));
14+
}
15+
});

0 commit comments

Comments
 (0)