Thank you for your interest in contributing to CodeRAG! This guide covers everything you need to know: from setting up your development environment to submitting a pull request.
- Getting Started
- Development Setup
- Project Structure
- Coding Conventions
- Error Handling
- Testing
- Branch and Commit Conventions
- Pull Request Process
- Adding New Providers
- Reporting Issues
- License
| Tool | Version | Required | Notes |
|---|---|---|---|
| Node.js | >= 20 | Yes | LTS recommended |
| pnpm | >= 9 | Yes | corepack enable && corepack prepare pnpm@latest --activate |
| Ollama | latest | No | For local embedding and NL enrichment |
| Git | >= 2.30 | Yes | For version control and file watcher features |
If you are an external contributor, start by forking the repository:
- Fork the CodeRAG repository on Azure DevOps (or GitHub mirror if available)
- Clone your fork locally:
git clone https://dev.azure.com/<your-org>/CodeRAG/_git/CodeRAG cd CodeRAG
- Add the upstream remote so you can keep your fork in sync:
git remote add upstream https://dev.azure.com/momc-pl/CodeRAG/_git/CodeRAG
- Create a feature branch for your changes:
git checkout -b feature/AB#XXXX-short-description
- After implementing and testing your changes, push to your fork and submit a Pull Request against the upstream
mainbranch.
The fastest way to get started is to use the bootstrap script:
git clone https://dev.azure.com/momc-pl/CodeRAG/_git/CodeRAG
cd CodeRAG
./scripts/bootstrap.shThis will check prerequisites, install dependencies, build all packages, and run the test suite.
If you prefer to set things up manually:
# 1. Clone the repository
git clone https://dev.azure.com/momc-pl/CodeRAG/_git/CodeRAG
cd CodeRAG
# 2. Install dependencies
pnpm install
# 3. Build all packages
pnpm build
# 4. Run tests
pnpm test
# 5. (Optional) Set up Ollama for local AI features
ollama pull nomic-embed-text
ollama pull qwen2.5-coderCodeRAG uses pnpm workspaces. Always use pnpm (not npm or yarn) for package operations:
# Install all workspace dependencies
pnpm install
# Build all packages (respects dependency order)
pnpm build
# Run all tests across all packages
pnpm test
# Run tests for a specific package
pnpm --filter @code-rag/core test
# Run with coverage
pnpm test -- --coverage
# Lint all packages
pnpm lint
# Clean all build artifacts
pnpm cleanThe project uses a shared tsconfig.base.json with strict settings. Each package extends it with its own tsconfig.json. Key compiler options:
target: ES2022withmodule: NodeNext(ESM)strict: true(enables all strict checks)noUncheckedIndexedAccess: true(array/object index returnsT | undefined)noUnusedLocals: trueandnoUnusedParameters: trueisolatedModules: true
All imports must include the .js extension for NodeNext module resolution, even when the source file is .ts:
// Correct
import { parseConfig } from './config-loader.js';
// Incorrect -- will fail at runtime
import { parseConfig } from './config-loader';CodeRAG is a pnpm workspace monorepo with 7 packages:
coderag/
packages/
core/ # Core library: ingestion, embedding, retrieval
cli/ # CLI tool (coderag init/index/search/serve/status)
mcp-server/ # MCP server (stdio + SSE transport)
api-server/ # REST API server for cloud/team features
viewer/ # Web-based visualization dashboard
vscode-extension/ # VS Code extension
benchmarks/ # Benchmark suite
scripts/ # Development and CI scripts
docs/ # Documentation (architecture, guides, API reference)
.coderag.yaml # Project config (dogfooding)
CLAUDE.md # AI agent context
pnpm-workspace.yaml # Workspace definition
tsconfig.base.json # Shared TypeScript config
Each package has its own package.json, tsconfig.json, and builds independently. Cross-package dependencies use the workspace protocol (workspace:*).
- No
any-- useunknownand narrow with type guards - No
ascasts without a justifying comment explaining why - ESM modules only -- use
import/export, neverrequire() - All imports must include the
.jsextension for NodeNext resolution
// Good -- explicit type narrowing
function getLength(value: unknown): number {
if (typeof value === 'string') {
return value.length;
}
return 0;
}
// Bad -- using `any` bypasses type safety
function getLength(value: any): number {
return value.length;
}| Element | Convention | Example |
|---|---|---|
| Functions / variables | camelCase | parseChunks, topK, chunkSize |
| Types / classes | PascalCase | SearchResult, DependencyGraph, EmbeddingProvider |
| Constants | UPPER_SNAKE_CASE | BATCH_SIZE, DEFAULT_CONFIG, MAX_CHUNK_SIZE |
| File names | kebab-case | tree-sitter-parser.ts, hybrid-search.ts |
| Test files | kebab-case + .test.ts |
hybrid-search.test.ts |
- Prefer pure functions over classes with mutable state
- Use
readonlyon interface properties and function parameters where possible - Minimize mutable state -- favor
map/filter/reduceover loops with mutation - Use
ReadonlyArray,ReadonlyMap,ReadonlySetfor immutable collections
// Good -- pure function, no side effects
function filterByLanguage(
chunks: readonly Chunk[],
language: string,
): Chunk[] {
return chunks.filter((chunk) => chunk.metadata.language === language);
}
// Avoid -- mutating external state
function filterByLanguage(chunks: Chunk[], language: string): void {
for (let i = chunks.length - 1; i >= 0; i--) {
if (chunks[i]?.metadata.language !== language) {
chunks.splice(i, 1);
}
}
}All external dependencies sit behind interfaces. This enables easy testing with mocks, swapping implementations without changing consumers, and clear dependency boundaries:
// Define the interface in packages/core/src/types/provider.ts
export interface EmbeddingProvider {
embed(texts: string[]): Promise<Result<number[][], EmbedError>>;
readonly dimensions: number;
}
// Implement it
export class OllamaEmbeddingProvider implements EmbeddingProvider {
readonly dimensions: number;
// ...
}Key interfaces: EmbeddingProvider, VectorStore, BacklogProvider, LLMProvider, Parser, Chunker, ReRanker.
All configuration flows through .coderag.yaml. No hardcoded values for anything that could vary between environments. The config schema is validated with Zod at startup.
CodeRAG uses the Result pattern from neverthrow instead of throwing exceptions. Every public function that can fail must return Result<T, E>.
import { ok, err, type Result } from 'neverthrow';
// Define a typed error class
export class ParseError extends Error {
constructor(message: string) {
super(message);
this.name = 'ParseError';
}
}
// Return Result instead of throwing
function parseConfig(raw: string): Result<Config, ParseError> {
try {
const parsed: unknown = JSON.parse(raw);
const validated = configSchema.safeParse(parsed);
if (!validated.success) {
return err(new ParseError(`Invalid config: ${validated.error.message}`));
}
return ok(validated.data);
} catch (error) {
const message = error instanceof Error ? error.message : 'Unknown error';
return err(new ParseError(`Invalid config: ${message}`));
}
}
// Consume with isOk/isErr
const result = parseConfig(input);
if (result.isErr()) {
console.error(result.error.message);
return;
}
const config = result.value; // Typed as Config- No uncaught throws in library code -- always return
Result<T, E> - Each module defines its own error class (e.g.,
ParseError,EmbedError,StoreError) - Exceptions are reserved for truly unexpected situations (programmer errors)
- Use
.isOk()/.isErr()to consume results, or.map()/.andThen()for chaining
- Vitest is the test runner
- Tests are co-located with source files as
*.test.ts - Coverage target: 80%+ on
@code-rag/core
Use the describe/it pattern:
import { describe, it, expect, vi } from 'vitest';
import { parseConfig } from './config-loader.js';
describe('parseConfig', () => {
it('should parse a valid YAML config', async () => {
const result = await parseConfig('/path/to/config');
expect(result.isOk()).toBe(true);
});
it('should return an error for missing config', async () => {
const result = await parseConfig('/nonexistent');
expect(result.isErr()).toBe(true);
expect(result._unsafeUnwrapErr().message).toContain('not found');
});
});# Run all tests across all packages
pnpm test
# Run tests for a specific package
pnpm --filter @code-rag/core test
# Run with coverage report
pnpm test -- --coverage
# Run a specific test file
pnpm test -- packages/core/src/embedding/ollama-embedding-provider.test.ts
# Watch mode during development
pnpm --filter @code-rag/core test -- --watchUse vi.fn() and vi.spyOn() for mocks. When testing against provider interfaces, create mock implementations rather than mocking internal details:
import { ok } from 'neverthrow';
import type { EmbeddingProvider } from '../types/provider.js';
function createMockEmbeddingProvider(dimensions = 768): EmbeddingProvider {
return {
dimensions,
embed: vi.fn().mockResolvedValue(ok([new Array(dimensions).fill(0)])),
};
}All branches follow this pattern:
feature/AB#XXXX-short-description (feature branches)
bugfix/AB#XXXX-fix-description (bugfix branches)
Where AB#XXXX is the Azure DevOps work item ID.
# Create a feature branch
git checkout -b feature/AB#42-add-auth-middleware
# Create a bugfix branch
git checkout -b bugfix/AB#99-fix-search-timeoutCommit messages start with the Azure DevOps work item ID:
AB#42 Add authentication middleware for API server
Rules:
AB#XXXXprefix links the commit to Azure DevOps work items automatically- No conventional commits prefix (
feat:,fix:, etc.) --AB#is the prefix - Multiple stories in one commit:
AB#42 AB#43 Description - Keep the description concise and meaningful
Examples:
AB#85 Fix tree-sitter WASM ABI incompatibility
AB#55 AB#52 Add SSE transport and coderag_explain tool
AB#103 Fix HybridSearch returning empty metadata for vector-only results
- Feature branches are squash-merged to main for clean history
- No direct commits to main
- The PR title should include the AB# reference
- Run the full build and test suite:
pnpm build && pnpm test && pnpm lint
- Ensure no
anytypes or unjustifiedascasts - Verify tests cover the new functionality (80%+ on core)
- Update documentation if behavior changes
-
Push your branch:
git push origin feature/AB#XXXX-description
-
Open a Pull Request against the
mainbranch -
Fill out the PR with:
- Summary of changes (1-3 bullet points)
- Test plan (how to verify the changes)
- AB# link to the Azure DevOps work item
- Keep PRs focused -- one feature or fix per PR
- Include tests for new functionality
- Update documentation if behavior changes
- Ensure all CI checks pass before requesting review
- Address review feedback promptly
- Rebase on
mainif your branch falls behind
CodeRAG's provider pattern makes it straightforward to add new implementations. See docs/extending.md for detailed guides on adding:
- Embedding providers (implement
EmbeddingProviderinterface) - Vector stores (implement
VectorStoreinterface) - Backlog providers (implement
BacklogProviderinterface) - Language parsers (register Tree-sitter grammars in
LanguageRegistry)
- Search existing issues before creating a new one
- For bugs: include reproduction steps, environment details (OS, Node.js version), and relevant error output
- For features: describe the use case and expected behavior
- Include the CodeRAG version (
coderag --versionor checkpackage.json)
By contributing to CodeRAG, you agree that your contributions will be licensed under the MIT License.