Contributing to CodeRAG

Thank you for your interest in contributing to CodeRAG! This guide covers everything you need to know: from setting up your development environment to submitting a pull request.

Getting Started
Development Setup
Project Structure
Coding Conventions
Error Handling
Testing
Branch and Commit Conventions
Pull Request Process
Adding New Providers
Reporting Issues
License

Getting Started

Prerequisites

Tool	Version	Required	Notes
Node.js	>= 20	Yes	LTS recommended
pnpm	>= 9	Yes	`corepack enable && corepack prepare pnpm@latest --activate`
Ollama	latest	No	For local embedding and NL enrichment
Git	>= 2.30	Yes	For version control and file watcher features

Forking the Repository

If you are an external contributor, start by forking the repository:

Fork the CodeRAG repository on Azure DevOps (or GitHub mirror if available)

Clone your fork locally:

git clone https://dev.azure.com/<your-org>/CodeRAG/_git/CodeRAG
cd CodeRAG

Add the upstream remote so you can keep your fork in sync:

git remote add upstream https://dev.azure.com/momc-pl/CodeRAG/_git/CodeRAG

Create a feature branch for your changes:

git checkout -b feature/AB#XXXX-short-description

After implementing and testing your changes, push to your fork and submit a Pull Request against the upstream main branch.

Quick Start

The fastest way to get started is to use the bootstrap script:

git clone https://dev.azure.com/momc-pl/CodeRAG/_git/CodeRAG
cd CodeRAG
./scripts/bootstrap.sh

This will check prerequisites, install dependencies, build all packages, and run the test suite.

Manual Setup

If you prefer to set things up manually:

# 1. Clone the repository
git clone https://dev.azure.com/momc-pl/CodeRAG/_git/CodeRAG
cd CodeRAG

# 2. Install dependencies
pnpm install

# 3. Build all packages
pnpm build

# 4. Run tests
pnpm test

# 5. (Optional) Set up Ollama for local AI features
ollama pull nomic-embed-text
ollama pull qwen2.5-coder

Development Setup

Package Manager

CodeRAG uses pnpm workspaces. Always use pnpm (not npm or yarn) for package operations:

# Install all workspace dependencies
pnpm install

# Build all packages (respects dependency order)
pnpm build

# Run all tests across all packages
pnpm test

# Run tests for a specific package
pnpm --filter @code-rag/core test

# Run with coverage
pnpm test -- --coverage

# Lint all packages
pnpm lint

# Clean all build artifacts
pnpm clean

TypeScript Configuration

The project uses a shared tsconfig.base.json with strict settings. Each package extends it with its own tsconfig.json. Key compiler options:

target: ES2022 with module: NodeNext (ESM)
strict: true (enables all strict checks)
noUncheckedIndexedAccess: true (array/object index returns T | undefined)
noUnusedLocals: true and noUnusedParameters: true
isolatedModules: true

All imports must include the .js extension for NodeNext module resolution, even when the source file is .ts:

// Correct
import { parseConfig } from './config-loader.js';

// Incorrect -- will fail at runtime
import { parseConfig } from './config-loader';

Project Structure

CodeRAG is a pnpm workspace monorepo with 7 packages:

coderag/
  packages/
    core/               # Core library: ingestion, embedding, retrieval
    cli/                # CLI tool (coderag init/index/search/serve/status)
    mcp-server/         # MCP server (stdio + SSE transport)
    api-server/         # REST API server for cloud/team features
    viewer/             # Web-based visualization dashboard
    vscode-extension/   # VS Code extension
    benchmarks/         # Benchmark suite
  scripts/              # Development and CI scripts
  docs/                 # Documentation (architecture, guides, API reference)
  .coderag.yaml         # Project config (dogfooding)
  CLAUDE.md             # AI agent context
  pnpm-workspace.yaml   # Workspace definition
  tsconfig.base.json    # Shared TypeScript config

Each package has its own package.json, tsconfig.json, and builds independently. Cross-package dependencies use the workspace protocol (workspace:*).

Coding Conventions

TypeScript Strict Mode

No any -- use unknown and narrow with type guards
No as casts without a justifying comment explaining why
ESM modules only -- use import/export, never require()
All imports must include the .js extension for NodeNext resolution

// Good -- explicit type narrowing
function getLength(value: unknown): number {
  if (typeof value === 'string') {
    return value.length;
  }
  return 0;
}

// Bad -- using `any` bypasses type safety
function getLength(value: any): number {
  return value.length;
}

Naming Conventions

Element	Convention	Example
Functions / variables	camelCase	`parseChunks`, `topK`, `chunkSize`
Types / classes	PascalCase	`SearchResult`, `DependencyGraph`, `EmbeddingProvider`
Constants	UPPER_SNAKE_CASE	`BATCH_SIZE`, `DEFAULT_CONFIG`, `MAX_CHUNK_SIZE`
File names	kebab-case	`tree-sitter-parser.ts`, `hybrid-search.ts`
Test files	kebab-case + `.test.ts`	`hybrid-search.test.ts`

Functional Style

Prefer pure functions over classes with mutable state
Use readonly on interface properties and function parameters where possible
Minimize mutable state -- favor map/filter/reduce over loops with mutation
Use ReadonlyArray, ReadonlyMap, ReadonlySet for immutable collections

// Good -- pure function, no side effects
function filterByLanguage(
  chunks: readonly Chunk[],
  language: string,
): Chunk[] {
  return chunks.filter((chunk) => chunk.metadata.language === language);
}

// Avoid -- mutating external state
function filterByLanguage(chunks: Chunk[], language: string): void {
  for (let i = chunks.length - 1; i >= 0; i--) {
    if (chunks[i]?.metadata.language !== language) {
      chunks.splice(i, 1);
    }
  }
}

Provider Pattern

All external dependencies sit behind interfaces. This enables easy testing with mocks, swapping implementations without changing consumers, and clear dependency boundaries:

// Define the interface in packages/core/src/types/provider.ts
export interface EmbeddingProvider {
  embed(texts: string[]): Promise<Result<number[][], EmbedError>>;
  readonly dimensions: number;
}

// Implement it
export class OllamaEmbeddingProvider implements EmbeddingProvider {
  readonly dimensions: number;
  // ...
}

Key interfaces: EmbeddingProvider, VectorStore, BacklogProvider, LLMProvider, Parser, Chunker, ReRanker.

Configuration

All configuration flows through .coderag.yaml. No hardcoded values for anything that could vary between environments. The config schema is validated with Zod at startup.

Error Handling

CodeRAG uses the Result pattern from neverthrow instead of throwing exceptions. Every public function that can fail must return Result<T, E>.

Basic Pattern

import { ok, err, type Result } from 'neverthrow';

// Define a typed error class
export class ParseError extends Error {
  constructor(message: string) {
    super(message);
    this.name = 'ParseError';
  }
}

// Return Result instead of throwing
function parseConfig(raw: string): Result<Config, ParseError> {
  try {
    const parsed: unknown = JSON.parse(raw);
    const validated = configSchema.safeParse(parsed);
    if (!validated.success) {
      return err(new ParseError(`Invalid config: ${validated.error.message}`));
    }
    return ok(validated.data);
  } catch (error) {
    const message = error instanceof Error ? error.message : 'Unknown error';
    return err(new ParseError(`Invalid config: ${message}`));
  }
}

// Consume with isOk/isErr
const result = parseConfig(input);
if (result.isErr()) {
  console.error(result.error.message);
  return;
}
const config = result.value; // Typed as Config

Rules

No uncaught throws in library code -- always return Result<T, E>
Each module defines its own error class (e.g., ParseError, EmbedError, StoreError)
Exceptions are reserved for truly unexpected situations (programmer errors)
Use .isOk() / .isErr() to consume results, or .map() / .andThen() for chaining

Testing

Framework and Location

Vitest is the test runner
Tests are co-located with source files as *.test.ts
Coverage target: 80%+ on @code-rag/core

Test Structure

Use the describe/it pattern:

import { describe, it, expect, vi } from 'vitest';
import { parseConfig } from './config-loader.js';

describe('parseConfig', () => {
  it('should parse a valid YAML config', async () => {
    const result = await parseConfig('/path/to/config');
    expect(result.isOk()).toBe(true);
  });

  it('should return an error for missing config', async () => {
    const result = await parseConfig('/nonexistent');
    expect(result.isErr()).toBe(true);
    expect(result._unsafeUnwrapErr().message).toContain('not found');
  });
});

Running Tests

# Run all tests across all packages
pnpm test

# Run tests for a specific package
pnpm --filter @code-rag/core test

# Run with coverage report
pnpm test -- --coverage

# Run a specific test file
pnpm test -- packages/core/src/embedding/ollama-embedding-provider.test.ts

# Watch mode during development
pnpm --filter @code-rag/core test -- --watch

Mocking

Use vi.fn() and vi.spyOn() for mocks. When testing against provider interfaces, create mock implementations rather than mocking internal details:

import { ok } from 'neverthrow';
import type { EmbeddingProvider } from '../types/provider.js';

function createMockEmbeddingProvider(dimensions = 768): EmbeddingProvider {
  return {
    dimensions,
    embed: vi.fn().mockResolvedValue(ok([new Array(dimensions).fill(0)])),
  };
}

Branch and Commit Conventions

Branch Naming

All branches follow this pattern:

feature/AB#XXXX-short-description    (feature branches)
bugfix/AB#XXXX-fix-description       (bugfix branches)

Where AB#XXXX is the Azure DevOps work item ID.

# Create a feature branch
git checkout -b feature/AB#42-add-auth-middleware

# Create a bugfix branch
git checkout -b bugfix/AB#99-fix-search-timeout

Commit Message Format

Commit messages start with the Azure DevOps work item ID:

AB#42 Add authentication middleware for API server

Rules:

AB#XXXX prefix links the commit to Azure DevOps work items automatically
No conventional commits prefix (feat:, fix:, etc.) -- AB# is the prefix
Multiple stories in one commit: AB#42 AB#43 Description
Keep the description concise and meaningful

Examples:

AB#85 Fix tree-sitter WASM ABI incompatibility
AB#55 AB#52 Add SSE transport and coderag_explain tool
AB#103 Fix HybridSearch returning empty metadata for vector-only results

Squash Merge

Feature branches are squash-merged to main for clean history
No direct commits to main
The PR title should include the AB# reference

Pull Request Process

Before Submitting

Run the full build and test suite:
```
pnpm build && pnpm test && pnpm lint
```
Ensure no any types or unjustified as casts
Verify tests cover the new functionality (80%+ on core)
Update documentation if behavior changes

Creating a PR

Push your branch:

git push origin feature/AB#XXXX-description

Open a Pull Request against the main branch
Fill out the PR with:
- Summary of changes (1-3 bullet points)
- Test plan (how to verify the changes)
- AB# link to the Azure DevOps work item

Review Expectations

Keep PRs focused -- one feature or fix per PR
Include tests for new functionality
Update documentation if behavior changes
Ensure all CI checks pass before requesting review
Address review feedback promptly
Rebase on main if your branch falls behind

Adding New Providers

CodeRAG's provider pattern makes it straightforward to add new implementations. See docs/extending.md for detailed guides on adding:

Embedding providers (implement EmbeddingProvider interface)
Vector stores (implement VectorStore interface)
Backlog providers (implement BacklogProvider interface)
Language parsers (register Tree-sitter grammars in LanguageRegistry)

Reporting Issues

Search existing issues before creating a new one
For bugs: include reproduction steps, environment details (OS, Node.js version), and relevant error output
For features: describe the use case and expected behavior
Include the CodeRAG version (coderag --version or check package.json)

License

By contributing to CodeRAG, you agree that your contributions will be licensed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contributing to CodeRAG

Table of Contents

Getting Started

Prerequisites

Forking the Repository

Quick Start

Manual Setup

Development Setup

Package Manager

TypeScript Configuration

Project Structure

Coding Conventions

TypeScript Strict Mode

Naming Conventions

Functional Style

Provider Pattern

Configuration

Error Handling

Basic Pattern

Rules

Testing

Framework and Location

Test Structure

Running Tests

Mocking

Branch and Commit Conventions

Branch Naming

Commit Message Format

Squash Merge

Pull Request Process

Before Submitting

Creating a PR

Review Expectations

Adding New Providers

Reporting Issues

License

FilesExpand file tree

CONTRIBUTING.md

Latest commit

History

CONTRIBUTING.md

File metadata and controls

Contributing to CodeRAG

Table of Contents

Getting Started

Prerequisites

Forking the Repository

Quick Start

Manual Setup

Development Setup

Package Manager

TypeScript Configuration

Project Structure

Coding Conventions

TypeScript Strict Mode

Naming Conventions

Functional Style

Provider Pattern

Configuration

Error Handling

Basic Pattern

Rules

Testing

Framework and Location

Test Structure

Running Tests

Mocking

Branch and Commit Conventions

Branch Naming

Commit Message Format

Squash Merge

Pull Request Process

Before Submitting

Creating a PR

Review Expectations

Adding New Providers

Reporting Issues

License