Skip to content

Latest commit

 

History

History
305 lines (228 loc) · 8.73 KB

File metadata and controls

305 lines (228 loc) · 8.73 KB

Contributing to Docmole

Thanks for your interest in contributing to Docmole! This guide will help you understand the codebase and contribute effectively.

Quick Start

# Clone and install
git clone https://github.com/Vigtu/docmole.git
cd docmole
bun install

# Run tests
bun test

# Type check
bun run typecheck

# Lint and format
bun run lint

Requirements: Bun >= 1.0.0

Development Commands

Command Purpose
bun install Install dependencies
bun run dev Run with hot reload
bun test Run all tests
bun test tests/embedded.test.ts Run single test file
bun run typecheck Type check (tsc --noEmit)
bun run lint Lint + format with Biome

Architecture Overview

Docmole is an MCP server with three operation modes:

┌─────────────────────────────────────────────────────────────┐
│                      docmole CLI                             │
│                    (src/index.ts)                            │
└──────────────────────────┬──────────────────────────────────┘
                           │
           ┌───────────────┼───────────────┐
           │               │               │
     ┌─────▼─────┐   ┌─────▼─────┐   ┌─────▼─────┐
     │ Mintlify  │   │ Embedded  │   │   Agno    │
     │  Backend  │   │  Backend  │   │  Backend  │
     │ (API)     │   │ (LanceDB) │   │ (Python)  │
     └───────────┘   └───────────┘   └───────────┘
           │               │               │
           └───────────────┼───────────────┘
                           │
                    ┌──────▼──────┐
                    │ MCP Server  │
                    │(src/server) │
                    └─────────────┘

Key Modules

Directory Purpose
src/backends/ Backend implementations
src/backends/embedded/ Pure TypeScript RAG (LanceDB + OpenAI)
src/cli/ CLI commands (setup, serve, list, etc.)
src/config/ YAML config management (~/.docmole/)
src/discovery/ Page discovery (sitemap.xml, mint.json)
src/security/ URL validation, input sanitization
src/server.ts MCP server exposing ask and clear_history

Backend Interface

All backends implement this interface:

interface Backend {
  readonly name: string;
  readonly projectId: string;
  ask(question: string): Promise<AskResult>;
  clearHistory(): void;
  isAvailable(): Promise<boolean>;
}

Adding a New Backend

  1. Add backend type to src/config/schema.ts:

    export const BACKEND_TYPES = ["mintlify", "embedded", "agno", "your-backend"] as const;
  2. Create backend file at src/backends/your-backend.ts:

    import type { Backend, BackendFactory, AskResult } from "./types";
    
    class YourBackend implements Backend {
      readonly name = "your-backend";
      constructor(readonly projectId: string) {}
    
      async ask(question: string): Promise<AskResult> {
        // Implementation
        return { answer: "..." };
      }
    
      clearHistory(): void {}
    
      async isAvailable(): Promise<boolean> {
        return true;
      }
    }
    
    export const backendFactory: BackendFactory = {
      create: async (projectId, config) => new YourBackend(projectId),
    };
  3. Backend loads automatically via src/backends/registry.ts (no manual registration).

Testing Guidelines

File Organization

tests/
├── config.test.ts        # src/config/*
├── mintlify-api.test.ts  # src/backends/mintlify.ts
├── embedded.test.ts      # src/backends/embedded/*
├── security.test.ts      # src/security/*
└── <module>.test.ts      # One file per major module

Good Tests

// Integration with real systems
test("persists data across instances", async () => {
  const kb1 = await createKnowledge(path, embedder);
  await kb1.addDocument({ name: "doc", content: "test" });
  await kb1.close();

  const kb2 = await createKnowledge(path, embedder);
  expect(await kb2.countDocuments()).toBe(1);
});

// Business logic that can fail
test("deduplicates by source URL (max 2 chunks)", async () => {
  // Tests actual deduplication logic
});

// Error handling
test("handles errors gracefully", async () => {
  const mock = { retrieve: async () => { throw new Error("fail"); } };
  const result = await tool.execute({ query: "test" });
  expect(result.success).toBe(false);
});

Bad Tests (Avoid)

// Testing hardcoded values
test("default model is gpt-4o-mini", () => {
  expect(config.model).toBe("gpt-4o-mini"); // Changes aren't bugs
});

// Testing library behavior
test("LanceDB returns array", () => {
  expect(Array.isArray(results)).toBe(true); // Test YOUR code
});

Integration Tests

Skip if dependencies unavailable:

const hasOpenAIKey = !!process.env.OPENAI_API_KEY;

describe.skipIf(!hasOpenAIKey)("OpenAI Integration", () => {
  setDefaultTimeout(60_000); // API calls are slow
  // ...
});

Mocking

// Type-safe mocks
type MockKnowledge = Pick<EmbeddedKnowledge, "search">;

const mock: MockKnowledge = {
  search: async () => [{ name: "doc", content: "test", metadata: {} }],
};

const retriever = new Retriever(mock as unknown as EmbeddedKnowledge);

Code Style

  • Formatter: Biome (double quotes, semicolons)
  • TypeScript: Strict mode enabled
  • Imports: Auto-organized by Biome

Run bun run lint to auto-fix formatting issues.

Project Structure Conventions

Pattern Convention
Backend file src/backends/{type}.ts exports backendFactory
CLI command src/cli/{name}.ts
Config schema src/config/schema.ts is the SSOT
Test file tests/{module}.test.ts

Key Technical Decisions

TypeScript as Source of Truth

Config schemas are defined in TypeScript (src/config/schema.ts), not JSON/YAML. This ensures type safety and IDE autocomplete.

Security by Default

All external inputs are validated:

  • URLs: SSRF protection (no file://, private IPs)
  • Project IDs: Path traversal prevention (alphanumeric + dash/underscore)

See src/security/ for implementations.

Graceful Degradation

Backend registry returns errors instead of throwing:

const result = await loadBackend("embedded");
if (!result.success) {
  console.error(result.error.message);
  console.error(result.error.suggestion);
}

Understanding the Mintlify Backend

The Mintlify backend (src/backends/mintlify.ts) reverse-engineers Mintlify's AI Assistant API.

Key details (see docs/reverse-engineering-mintlify-api.md):

  • Endpoint: POST https://leaves.mintlify.com/api/assistant/{project-id}/message
  • Response: SSE stream with prefixes (0: = text, a: = tool results)
  • Only parse 0: chunks to avoid context window bloat (97% reduction)

Pull Request Process

  1. Create a branch:

    git checkout -b feature/your-feature
  2. Make changes and add tests if applicable

  3. Run checks:

    bun test
    bun run typecheck
    bun run lint
  4. Commit with conventional commits:

    git commit -m "feat: add support for X"

    Prefixes: feat:, fix:, docs:, chore:, test:, refactor:

  5. Push and create PR:

    git push origin feature/your-feature

Environment Variables

Variable Purpose
OPENAI_API_KEY Required for embedded backend tests
DOCMOLE_DATA_DIR Override data directory (default: ~/.docmole)
DOCMOLE_ALLOW_LOCALHOST Allow localhost URLs (dev mode)

Documentation

Document When to Read
AGENT.md Architecture overview
docs/architecture-plan.md Design decisions, roadmap
docs/reverse-engineering-mintlify-api.md Mintlify backend details
docs/enterprise-requirements.md Enterprise features
docs/universal-docs-support.md Generic docs site support

Questions?

Open an issue for discussion before starting work on major changes.

License

By contributing, you agree that your contributions will be licensed under the MIT License.