Skip to content

Latest commit

 

History

History
578 lines (452 loc) · 13.8 KB

File metadata and controls

578 lines (452 loc) · 13.8 KB

Contributing to EdgeVec

Thank you for your interest in contributing to EdgeVec!

EdgeVec follows a "Nvidia Grade" development protocol where correctness and verification are paramount. This document outlines the strict quality standards all contributions must meet.


Table of Contents

  1. Core Principles
  2. Forbidden Patterns
  3. Required Practices
  4. Testing Standards
  5. Code Review Process
  6. Development Workflow
  7. Release Process

Core Principles

1. Test-First Development (TDD)

Every feature begins with tests, not implementation.

// ✅ CORRECT: Write the test first
#[test]
fn test_insert_returns_vector_id() {
    let mut index = VectorIndex::new(128);
    let vector = vec![0.0; 128];
    let id = index.insert(&vector).expect("insert failed");
    assert!(id.is_valid());
}

// Then implement the function to pass the test
impl VectorIndex {
    pub fn insert(&mut self, vector: &[f32]) -> Result<VectorId, Error> {
        // Implementation
    }
}

2. Correctness Over Convenience

Code must be provably correct before it can be fast.

  • All public APIs must have comprehensive tests
  • All algorithms must have property-based tests
  • All unsafe code must have documented safety proofs
  • All performance claims must have benchmark evidence

3. Hostile Default

Assume your code will break in production. Prove it won't.

  • Design for failure modes
  • Handle all error paths explicitly
  • No silent failures
  • No assumptions about input validity

Forbidden Patterns

❌ FORBIDDEN: unwrap() in Library Code

Never use unwrap(), expect(), or any panic-inducing operation in library code.

// ❌ FORBIDDEN
pub fn search(&self, query: &[f32]) -> Vec<Result> {
    let k = query.len();
    self.index.get(k).unwrap()  // PANIC if k is out of bounds!
}

// ✅ CORRECT
pub fn search(&self, query: &[f32]) -> Result<Vec<SearchResult>, Error> {
    let k = query.len();
    self.index.get(k)
        .ok_or(Error::InvalidQueryDimension { 
            expected: self.dim, 
            actual: k 
        })
}

Exceptions: unwrap() is allowed ONLY in:

  • Test code (#[cfg(test)])
  • Example code (examples/)
  • Build scripts (build.rs)

❌ FORBIDDEN: Panic in Public APIs

// ❌ FORBIDDEN
pub fn insert(&mut self, vector: &[f32]) {
    assert_eq!(vector.len(), self.dim, "Dimension mismatch!");  // PANIC!
}

// ✅ CORRECT
pub fn insert(&mut self, vector: &[f32]) -> Result<VectorId, Error> {
    if vector.len() != self.dim {
        return Err(Error::DimensionMismatch {
            expected: self.dim,
            actual: vector.len(),
        });
    }
    // ...
}

❌ FORBIDDEN: unsafe Without Justification

Every unsafe block must have:

  1. A comment explaining why it's needed
  2. A proof of safety
  3. A reference to the review that approved it
// ❌ FORBIDDEN
unsafe {
    *ptr = value;  // No explanation!
}

// ✅ CORRECT
// SAFETY: This unsafe block is required because:
// 1. We need to avoid bounds checks in the search hot path
// 2. The index `i` is guaranteed valid by invariant [I1] (len <= capacity)
// 3. This was reviewed in PR #42 by HOSTILE_REVIEWER
//
// Proof of safety:
// - Line 127 checks `i < self.len` before this block
// - `self.data` is always valid for `self.len` elements (invariant I2)
// - No aliasing: `self.data` is uniquely owned by this struct
unsafe {
    *self.data.get_unchecked_mut(i) = value;
}

❌ FORBIDDEN: Magic Numbers

// ❌ FORBIDDEN
if neighbors.len() > 32 {
    // Why 32? Is it configurable? Where did it come from?
}

// ✅ CORRECT
/// Maximum number of neighbors per HNSW node.
/// 
/// Source: Section 5.2 of the HNSW paper (Malkov & Yashunin, 2018)
/// recommends M=16 for most use cases. We use 2*M=32 as the capacity.
pub const MAX_NEIGHBORS: usize = 32;

if neighbors.len() > MAX_NEIGHBORS {
    // Clear intent, documented source
}

❌ FORBIDDEN: TODO Without Issue Link

// ❌ FORBIDDEN
// TODO: optimize this later

// ✅ CORRECT
// TODO(issue #123): Optimize with SIMD after performance profiling

❌ FORBIDDEN: Commented-Out Code

// ❌ FORBIDDEN
pub fn search(&self, query: &[f32]) -> Result<Vec<Result>, Error> {
    // let old_approach = self.linear_search(query);
    self.hnsw_search(query)
}

// ✅ CORRECT: Delete it. Git remembers.
pub fn search(&self, query: &[f32]) -> Result<Vec<SearchResult>, Error> {
    self.hnsw_search(query)
}

Required Practices

✅ REQUIRED: Explicit Error Handling

Use Result<T, E> for all fallible operations.

// Define clear error types
#[derive(Debug, Clone, PartialEq, Eq, thiserror::Error)]
pub enum Error {
    #[error("dimension mismatch: expected {expected}, got {actual}")]
    DimensionMismatch { expected: usize, actual: usize },
    
    #[error("index is empty")]
    EmptyIndex,
    
    #[error("k={k} exceeds vector count={count}")]
    KTooLarge { k: usize, count: usize },
}

// Use Result everywhere
pub fn insert(&mut self, vector: &[f32]) -> Result<VectorId, Error>;
pub fn search(&self, query: &[f32], k: usize) -> Result<Vec<SearchResult>, Error>;

✅ REQUIRED: Documentation Comments

Every public item must have a doc comment.

/// A hierarchical navigable small world (HNSW) index for approximate nearest neighbor search.
///
/// # Examples
///
/// ```rust
/// use edgevec::HnswIndex;
///
/// let mut index = HnswIndex::new(128); // 128-dimensional vectors
/// index.insert(&vec![0.0; 128])?;
/// let results = index.search(&vec![1.0; 128], 10)?;
/// ```
///
/// # Performance
///
/// - Search: O(log n) average case
/// - Insert: O(log n) average case
/// - Memory: ~82 bytes per vector (including index overhead)
///
/// # Thread Safety
///
/// This type is `Send + Sync` and can be safely shared across threads.
pub struct HnswIndex {
    // ...
}

✅ REQUIRED: Struct Invariants

Document invariants for all data structures.

/// Vector storage with arena allocation.
///
/// # Invariants
///
/// - [I1] `len <= capacity` (always)
/// - [I2] `data[0..len]` is always valid
/// - [I3] `tombstones.len() == capacity` (bitmap size matches capacity)
/// - [I4] If `tombstones[i] == true`, then `data[i]` is logically deleted
///
/// # Safety
///
/// Violating any invariant is undefined behavior. All public methods maintain invariants.
pub struct VectorStorage {
    data: Vec<Vector>,
    len: usize,
    capacity: usize,
    tombstones: BitVec,
}

✅ REQUIRED: Test Coverage

100% coverage for all public APIs.

#[cfg(test)]
mod tests {
    use super::*;

    /// Unit test: Empty index returns error
    #[test]
    fn test_search_empty_index() {
        let index = HnswIndex::new(128);
        let query = vec![0.0; 128];
        let result = index.search(&query, 10);
        assert!(matches!(result, Err(Error::EmptyIndex)));
    }

    /// Boundary test: Single vector
    #[test]
    fn test_search_single_vector() {
        let mut index = HnswIndex::new(128);
        index.insert(&vec![1.0; 128]).unwrap();
        let results = index.search(&vec![1.0; 128], 10).unwrap();
        assert_eq!(results.len(), 1);
    }

    /// Scale test: Large index
    #[test]
    fn test_search_100k_vectors() {
        let mut index = HnswIndex::new(128);
        for _ in 0..100_000 {
            index.insert(&random_vector(128)).unwrap();
        }
        let results = index.search(&random_vector(128), 10).unwrap();
        assert_eq!(results.len(), 10);
    }

    /// Property-based test: Idempotence
    #[test]
    fn test_search_is_deterministic() {
        proptest!(|(seed: u64)| {
            let mut index = HnswIndex::with_seed(128, seed);
            for _ in 0..1000 {
                index.insert(&random_vector(128)).unwrap();
            }
            let query = random_vector(128);
            let results1 = index.search(&query, 10).unwrap();
            let results2 = index.search(&query, 10).unwrap();
            prop_assert_eq!(results1, results2);
        });
    }
}

Testing Standards

"Nvidia Grade" Verification

EdgeVec uses a multi-layer test pyramid:

         /\
        /E2E\         ← 5 End-to-End tests (browser automation)
       /------\
      /Integr.\      ← 6 Integration tests (persistence + search)
     /----------\
    /  Recall   \    ← 1 Recall test (95% threshold on SIFT dataset)
   /--------------\
  / Property-Based\  ← 8 Property tests (invariants, serialization)
 /------------------\
/   Fuzz + Miri     \ ← 4 Fuzz targets + Miri checks (memory safety)
/--------------------\
|   Unit Tests (200+)|  ← Comprehensive unit coverage
----------------------

Required Test Types

  1. Unit Tests: Every function has at least one test
  2. Integration Tests: Cross-component workflows
  3. Property-Based Tests: Invariants hold for all inputs
  4. Fuzz Tests: No crashes on adversarial inputs
  5. Recall Tests: Search returns correct results (>95% recall)
  6. Benchmark Tests: Performance budgets are met

Test Naming Convention

#[test]
fn test_<function>_<scenario>_<expected_behavior>() {
    // test_insert_duplicate_vector_returns_error
    // test_search_empty_index_returns_error
    // test_serialize_roundtrip_preserves_data
}

Code Review Process

Self-Review Checklist

Before submitting a PR, verify:

  • cargo fmt passes (formatting)
  • cargo clippy -- -D warnings passes (no warnings)
  • cargo test passes (all tests)
  • cargo doc --no-deps passes (documentation)
  • No unwrap() in library code
  • No TODO without issue reference
  • All public APIs documented
  • Test coverage for new code
  • Acceptance criteria met (from task plan)

Hostile Review

All PRs undergo HOSTILE_REVIEWER scrutiny:

  1. Correctness: Does the code do what it claims?
  2. Safety: Are all invariants maintained?
  3. Performance: Are performance budgets met?
  4. Testing: Is verification comprehensive?
  5. Documentation: Is intent clear?

Standard: Maximum scrutiny. Approval requires zero critical issues.


Development Workflow

1. Check the Plan

No code without an approved task.

# Verify your task exists in the weekly plan
cat docs/planning/weeks/week_XX/OVERVIEW.md

# Find your task ID (e.g., W3.2)
# Read acceptance criteria

2. Write Tests First

# Create test file first
touch tests/hnsw_test.rs

# Write failing tests that define the interface
cargo test  # Should fail

3. Implement

# Now write the code to pass the tests
vim src/hnsw.rs
cargo test  # Should pass

4. Verify Quality

# Format
cargo fmt

# Lint
cargo clippy -- -D warnings

# Test
cargo test

# Document
cargo doc --no-deps --open

# Benchmark (if applicable)
cargo bench

5. Submit for Review

# Create PR with:
# - Reference to task ID (e.g., "Implements W3.2")
# - Test results
# - Benchmark results (if applicable)
# - Self-review checklist completed

Examples

Good Contribution Example

/// Inserts a vector into the index.
///
/// # Arguments
///
/// * `vector` - A slice of f32 values representing the vector.
///
/// # Returns
///
/// Returns `Ok(VectorId)` if successful, or `Err(Error)` if:
/// - The vector dimension doesn't match the index dimension
/// - The index is at capacity
///
/// # Examples
///
/// ```rust
/// let mut index = HnswIndex::new(128);
/// let vector = vec![0.0; 128];
/// let id = index.insert(&vector)?;
/// ```
pub fn insert(&mut self, vector: &[f32]) -> Result<VectorId, Error> {
    // Validate dimension
    if vector.len() != self.dim {
        return Err(Error::DimensionMismatch {
            expected: self.dim,
            actual: vector.len(),
        });
    }

    // Check capacity
    if self.len >= self.capacity {
        return Err(Error::AtCapacity {
            capacity: self.capacity,
        });
    }

    // Insert logic...
    let id = VectorId::new(self.len);
    self.storage.push(vector);
    self.len += 1;

    Ok(id)
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_insert_valid_vector_succeeds() {
        let mut index = HnswIndex::new(128);
        let vector = vec![0.0; 128];
        let result = index.insert(&vector);
        assert!(result.is_ok());
    }

    #[test]
    fn test_insert_wrong_dimension_fails() {
        let mut index = HnswIndex::new(128);
        let vector = vec![0.0; 64]; // Wrong dimension
        let result = index.insert(&vector);
        assert!(matches!(result, Err(Error::DimensionMismatch { .. })));
    }
}

Questions?

If you have questions about these standards:

  1. Read the architecture docs: docs/architecture/
  2. Check existing code for patterns
  3. Ask in issues or discussions

Remember: Strict standards produce reliable software. Thank you for helping build EdgeVec to "Nvidia Grade" quality!


Release Process

For Maintainers

EdgeVec follows a rigorous release process to prevent post-release issues. Before any release:

  1. Run Pre-Release Validation:

    ./scripts/pre-release-check.sh
  2. Follow the Release Checklist: See docs/RELEASE_CHECKLIST.md for the complete release protocol.

  3. Emergency Procedures: If issues are discovered post-release, see docs/ROLLBACK_PROCEDURES.md for incident response.

Release Documentation

Document Purpose
RELEASE_CHECKLIST.md Step-by-step release protocol
ROLLBACK_PROCEDURES.md Incident response for failed releases
CHANGELOG.md Version history and changes

Document Version: 1.1 Last Updated: 2025-12-15 Status: ACTIVE