Skip to content

Support progress callbacks and incremental indexing in search_autonomous #1

@rupurt

Description

@rupurt

Context

Paddles wraps sift::Sift::search_autonomous() to gather workspace context during recursive planning turns. This call blocks for 30-60+ seconds during workspace indexing and multi-step graph search. Today, paddles can only show elapsed-time heartbeats because sift exposes no progress reporting mechanism, and sift re-indexes the full workspace on every search even when only a few files have changed.

Two improvements would significantly improve the end-user experience in paddles and any other sift consumer.

Request 1: Progress callbacks for search_autonomous

Problem

When sift is indexing or searching, the calling application has no way to know what phase the operation is in or how far along it is. Paddles currently shows a generic "Searching — 4s" heartbeat that tells the user nothing useful.

Proposed API

Add a search_autonomous_with_progress method that accepts a std::sync::mpsc::Sender<SearchProgress> for progress updates:

pub type ProgressSender = std::sync::mpsc::Sender<SearchProgress>;

impl Sift {
    /// Existing API — unchanged.
    pub fn search_autonomous(
        &self,
        request: AutonomousSearchRequest,
    ) -> Result<AutonomousSearchResponse> { /* ... */ }

    /// Search with optional progress reporting.
    pub fn search_autonomous_with_progress(
        &self,
        request: AutonomousSearchRequest,
        progress: ProgressSender,
    ) -> Result<AutonomousSearchResponse> { /* ... */ }
}

Progress phases

#[derive(Clone, Debug)]
pub enum SearchProgress {
    Indexing { files_indexed: usize, total_files: Option<usize> },
    Embedding { files_embedded: usize, total_files: Option<usize> },
    Planning { step_index: usize, step_limit: usize, action: Option<String> },
    Retrieving { step_index: usize, query: Option<String> },
    Completed { total_steps: usize, retained_artifacts: usize },
}

Constraints

  • MUST use std::sync::mpsc::Sender (not tokio) to keep sift runtime-agnostic
  • MUST keep search_autonomous working without changes (additive API)
  • MUST NOT add tokio as a dependency to the sift crate
  • SHOULD emit progress at most every 2 seconds to avoid overhead

What this enables in paddles

With phase data, paddles can show contextual progress in its TUI:

  • "Indexing 42/128 files..."
  • "Planning step 3/5: refining query..."
  • "Retrieving results..."

Instead of the current generic "Searching — 12s" heartbeat.

Request 2: Incremental / partial indexing

Problem

search_autonomous appears to re-index the full workspace on every call, even when only a few files have changed since the last search. On large workspaces this adds 10-30 seconds of indexing time before the actual search begins.

Proposed behavior

  • Persist the index state between calls (already partially done via the sift builder cache)
  • On subsequent searches, only re-index files whose mtime or content hash has changed since the last index
  • Expose the indexing decision in the progress callback (e.g., "Indexing 3/3 changed files" vs "Indexing 128/128 files")

This would make repeated searches within the same workspace session near-instant for the indexing phase.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions