Skip to content

Migrate the workspace indexer to ArkFile and make it a query#1266

Merged
lionel- merged 14 commits into
mainfrom
oak/salsa-25-rewire-indexer
Jun 19, 2026
Merged

Migrate the workspace indexer to ArkFile and make it a query#1266
lionel- merged 14 commits into
mainfrom
oak/salsa-25-rewire-indexer

Conversation

@lionel-

@lionel- lionel- commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Branched from #1264
Progress towards #1212

With this PR the Ark indexer that currently powers legacy workspace symbols, completions, and diagnostics is migrated to sources from the Oak DB, instead of walking the disk and watching.

Benefits:

  • Single source of truth
  • Less memory consumption
  • Reactivity to changes for workspace files that are not opened in the editor

This last property is quite nice when changing git branches or editing files with an agent. The changes used to be ignored until the file was opened, but now the IDE reacts instantly.

Screen.Recording.2026-06-11.at.21.05.27.mov
  • Building a file index is now a Salsa query over oak_db::File. Making it a query per file is especially nice during initial workspace scans where Salsa cancellations don't invalidate finished work per file, they remain cached across edits without having to finish building the entire index. This replaces the old global mutable index.

  • New workspace_files() helper that collects, as the name says, all workspace files. Unlike all_files(), library roots are excluded. This is what indexer::find() and map() use to walk over files.

  • Dropped the combined indexer + diagnostics queue. Now diagnostics query the index, so the ordering we used to enforce by hand (finish indexing, then run diagnostics) just falls out of the pull model. I've kept diagnostics batching and deduping as these are still useful.

  • Now that the indexer is a query, it's fully lazy. That's a good property in principle but can lead to user-visible delays when requesting workspace symbols in an empty workspace on startup. If a file is opened, diagnostics will query the index and warm it up, but if none are opened, nothing drives that.

    To fix this issue, we now warm up the index manually after the initial workspace scan has settled. This mirrors how Rust-Analyser warms up its own general-purpose queries after the language server reaches "quiescence" (after the initial scan, after a workspace refresh caused by changing Cargo.toml, etc). There is no such mechanism in ty, it's fully on-demand. I opted to go the r-a way as it's closer to what we had before.

  • Diagnostics triggering is now simplified by looking at salsa::plumbing::current_revision() from the main loop. Rust-Analyser uses it too, ty doesn't (they have a different pull model for diagnostics that I didn't investigate much yet, our push model is closer to R-A's).

  • The indexer was the only consumer of the rename events. We now watch files purely through DidChangeWatchedFiles, which is consistent which is also what ty and r-a do.

    The one event we might want to subscribe later on is WillRename. This would be useful to e.g. update source() paths when a file is renamed.

Comment thread crates/oak_db/src/db.rs
Comment on lines +135 to 145
#[salsa::tracked(returns(ref))]
pub fn workspace_files(db: &dyn Db) -> Vec<File> {
let mut files: Vec<File> = Vec::new();

for &root in db.live_roots() {
match root {
LiveRoot::Workspace(r) => collect_root_files(db, &mut files, r),
LiveRoot::Library(_) => {},
LiveRoot::Orphan(orphan) => files.extend(orphan.files(db).iter().copied()),
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, orphans?

If I open ~/Desktop/foo.R and ~/Desktop/bar.R then I'm not sure I want symbols in foo.R to be seen by bar.R.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And that does seem like what happens now

Screen.Recording.2026-06-12.at.11.51.38.AM.mov

@lionel- lionel- Jun 16, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes diagnostics and the workspace indexer are legacy Ark handlers, they don't use Oak semantics.

The plan is to first expose users to stricter semantics with goto-def, find-refs, and rename, and then tighten diagnostics.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay

Comment on lines +498 to +500
if salsa::plumbing::current_revision(&self.world.db) != old_revision {
diagnostics_refresh_all(self.world.clone());
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stinks that we have to use an internal method for this

I will note that rust-analyzer doesn't use it like this for diagnostics. It uses it for some gc related things. Diagnostics are updated based on

            let project_or_mem_docs_changed =
                became_quiescent || state_changed || memdocs_added_or_removed;
            if project_or_mem_docs_changed
                && !self.config.text_document_diagnostic()
                && self.config.publish_diagnostics(None)
            {
                self.update_diagnostics();
            }

so they do track state changes manually for this purpose

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too worried about it, and indeed we can work around if it's removed.

@@ -1067,116 +989,7 @@ fn refresh_diagnostics(task: RefreshDiagnosticsTask) -> RefreshDiagnosticsResult

/// Run `f`, swallowing a salsa cancellation as `None`. Any other panic propagates.
fn catch_cancellation<T>(f: impl FnOnce() -> T) -> Option<T> {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log::trace!("Salsa query {cancelled}"); was removed. we dont want to know when it cancels? i know it happens naturally, but is it like super noisy or something?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's noisy

Comment thread crates/ark/src/lsp/symbols.rs Outdated
Comment thread crates/ark/src/lsp/symbols.rs Outdated
Comment on lines +105 to +107
fn file_index(db: &dyn ArkDb, file: File) -> FileIndex {
let tree = file.tree_sitter(db);
let contents = file.contents(db).as_str();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this kind of expensive to force up front on every workspace file?

Like, we have to read and store the contents and parse tree of EVERY workspace file up front in the initial round of indexing, that sounds very expensive memory wise

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's bounded by our LRU eviction settings

@lionel- lionel- Jun 16, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also worth pointing out that opening a file would (likely) warm up the whole workspace anyway for diagnostics. So it's really about the empty workspace case, so that the user can use cmd + t to navigate to a symbol without delay.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it's really about the empty workspace case, so that the user can use cmd + t to navigate to a symbol without delay.

I do this with rust fairly often too, so makes sense

@lionel- lionel- force-pushed the oak/salsa-24-rewire-handlers branch from ac1ede7 to 05ab753 Compare June 16, 2026 16:06
@lionel- lionel- force-pushed the oak/salsa-25-rewire-indexer branch from 0ec831d to 4cb3904 Compare June 16, 2026 16:42
@lionel- lionel- force-pushed the oak/salsa-24-rewire-handlers branch from 4b1e9b3 to 112e465 Compare June 19, 2026 14:01
Base automatically changed from oak/salsa-24-rewire-handlers to main June 19, 2026 14:01
@lionel- lionel- force-pushed the oak/salsa-25-rewire-indexer branch from 8a00f75 to 26f55c8 Compare June 19, 2026 14:01
@lionel- lionel- merged commit c9d1c7d into main Jun 19, 2026
17 checks passed
@lionel- lionel- deleted the oak/salsa-25-rewire-indexer branch June 19, 2026 14:02
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 19, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants