Migrate the workspace indexer to ArkFile and make it a query#1266
Conversation
| #[salsa::tracked(returns(ref))] | ||
| pub fn workspace_files(db: &dyn Db) -> Vec<File> { | ||
| let mut files: Vec<File> = Vec::new(); | ||
|
|
||
| for &root in db.live_roots() { | ||
| match root { | ||
| LiveRoot::Workspace(r) => collect_root_files(db, &mut files, r), | ||
| LiveRoot::Library(_) => {}, | ||
| LiveRoot::Orphan(orphan) => files.extend(orphan.files(db).iter().copied()), | ||
| } | ||
| } |
There was a problem hiding this comment.
Hmm, orphans?
If I open ~/Desktop/foo.R and ~/Desktop/bar.R then I'm not sure I want symbols in foo.R to be seen by bar.R.
There was a problem hiding this comment.
And that does seem like what happens now
Screen.Recording.2026-06-12.at.11.51.38.AM.mov
There was a problem hiding this comment.
Yes diagnostics and the workspace indexer are legacy Ark handlers, they don't use Oak semantics.
The plan is to first expose users to stricter semantics with goto-def, find-refs, and rename, and then tighten diagnostics.
| if salsa::plumbing::current_revision(&self.world.db) != old_revision { | ||
| diagnostics_refresh_all(self.world.clone()); | ||
| } |
There was a problem hiding this comment.
Stinks that we have to use an internal method for this
I will note that rust-analyzer doesn't use it like this for diagnostics. It uses it for some gc related things. Diagnostics are updated based on
let project_or_mem_docs_changed =
became_quiescent || state_changed || memdocs_added_or_removed;
if project_or_mem_docs_changed
&& !self.config.text_document_diagnostic()
&& self.config.publish_diagnostics(None)
{
self.update_diagnostics();
}so they do track state changes manually for this purpose
There was a problem hiding this comment.
I'm not too worried about it, and indeed we can work around if it's removed.
| @@ -1067,116 +989,7 @@ fn refresh_diagnostics(task: RefreshDiagnosticsTask) -> RefreshDiagnosticsResult | |||
|
|
|||
| /// Run `f`, swallowing a salsa cancellation as `None`. Any other panic propagates. | |||
| fn catch_cancellation<T>(f: impl FnOnce() -> T) -> Option<T> { | |||
There was a problem hiding this comment.
log::trace!("Salsa query {cancelled}"); was removed. we dont want to know when it cancels? i know it happens naturally, but is it like super noisy or something?
| fn file_index(db: &dyn ArkDb, file: File) -> FileIndex { | ||
| let tree = file.tree_sitter(db); | ||
| let contents = file.contents(db).as_str(); |
There was a problem hiding this comment.
Is this kind of expensive to force up front on every workspace file?
Like, we have to read and store the contents and parse tree of EVERY workspace file up front in the initial round of indexing, that sounds very expensive memory wise
There was a problem hiding this comment.
It's bounded by our LRU eviction settings
There was a problem hiding this comment.
Also worth pointing out that opening a file would (likely) warm up the whole workspace anyway for diagnostics. So it's really about the empty workspace case, so that the user can use cmd + t to navigate to a symbol without delay.
There was a problem hiding this comment.
So it's really about the empty workspace case, so that the user can use cmd + t to navigate to a symbol without delay.
I do this with rust fairly often too, so makes sense
ac1ede7 to
05ab753
Compare
0ec831d to
4cb3904
Compare
4b1e9b3 to
112e465
Compare
8a00f75 to
26f55c8
Compare
Branched from #1264
Progress towards #1212
With this PR the Ark indexer that currently powers legacy workspace symbols, completions, and diagnostics is migrated to sources from the Oak DB, instead of walking the disk and watching.
Benefits:
This last property is quite nice when changing git branches or editing files with an agent. The changes used to be ignored until the file was opened, but now the IDE reacts instantly.
Screen.Recording.2026-06-11.at.21.05.27.mov
Building a file index is now a Salsa query over
oak_db::File. Making it a query per file is especially nice during initial workspace scans where Salsa cancellations don't invalidate finished work per file, they remain cached across edits without having to finish building the entire index. This replaces the old global mutable index.New
workspace_files()helper that collects, as the name says, all workspace files. Unlikeall_files(), library roots are excluded. This is whatindexer::find()andmap()use to walk over files.Dropped the combined indexer + diagnostics queue. Now diagnostics query the index, so the ordering we used to enforce by hand (finish indexing, then run diagnostics) just falls out of the pull model. I've kept diagnostics batching and deduping as these are still useful.
Now that the indexer is a query, it's fully lazy. That's a good property in principle but can lead to user-visible delays when requesting workspace symbols in an empty workspace on startup. If a file is opened, diagnostics will query the index and warm it up, but if none are opened, nothing drives that.
To fix this issue, we now warm up the index manually after the initial workspace scan has settled. This mirrors how Rust-Analyser warms up its own general-purpose queries after the language server reaches "quiescence" (after the initial scan, after a workspace refresh caused by changing Cargo.toml, etc). There is no such mechanism in ty, it's fully on-demand. I opted to go the r-a way as it's closer to what we had before.
Diagnostics triggering is now simplified by looking at
salsa::plumbing::current_revision()from the main loop. Rust-Analyser uses it too, ty doesn't (they have a different pull model for diagnostics that I didn't investigate much yet, our push model is closer to R-A's).The indexer was the only consumer of the rename events. We now watch files purely through
DidChangeWatchedFiles, which is consistent which is also what ty and r-a do.The one event we might want to subscribe later on is
WillRename. This would be useful to e.g. updatesource()paths when a file is renamed.