Skip to content

Make File contents lazy#1271

Open
lionel- wants to merge 3 commits into
oak/salsa-28-rtask-unwindfrom
oak/salsa-29-optional-contents
Open

Make File contents lazy#1271
lionel- wants to merge 3 commits into
oak/salsa-28-rtask-unwindfrom
oak/salsa-29-optional-contents

Conversation

@lionel-

@lionel- lionel- commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Branched from #1269
Progress towards #1212

This PR implements ty's lazy model for storing sources. This should help keep memory under control, especially when we integrate library packages with the oak_sources cache.

  • File no longer stores the source string as a Salsa input, instead it includes a revision field, an optional override for when you do want to store a source (e.g. virtual files and open editors), and a lazy source_text() tracked query that reads from disk if needed. Failures return empty sources with an error logged, so the model degrades to empty contents. The query stores at most 128 sources in the Salsa cache, following our storage retention for parse trees.

  • oak_scan reads the file's mtime instead of the contents and store that as revision. The watcher's Changed/Created events bump the revision.

This makes the Salsa model a bit impure. If we miss a FS event, or it's lagging a bit, and re-read from disk at that point, we'll get invalid contents. It becomes possible for different queries to be inconsistent. Should be self-healing as long as we do get the events eventually. Since ty does this, I'm not too worried about this approach. The alternative is that we'll have to store all the sources of the entire dependency tree, which is likely a bit much especially if there are many Ark sessions live in the workspace (e.g. multiple notebooks, shiny apps, etc).

@DavisVaughan

Copy link
Copy Markdown
Contributor

I have not fully read this yet but my brain is also thinking about #1266 (comment) and the fact that we are going to be doing a full workspace index (which means reading contents from disk for every file) on startup. So....is it even lazy anymore?

Is it mostly for non workspace Files?

@DavisVaughan

Copy link
Copy Markdown
Contributor

Ok I also see it is not just about being lazy from a performance POV

It is probably mostly about gaining that lru = 128 bound on the number of File contents we hold in memory at a time!

@DavisVaughan

Copy link
Copy Markdown
Contributor

Should file_index() over in the indexer also get lru = 128 since it runs on all workspace files?

@DavisVaughan DavisVaughan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

Comment thread crates/oak_db/src/file.rs Outdated
Comment on lines 57 to 59
pub(crate) fn contents<'db>(&self, db: &'db dyn ArkDb) -> &'db str {
self.file.contents(db).as_str()
self.file.source_text(db).as_str()
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also make this source_text() for consistency?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I'll do it in the OpenFile PR

Comment on lines +150 to +153
// Clear the editor override so the disk contents becomes the source of
// truth again. This immediately invalidates queries that depend on
// `source_text`, because the latter depends on the override.
file.set_source_text_override(self).to(None);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 i was explicitly looking for this, nice

@lionel- lionel- force-pushed the oak/salsa-28-rtask-unwind branch from a05a82d to 1705d42 Compare June 16, 2026 23:04
@lionel- lionel- force-pushed the oak/salsa-29-optional-contents branch from cc27487 to 2e72224 Compare June 16, 2026 23:12
@lionel-

lionel- commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

Should file_index() over in the indexer also get lru = 128 since it runs on all workspace files?

I don't think so, this needs to stay warm to timely answer workspace queries from the user.

In general I think of LRU mostly as a safeguard for data about installed packages, once they're hooked up to oak_sources. We can revisit if we see issues in large workspaces though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants