Skip to content

database: treat database as read-only after creation and use shared locks#315

Open
fischeti wants to merge 6 commits into
masterfrom
shared-locks
Open

database: treat database as read-only after creation and use shared locks#315
fischeti wants to merge 6 commits into
masterfrom
shared-locks

Conversation

@fischeti

@fischeti fischeti commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

In #307 we introduced database locks and separaring database from checkout in order to share the database as a cache cross projects. This is also relevant in the context of CI, since the git databases are only fetched once and can then be reused. However, the exclusive locks on the databases essentially prevent any kind of parallelization and every invocation of bender becomes serialized across CI jobs.

This PR aims to relax the locks to use exclusive locks only when writing to the database, and use shared locks when reading from it.

The flow of fetching and checking out a git repository was slightly adapted to reduce the amount of write operations on a shared database:

  • git init and git fetch [<rev>] are write operations and still acquire an exclusive lock
  • The creation of a temporary tag tmp-<hash> in the database and the later checkout with git clone --branch tmp-<hash> has been replaced with git clone --shared --no-checkout && git checkout <rev>
  • The database is configured to disable git gc since revisions might not be tracked anymore by refs i.e. the previous tmp-<hash> tags. This is maybe overly catious.

Furthermore, file locking is done more fine grained. Instead of locking the whole git_database call exclusively, only the part that requires writes (git init and git fetch) acquire an exclusive lock (if database is not ready yet).

@fischeti fischeti force-pushed the shared-locks branch 2 times, most recently from a6ac0ee to 9a16348 Compare June 16, 2026 08:31
@fischeti fischeti marked this pull request as ready for review June 16, 2026 08:46
@fischeti

fischeti commented Jun 16, 2026

Copy link
Copy Markdown
Contributor Author

Actually, I am not sure anymore if the locks are needed. I have the impression that git should be able to handle concurrency itself of a shared git database i.e. objects are immutable and do not need to be locked, while other things like refs seem to have fine-grained lock files:

https://stackoverflow.com/questions/19962024/locking-strategy-of-git-to-achieve-concurrency
https://stackoverflow.com/questions/750765/concurrency-in-a-git-repo-on-a-network-shared-folder#answer-751026

Edit: I guess the git init cannot be locked properly by git and could cause problems and apparently git also errors out when failing to acquire locks, so it could make sense after all🤓

@micprog

micprog commented Jun 16, 2026

Copy link
Copy Markdown
Member

Actually, I am not sure anymore if the locks are needed. I have the impression that git should be able to handle concurrency itself of a shared git database i.e. objects are immutable and do not need to be locked, while other things like refs seem to have fine-grained lock files:

https://stackoverflow.com/questions/19962024/locking-strategy-of-git-to-achieve-concurrency https://stackoverflow.com/questions/750765/concurrency-in-a-git-repo-on-a-network-shared-folder#answer-751026

I think we should preserve locks, and I like your solution to speed things up (still need to review the details). The main issue I'm attempting to solve is with multiple repos fetching the same bare ref. Concurrent fetches racing on the same ref would produce fatal: cannot lock ref ...: File exists from one of them. Git handles it cleanly but bender has no retry layer, so we'd just propagate that as a CI failure. Rare, but the failure mode is "this CI job randomly fails once a week". Keeping locks fixes this.

@fischeti

Copy link
Copy Markdown
Contributor Author

Yes, you are right🤓 I just realized that git locks are not blocking but just error out

micprog and others added 3 commits June 16, 2026 21:31
Tim's commit e3e9241 disables auto-gc only when bender creates a brand
new bare database. Databases populated by bender 0.31.x or earlier still
run with git's default gc.auto=6700, so an auto-gc could fire from a
fetch and prune objects referenced by a concurrent `git clone --shared`
checkout via alternates.

Apply `git config gc.auto 0` in the Fetch branch too: idempotent, runs
under the same exclusive lock, and ensures every database the new
bender ever fetches into is safe -- not only ones it initialized itself.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The shared db lock in `checkout_git` is keyed by the dependency URL and
shared with other projects, so it intentionally does not serialize two
bender invocations that resolve to the same working-tree path (e.g. two
`bender checkout` runs against the same project root). Shared lockers
don't block each other, so both invocations could interleave
`remove_dir_all`, `git clone --shared`, and `git checkout` on the same
directory and corrupt it.

Add a per-checkout-path exclusive lock as the first I/O in
`checkout_git`. The lock file lives as `<path>.lock` -- a sibling of
the checkout dir, so it stays with the project and is independent of
where the bare database is stored. Unrelated deps and the same dep
across different projects still resolve to distinct paths and run in
parallel.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tim's e3e9241 dropped the bender-tmp- filter from the
RevisionNotOnUpstream check on the assumption that the synthetic tags
were no longer created. They still are -- `dependency_versions`
creates them during update operations to make commits reachable for
`git rev-list --all`, and databases populated by bender 0.31.x also
contain leftover tmp tags from the old checkout flow. Without the
filter the warning was silently suppressed whenever such a tag pointed
at the revision being checked out, even when no real upstream ref
actually tracked the commit.

Restore the explicit filter. Marked as a TODO -- the longer-term fix
is to either stop creating these tags or sweep them after use; until
then the filter keeps the warning correct on every database shape
bender can encounter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@micprog micprog left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some additional commits on top to add additional protections and improve reporting. Offline we discussed cleaning up some of the sloppy comments, other than that LGTM 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants