Skip to content

cli: detect sql-server via creds file before probing DBs#11115

Open
zook-bot wants to merge 1 commit into
dolthub:mainfrom
zookanalytics:zook/cli-skip-db-probe-for-server-detect
Open

cli: detect sql-server via creds file before probing DBs#11115
zook-bot wants to merge 1 commit into
dolthub:mainfrom
zookanalytics:zook/cli-skip-db-probe-for-server-detect

Conversation

@zook-bot
Copy link
Copy Markdown

Summary

buildLateBinder in go/cmd/dolt/dolt.go previously gated FindAndLoadLocalCreds on an access-mode probe that loaded the target env's DoltDB and, if that didn't resolve, iterated mrEnv and loaded every other database in the data dir. Each load attempts the journal manifest's exclusive file lock with a 100ms timeout (go/store/nbs/file_manifest.go:46) plus the usual manifest/journal I/O, so when a sql-server holds the locks the probe costs ~100ms × N databases just to infer what a single file read would have told us.

Since FindAndLoadLocalCreds is the only way the CLI auto-connects to a local server, hoist it to be the first signal. If sql-server.info exists up-tree, connect; otherwise fall through to local mode. The access-mode probe is removed — it never influenced the local-mode fallback (no creds file → local regardless) and was only gating the creds lookup.

Behavior change

A stale sql-server.info left behind by a crashed server will now surface a connection error instead of silently falling back to local mode. Deleting the stale file restores prior behavior. The trade-off felt fair given the perf win — happy to revisit if reviewers disagree.

Benchmark

100 databases, sql-server running, dolt remote -v from the top-level data dir:

brew dolt 2.0.7 with fix speedup
dolt remote -v (no --use-db, top-level data dir) 10.6 s 0.18 s ~60×
dolt --use-db db_1 remote -v 0.19 s 0.08 s ~2.5×
dolt --use-db db_1 branch -a 0.23 s 0.10 s ~2.3×
dolt --use-db db_1 status 0.22 s 0.08 s ~2.7×

In the worst case both binaries still print "not a valid dolt repository" (CheckEnvIsValid is a separate gate). The new one just fails fast.

Test plan

  • Unit: go test ./cmd/dolt/commands/... — failures present with fix are identical to those on main (pre-existing flakes; verified with git stash)
  • bats remote-cmd.bats — 8/8
  • bats remotes.bats — 70/70
  • bats remotes-push-pull.bats — 33/33
  • bats sql-server-remotesrv.bats — 25/25
  • Manual: local mode (no server) still picks up local engine; remote mode connects and returns the right rows

Follow-up

A companion PR will address the related cost in MultiEnvForDirectory (an eager LoadDoltDB on the rootEnv that fires before this code runs). It accounts for the remaining ~1s when invoking from inside a single DB while a server holds its lock.

buildLateBinder previously gated FindAndLoadLocalCreds on an access-mode
probe that loaded the target env's DoltDB and, if that lookup didn't
resolve, iterated mrEnv and loaded every other database in the data dir.
Each load attempts the journal manifest's exclusive file lock with a
100 ms timeout (file_manifest.go) plus the usual manifest/journal I/O,
so when a sql-server holds the locks the probe costs roughly
100 ms × N databases just to learn something a single file read would
have told us.

Since FindAndLoadLocalCreds is the only way the CLI auto-connects to a
local server anyway, hoist it to be the first signal. If a
sql-server.info exists up-tree, we connect; otherwise we fall through to
local mode. The access-mode probe is removed: it never influenced the
local-mode fallback (no creds file -> local regardless) and was only
gating the creds lookup.

Behavior change: a stale sql-server.info left behind by a crashed server
will now surface a connection error instead of silently falling back to
local mode. Deleting the stale file restores prior behavior.

Benchmark (100 databases, sql-server running, dolt remote -v from the
top-level data dir):

  before: 10.6 s   loads every DB, then fails with "not a valid dolt repo"
  after:  0.18 s   reads sql-server.info, then fails with the same error

The dolt commands branch -a / status see a similar ~2-3x speedup when
invoked with --use-db against a server.
@reltuk
Copy link
Copy Markdown
Contributor

reltuk commented May 29, 2026

Hi @zook-bot. Thank's for your interest in Dolt and for the contribution. The semantics and cleanliness around database loading and locking aren't great currently and they reflect quite a few warts accrued on the path-dependent walk they took from purely-CLI-based-tool to SQL-server to CLI-operations-invoked over SQL, etc.

I have taken a slightly different approach to try to fix the pain points brought up here and in #11116. The two relevant PRs have landed in main:

Would you be able to checkout main and build and see how you find the new behavior?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants