Skip to content

fix : read only user should be able to open db on ReadOnly mode #2268

Open
Naelpuissant wants to merge 5 commits intodgraph-io:mainfrom
Naelpuissant:fix-read-only
Open

fix : read only user should be able to open db on ReadOnly mode #2268
Naelpuissant wants to merge 5 commits intodgraph-io:mainfrom
Naelpuissant:fix-read-only

Conversation

@Naelpuissant
Copy link
Copy Markdown

@Naelpuissant Naelpuissant commented Mar 23, 2026

Description

Should fix #2234
Help appreciated 😄
Step to reproduce :

  • Open a db and fill it with some data
  • run chmod -w ./db/*
  • Open it again in WithReadOnly mode

Checklist

  • Code compiles correctly and linting passes locally
  • Tests added for new functionality, or regression tests for bug fixes added as applicable

@Naelpuissant Naelpuissant requested a review from a team as a code owner March 23, 2026 14:09
Copy link
Copy Markdown
Collaborator

@matthewmcneely matthewmcneely left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical: goroutine leak on ReadOnly close

db.go:381-384 is unchanged by the PR:

if !db.opt.InMemory {
    db.closers.valueGC = z.NewCloser(1)
    go db.vlog.waitOnGC(db.closers.valueGC)
}

So when ReadOnly=true && InMemory=false, the GC closer and goroutine still get created, but with the new db.go:541 guard the close path now skips SignalAndWait(). The goroutine is therefore leaked on every db.Close().

Worse, value.go:1080 does vlog.garbageCh <- struct{}{} after the close signal — and the PR makes garbageCh nil in ReadOnly via the init early‑return. Even if the closer were signaled, the goroutine would deadlock forever on a nil‑channel send.

The same hazard exists in the failure path: db.go:484-486 cleanup() calls valueGC.Signal(), which would unblock waitOnGC and then hang on the nil garbageCh.

The commit message claims to "avoid unused GC initialization", but the un‑skipped goroutine spawn contradicts that intent. The cleaner fix is to gate the goroutine spawn itself:

if !db.opt.InMemory && !db.opt.ReadOnly {
    db.closers.valueGC = z.NewCloser(1)
    go db.vlog.waitOnGC(db.closers.valueGC)
}

…and then the new guard in db.close() becomes redundant (the existing cleanup() nil‑check at db.go:484 is already safe). This makes the three code sites (init, spawn, close) consistent: ReadOnly = no GC anywhere.

Other issues

  • RunValueLogGC on a ReadOnly DB (db.go:1234) doesn't reject early. With this PR, calling it bottoms out in value.go:1085:

    select {
    case vlog.garbageCh <- struct{}{}:
        ...
    default:
        return ErrRejected
    }

    Sending to a nil channel in a select with default falls to default, so it accidentally returns ErrRejected rather than panicking. It's correct by luck, not design. Worth either adding if db.opt.ReadOnly { return ErrRejected } (or a dedicated error) at the top of RunValueLogGC, or documenting the invariant.

  • updateDiscardStats (value.go:1101) only short‑circuits on InMemory, then dereferences vlog.discardStats — which is now nil under ReadOnly. In practice it isn't reached because compactions don't run in ReadOnly mode (gated at db.go:343), but a defensive || vlog.opt.ReadOnly here would match the symmetric guards added elsewhere and remove a footgun.

  • Pre‑existing, related: db.go:706 db.Sync() calls db.mt.SyncWAL() and db.mt is nil in ReadOnly per db.go:329-333. Not in this PR's scope, but un‑skipping TestReadOnly increases the chance someone exercises this. Worth a follow‑up similar to #2264 (the InMemory NPE‑on‑sync fix).

Test coverage

TestReadOnly exercises the happy path (concurrent RO opens, rejected RW‑while‑RO open, attempted write) but does not cover:

  • RunValueLogGC on a ReadOnly DB (would have caught the goroutine/nil‑channel hazard above)
  • Sync() on a ReadOnly DB
  • Goroutine leak verification (no goleak check, so the leak is invisible in CI)

Adding at least the RunValueLogGC case would have flagged the inconsistency in the GC initialization gating.

Recommended changes before merge

  1. Gate the valueGC closer/goroutine creation at db.go:381 on !ReadOnly as well, and revert the db.go:541 change as redundant.
  2. Add an early ErrRejected (or a dedicated ErrReadOnly) in RunValueLogGC for ReadOnly DBs.
  3. Optional: extend TestReadOnly to call db.RunValueLogGC(0.5) and db.Sync() and assert they don't panic.

@Naelpuissant
Copy link
Copy Markdown
Author

Naelpuissant commented Apr 30, 2026

@matthewmcneely Thanks for your reply, I'm working on it !

the new guard in db.close() becomes redundant (the existing cleanup() nil‑check at db.go:484 is already safe).

Since cleanup is only called on Open, how the close guard would be redundant ?

@Naelpuissant Naelpuissant marked this pull request as draft April 30, 2026 11:59
@matthewmcneely
Copy link
Copy Markdown
Collaborator

@matthewmcneely Thanks for your reply, I'm working on it !

the new guard in db.close() becomes redundant (the existing cleanup() nil‑check at db.go:484 is already safe).

Since cleanup is only called on Open, how the close guard would be redundant ?

Whoops, I mistakenly thought cleanup() and db.close() were the same thing. If you gate the spawn with:

- if !db.opt.InMemory {
+ if !db.opt.InMemory && !db.opt.ReadOnly {
      db.closers.valueGC = z.NewCloser(1)
      go db.vlog.waitOnGC(db.closers.valueGC)
  }

and then replace your !ReadOnly line 541 with a nil‑check of valeuGC:

- if !db.opt.InMemory && !db.opt.ReadOnly {
-     // Stop value GC first.
-     db.closers.valueGC.SignalAndWait()
+ if db.closers.valueGC != nil {
+     // Stop value GC first.
+     db.closers.valueGC.SignalAndWait()
  }

Then I think that seems to be cleanest.

…e ReadOnly guards on :

- Sync
- RunValueLogGC
- Flatten
- BanNamespace
- GetSequence
@Naelpuissant Naelpuissant marked this pull request as ready for review May 1, 2026 09:07
@Naelpuissant
Copy link
Copy Markdown
Author

@matthewmcneely Ok I see, it should be fine now.
I've also added more guards on Flatten, GetSequence and BanNamespace since those methods might mutate the DB state.

Copy link
Copy Markdown
Collaborator

@matthewmcneely matthewmcneely left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description's repro is chmod -w ./db/* (strip write from the files), but the test instead does chmod 0o500 on the directory. That blocks file create/delete but leaves the existing data files writable, so the new O_RDONLY flag at [value.go:580-582] isn't actually exercised — O_RDWR would still succeed on those files. The test does cover the empty-vlog-deletion guard and the init() early-return (both need the dir to be unwritable), but a follow-up chmod 0o400 on each *.vlog would directly exercise the O_RDONLY change.

Comment thread db_test.go
_, err = txn3.Get([]byte("key1"))
require.NoError(t, err)
require.NoError(t, kv3.Close())
require.NoError(t, txn3.Commit())
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swap 1830 with 1829, at present commit is called after close, which I guess works because txn3 performed no writes.

Comment thread db_test.go
require.NoError(t, kv2.Close())

// Test os permission read-only open
// We don't directly chmod the directory to still be able to aquire the lock
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "acquire"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

Support opening database with read-only file permissions

2 participants