fix : read only user should be able to open db on ReadOnly mode #2268
fix : read only user should be able to open db on ReadOnly mode #2268Naelpuissant wants to merge 5 commits intodgraph-io:mainfrom
Conversation
… open with O_RDONLY with ReadOnly option (dgraph-io#2234)
matthewmcneely
left a comment
There was a problem hiding this comment.
Critical: goroutine leak on ReadOnly close
db.go:381-384 is unchanged by the PR:
if !db.opt.InMemory {
db.closers.valueGC = z.NewCloser(1)
go db.vlog.waitOnGC(db.closers.valueGC)
}So when ReadOnly=true && InMemory=false, the GC closer and goroutine still get created, but with the new db.go:541 guard the close path now skips SignalAndWait(). The goroutine is therefore leaked on every db.Close().
Worse, value.go:1080 does vlog.garbageCh <- struct{}{} after the close signal — and the PR makes garbageCh nil in ReadOnly via the init early‑return. Even if the closer were signaled, the goroutine would deadlock forever on a nil‑channel send.
The same hazard exists in the failure path: db.go:484-486 cleanup() calls valueGC.Signal(), which would unblock waitOnGC and then hang on the nil garbageCh.
The commit message claims to "avoid unused GC initialization", but the un‑skipped goroutine spawn contradicts that intent. The cleaner fix is to gate the goroutine spawn itself:
if !db.opt.InMemory && !db.opt.ReadOnly {
db.closers.valueGC = z.NewCloser(1)
go db.vlog.waitOnGC(db.closers.valueGC)
}…and then the new guard in db.close() becomes redundant (the existing cleanup() nil‑check at db.go:484 is already safe). This makes the three code sites (init, spawn, close) consistent: ReadOnly = no GC anywhere.
Other issues
-
RunValueLogGCon a ReadOnly DB (db.go:1234) doesn't reject early. With this PR, calling it bottoms out invalue.go:1085:select { case vlog.garbageCh <- struct{}{}: ... default: return ErrRejected }
Sending to a nil channel in a
selectwithdefaultfalls todefault, so it accidentally returnsErrRejectedrather than panicking. It's correct by luck, not design. Worth either addingif db.opt.ReadOnly { return ErrRejected }(or a dedicated error) at the top ofRunValueLogGC, or documenting the invariant. -
updateDiscardStats(value.go:1101) only short‑circuits onInMemory, then dereferencesvlog.discardStats— which is now nil under ReadOnly. In practice it isn't reached because compactions don't run in ReadOnly mode (gated atdb.go:343), but a defensive|| vlog.opt.ReadOnlyhere would match the symmetric guards added elsewhere and remove a footgun. -
Pre‑existing, related:
db.go:706db.Sync()callsdb.mt.SyncWAL()anddb.mtis nil in ReadOnly perdb.go:329-333. Not in this PR's scope, but un‑skippingTestReadOnlyincreases the chance someone exercises this. Worth a follow‑up similar to #2264 (the InMemory NPE‑on‑sync fix).
Test coverage
TestReadOnly exercises the happy path (concurrent RO opens, rejected RW‑while‑RO open, attempted write) but does not cover:
RunValueLogGCon a ReadOnly DB (would have caught the goroutine/nil‑channel hazard above)Sync()on a ReadOnly DB- Goroutine leak verification (no
goleakcheck, so the leak is invisible in CI)
Adding at least the RunValueLogGC case would have flagged the inconsistency in the GC initialization gating.
Recommended changes before merge
- Gate the
valueGCcloser/goroutine creation atdb.go:381on!ReadOnlyas well, and revert thedb.go:541change as redundant. - Add an early
ErrRejected(or a dedicatedErrReadOnly) inRunValueLogGCfor ReadOnly DBs. - Optional: extend
TestReadOnlyto calldb.RunValueLogGC(0.5)anddb.Sync()and assert they don't panic.
|
@matthewmcneely Thanks for your reply, I'm working on it !
Since cleanup is only called on Open, how the close guard would be redundant ? |
19584f3 to
4986387
Compare
Whoops, I mistakenly thought cleanup() and db.close() were the same thing. If you gate the spawn with: and then replace your Then I think that seems to be cleanest. |
…e ReadOnly guards on : - Sync - RunValueLogGC - Flatten - BanNamespace - GetSequence
4986387 to
8100d3c
Compare
|
@matthewmcneely Ok I see, it should be fine now. |
matthewmcneely
left a comment
There was a problem hiding this comment.
The PR description's repro is chmod -w ./db/* (strip write from the files), but the test instead does chmod 0o500 on the directory. That blocks file create/delete but leaves the existing data files writable, so the new O_RDONLY flag at [value.go:580-582] isn't actually exercised — O_RDWR would still succeed on those files. The test does cover the empty-vlog-deletion guard and the init() early-return (both need the dir to be unwritable), but a follow-up chmod 0o400 on each *.vlog would directly exercise the O_RDONLY change.
| _, err = txn3.Get([]byte("key1")) | ||
| require.NoError(t, err) | ||
| require.NoError(t, kv3.Close()) | ||
| require.NoError(t, txn3.Commit()) |
There was a problem hiding this comment.
Swap 1830 with 1829, at present commit is called after close, which I guess works because txn3 performed no writes.
| require.NoError(t, kv2.Close()) | ||
|
|
||
| // Test os permission read-only open | ||
| // We don't directly chmod the directory to still be able to aquire the lock |
Description
Should fix #2234
Help appreciated 😄
Step to reproduce :
chmod -w ./db/*WithReadOnlymodeChecklist