[dbnode] Add index snapshotting#2556
Conversation
2eb19ed to
1425f4f
Compare
ryanhall07
left a comment
There was a problem hiding this comment.
can you elaborate more on "why we need this" in the PR description?
| } | ||
|
|
||
| // Read index snapshot files | ||
| indexSnapshotFiles, err := s.indexSnapshotFilesFn( |
There was a problem hiding this comment.
are we going to be reading snapshot files twice now (like the commit log)?
There was a problem hiding this comment.
Yea, unfortunately :/. I think I can piggy back off of nate's work to cache read info file results between bootstrap runs to reduce this work after it lands: https://github.com/m3db/m3/pull/2598/files
There was a problem hiding this comment.
I think this is also true for calls to ReadIndexInfofiles, ya? Either way, probably worth doing in a follow up PR so as to not expand the size of this one.
| // NB(bodu): We don't actually bootstrap snapshot data in the fs bootstrapper | ||
| // (we do this in the commitlog bootstrapper) but we do want to subtract the shard time | ||
| // ranges fulfilled by index snapshots. | ||
| r, err = s.bootstrapFromIndexSnapshots( |
There was a problem hiding this comment.
I don't follow why you need to read index snapshots in both bootstrapers
| QueryStats() stats.QueryStats | ||
| } | ||
|
|
||
| // BlockStateSnapshot represents a snapshot of a 's block state at |
| snapshot BootstrappedBlockStateSnapshot | ||
| } | ||
|
|
||
| // NewBlockStateSnapshot constructs a new NewBlockStateSnapshot. |
There was a problem hiding this comment.
constructs a new BlockStateSnapshot
f20392a to
110de93
Compare
8b0ec9b to
41781b5
Compare
There was a problem hiding this comment.
Can we add method "SegmentState" to the interface "IndexSegmentFileSetWriter"? Would rather explicitly use the default value returned by the segment file set writer rather than use it by default if we can't upcast.
4e3f709 to
da3bf91
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #2556 +/- ##
========================================
- Coverage 72.1% 72.1% -0.1%
========================================
Files 1099 1099
Lines 99581 99901 +320
========================================
+ Hits 71863 72079 +216
- Misses 22807 22885 +78
- Partials 4911 4937 +26
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
18e3fb0 to
49a8ba1
Compare
5ac62a8 to
d6a26ab
Compare
|
Note: With new transparent indexing, LoadBlock should not transparently index when this lands. Also, would be good to have in documentation the snapshot metadata files and how they all relate to each other. |
| // Since we are doing this after a successful flush, there is a small window where we can crash and | ||
| // run into the above case. However, we'd rather deal w/ a one time extra mem than long bootstrap times | ||
| // if we write an empty snapshot to disk and the node crashes before a successful warm flush. | ||
| func (m *flushManager) writeEmptyIndexSnapshots( |
There was a problem hiding this comment.
@notbdu as discussed, is it possible at all to avoid writing empty index snapshots now we are tracking everything with snapshot ID?
There was a problem hiding this comment.
So the issue of potentially bootstrapping a "stale" snapshot if the node crashes between a successful warm flush and the next snapshot did not go away. In this case, since we've already warm flushed, the loaded snapshot data would never get evicted.
Although... after revisiting this logic it makes more sense to me to just check during index bootstrapping whether or not the index block is sealed and we've successfully warm flushed and ignore any warm snapshots for that block.
There was a problem hiding this comment.
What do you think of this approach:
for blockStart, blockResults := range bootstrapResults {
block, err := i.ensureBlockPresentWithRLock(blockStart.ToTime())
if err != nil { // should never happen
multiErr = multiErr.Add(i.unableToAllocBlockInvariantError(err))
continue
}
// NB(bodu): For warm snapshots, we need to make sure that we haven't already successfully warm
// flushed that block start. We can run into this case when the node crashes between a successful warm
// flush and the next index snapshot.
if _, ok := blockResults.GetBlock(idxpersist.SnapshotWarmIndexVolumeType); ok {
if block.IsSealed() && i.hasIndexWarmFlushedToDisk(infoFiles, blockStart.ToTime()) {
// If we have warm snapshots and the block has been warm flushed already,
// we just discard the warm snapshot data.
blockResults.DeleteBlock(idxpersist.SnapshotWarmIndexVolumeType)
}
}
| Close IndexCloser | ||
| } | ||
|
|
||
| // PreparedIndexSnapshotPersist is an object that wraps holds a persist function and a closer. |
There was a problem hiding this comment.
nit: comment is a duplicate of the one above. Probably worth pointing out the difference in one of the two
| // We may have less we need to read | ||
| shardTimeRanges = shardTimeRanges.Copy() | ||
| shardTimeRanges.Subtract(r.fulfilled) | ||
| log.Println("subtract ->", r.fulfilled) |
There was a problem hiding this comment.
Looks like this might be debug logging you can remove?
| return bootstrap.NamespaceResults{}, err | ||
| } | ||
| fsOpts := s.opts.CommitLogOptions().FilesystemOptions() | ||
| indexInfoFiles := s.readIndexInfoFilesFn(fsOpts.FilePathPrefix(), ns.Metadata.ID(), |
There was a problem hiding this comment.
nit: think you already have variables for fsOpts and filePathPrefix created outside of this for loop
| fsOpts := s.opts.CommitLogOptions().FilesystemOptions() | ||
| indexInfoFiles := s.readIndexInfoFilesFn(fsOpts.FilePathPrefix(), ns.Metadata.ID(), | ||
| fsOpts.InfoReaderBufferSize(), persist.FileSetSnapshotType) | ||
| if err != nil { |
There was a problem hiding this comment.
Can remove this if block
| shardTimeRanges := ns.DataRunOptions.ShardTimeRanges | ||
| dataResult = shardTimeRanges.ToUnfulfilledDataResult() | ||
| } | ||
| var indexResult result.IndexBootstrapResult |
There was a problem hiding this comment.
Now each bootstrap.NamespaceResult will have the same IndexResult from line 203. Is that ok?
There was a problem hiding this comment.
Ah, good catch the index results should be per ns.
|
|
||
| const ( | ||
| // Volume index defaults to -1 or unset. | ||
| defaultVolumeIndex = -1 |
There was a problem hiding this comment.
I think we default to 0 for data files. Should we do the same here or is it somewhat apples and oranges?
There was a problem hiding this comment.
Sorry, the naming here is confusing I changed it to the following:
// Volume index of -1 means unset.
volumeIndexUnset = -1
The default is 0 for index data too.
…string as key in index results.
…e on snapshots case in addition to the split across snapshots/commitlogs case. Add comments.
a80540e to
dd5edea
Compare
e130b7e to
3cf877e
Compare
What this PR does / why we need it:
Adds index snapshotting. This will reduce the amount of bootstrapping time since we won't need to build the entire index segment.
Also implements initial work for eagerly offloading terminal or frozen index segments from memory.
Special notes for your reviewer:
Does this PR introduce a user-facing and/or backwards incompatible change?:
Does this PR require updating code package or user-facing documentation?: