WIP [dbnode] Refactor posting list operations to be lazily evaluated and zero copy#2791
WIP [dbnode] Refactor posting list operations to be lazily evaluated and zero copy#2791robskillington wants to merge 119 commits intomasterfrom
Conversation
| var ( | ||
| union = make([]postings.List, 0, len(s.searchers)) | ||
| ) |
There was a problem hiding this comment.
nit: don't need var block here
arnikola
left a comment
There was a problem hiding this comment.
I had a cursory glance over but didn't go super deep on it, figuring a lot of it was very similar to Pilosa's implementation; if this is a bad assumption can go back with more discerning review
| "github.com/uber-go/tally" | ||
| ) | ||
|
|
||
| // LatencyBuckets are a set of latency buckets useful for measuring things. |
| func (i *multiBitmap) IsEmpty() bool { | ||
| iter := i.Iterator() | ||
| hasAny := iter.Next() | ||
| _ = iter.Err() |
There was a problem hiding this comment.
Hm, should this return false if iter.Err() is non-nil? Otherwise no need to call Err
| type containerIterator interface { | ||
| NextContainer() bool | ||
| ContainerKey() uint64 | ||
| ContainerUnion(ctx containerOpContext, target *bitmapContainer) | ||
| ContainerIntersect(ctx containerOpContext, target *bitmapContainer) | ||
| ContainerNegate(ctx containerOpContext, target *bitmapContainer) | ||
| } |
There was a problem hiding this comment.
nit: do these need to be exported? If so, may need comments (and may be worth exporting containerIterator?)
| i.bitmap.Reset(true) | ||
|
|
||
| intersects := i.filter(i.multiContainerIter.containerIters, multiContainerOpIntersect) | ||
| negates := i.filter(i.multiContainerIter.containerIters, multiContainerOpNegate) |
There was a problem hiding this comment.
Doesn't this overwrite the i.filtered in intersects?
There was a problem hiding this comment.
ah yes, TY!
| func (i *multiBitmapContainersIterator) NextContainer() bool { | ||
| if len(i.iters) != 0 { | ||
| // Exhausted. | ||
| return true |
There was a problem hiding this comment.
Should this be
if len(i.iters) == 0 {
return false
}
|
|
||
| // Tell any other callers that we're done loading | ||
| if entry.loadingCh != nil { | ||
| close(entry.loadingCh) |
| } | ||
| // Only valid if all intersecting actually matched, | ||
| // if zero intersecting then postings does not contain ID. | ||
| return len(i.intersect) > 0 |
There was a problem hiding this comment.
Nit; can move this to 181, before checking intersectNegate lists
| case elem.multiBitmap != nil: | ||
| it = elem.multiBitmap.containerIterator() | ||
|
|
||
| case elem.bitmap != nil: |
There was a problem hiding this comment.
nit: needs to fail if both set?
|
|
||
| require.False(t, first.Equal(second)) | ||
| } | ||
| // func TestRoaringPostingsListEqualWithOtherNonRoaring(t *testing.T) { |
| parameters := gopter.DefaultTestParameters() | ||
| seed := time.Now().UnixNano() | ||
| parameters.MinSuccessfulTests = 100 | ||
| // seed := time.Now().UnixNano() |
There was a problem hiding this comment.
nit; can probably resume using rng seed
| // simpleReader, err := simpleSeg.Reader() | ||
| // require.NoError(t, err) | ||
| // simpleExec := executor.NewExecutor([]index.Reader{simpleReader}) |
| require.NoError(t, err) | ||
| matchedDocs, err := collectDocs(dOrg) | ||
| require.NoError(t, err) | ||
| require.NoError(t, err, fmt.Sprintf("query: %v\n", q.String())) |
There was a problem hiding this comment.
nit; require.NoError(t, err, q.String()) may be easier?
| if okA && okB && countA != countB { | ||
| return false | ||
| } | ||
|
|
There was a problem hiding this comment.
If using countfast here, we should also make sure that
if !okA && !okB {
return EqualIterator(a.Iterator(), b.Iterator())
}
return false
| intersects []postings.List, | ||
| negates []postings.List, | ||
| ) (postings.List, error) { | ||
| intersect := make([]readOnlyIterable, 0, len(intersects)) |
There was a problem hiding this comment.
nit: prealloc a slice of iterables := make([]readOnlyIterable, 0, len(intersects) + len(negates) and have both intersect and negate be taken from that, e.g. intersect, negate := iterables[:len(intersects)], iterables[len(intersects):]
| Contains(id postings.ID) bool | ||
| ContainerIterator() containerIterator | ||
| } | ||
|
|
||
| type containerIterator interface { | ||
| NextContainer() bool | ||
| ContainerKey() uint64 | ||
| ContainerUnion(ctx containerOpContext, target *bitmapContainer) | ||
| ContainerIntersect(ctx containerOpContext, target *bitmapContainer) | ||
| ContainerNegate(ctx containerOpContext, target *bitmapContainer) | ||
| Err() error | ||
| Close() |
| func (i *multiBitmap) IsEmpty() bool { | ||
| iter := i.Iterator() | ||
| hasAny := iter.Next() | ||
| _ = iter.Err() |
There was a problem hiding this comment.
do these need to check err?
…indexed snapshot during a GC run
c0e6836 to
3699014
Compare
…-index-query-allocs-rebase
…ding to rebuild all cached searches again
…terministic query key, also returned optimized form
What this PR does / why we need it:
This reduces a significant amount of all index query allocations by implementing the conjunction search as a lazy iterator over existing postings lists. It also caches regexp compilations with an LRU cache and allows setting different sizes for the regexp cache.
This change completely re-implements roaring bitmaps for query execution, this means pilosa roaring bitmaps are only used and index segment build time and all queries execute new read only roaring bitmaps which are completely mmap backed without any bitmap container/other allocations involved.
Special notes for your reviewer:
Does this PR introduce a user-facing and/or backwards incompatible change?:
Does this PR require updating code package or user-facing documentation?: