Skip to content

WIP [dbnode] Refactor posting list operations to be lazily evaluated and zero copy#2791

Open
robskillington wants to merge 119 commits intomasterfrom
r/reduce-index-query-allocs
Open

WIP [dbnode] Refactor posting list operations to be lazily evaluated and zero copy#2791
robskillington wants to merge 119 commits intomasterfrom
r/reduce-index-query-allocs

Conversation

@robskillington
Copy link
Copy Markdown
Collaborator

@robskillington robskillington commented Oct 22, 2020

What this PR does / why we need it:

This reduces a significant amount of all index query allocations by implementing the conjunction search as a lazy iterator over existing postings lists. It also caches regexp compilations with an LRU cache and allows setting different sizes for the regexp cache.

This change completely re-implements roaring bitmaps for query execution, this means pilosa roaring bitmaps are only used and index segment build time and all queries execute new read only roaring bitmaps which are completely mmap backed without any bitmap container/other allocations involved.

Special notes for your reviewer:

Does this PR introduce a user-facing and/or backwards incompatible change?:

NONE

Does this PR require updating code package or user-facing documentation?:

NONE

Copy link
Copy Markdown
Collaborator

@arnikola arnikola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM w nits

Comment thread src/m3ninx/search/searcher/regexp.go Outdated
Comment thread src/m3ninx/search/searcher/term.go Outdated
Comment thread src/m3ninx/postings/types.go
Comment thread src/m3ninx/search/searcher/all.go Outdated
Comment thread src/m3ninx/search/searcher/lazy_postings_list.go Outdated
Comment thread src/m3ninx/search/searcher/lazy_postings_list.go Outdated
Comment on lines +49 to +51
var (
union = make([]postings.List, 0, len(s.searchers))
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: don't need var block here

Comment thread src/m3ninx/search/searcher/conjunction.go Outdated
Comment thread src/m3ninx/search/searcher/disjunction.go Outdated
Comment thread src/cmd/services/m3dbnode/config/cache.go
@arnikola arnikola self-assigned this Oct 26, 2020
Copy link
Copy Markdown
Collaborator

@arnikola arnikola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a cursory glance over but didn't go super deep on it, figuring a lot of it was very similar to Pilosa's implementation; if this is a bad assumption can go back with more discerning review

"github.com/uber-go/tally"
)

// LatencyBuckets are a set of latency buckets useful for measuring things.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit measuring latencies

Comment thread src/cmd/services/m3dbnode/config/cache.go
Comment thread src/cmd/services/m3coordinator/ingest/metrics.go
Comment thread src/cmd/services/m3coordinator/ingest/metrics.go
Comment thread src/cmd/services/m3coordinator/ingest/metrics_test.go
func (i *multiBitmap) IsEmpty() bool {
iter := i.Iterator()
hasAny := iter.Next()
_ = iter.Err()
Copy link
Copy Markdown
Collaborator

@arnikola arnikola Oct 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, should this return false if iter.Err() is non-nil? Otherwise no need to call Err

Comment on lines +253 to +259
type containerIterator interface {
NextContainer() bool
ContainerKey() uint64
ContainerUnion(ctx containerOpContext, target *bitmapContainer)
ContainerIntersect(ctx containerOpContext, target *bitmapContainer)
ContainerNegate(ctx containerOpContext, target *bitmapContainer)
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do these need to be exported? If so, may need comments (and may be worth exporting containerIterator?)

i.bitmap.Reset(true)

intersects := i.filter(i.multiContainerIter.containerIters, multiContainerOpIntersect)
negates := i.filter(i.multiContainerIter.containerIters, multiContainerOpNegate)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this overwrite the i.filtered in intersects?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes, TY!

func (i *multiBitmapContainersIterator) NextContainer() bool {
if len(i.iters) != 0 {
// Exhausted.
return true
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be

if len(i.iters) == 0 {
  return false
}

Comment thread src/x/cache/lru_cache.go

// Tell any other callers that we're done loading
if entry.loadingCh != nil {
close(entry.loadingCh)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice pattern 👍

}
// Only valid if all intersecting actually matched,
// if zero intersecting then postings does not contain ID.
return len(i.intersect) > 0
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit; can move this to 181, before checking intersectNegate lists

case elem.multiBitmap != nil:
it = elem.multiBitmap.containerIterator()

case elem.bitmap != nil:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: needs to fail if both set?

@robskillington robskillington changed the title [dbnode] Reduce conjunction search and regexp compilation index query allocations [dbnode] Refactor posting list operations to become zero copy and lazily evaluated Nov 1, 2020
@robskillington robskillington changed the title [dbnode] Refactor posting list operations to become zero copy and lazily evaluated [dbnode] Refactor posting list operations to be lazily evaluated and zero copy Nov 1, 2020

require.False(t, first.Equal(second))
}
// func TestRoaringPostingsListEqualWithOtherNonRoaring(t *testing.T) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit; delete?

parameters := gopter.DefaultTestParameters()
seed := time.Now().UnixNano()
parameters.MinSuccessfulTests = 100
// seed := time.Now().UnixNano()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit; can probably resume using rng seed

Comment on lines +50 to +52
// simpleReader, err := simpleSeg.Reader()
// require.NoError(t, err)
// simpleExec := executor.NewExecutor([]index.Reader{simpleReader})
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit; delete

require.NoError(t, err)
matchedDocs, err := collectDocs(dOrg)
require.NoError(t, err)
require.NoError(t, err, fmt.Sprintf("query: %v\n", q.String()))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit; require.NoError(t, err, q.String()) may be easier?

if okA && okB && countA != countB {
return false
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If using countfast here, we should also make sure that

if !okA && !okB { 
	return EqualIterator(a.Iterator(), b.Iterator())
}

return false

intersects []postings.List,
negates []postings.List,
) (postings.List, error) {
intersect := make([]readOnlyIterable, 0, len(intersects))
Copy link
Copy Markdown
Collaborator

@arnikola arnikola Nov 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: prealloc a slice of iterables := make([]readOnlyIterable, 0, len(intersects) + len(negates) and have both intersect and negate be taken from that, e.g. intersect, negate := iterables[:len(intersects)], iterables[len(intersects):]

Comment on lines +152 to +163
Contains(id postings.ID) bool
ContainerIterator() containerIterator
}

type containerIterator interface {
NextContainer() bool
ContainerKey() uint64
ContainerUnion(ctx containerOpContext, target *bitmapContainer)
ContainerIntersect(ctx containerOpContext, target *bitmapContainer)
ContainerNegate(ctx containerOpContext, target *bitmapContainer)
Err() error
Close()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit; comments here

func (i *multiBitmap) IsEmpty() bool {
iter := i.Iterator()
hasAny := iter.Next()
_ = iter.Err()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do these need to check err?

@robskillington robskillington force-pushed the r/reduce-index-query-allocs branch from c0e6836 to 3699014 Compare February 8, 2021 06:06
…terministic query key, also returned optimized form
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants