Skip to content

Commit 2eea945

Browse files
Optimize bitmap usage in FilterPipelineStage (open-telemetry#2329)
# Change Summary Last Wednesday's SIG meeting discussion alluded to me that our OTAP query engine probably had some had some performance to be gained in its filtering implementation. I ran the profiler on the benchmarks and, lo and behold, we were spending a lot of time creating `RoaringBitmap`s and checking if IDs were contained therein. Note - when filtering we were using these `RoaringBitmap`s in two places: a) when the filtering be an attribute value in a query like `where attributes["x"] = "y"`, we filter the attribute record batch where `key == "x" and str == "y"`, then create a bitmap of the parent_id column, then we scan the parent record batch, determining which values from the `id` column are in this bitmap, and keep only those rows. In these situations, we may also and/or/not the bitmaps together if the query was something like `attributes["x1"] == "y1" and attributes["x2"] == "y2"`. b) when we're finished with the entire filter execution, we will have determined which rows to keep for the parent batch, and then we need to recursively filter the child record batches. We would do this by creating a bitmap of the ID column on the parent batch, then scanning the child `parent_id` column, checking which rows were in this bitmap and keeping only those rows. This PR replaces the usage of `RoaringBitmap` with a simpler implementation based on an implementation called `IdBitmap` which contains pages, each containing a flat bitmap of based on an array of u64s. This is much faster because, unlike `RoaringBitmap`, we never have to do search for the container during insert/contains. While the `RoaringBitmap` is designed for potentially sparse data, our data is dense where in the vast majority of cases the ID columns will be 0...max with few or no gaps, so a more traditional bitmap makes sense here. This PR also reuses the heap allocations of the bitmap between batches. Note that because we potentially create many of these bitmaps during a filter execution (see note above about logically combining attribute filters), we implement reuse via a pool. Benchmark results: | Benchmark | Batch Size | Before | After | Change | |---|---|---|---|---| | `simple_field_filter` | 32 | 7.58 µs | 7.61 µs | ~0% | | `simple_field_filter` | 1024 | 22.32 µs | 14.39 µs | **-35.5%** | | `simple_field_filter` | 8192 | 140.86 µs | 65.67 µs | **-53.4%** | | `simple_attr_filter` | 32 | 10.17 µs | 9.93 µs | **-2.4%** | | `simple_attr_filter` | 1024 | 41.43 µs | 22.75 µs | **-45.1%** | | `simple_attr_filter` | 8192 | 308.44 µs | 120.45 µs | **-60.9%** | | `attr_or_attr_filter` | 32 | 12.92 µs | 12.57 µs | **-2.7%** | | `attr_or_attr_filter` | 1024 | 56.77 µs | 31.04 µs | **-45.3%** | | `attr_or_attr_filter` | 8192 | 427.42 µs | 167.81 µs | **-60.7%** | | `attr_and_prop_filter` | 32 | 10.84 µs | 10.92 µs | ~0% | | `attr_and_prop_filter` | 1024 | 35.98 µs | 21.54 µs | **-40.1%** | | `attr_and_prop_filter` | 8192 | 263.80 µs | 100.78 µs | **-61.8%** | | `attr_and_attr_filter` | 32 | 4.32 µs | 4.38 µs | ~0% | | `attr_and_attr_filter` | 1024 | 13.40 µs | 9.41 µs | **-29.8%** | | `attr_and_attr_filter` | 8192 | 92.65 µs | 49.49 µs | **-46.6%** | | `attr_and_or_together_filter` | 32 | 16.31 µs | 16.37 µs | ~0% | | `attr_and_or_together_filter` | 1024 | 52.78 µs | 34.01 µs | **-35.6%** | | `attr_and_or_together_filter` | 8192 | 387.77 µs | 164.41 µs | **-57.6%** | | `or_short_circuit` | 32 | 3.38 µs | 3.25 µs | **-3.8%** | | `or_short_circuit` | 1024 | 24.19 µs | 8.49 µs | **-64.9%** | | `or_short_circuit` | 8192 | 127.50 µs | 48.67 µs | **-61.8%** | (Note that filter benchmarks `attr_and_or_together_filter` & `and_attrs_short_circuit` are omitted from the table because they filter out all the rows before any bitmap operations, so there is no performance change). <!-- Replace with a brief summary of the change in this PR --> ## What issue does this PR close? <!-- We highly recommend correlation of every PR to an issue --> * Closes open-telemetry#2330 ## How are these changes tested? Unit tests ## Are there any user-facing changes? No ## Future work/followups We should probably look at other places we use `RoaringBitmap` for this same purpose to see if we can get some performance benefit from using this new IdBitmap type. For example, I think this same type of procedure is also done in filter processor.
1 parent f35de84 commit 2eea945

3 files changed

Lines changed: 1374 additions & 224 deletions

File tree

0 commit comments

Comments
 (0)