Skip to content

Rollback cherrypicking 207 branch UT fix. This PR will be merged after Sachin merge of rollback commits.#50

Open
debasatwa29 wants to merge 14 commits into
pinterest:druid_main_version_2from
debasatwa29:rollback_cherrypicking_207_ut_fix
Open

Rollback cherrypicking 207 branch UT fix. This PR will be merged after Sachin merge of rollback commits.#50
debasatwa29 wants to merge 14 commits into
pinterest:druid_main_version_2from
debasatwa29:rollback_cherrypicking_207_ut_fix

Conversation

@debasatwa29
Copy link
Copy Markdown

@debasatwa29 debasatwa29 commented Jul 21, 2022

Fixes # Rollback cherrypicking 207 branch UT fix.

Rollback cherrypicking 207 branch UT fix. This PR will be merged after Sachin merge of rollback commits.



This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Jian Wang and others added 14 commits July 19, 2022 08:54
Summary:
Add BloomFilterStreamFanOutHashBasedNumberedShardSpec
One pager:
https://docs.google.com/document/d/173EgL8wRLGrF2o8_xMtfIPPM9HE1GDRrC28s6YsMjXc/edit?usp=sharing

Reviewers: O1139 Druid, yyang

Reviewed By: O1139 Druid, yyang

Subscribers: jenkins, yyang, mleonard, shawncao, realtime-analytics

Differential Revision: https://phabricator.pinadmin.com/D719672
commit : faf5d00
Summary:
Allow mixed shard spec type for real time ingestion
Background: for real time ingestion of an existing data source, there will be exceptions thrown if we change the shard spec type, this diff adds an optional config to allow it. If not set, no behavior change.

Reviewers: O1139 Druid, itallam

Reviewed By: O1139 Druid, itallam

Subscribers: itallam, jenkins, shawncao, realtime-analytics

Differential Revision: https://phabricator.pinadmin.com/D726029

commit: 64dc058
Summary: Fix real time shard spec compatibility issue

Reviewers: O1139 Druid, yyang, itallam

Reviewed By: O1139 Druid, yyang, itallam

Subscribers: jenkins, shawncao, realtime-analytics

Differential Revision: https://phabricator.pinadmin.com/D726744

commit: 763e96f
Summary: Add in memory bitmap support when rollup is false

Reviewers: O1139 Druid, jgu, yyang, itallam

Reviewed By: O1139 Druid, jgu, yyang, itallam

Subscribers: jenkins, shawncao, realtime-analytics

JIRA Issue(s): RTA-2719

Differential Revision: https://phabricator.pinadmin.com/D729042

commit: 0a54b60
…o monitor latency to insert rows to bloom filter and add an option to config monitor only certain data source in a broker stage

Summary: Add a metric to monitor pending persist submission and add a metric to monitor latency to insert rows to bloom filter and add an option to config monitor only certain data source in a broker stage

Reviewers: O1139 Druid, itallam

Reviewed By: O1139 Druid, itallam

Subscribers: shawncao, realtime-analytics

Differential Revision: https://phabricator.pinadmin.com/D732268
Commit: 6106c47
Summary:
Currently the druid services exports <hostname>:<port> to ZK by default, which works good when running
on Teletraan, as the hostname are valid ec2 urls. But when running on K8s, the hostname become unresolvable
Pod names. This can be fixed by support exporting <IP>:<port> as service address.

Test Plan:
Tested manually.
{F28879950}

Reviewers: O1139 Druid, ericnguyen

Reviewed By: O1139 Druid, ericnguyen

Differential Revision: https://phabricator.pinadmin.com/D750919

commit: 643d4a0
Signed-off-by: ssagare <ssagare@pinterest.com>
Summary: Set useInMemoryBitmapInQuery default to true. Now there's only one knob `enableInMemoryBitmap` in ingestion spec that controls whether to use in memory bitmap

Reviewers: O1139 Druid, itallam

Reviewed By: O1139 Druid, itallam

Subscribers: jenkins, shawncao, #realtime-analytics

Differential Revision: https://phabricator.pinadmin.com/D754879

Signed-off-by: ssagare <ssagare@pinterest.com>
Summary: stream namespaced fan out shard spec. Add namespace support to stream fan out shard spec.

Test Plan:
Made corresponding change in DRUIDHADOOP repo.
In flink producer schema, use partition dimension together with fanOutSize to calculate the kafka partition.
Ingest some data.
Query by timeline, getting row info. Use the partition dimension value from timeline query to query by filter. Both results are same. Sum up metrics ct and verified same for both as well.

Reviewers: O1139 Druid, jwang

Reviewed By: O1139 Druid, jwang

Subscribers: jwang, jenkins, mleonard, #realtime-analytics

Differential Revision: https://phabricator.pinadmin.com/D755523

commit : 648473d
Signed-off-by: ssagare <ssagare@pinterest.com>
…; fix an issue on loading bloom filters in broker

Summary:

Pull upstream fix apache#10664 to remove confusing error messages in the log "Not all bytes were read from the S3ObjectInputStream"
Add a query context returnEmptyResults for debugging pruning effect purpose
Fix an issue on read only byte buffer exception leading to unable to load bloom filters in broker
Reviewers: O1139 Druid, yyang

Reviewed By: O1139 Druid, yyang

Subscribers: jenkins, shawncao, realtime-analytics

Differential Revision: https://phabricator.pinadmin.com/D755650

commit : 54b73af
Signed-off-by: ssagare <ssagare@pinterest.com>
…ema definition to support both real time and batch segments

Summary:
Add generic bloom filter index creation support in ingestion spec schema definition to support both real time and batch segments

Added a flag `createBloomFilterIndex` which defaults to false and can be optionally set to true for any String dimension in the ingestion schema to create bloom filter indexes for a dimension which can be used later by broker hosts to prune segments. The segment pruning logic in broker process will now look for both filters on dimensions having bloom filter indexes created in addition to filters on current partition dimensions that are used by some shard specs like HashBasedShardSpec, SingleDimensionShardSpec and BloomFilterNamedShardSpec. The bloom filter index can be enabled regardless of what shard spec is in use and regardless of whether a segment is created by batch or real time ingestion.

"dimensionsSpec": {
    "dimensions": [
            {"name": "partner_id", "type":"string", "createBloomFilterIndex": true},
            "eventtype",
            "app",
            {"name": "root_pin_id", "type":"string", "createBloomFilterIndex": true},
            "pin_id",
            "contenttype",
            "pinformat"
      ]
 }

Test Plan: Unit test and integration test

Reviewers: O1139 Druid, jgu, itallam

Reviewed By: O1139 Druid, jgu, itallam

Subscribers: jenkins, mleonard, shawncao, #realtime-analytics

Differential Revision: https://phabricator.pinadmin.com/D747062

commit: f1f73d1
Signed-off-by: ssagare <ssagare@pinterest.com>
…ema definition to support both real time and batch segments

Summary:
Add generic bloom filter index creation support in ingestion spec schema definition to support both real time and batch segments

Added a flag `createBloomFilterIndex` which defaults to false and can be optionally set to true for any String dimension in the ingestion schema to create bloom filter indexes for a dimension which can be used later by broker hosts to prune segments. The segment pruning logic in broker process will now look for both filters on dimensions having bloom filter indexes created in addition to filters on current partition dimensions that are used by some shard specs like HashBasedShardSpec, SingleDimensionShardSpec and BloomFilterNamedShardSpec. The bloom filter index can be enabled regardless of what shard spec is in use and regardless of whether a segment is created by batch or real time ingestion.

"dimensionsSpec": {
    "dimensions": [
            {"name": "partner_id", "type":"string", "createBloomFilterIndex": true},
            "eventtype",
            "app",
            {"name": "root_pin_id", "type":"string", "createBloomFilterIndex": true},
            "pin_id",
            "contenttype",
            "pinformat"
      ]
 }

Test Plan: Unit test and integration test

Reviewers: O1139 Druid, jgu, itallam

Reviewed By: O1139 Druid, jgu, itallam

Subscribers: jenkins, mleonard, shawncao, #realtime-analytics

Differential Revision: https://phabricator.pinadmin.com/D747062

commit: f1f73d1
Signed-off-by: ssagare <ssagare@pinterest.com>
Signed-off-by: ssagare <ssagare@pinterest.com>
Signed-off-by: ssagare <ssagare@pinterest.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants