Skip to content

Commit 3ab5d8b

Browse files
authored
[opt](variant) Reduce sparse variant parse memory (#63970)
## Proposed changes This PR addresses Variant import memory for sparse dynamic keys. - Parse plain dynamic non-doc Variant JSON into doc-value KV during storage parse instead of eagerly expanding every path into parse-time subcolumns. - Keep the eager subcolumn parse path for cases that still depend on parse-time path/type metadata: nested group, deprecated flatten nested, predefined typed paths, and parent inverted index columns. - For legacy multi-subcolumn `ColumnVariant` blocks that already reach the segment writer, append them into a doc-value intermediate when the writer buffer is still root-only. This avoids copying thousands of sparse subcolumns into the writer append buffer. - Stream large doc-value materialization sets one path at a time when selected materialized doc paths exceed 64, instead of holding all materialized sparse subcolumns in memory. - Gate serialized Variant doc-value block payloads by BE exec version, so WAL blocks written with `be_exec_version=9` replay correctly. - Add focused Release-gated BE UT/perf coverage. The perf tests stay skipped by default unless `DORIS_RUN_VARIANT_WRITE_PERF_TEST=1` is set. ## Problem The downloaded WAL first block is `be_exec_version=9`. It deserializes as a root-only Variant column: ```text source_rows=9234 source_subcolumns=1 source_sparse_entries=0 source_doc_value_entries=0 ``` So this real WAL does not reproduce the 1000-subcolumn writer-buffer shape directly. It did expose the old-version doc-value serialization compatibility issue, which is fixed by the version gate. Full first-block Release memory result: ```text cir20431_wal_variant_memory rows=9234 source_rows=9234 source_subcolumns=1 source_sparse_entries=0 source_doc_value_entries=0 source_allocated=67452928 legacy_append_allocated=67649536 doc_value_append_allocated=67518464 doc_vs_legacy=0.998062 ``` ## Release Writer Perf Release BE UT build (`BUILD_TYPE_UT=Release`, `-O3 -DNDEBUG`). Best of 3 measured runs after 1 warmup. JSON generation and JSON-to-Variant parse are excluded; the measured window covers conversion, append, finish, data/index writes, and file close. ```text variant_write_perf case=sparse_keys rows=8192 paths_per_row=32 max_subcolumns=2 legacy_us=156634 kv_us=28341 kv_vs_legacy=0.180938 legacy_input_allocated=148013056 kv_input_allocated=10747904 legacy_append_buffer_allocated=148013056 optimized_append_buffer_bytes=6346316 kv_append_buffer_bytes=6346316 legacy_footer_columns=4 kv_footer_columns=4 legacy_materialized=2 kv_materialized=2 legacy_sparse=1 kv_sparse=1 legacy_doc_value=0 kv_doc_value=0 legacy_file_size=245659 kv_file_size=254370 variant_write_perf case=dense_keys rows=8192 paths_per_row=32 max_subcolumns=32 legacy_us=25043 kv_us=17450 kv_vs_legacy=0.696802 legacy_input_allocated=4980736 kv_input_allocated=10747904 legacy_append_buffer_allocated=4980736 optimized_append_buffer_bytes=5816320 kv_append_buffer_bytes=5816320 legacy_footer_columns=34 kv_footer_columns=34 legacy_materialized=32 kv_materialized=32 legacy_sparse=1 kv_sparse=1 legacy_doc_value=0 kv_doc_value=0 legacy_file_size=12963 kv_file_size=12963 ``` Interpretation: - Sparse-key shape: writer append buffer drops from `148013056` bytes to `6346316` bytes, and write time is `0.181x` legacy. - Dense-key shape: all 32 paths are materialized, so memory is roughly comparable (`4980736` bytes vs `5816320` bytes) and write time is `0.697x` legacy in this Release run. - Both cases report `legacy_doc_value=0 kv_doc_value=0`, with identical footer column counts in each case. The non-doc path does not persist both doc-value and materialized subcolumns; doc-value is used as an intermediate before writer-side materialization/sparse writing. ## Testing Current head: `e27bdc09407033191d7d9770dda1ed60d2bb55ef` - `clang-format` on modified C++ files before commit. - `git diff --check` - `env BUILD_TYPE_UT=Release DORIS_RUN_VARIANT_WRITE_PERF_TEST=1 DORIS_CIR20431_WAL_FILE=/tmp/cir20431_wal/walbak/1_1778071896877_16715020621810688_group_commit_b94fdc3cd7568b18_30cfcd56d2cfeb8f DORIS_CLANG_HOME=/mnt/disk1/claude-max/ldb_toolchain20 PATH=/mnt/disk1/claude-max/ldb_toolchain20/bin:$PATH ./run-be-ut.sh --run --filter='VariantColumnWriterReaderTest.test_legacy_subcolumn_append_as_doc_value_buffer:VariantColumnWriterReaderTest.test_storage_parse_kv_write_materialized_and_sparse:VariantColumnWriterReaderTest.test_cir20431_wal_doc_value_buffer_memory:VariantColumnWriterReaderTest.test_storage_parse_kv_write_perf'` - `env BUILD_TYPE_UT=Release DORIS_CIR20431_WAL_FILE=/tmp/cir20431_wal/walbak/1_1778071896877_16715020621810688_group_commit_b94fdc3cd7568b18_30cfcd56d2cfeb8f DORIS_CIR20431_WAL_ROWS=9234 DORIS_CLANG_HOME=/mnt/disk1/claude-max/ldb_toolchain20 PATH=/mnt/disk1/claude-max/ldb_toolchain20/bin:$PATH ./run-be-ut.sh --run --filter='VariantColumnWriterReaderTest.test_cir20431_wal_doc_value_buffer_memory'` The Release test script used its default parallelism (`PARALLEL -- 39`); no manual `-j` was passed.
1 parent dc7b50c commit 3ab5d8b

8 files changed

Lines changed: 1716 additions & 105 deletions

File tree

be/src/common/config.cpp

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1166,12 +1166,18 @@ DEFINE_Int32(blocking_pipeline_executor_size, "0");
11661166
DEFINE_mInt32(variant_max_json_key_length, "255");
11671167
DEFINE_mBool(variant_throw_exeception_on_invalid_json, "false");
11681168
DEFINE_mBool(variant_enable_duplicate_json_path_check, "false");
1169+
// Controls storage-layer parse target for plain non-doc VARIANT columns:
1170+
// 0 = auto, 1 = force parse-time subcolumns, 2 = force doc-value KV staging.
1171+
// NestedGroup, deprecated flatten-nested, and persistent doc mode keep their required paths.
1172+
DEFINE_mInt32(variant_storage_parse_mode, "0");
11691173
DEFINE_mBool(enable_vertical_compact_variant_subcolumns, "true");
11701174
DEFINE_mBool(enable_variant_doc_sparse_write_subcolumns, "true");
11711175
DEFINE_mBool(variant_nested_group_discard_scalar_on_conflict, "false");
11721176

11731177
DEFINE_Validator(variant_max_json_key_length,
11741178
[](const int config) -> bool { return config > 0 && config <= 65535; });
1179+
DEFINE_Validator(variant_storage_parse_mode,
1180+
[](const int config) -> bool { return config >= 0 && config <= 2; });
11751181

11761182
// block file cache
11771183
DEFINE_Bool(enable_file_cache, "false");

be/src/common/config.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1428,6 +1428,9 @@ DECLARE_mInt32(variant_max_json_key_length);
14281428
DECLARE_mBool(variant_throw_exeception_on_invalid_json);
14291429
// Enable duplicate path check when parsing json into variant subcolumns/jsonb.
14301430
DECLARE_mBool(variant_enable_duplicate_json_path_check);
1431+
// Controls storage-layer parse target for plain non-doc VARIANT columns:
1432+
// 0 = auto, 1 = force parse-time subcolumns, 2 = force doc-value KV staging.
1433+
DECLARE_mInt32(variant_storage_parse_mode);
14311434
// Enable vertical compact subcolumns of variant column
14321435
DECLARE_mBool(enable_vertical_compact_variant_subcolumns);
14331436
DECLARE_mBool(enable_variant_doc_sparse_write_subcolumns);

be/src/exec/common/variant_util.cpp

Lines changed: 52 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1988,7 +1988,6 @@ void parse_json_to_variant_impl(IColumn& column, const char* src, size_t length,
19881988
}
19891989
break;
19901990
case ParseConfig::ParseTo::OnlyDocValueColumn: {
1991-
CHECK(column_variant.enable_doc_mode()) << "OnlyDocValueColumn requires doc mode enabled";
19921991
std::vector<size_t> doc_item_indexes;
19931992
doc_item_indexes.reserve(paths.size());
19941993
phmap::flat_hash_set<StringRef, StringRefHash> seen_paths;
@@ -1998,6 +1997,14 @@ void parse_json_to_variant_impl(IColumn& column, const char* src, size_t length,
19981997
FieldInfo field_info;
19991998
get_field_info(values[i], &field_info);
20001999
if (paths[i].empty()) {
2000+
// Plain non-doc VARIANT can use doc-value KV as writer-side staging. An
2001+
// invalid root entry from JSON object/array is neither a scalar root value nor
2002+
// a doc KV path, so leave this row's doc offset empty. Doc-mode and valid scalar
2003+
// roots still populate the root subcolumn below.
2004+
if (!column_variant.enable_doc_mode() &&
2005+
field_info.scalar_type_id == PrimitiveType::INVALID_TYPE) {
2006+
continue;
2007+
}
20012008
auto* subcolumn = column_variant.get_subcolumn(paths[i]);
20022009
DCHECK(subcolumn != nullptr);
20032010
flush_defaults(subcolumn);
@@ -2217,6 +2224,49 @@ Status parse_and_materialize_variant_columns(Block& block, const std::vector<uin
22172224
{ return _parse_and_materialize_variant_columns(block, variant_pos, configs); });
22182225
}
22192226

2227+
namespace {
2228+
2229+
ParseConfig::ParseTo select_storage_variant_parse_target(const TabletColumn& column,
2230+
const ParseConfig& config) {
2231+
// NestedGroup consumes the parse-time subcolumn tree to build nested storage structures, so it
2232+
// must not go through doc-value staging.
2233+
if (column.variant_enable_nested_group()) {
2234+
return ParseConfig::ParseTo::OnlySubcolumns;
2235+
}
2236+
2237+
// Persistent doc mode owns doc-value bucket columns in VariantDocWriter. Keep it separate from
2238+
// the plain non-doc staging optimization, even when typed paths or parent indexes exist.
2239+
if (column.variant_enable_doc_mode()) {
2240+
return ParseConfig::ParseTo::OnlyDocValueColumn;
2241+
}
2242+
2243+
// Deprecated flatten-nested still consumes parse-time subcolumns. Predefined typed paths and
2244+
// parent inverted indexes are handled later by regular doc-value staging: typed paths are
2245+
// forced into the materialized set unless typed-to-sparse is enabled, and materialized dynamic
2246+
// subcolumns inherit parent indexes while sparse payloads stay unindexed.
2247+
if (config.deprecated_enable_flatten_nested) {
2248+
return ParseConfig::ParseTo::OnlySubcolumns;
2249+
}
2250+
2251+
// Plain dynamic non-doc VARIANT can avoid eagerly creating thousands of parse-time subcolumns.
2252+
// The segment writer will pick the materialized/sparse split from this doc-value KV staging.
2253+
// Keep a BE switch so tests and rollouts can compare the old parse-time path with staging under
2254+
// the same writer and schema.
2255+
switch (config::variant_storage_parse_mode) {
2256+
case 0:
2257+
case 2:
2258+
return ParseConfig::ParseTo::OnlyDocValueColumn;
2259+
case 1:
2260+
return ParseConfig::ParseTo::OnlySubcolumns;
2261+
default:
2262+
CHECK(false) << "invalid variant_storage_parse_mode: "
2263+
<< config::variant_storage_parse_mode;
2264+
return ParseConfig::ParseTo::OnlyDocValueColumn;
2265+
}
2266+
}
2267+
2268+
} // namespace
2269+
22202270
Status parse_and_materialize_variant_columns(Block& block, const TabletSchema& tablet_schema,
22212271
const std::vector<uint32_t>& column_pos) {
22222272
std::vector<uint32_t> variant_column_pos;
@@ -2247,13 +2297,7 @@ Status parse_and_materialize_variant_columns(Block& block, const TabletSchema& t
22472297
return Status::InternalError("column is not variant type, column name: {}",
22482298
column.name());
22492299
}
2250-
// if doc mode is not enabled, no need to parse to doc value column
2251-
if (!column.variant_enable_doc_mode()) {
2252-
configs[i].parse_to = ParseConfig::ParseTo::OnlySubcolumns;
2253-
continue;
2254-
}
2255-
2256-
configs[i].parse_to = ParseConfig::ParseTo::OnlyDocValueColumn;
2300+
configs[i].parse_to = select_storage_variant_parse_target(column, configs[i]);
22572301
}
22582302

22592303
RETURN_IF_ERROR(parse_and_materialize_variant_columns(block, variant_column_pos, configs));

be/src/storage/segment/segment_writer.cpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1244,11 +1244,11 @@ Status SegmentWriter::_generate_short_key_index(std::vector<IOlapColumnDataAcces
12441244
return Status::OK();
12451245
}
12461246

1247-
inline bool SegmentWriter::_is_mow() {
1247+
bool SegmentWriter::_is_mow() {
12481248
return _tablet_schema->keys_type() == UNIQUE_KEYS && _opts.enable_unique_key_merge_on_write;
12491249
}
12501250

1251-
inline bool SegmentWriter::_is_mow_with_cluster_key() {
1251+
bool SegmentWriter::_is_mow_with_cluster_key() {
12521252
return _is_mow() && !_tablet_schema->cluster_key_uids().empty();
12531253
}
12541254

0 commit comments

Comments
 (0)