Pipe: support multiple path patterns under tree model by VGalaxies · Pull Request #16435 · apache/iotdb

VGalaxies · 2025-09-18T05:59:56Z

As title.

Copilot

Pull Request Overview

Adds foundational support for multiple path patterns in the IoTDB pipe source configuration under the tree model. Currently, the system only supports single path patterns, but this PR begins the architectural changes needed to handle multiple patterns.

Refactors TreePattern.parsePipePatternFromSourceParameters() to return a list of patterns instead of a single pattern
Introduces a wrapper class DataRegionSourceWithPattern to associate sources with their specific patterns
Updates pattern matching logic to handle multiple patterns while maintaining temporary single-pattern restrictions

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
TreePattern.java	Modified parser to return List and support comma-separated paths
IoTDBNonDataRegionSource.java	Added validation to restrict to single pattern until full support is implemented
DataRegionSourceWithPattern.java	New wrapper class for associating sources with patterns
PipeRealtimeDataRegionSource.java	Changed internal pattern storage from single to list
CachedSchemaPatternMatcher.java	Updated matching logic to work with pattern lists and wrapper class
PipeDataRegionAssigner.java	Modified to use wrapper class for pattern-specific assignments
PipeHistoricalDataRegionTsFileAndDeletionSource.java	Updated to use pattern lists with TODO markers
IoTDBDataRegionSource.java	Updated validation to iterate over pattern list
DataRegionListeningFilter.java	Modified filtering logic to check any matching pattern
CachedSchemaPatternMatcherTest.java	Updated test to use new wrapper class API

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-09-18T07:07:14Z

+      for (final String singlePath : path.split(",")) {
+        if (!singlePath.trim().isEmpty()) {
+          patterns.add(new IoTDBTreePattern(isTreeModelDataAllowedToBeCaptured, singlePath.trim()));
+        }
+      }


The splitting logic on comma assumes no escaping mechanism for commas within paths. Consider using a more robust parsing approach or documenting that commas cannot be part of path names.

Copilot · 2025-09-18T07:07:15Z

+  public DataRegionSourceWithPattern(final PipeRealtimeDataRegionSource source) {
+    this.source = source;
+    // TODO: handle multiple patterns
+    this.treePattern = source.getTreePatterns().get(0);


Potential IndexOutOfBoundsException if source.getTreePatterns() returns an empty list. Add bounds checking before accessing index 0.

Suggested change

this.treePattern = source.getTreePatterns().get(0);

if (source.getTreePatterns() != null && !source.getTreePatterns().isEmpty()) {

this.treePattern = source.getTreePatterns().get(0);

} else {

this.treePattern = null;

}

Copilot · 2025-09-18T07:07:15Z

            creationTime,
            pipeTaskMeta,
-            treePattern,
+            treePatterns.get(0), // TODO


Potential IndexOutOfBoundsException if treePatterns is empty. Add bounds checking before accessing index 0.

Suggested change

treePatterns.get(0), // TODO

treePatterns.isEmpty() ? null : treePatterns.get(0), // Safe: pass null if empty

Copilot · 2025-09-18T07:07:16Z

            creationTime,
            pipeTaskMeta,
-            treePattern,
+            treePatterns.get(0), // TODO


Potential IndexOutOfBoundsException if treePatterns is empty. Add bounds checking before accessing index 0.

Copilot · 2025-09-18T07:07:16Z

    for (final PipeRealtimeDataRegionSource source : sources) {
-      if (!matchedSources.contains(source)) {
-        unmatchedSources.add(source);
+      // TODO: improve performance
+      if (!matchedSources.stream()
+          .map(DataRegionSourceWithPattern::getSource)
+          .collect(Collectors.toSet())
+          .contains(source)) {


Converting stream to Set for contains check is inefficient. Consider using anyMatch() or maintaining a Set for faster lookups.

Copilot

Pull Request Overview

Copilot reviewed 35 out of 35 changed files in this pull request and generated 4 comments.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-09-25T08:16:07Z

-        + treePattern
-        + "', tablePattern='"
+        + treePatterns
+        + "', tablePatterns='"


The field name in the string should be 'tablePattern' to match the actual field name, not 'tablePatterns'.

Suggested change

+ "', tablePatterns='"

+ "', tablePattern='"

Copilot · 2025-09-25T08:16:08Z

+
+  private static List<TreePattern> parseMultiplePatterns(
+      final String pattern, final Function<String, TreePattern> patternSupplier) {
+    if (pattern.isEmpty()) {


This condition checks if the pattern string is empty, but an empty string should likely be handled differently than creating a pattern with an empty string. Consider checking for null or whitespace-only strings instead.

Suggested change

if (pattern.isEmpty()) {

if (pattern == null || pattern.trim().isEmpty()) {

Copilot · 2025-09-25T08:16:08Z

        } else {
-          isDbNameCoveredByPattern = treePattern.coversDb(databaseName);
+          isDbNameCoveredByPattern =
+              treePatterns.stream().allMatch(treePattern -> treePattern.coversDb(databaseName));


Using allMatch() here means the database is only considered covered if ALL patterns cover it. This seems incorrect - the database should be covered if ANY pattern covers it. Consider using anyMatch() instead.

Suggested change

treePatterns.stream().allMatch(treePattern -> treePattern.coversDb(databaseName));

treePatterns.stream().anyMatch(treePattern -> treePattern.coversDb(databaseName));

Copilot · 2025-09-25T08:16:09Z

        } else {
-          isDbNameCoveredByPattern = treePattern.coversDb(databaseName);
+          isDbNameCoveredByPattern =
+              treePatterns.stream().allMatch(treePattern -> treePattern.coversDb(databaseName));


Same issue as in PipeRealtimeDataRegionSource - using allMatch() here means the database is only considered covered if ALL patterns cover it. This should likely be anyMatch() to match if any pattern covers the database.

Suggested change

treePatterns.stream().allMatch(treePattern -> treePattern.coversDb(databaseName));

treePatterns.stream().anyMatch(treePattern -> treePattern.coversDb(databaseName));

This reverts commit 09a79c3.

Copilot

Pull Request Overview

Copilot reviewed 36 out of 36 changed files in this pull request and generated 3 comments.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-09-25T08:24:22Z

+  private static List<TreePattern> parseMultiplePatterns(
+      final String pattern, final Function<String, TreePattern> patternSupplier) {
+    if (pattern.isEmpty()) {
+      return Collections.singletonList(patternSupplier.apply(pattern));


This condition will cause empty patterns to be processed as valid patterns. An empty pattern should likely be filtered out or handled differently to avoid creating meaningless pattern matchers.

Suggested change

return Collections.singletonList(patternSupplier.apply(pattern));

// Return an empty list if the pattern is empty, to avoid creating meaningless pattern matchers.

return Collections.emptyList();

Copilot · 2025-09-25T08:24:23Z

+        Objects.isNull(treePatterns)
+            || ((treePatterns.stream()
+                    .allMatch(treePattern -> treePattern == null || treePattern.isRoot()))


The nested null checks and stream operations make this logic complex. Consider extracting this into a private helper method like areAllPatternsRootOrNull() for better readability.

Copilot · 2025-09-25T08:24:23Z

    }
-    return initEventParser().convertToTablet();
+    // TODO: handle multiple patterns
+    return initEventParsers().get(0).convertToTablet();


Using get(0) without checking if the list is empty will cause an IndexOutOfBoundsException. Add a null/empty check or handle the case where no patterns exist.

Suggested change

return initEventParsers().get(0).convertToTablet();

List<TabletInsertionEventParser> parsers = initEventParsers();

if (parsers == null || parsers.isEmpty()) {

// Handle the case where no parser exists; return null or throw an exception as appropriate

return null;

}

return parsers.get(0).convertToTablet();

Caideyipi · 2025-09-30T07:53:29Z

        } else {
-          isDbNameCoveredByPattern = treePattern.coversDb(databaseName);
+          isDbNameCoveredByPattern =
+              treePatterns.stream().allMatch(treePattern -> treePattern.coversDb(databaseName));


Caideyipi · 2025-09-30T07:53:57Z

    return parseConfigPlan(
            ((PipeConfigRegionWritePlanEvent) event).getConfigPhysicalPlan(),
-            treePattern,
+            // TODO: handle multiple patterns


should complete it

jt2594838 · 2025-10-11T06:48:09Z

  public Iterable<TabletInsertionEvent> processRowByRow(
      final BiConsumer<Row, RowCollector> consumer) {
-    return initEventParser().processRowByRow(consumer);
+    return initEventParsers().stream()
+        .map(tabletInsertionEventParser -> tabletInsertionEventParser.processRowByRow(consumer))
+        .flatMap(Collection::stream)
+        .collect(Collectors.toList());
  }

  @Override
  public Iterable<TabletInsertionEvent> processTablet(
      final BiConsumer<Tablet, RowCollector> consumer) {
-    return initEventParser().processTablet(consumer);
+    return initEventParsers().stream()
+        .map(tabletInsertionEventParser -> tabletInsertionEventParser.processTablet(consumer))
+        .flatMap(Collection::stream)
+        .collect(Collectors.toList());
  }


Will an insertion be sent twice if I define patterns like:
root.db1.d1.s1
root.db1.d1.*

jt2594838 · 2025-10-11T06:55:44Z

+        final Iterator<TsFileInsertionEventParser> parserIterator = initEventParsers().iterator();
+        return new Iterator<TabletInsertionEvent>() {
+          private TsFileInsertionEventParser currentParser = null;
+          private Iterator<TabletInsertionEvent> currentEventIterator = Collections.emptyIterator();
+
+          private void closeCurrentParser() {
+            if (Objects.nonNull(currentParser)) {
+              currentParser.close();
+              currentParser = null;
+            }
+          }
+
+          @Override
+          public boolean hasNext() {
+            while (!currentEventIterator.hasNext() && parserIterator.hasNext()) {
+              closeCurrentParser();
+              currentParser = parserIterator.next();
+              currentEventIterator = currentParser.toTabletInsertionEvents().iterator();
+            }
+
+            if (!currentEventIterator.hasNext()) {
+              closeCurrentParser();
+            }
+
+            return currentEventIterator.hasNext();
+          }
+
+          @Override
+          public TabletInsertionEvent next() {
+            if (!hasNext()) {
+              throw new NoSuchElementException();
+            }
+            return currentEventIterator.next();
+          }
+        };
+      };


I would suggest that you put patterns inside the parser, instead of each parser for one pattern, because:

overlapped patterns, like root.db1.** and root.db1.d1.**, may produce redundant results;

each pattern may result in a traverse in a TsFile, which could be inefficient.

setup

8c05c68

VGalaxies requested a review from Copilot September 18, 2025 07:05

Copilot AI reviewed Sep 18, 2025

View reviewed changes

VGalaxies added 15 commits September 19, 2025 15:38

minor improve

c3f320f

fix & add realtime data CI

0d76068

Merge branch 'master' into multi-path

d707f1c

fix IT

7f4e08d

refact

e958053

reset

20ab68c

more IT

a0760e2

fixup

5ed2902

try fix

1b44f6b

try fix

172a86b

fix UT

4edc88c

fix IT

f889c18

fixup! fix IT

15b2229

more IT

859a4d3

Merge branch 'master' into multi-path

636b379

VGalaxies requested review from Caideyipi and Copilot September 25, 2025 08:14

VGalaxies marked this pull request as ready for review September 25, 2025 08:15

Copilot AI reviewed Sep 25, 2025

View reviewed changes

minor improve

09a79c3

VGalaxies requested a review from Copilot September 25, 2025 08:23

VGalaxies added 2 commits September 25, 2025 16:23

Revert "minor improve"

c51d447

This reverts commit 09a79c3.

minor improve

683da4f

Copilot AI reviewed Sep 25, 2025

View reviewed changes

Merge branch 'master' into multi-path

ba408dd

Caideyipi reviewed Sep 30, 2025

View reviewed changes

Merge branch 'master' into multi-path

e7a35b7

jt2594838 requested changes Oct 11, 2025

View reviewed changes

VGalaxies added 3 commits October 13, 2025 13:41

Merge branch 'multi-path' of github.com:VGalaxies/iotdb into multi-path

3b1a19f

support path like like root.db1.a,b.**

82e47e7

Merge branch 'master' into multi-path

41ba1f7

VGalaxies mentioned this pull request Oct 13, 2025

Pipe: support multiple path patterns under tree model #16575

Merged

VGalaxies closed this Oct 13, 2025

VGalaxies deleted the multi-path branch October 21, 2025 02:41

VGalaxies restored the multi-path branch October 21, 2025 02:41

	treePatterns.get(0), // TODO
	treePatterns.isEmpty() ? null : treePatterns.get(0), // Safe: pass null if empty

	if (pattern.isEmpty()) {
	if (pattern == null \|\| pattern.trim().isEmpty()) {

	treePatterns.stream().allMatch(treePattern -> treePattern.coversDb(databaseName));
	treePatterns.stream().anyMatch(treePattern -> treePattern.coversDb(databaseName));

	return Collections.singletonList(patternSupplier.apply(pattern));
	// Return an empty list if the pattern is empty, to avoid creating meaningless pattern matchers.
	return Collections.emptyList();

-    return initEventParsers().get(0).convertToTablet();
+    List<TabletInsertionEventParser> parsers = initEventParsers();
+    if (parsers == null || parsers.isEmpty()) {
+      // Handle the case where no parser exists; return null or throw an exception as appropriate
+      return null;
+    }
+    return parsers.get(0).convertToTablet();

Uh oh!

Conversation

VGalaxies commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

Caideyipi Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

Caideyipi Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

Caideyipi Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

jt2594838 Oct 11, 2025

Choose a reason for hiding this comment

Uh oh!

jt2594838 Oct 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

VGalaxies commented Sep 18, 2025 •

edited

Loading