Skip to content

chore: improve performance of upstream's trim command#3170

Merged
SirYwell merged 3 commits intoIntellectualSites:mainfrom
Timongcraft:chore/cleanup-trim-cmd
Apr 14, 2025
Merged

chore: improve performance of upstream's trim command#3170
SirYwell merged 3 commits intoIntellectualSites:mainfrom
Timongcraft:chore/cleanup-trim-cmd

Conversation

@Timongcraft
Copy link
Copy Markdown
Contributor

Cleanup for #3168

Submitter Checklist

  • Make sure you are opening from a topic branch (/feature/fix/docs/ branch (right side)) and not your main branch.
  • Ensure that the pull request title represents the desired changelog entry.
  • I read and followed the contribution guidelines.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 8, 2025

Please take a moment and address the merge conflicts of your pull request. Thanks!

@Timongcraft Timongcraft force-pushed the chore/cleanup-trim-cmd branch from abe16a1 to fa82f64 Compare April 8, 2025 20:51
@Timongcraft Timongcraft marked this pull request as ready for review April 8, 2025 20:59
@Timongcraft Timongcraft requested a review from a team as a code owner April 8, 2025 21:00
@SirYwell
Copy link
Copy Markdown
Member

SirYwell commented Apr 9, 2025

This is more a performance improvement than a cleanup. I gave the current implementation a try on a 2048 x 320 x 2048 selection using //trim emerald_ore (tbh, that's probably an extreme case) and it is depressingly slow. I gave up after 30 minutes (!) of waiting. In comparison, a simple //count emerald_ore takes 11 seconds on the same selection.

From what it looks like, the BlockVectors are a rather minor problem. More problematic:

  1. The logic starts with minY and maxY. This is already bad because it means we use loaded chunks extremely inefficiently
  2. The mask directly accesses the world instead of going through an EditSession, further skipping valuable caching mechanisms.
  3. The whole command only uses one thread.

I tried fixing 2. but I still gave up after 30 minutes of waiting. On a 512 x 320 x 512 selection, trimming "only" takes 20-30 seconds, showcasing how bad 1. scales when we need to check many blocks.

If you want, I suggest the following (given that //count has good performance):
Instead of trying to be smart about finding the bounds, we just process the whole region using a custom filter that just stores min/max. I gave that a quick try

Subject: [PATCH] use filter
---
Index: worldedit-core/src/main/java/com/sk89q/worldedit/command/SelectionCommands.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/worldedit-core/src/main/java/com/sk89q/worldedit/command/SelectionCommands.java b/worldedit-core/src/main/java/com/sk89q/worldedit/command/SelectionCommands.java
--- a/worldedit-core/src/main/java/com/sk89q/worldedit/command/SelectionCommands.java	(revision aee75c1249511be052d3acfabdd2f83c11b58e49)
+++ b/worldedit-core/src/main/java/com/sk89q/worldedit/command/SelectionCommands.java	(date 1744187417556)
@@ -21,7 +21,10 @@
 
 import com.fastasyncworldedit.core.configuration.Caption;
 import com.fastasyncworldedit.core.extent.clipboard.URIClipboardHolder;
+import com.fastasyncworldedit.core.extent.filter.MaskFilter;
+import com.fastasyncworldedit.core.extent.filter.block.FilterBlock;
 import com.fastasyncworldedit.core.function.mask.IdMask;
+import com.fastasyncworldedit.core.queue.Filter;
 import com.fastasyncworldedit.core.regions.selector.FuzzyRegionSelector;
 import com.fastasyncworldedit.core.regions.selector.PolyhedralRegionSelector;
 import com.fastasyncworldedit.core.util.MaskTraverser;
@@ -95,6 +98,7 @@
 import java.net.URI;
 import java.util.List;
 import java.util.Optional;
+import java.util.concurrent.atomic.AtomicIntegerArray;
 import java.util.stream.Stream;
 
 import static com.sk89q.worldedit.command.util.Logging.LogMode.POSITION;
@@ -518,7 +522,7 @@
     )
     @Logging(REGION)
     @CommandPermissions("worldedit.selection.trim")
-    public void trim(Actor actor, World world, LocalSession session,
+    public void trim(Actor actor, World world, EditSession editSession, LocalSession session,
                      @Arg(desc = "Mask of blocks to keep within the selection", def = "#existing")
                          Mask mask) throws WorldEditException {
         // Avoid checking blocks outside the original region but within the cuboid region
@@ -526,6 +530,7 @@
         if (!(originalRegion instanceof CuboidRegion)) {
             mask = new MaskIntersection(new RegionMask(originalRegion), mask);
         }
+        new MaskTraverser(mask).setNewExtent(editSession);
 
         // Result region will be cuboid
         CuboidRegion region = originalRegion.getBoundingBox();
@@ -536,106 +541,39 @@
         int minY = 0;
         boolean found = false;
 
-        outer: for (int y = min.y(); y <= max.y(); y++) {
-            for (int x = min.x(); x <= max.x(); x++) {
-                for (int z = min.z(); z <= max.z(); z++) {
-                    BlockVector3 vec = BlockVector3.at(x, y, z);
-
-                    if (mask.test(vec)) {
-                        found = true;
-                        minY = y;
-
-                        break outer;
-                    }
-                }
-            }
-        }
-
-        // If anything was found in the first pass, then the remaining variables are guaranteed to be set
-        if (!found) {
-            throw new StopExecutionException(Caption.of("worldedit.trim.no-blocks"));
-        }
-
-        int maxY = minY;
-
-        outer: for (int y = max.y(); y > minY; y--) {
-            for (int x = min.x(); x <= max.x(); x++) {
-                for (int z = min.z(); z <= max.z(); z++) {
-                    BlockVector3 vec = BlockVector3.at(x, y, z);
-
-                    if (mask.test(vec)) {
-                        maxY = y;
-                        break outer;
-                    }
-                }
-            }
-        }
-
-        int minX = 0;
-
-        outer: for (int x = min.x(); x <= max.x(); x++) {
-            for (int z = min.z(); z <= max.z(); z++) {
-                for (int y = minY; y <= maxY; y++) {
-                    BlockVector3 vec = BlockVector3.at(x, y, z);
-
-                    if (mask.test(vec)) {
-                        minX = x;
-                        break outer;
-                    }
-                }
-            }
-        }
-
-        int maxX = minX;
-
-        outer: for (int x = max.x(); x > minX; x--) {
-            for (int z = min.z(); z <= max.z(); z++) {
-                for (int y = minY; y <= maxY; y++) {
-                    BlockVector3 vec = BlockVector3.at(x, y, z);
-
-                    if (mask.test(vec)) {
-                        maxX = x;
-                        break outer;
-                    }
-                }
-            }
-        }
-
-        int minZ = 0;
-
-        outer: for (int z = min.z(); z <= max.z(); z++) {
-            for (int x = minX; x <= maxX; x++) {
-                for (int y = minY; y <= maxY; y++) {
-                    BlockVector3 vec = BlockVector3.at(x, y, z);
+        final AtomicIntegerArray values = editSession.apply(
+                region, new MaskFilter<>(
+                        new Filter() {
+                            final AtomicIntegerArray values = new AtomicIntegerArray(new int[] {
+                                    min.x(),
+                                    max.x(),
+                                    min.y(),
+                                    max.y(),
+                                    min.z(),
+                                    max.z(),
+                            });
 
-                    if (mask.test(vec)) {
-                        minZ = z;
-                        break outer;
-                    }
-                }
-            }
-        }
-
-        int maxZ = minZ;
-
-        outer: for (int z = max.z(); z > minZ; z--) {
-            for (int x = minX; x <= maxX; x++) {
-                for (int y = minY; y <= maxY; y++) {
-                    BlockVector3 vec = BlockVector3.at(x, y, z);
+                            @Override
+                            public void applyBlock(final FilterBlock block) {
+                                values.accumulateAndGet(0, block.x(), Math::min);
+                                values.accumulateAndGet(1, block.x(), Math::max);
+                                values.accumulateAndGet(2, block.y(), Math::min);
+                                values.accumulateAndGet(3, block.y(), Math::max);
+                                values.accumulateAndGet(4, block.z(), Math::min);
+                                values.accumulateAndGet(5, block.z(), Math::max);
+                            }
 
-                    if (mask.test(vec)) {
-                        maxZ = z;
-                        break outer;
-                    }
-                }
-            }
-        }
+                        }, mask
+                ), true
+        ).getParent().values;
 
         final CuboidRegionSelector selector;
         if (session.getRegionSelector(world) instanceof ExtendingCuboidRegionSelector) {
-            selector = new ExtendingCuboidRegionSelector(world, BlockVector3.at(minX, minY, minZ), BlockVector3.at(maxX, maxY, maxZ));
+            selector = new ExtendingCuboidRegionSelector(world, BlockVector3.at(values.get(0), values.get(2), values.get(4)),
+                    BlockVector3.at(values.get(1), values.get(3), values.get(5)));
         } else {
-            selector = new CuboidRegionSelector(world, BlockVector3.at(minX, minY, minZ), BlockVector3.at(maxX, maxY, maxZ));
+            selector = new CuboidRegionSelector(world, BlockVector3.at(values.get(0), values.get(2), values.get(4)),
+                    BlockVector3.at(values.get(1), values.get(3), values.get(5)));
         }
         session.setRegionSelector(world, selector);

This now takes 10 seconds for the large selection when using //trim emerald_ore but 60 seconds for just //trim (as we have high overhead on accumulateAndGet then). That overhead could be reduced by properly forking and joining the filter and just use plain fields instead of atomics. This solution also currently doesn't properly handle the case when nothing was trimmed, but I think the idea is clear.

I leave it up to you whether you want to play around with this more, but from my very basic measurements, the mutable vectors don't provide as much benefit as expected.

Co-authored-by: Hannes Greule <SirYwell@users.noreply.github.com>
@Timongcraft
Copy link
Copy Markdown
Contributor Author

Timongcraft commented Apr 9, 2025

I've added your code suggestion without atomics, but as I'm not really into FAWE I think it would be better to "hand this PR over" if you want to make additional changes.

For my testing I added the papermc repo to get paperweight, since it didn't work with the enginehub repo

@SirYwell SirYwell changed the title chore: cleanup upstream's trim command chore: improve performance of upstream's trim command Apr 10, 2025
@SirYwell
Copy link
Copy Markdown
Member

Thanks @Timongcraft

@dordsor21 dordsor21 requested a review from a team April 11, 2025 16:24
@SirYwell SirYwell merged commit d1865b6 into IntellectualSites:main Apr 14, 2025
10 checks passed
@Timongcraft Timongcraft deleted the chore/cleanup-trim-cmd branch April 15, 2025 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants