Native allocator: dynamic settings, query/datafusion pools, plugin nodeStats by gaurav-amz · Pull Request #21732 · opensearch-project/OpenSearch

gaurav-amz · 2026-05-19T08:45:23Z

Description

Builds on #21703 to add the query, datafusion with allocator, dynamic-tuning, stats, and pool-wiring pieces.

Architecture: three native-memory trackers, three knobs

Each tracker owns the bytes it can actually see, with a process-level cap above all of them.

            ┌──────────────────────────────────────────────────────────┐
            │  Process-level cap (admission control)                   │
            │                                                          │
            │  node.native_memory.limit                                │
            │    Defaults to 79% × (RAM − JVM heap), cgroup-aware.     │
            │                                                          │
            │  ┌────────────────────────┐  ┌────────────────────────┐  │
            │  │ Arrow tracker          │  │ DataFusion tracker     │  │
            │  │ (ArrowNativeAllocator. │  │ (Rust MemoryPool)      │  │
            │  │  root)                 │  │                        │  │
            │  │                        │  │ Tracks:                │  │
            │  │ Tracks:                │  │   HashAggregate        │  │
            │  │   POOL_FLIGHT (5% NM)  │  │   Sort buffers         │  │
            │  │   POOL_INGEST (8% NM)  │  │   TopK heap            │  │
            │  │   POOL_QUERY  (5% NM)  │  │   Spill staging        │  │
            │  │                        │  │                        │  │
            │  │ Bounded by:            │  │ Bounded by:            │  │
            │  │  native.allocator      │  │  datafusion.memory_    │  │
            │  │  .root.limit           │  │  pool_limit_bytes      │  │
            │  │  (20% × NM by default) │  │  (75% × NM by default) │  │
            │  └────────────────────────┘  └────────────────────────┘  │
            │                                                          │
            └──────────────────────────────────────────────────────────┘

Three layers, three responsibilities. Each is necessary; none can be replaced by another.

Arrow tracker (ArrowNativeAllocator) accounts for every Arrow BufferAllocator allocation in the JVM. Partitioned into FLIGHT / INGEST / QUERY pools so one plugin can't starve another. All Arrow buffers descend from the same RootAllocator, preserving Arrow's same-root invariant for cross-plugin zero-copy handoff.
DataFusion tracker (Rust MemoryPool) accounts for DataFusion's own working memory and triggers spill / fail-fast when a query exceeds budget. Lives entirely on the Rust side; updates flow in via FFM through NativeBridge.setMemoryPoolLimit.
Admission control caps the OS process. node.native_memory.limit is the operator-declared off-heap budget AC throttles against; framework-derived defaults (root, pools, DataFusion pool) all scale from this single number.

POOL_DATAFUSION is intentionally not present. DataFusion's working memory is allocated by Rust operators directly and reported only to DataFusion's own MemoryPool — it never flows through Arrow's BufferAllocator API. Adding a Java-side pool that pretended to track it would have required either a per-allocation FFM round-trip (performance disaster) or a config-only mirror (a setting that returns HTTP 200 and silently does nothing). The Rust-side datafusion.memory_pool_limit_bytes is the honest knob for that layer.

Worked example: 64 GB / 16 GB-heap node

With operator-declared -Xmx16g on a 64 GB host (bare metal or cgroup-limited container), defaults compose to:

64 GB RAM
  │ ├── JVM Heap (16 GB / 25%)
  │ │   ├── Indexing pressure, Writer buffer
  │ │   ├── Transport Netty
  │ │   └── etc.
  │ │
  │ └── Off-Heap (48 GB / 75%)
  │     │
  │     ├── node.native_memory.limit (37.92 GB / 79% of off-heap)
  │     │   │
  │     │   ├── Java Arrow Root (7.58 GB / 20% of NM)
  │     │   │   ├── POOL_FLIGHT  (1.90 GB / 5% of NM)
  │     │   │   │   └── server, client — flight transport batches
  │     │   │   ├── POOL_INGEST  (3.03 GB / 8% of NM)
  │     │   │   │   └── ArrowBufferPool / parquet VSR allocators
  │     │   │   └── POOL_QUERY   (1.90 GB / 5% of NM)
  │     │   │       └── analytics-search-service → per-query children
  │     │   │
  │     │   └── DataFusion Rust pool (28.44 GB / 75% of NM)
  │     │       └── datafusion.memory_pool_limit_bytes
  │     │           ├── hash tables, sort buffers, aggs, intermediate batches
  │     │           └── triggers spill on exhaustion
  │     │
  │     └── Unmanaged (~10.08 GB / 21% of off-heap)
  │         ├── Lucene mmap
  │         ├── OS Page Cache
  │         └── Sidecars / non-Arrow native consumers
  │
  │ Independent (disk):
  │   datafusion.spill_memory_limit_bytes (32 GB / 50% of physical RAM)

As a flat table for quick reference:

RAM                                              64 GB
JVM heap (operator-configured -Xmx)              16 GB
Off-heap (RAM − heap)                            48 GB
node.native_memory.limit  (79% of off-heap)      37.92 GB    ← AC throttle threshold
  native.allocator.root.limit  (20% of NM)        7.58 GB    ← Arrow framework cap
    pool.flight.max  (5% of NM)                   1.90 GB    ← Flight transport pool
    pool.ingest.max  (8% of NM)                   3.03 GB    ← Parquet VSR pool
    pool.query.max   (5% of NM)                   1.90 GB    ← analytics-engine pool
  datafusion.memory_pool_limit  (75% of NM)      28.44 GB    ← Rust runtime pool (sibling)
Unmanaged (Lucene mmap, OS page cache, etc.)     10.08 GB

Independent budget (disk staging, not memory):

datafusion.spill_memory_limit_bytes (50% of physical RAM)    32 GB

How a query flows through these layers

Take a concrete example: a user issues a PPL query that goes through analytics-engine and dispatches to DataFusion.

  1. Coordinator receives the request.
     → Admission control checks node.native_memory.limit budget.
     → If OK, proceeds; otherwise rejects with 429.

  2. AnalyticsSearchService creates a per-fragment Arrow allocator:
        allocator = getPoolAllocator(POOL_QUERY).newChildAllocator("frag-N", 0, Long.MAX_VALUE)
     → Future BufferAllocator.buffer(...) calls on this allocator
       increment POOL_QUERY's counter and the root counter via Arrow's
       parent-cap chain at allocateBytes.
     → Bounded by native.allocator.pool.query.max.

  3. AnalyticsSearchService dispatches to DataFusion via NativeBridge.
     → NativeBridge.executeQueryAsync(...) marshals plan bytes via FFM.

  4. DataFusion runs the query in Rust:
     → HashAggregate builds a hash table.
     → reservation.try_grow(50MB) → DataFusion MemoryPool counter += 50MB.
     → Bounded by datafusion.memory_pool_limit_bytes.
     → If exceeded, HashAggregate spills to disk
       (which itself is bounded by datafusion.spill_memory_limit_bytes).

  5. DataFusion produces a result batch in Rust.
     → Java imports it via Arrow C Data Interface.
     → The import allocates Java ArrowBufs under the per-fragment allocator
       from step 2. POOL_QUERY counter += result_size.

  6. Result returns to coordinator.
     → Per-fragment allocator closes; POOL_QUERY counter decrements.
     → DataFusion query completes; MemoryPool counter decrements.

Each layer accounts for what it owns. No double-counting (the result bytes are counted in DataFusion's pool only while they exist as Rust buffers; after Java imports them, ownership transfers and the Java side counts them; when Java closes the import, the bytes are freed). Each operator-tunable knob bounds a real, observable thing.

Operator surface

After this PR, an operator inspecting a node running the query above sees:

{
  "native_allocator": {
    "root": {"allocated": "150MB", "limit": "7.58GB"},
    "pools": {
      "flight":  {"allocated": "20MB",  "limit": "1.90GB"},
      "ingest":  {"allocated": "30MB",  "limit": "3.03GB"},
      "query":   {"allocated": "100MB", "limit": "1.90GB"}
    }
  },
  "datafusion": {
    "memory_pool": {"usage": "2.4GB", "limit": "28.44GB"},
    "spill":       {"usage": "0",     "limit": "32GB"}
  }
}

Three numbers, three sources, three knobs. The operator looks at each in isolation when tuning that layer:

"DataFusion queries are OOMing" → bump datafusion.memory_pool_limit_bytes.
"Flight RPC is starving Parquet ingest" → bump parquet.native.pool.flight.max or lower parquet.native.pool.ingest.max.
"Whole node is using too much memory" → lower node.native_memory.limit.

All pool min/max settings are Setting.Property.Dynamic. Pool limit changes propagate to consumer-side child allocators automatically via Arrow's parent-cap check at allocateBytes — child allocators are created with Long.MAX_VALUE and inherit the live parent cap on every allocation, so dynamic resizes reach in-flight workloads without restart and without an explicit notification SPI. The grouped validator rejects cross-setting violations (sum of pool mins > root, per-pool min > max) at PUT time with HTTP 400 rather than at the next allocation.

Behavior change to call out

Admission control is now active by default. node.native_memory.limit defaults to 79% × (RAM − JVM heap) instead of 0 (unconfigured). On upgrade, AC will start tracking native memory utilization and the framework's pool caps will derive sensible values from the operator's declared off-heap budget without any explicit configuration.

Operators who want pre-existing opt-out behavior (AC unconfigured, all framework caps unbounded) can set:

node.native_memory.limit: 0b

This restores the prior "explicit-opt-in" semantics — useful for nodes where Lucene mmap or non-Arrow native consumers dominate and the operator does not want admission control throttling search/index traffic.

Changes after initial review

Drop NativeAllocatorListener SPI. The SPI was emulating Arrow's native parent-cap check at allocateBytes. With consumer-side child allocators created at Long.MAX_VALUE, dynamic pool resizes propagate automatically through Arrow's Accountant.allocate parent walk — no listener machinery needed. Resolves @bowenlan-amzn (Native allocator: dynamic settings, query/datafusion pools, plugin nodeStats #21732 (comment), Native allocator: dynamic settings, query/datafusion pools, plugin nodeStats #21732 (comment)) and @Bukhtawar (Native allocator: dynamic settings, query/datafusion pools, plugin nodeStats #21732 (comment)).
Fix Guice duplicate-binding regression. Removed the redundant ArrowBasePlugin#createGuiceModules override; Node.java already auto-binds every component returned from createComponents. This was the root cause of the gradle-check Jenkins CI failures across multiple SHAs.
@ExperimentalApi on Plugin#nodeStats() to match the annotation already on PluginNodeStats. Resolves @bowenlan-amzn (Native allocator: dynamic settings, query/datafusion pools, plugin nodeStats #21732 (comment)).
FQN → import in ParquetIndexingEngine (3 constructor parameters). Resolves @bowenlan-amzn (Native allocator: dynamic settings, query/datafusion pools, plugin nodeStats #21732 (comment)).
Default tuning: node.native_memory.limit derives from RAM−heap (79%); root.limit is 20% of NM (Arrow gets a small fraction; DataFusion gets the larger 75% as a sibling); pool maxes anchor to NM at 5%/8%/5%; datafusion.memory_pool_limit_bytes is 75% of NM (replaces the prior JVM-heap-derived default flagged by @bharath-techie at Native allocator: dynamic settings, query/datafusion pools, plugin nodeStats #21732 (comment)); datafusion.spill_memory_limit_bytes is 50% of physical RAM (independent disk-staging budget).
3 boundary integration tests in NativeAllocatorBoundaryIT (plugins/arrow-flight-rpc/src/internalClusterTest) boot a real cluster with tight memory settings and verify cap enforcement end-to-end through actual Arrow allocations: per-pool max rejection, root-level rejection across pools, and dynamic pool-resize propagation to in-flight Long.MAX_VALUE children.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

github-actions · 2026-05-19T08:47:18Z

PR Reviewer Guide 🔍

(Review updated until commit `dcddc3c`)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review Possible Race Condition The `rebalancerEnabled` flag is set in `setRebalanceInterval` and read in `getOrCreatePool` without synchronization. If `setRebalanceInterval` is called concurrently with `getOrCreatePool`, a pool could be created with the wrong initial limit. For example, if `rebalancerEnabled` is read as `false` while `setRebalanceInterval(60)` is executing, the pool starts at `max` when it should start at `min`. This can cause pools to over-allocate before the first rebalance tick. rebalancerEnabled = intervalSeconds > 0; if (rebalancerEnabled) { rebalanceTask = rebalancer.scheduleAtFixedRate(this::rebalance, intervalSeconds, intervalSeconds, TimeUnit.SECONDS); } } @Override public PoolHandle getOrCreatePool(String poolName, long limit) { return getOrCreatePool(poolName, limit, limit); } /** * Creates or returns a pool with min/max limits. * * @param poolName logical pool name * @param min guaranteed minimum bytes (always available) * @param max maximum bytes the pool can burst to * @return the pool handle / public PoolHandle getOrCreatePool(String poolName, long min, long max) { poolMins.putIfAbsent(poolName, min); poolMaxes.putIfAbsent(poolName, max); return pools.computeIfAbsent(poolName, name -> { // Pick an initial limit that's safe for both rebalancer-on and rebalancer-off // deployments. When rebalancing is enabled, start at min (the original PR's // "guarantee + burst" semantics): the next rebalance tick will distribute // headroom up to each pool's max. When rebalancing is disabled (the default), // pools with min=0 would otherwise reject every allocation until a tick that // never comes — start at max so consumers can allocate immediately. long initial = rebalancerEnabled ? min : max; BufferAllocator child = root.newChildAllocator(name, 0, initial); Overflow Risk* In `rebalance`, `bonusPerPool` is computed as `headroom / poolCount`. If `poolCount` is zero (all pools removed between the check and the loop), the division is safe but the subsequent loop is a no-op. However, if `headroom` is `Long.MAX_VALUE` and `poolCount` is 1, `bonusPerPool` equals `Long.MAX_VALUE`, and `min + bonusPerPool` at line 256 can overflow. The code does not guard against this scenario, which can occur if the root limit is `Long.MAX_VALUE` and a single pool has `min = 1`. int poolCount = pools.size(); long bonusPerPool = poolCount > 0 ? headroom / poolCount : 0; for (Map.Entry<String, ArrowPoolHandle> entry : pools.entrySet()) { String name = entry.getKey(); BufferAllocator alloc = entry.getValue().allocator; long min = poolMins.getOrDefault(name, 0L); long max = poolMaxes.getOrDefault(name, Long.MAX_VALUE); long currentAllocation = alloc.getAllocatedMemory(); long effectiveLimit = min + bonusPerPool; Possible Issue `updateSpillMemoryLimit` checks `isSpillLimitDynamic()` and logs a warning if false, but the cluster-settings update is accepted unconditionally. If the native library does not support runtime updates, the new value is stored in cluster state but never applied to the running service. On the next node restart, the service reads the new value from settings and applies it, but until then the node runs with the old limit. This can cause confusion when operators PUT a new limit, see HTTP 200, but observe no change in behavior until restart. void updateSpillMemoryLimit(long newLimitBytes) { DataFusionService service = dataFusionService; if (service == null) { logger.debug("DataFusion service not yet initialized; ignoring spill limit update to {}B", newLimitBytes); return; } if (!service.isSpillLimitDynamic()) { logger.warn( "Updated DataFusion spill memory limit to {}B at the cluster level; the loaded native library does not " + "support runtime spill resize, so the new value will only take effect after a node restart", newLimitBytes ); return; } try { service.setSpillMemoryLimit(newLimitBytes); logger.info("Updated DataFusion spill memory limit to {}B", newLimitBytes); } catch (IllegalStateException e) { logger.warn("Ignoring spill memory limit update to {}B; service is not running", newLimitBytes); } catch (UnsupportedOperationException e) { // isSpillLimitDynamic() guard above should make this unreachable, but defend // against a race between probe and call. logger.warn("Ignoring spill memory limit update to {}B; native runtime does not support live updates", newLimitBytes); }

github-actions · 2026-05-19T08:48:01Z

PR Code Suggestions ✨

Latest suggestions up to dcddc3c

Explore these optional code suggestions:

Category	Suggestion	Impact
General	Skip unnecessary reads when registry is null When `registry == null`, the loop continues reading `name` and `payload` for all remaining entries but discards them. This wastes I/O and CPU. After detecting `registry ==` `null`, break out of the loop immediately to avoid unnecessary reads. server/src/main/java/org/opensearch/action/admin/cluster/node/stats/NodeStats.java [297-334] private static Map<String, PluginNodeStats> readPluginStats(StreamInput in) throws IOException { int size = in.readVInt(); if (size == 0) { return Collections.emptyMap(); } NamedWriteableRegistry registry = in.namedWriteableRegistry(); + if (registry == null) { + for (int i = 0; i < size; i++) { + in.readString(); + in.readBytesReference(); + } + return Collections.emptyMap(); + } Map<String, PluginNodeStats> result = new HashMap<>(size); for (int i = 0; i < size; i++) { String name = in.readString(); BytesReference payload = in.readBytesReference(); - if (registry == null) { - continue; - } try ( StreamInput rawIn = payload.streamInput(); NamedWriteableAwareStreamInput payloadIn = new NamedWriteableAwareStreamInput(rawIn, registry) ) { payloadIn.setVersion(in.getVersion()); PluginNodeStats stats = payloadIn.readNamedWriteable(PluginNodeStats.class); result.put(name, stats); } catch (IOException \| IllegalArgumentException e) { - // Receiver doesn't have the plugin's NamedWriteable registered (typical - // during rolling upgrades or in a non-uniform plugin install). Drop the - // entry; the rest of NodeStats remains decodable. } } return result; } Suggestion importance[1-10]: 6 __ Why: The suggestion correctly identifies that when `registry == null`, the loop continues reading and discarding all entries. Breaking early after detecting `registry == null` avoids unnecessary I/O. However, the improved code must still consume the remaining entries from the stream to maintain wire-format alignment—the suggestion's loop does this correctly. The optimization is valid and improves efficiency in the defensive `registry == null` path.	Low
	Distribute headroom remainder to avoid truncation The division `headroom / poolCount` can lose precision when headroom is not evenly divisible by poolCount. Consider using a remainder distribution strategy to allocate leftover bytes to pools, ensuring the full headroom is utilized rather than truncated. plugins/arrow-base/src/main/java/org/opensearch/arrow/allocator/ArrowNativeAllocator.java [247-248] long bonusPerPool = poolCount > 0 ? headroom / poolCount : 0; +long remainder = poolCount > 0 ? headroom % poolCount : 0; +// Distribute remainder to first `remainder` pools Suggestion importance[1-10]: 4 __ Why: The suggestion correctly identifies precision loss from integer division, but the impact is minor—leftover bytes are typically small relative to pool sizes. The added complexity of remainder distribution may not justify the marginal improvement in headroom utilization.	Low
	Validate duplicate plugin registration The method modifies a `LinkedHashSet` after construction, which could break plugin ordering guarantees if `mockPlugins` contains duplicates or conflicts with `nodePlugins()`. Consider validating that `mockPlugins` doesn't contain duplicates of plugins already in `nodePlugins()` to maintain deterministic ordering. test/framework/src/main/java/org/opensearch/test/InternalTestCluster.java [522-524] Set<Class<? extends Plugin>> plugins = new LinkedHashSet<>(nodeConfigurationSource.nodePlugins()); -plugins.addAll(mockPlugins); +for (Class<? extends Plugin> mockPlugin : mockPlugins) { + if (!plugins.add(mockPlugin)) { + logger.warn("Mock plugin {} already present in nodePlugins(), skipping duplicate", mockPlugin.getName()); + } +} return plugins; Suggestion importance[1-10]: 4 __ Why: The suggestion addresses a potential issue with duplicate plugins in `mockPlugins`, but the concern is somewhat theoretical. The `LinkedHashSet` naturally handles duplicates by not adding them twice, maintaining the first occurrence. Adding explicit validation with logging could help with debugging but doesn't fix a critical bug. The impact is moderate as it mainly improves observability.	Low

Previous suggestions

Suggestions up to commit dcddc3c

Category	Suggestion	Impact
General	Log when min exceeds max When `newMin` exceeds `max`, the method silently caps at `max` without logging or warning. Operators may expect the min to be honored and not realize it's being capped, leading to confusion during troubleshooting. plugins/arrow-base/src/main/java/org/opensearch/arrow/allocator/ArrowNativeAllocator.java [160-172] public void setPoolMin(String poolName, long newMin) { ArrowPoolHandle handle = pools.get(poolName); if (handle == null) { throw new IllegalStateException("Pool '" + poolName + "' does not exist"); } poolMins.put(poolName, newMin); long max = poolMaxes.getOrDefault(poolName, Long.MAX_VALUE); long current = handle.allocator.getLimit(); long target = Math.min(newMin, max); + if (newMin > max) { + logger.warn("Pool '{}' min ({}) exceeds max ({}); capping at max", poolName, newMin, max); + } if (target > current) { handle.allocator.setLimit(target); } } Suggestion importance[1-10]: 6 __ Why: Adding a warning when `newMin` exceeds `max` improves operator visibility into silent capping behavior. The suggestion is correct and helpful for troubleshooting, though the scenario is already caught by the grouped validator at cluster-settings update time, so the log would only fire if the validator is bypassed or misconfigured. Moderate impact.	Low
	Rate-limit spill-limit warning logs The method logs a warning every time a spill-limit update is attempted when the native library doesn't support dynamic resize. In a cluster with frequent settings updates, this can spam logs. Consider rate-limiting or logging only on the first occurrence. sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DataFusionPlugin.java [420-444] +private volatile boolean spillLimitWarningLogged = false; + void updateSpillMemoryLimit(long newLimitBytes) { DataFusionService service = dataFusionService; if (service == null) { logger.debug("DataFusion service not yet initialized; ignoring spill limit update to {}B", newLimitBytes); return; } if (!service.isSpillLimitDynamic()) { - logger.warn( - "Updated DataFusion spill memory limit to {}B at the cluster level; the loaded native library does not " - + "support runtime spill resize, so the new value will only take effect after a node restart", - newLimitBytes - ); + if (!spillLimitWarningLogged) { + logger.warn( + "Updated DataFusion spill memory limit to {}B at the cluster level; the loaded native library does not " + + "support runtime spill resize, so the new value will only take effect after a node restart", + newLimitBytes + ); + spillLimitWarningLogged = true; + } return; } ... } Suggestion importance[1-10]: 5 __ Why: The suggestion correctly identifies that repeated warnings can spam logs when the native library doesn't support dynamic spill resize. Rate-limiting via a boolean flag is a simple, effective mitigation. However, the scenario is expected to be rare (only during rolling upgrades or when an operator repeatedly updates the setting), so the impact is moderate.	Low
	Log when registry is null When `registry == null`, the method silently skips all entries without logging. This defensive path is rarely hit in production, but when it does occur, operators have no visibility into why plugin stats are missing. Add a debug log to aid troubleshooting. server/src/main/java/org/opensearch/action/admin/cluster/node/stats/NodeStats.java [310-334] private static Map<String, PluginNodeStats> readPluginStats(StreamInput in) throws IOException { int size = in.readVInt(); if (size == 0) { return Collections.emptyMap(); } NamedWriteableRegistry registry = in.namedWriteableRegistry(); + if (registry == null) { + logger.debug("NamedWriteableRegistry not attached to stream; dropping {} plugin stats entries", size); + for (int i = 0; i < size; i++) { + in.readString(); + in.readBytesReference(); + } + return Collections.emptyMap(); + } Map<String, PluginNodeStats> result = new HashMap<>(size); for (int i = 0; i < size; i++) { String name = in.readString(); BytesReference payload = in.readBytesReference(); - if (registry == null) { - continue; - } try ( StreamInput rawIn = payload.streamInput(); NamedWriteableAwareStreamInput payloadIn = new NamedWriteableAwareStreamInput(rawIn, registry) ) { payloadIn.setVersion(in.getVersion()); PluginNodeStats stats = payloadIn.readNamedWriteable(PluginNodeStats.class); result.put(name, stats); } catch (IOException \| IllegalArgumentException e) { - // Receiver doesn't have the plugin's NamedWriteable registered (typical - // during rolling upgrades or in a non-uniform plugin install). Drop the - // entry; the rest of NodeStats remains decodable. + // Receiver doesn't have the plugin's NamedWriteable registered } } return result; } Suggestion importance[1-10]: 5 __ Why: Adding a debug log when `registry == null` improves troubleshooting visibility for a defensive path that should rarely be hit in production. The suggestion is correct and helpful, though the scenario is uncommon (production paths always attach a registry), so the impact is moderate. The improved code also correctly consumes the skipped entries to avoid stream corruption.	Low
	Avoid headroom truncation in rebalance The division `headroom / poolCount` can lose precision when distributing small headroom across multiple pools. Consider using a remainder-aware distribution to ensure all available headroom is allocated rather than truncated. plugins/arrow-base/src/main/java/org/opensearch/arrow/allocator/ArrowNativeAllocator.java [247] long bonusPerPool = poolCount > 0 ? headroom / poolCount : 0; +long remainder = poolCount > 0 ? headroom % poolCount : 0; +// Distribute remainder to first N pools to avoid losing bytes to truncation Suggestion importance[1-10]: 4 __ Why: The suggestion correctly identifies that integer division can lose precision when distributing headroom. However, the impact is minor—typically a few bytes lost to truncation—and the added complexity of remainder distribution may not be worth it for the rebalancer's coarse-grained allocation model. The suggestion is valid but offers marginal improvement.	Low
	Preserve plugin order when adding mockPlugins The `mockPlugins` collection is added to a `LinkedHashSet` using `addAll()`, which may not preserve insertion order if `mockPlugins` itself is unordered. This could reintroduce non-determinism that the `LinkedHashSet` was meant to prevent. Consider converting `mockPlugins` to a `LinkedHashSet` during initialization or ensuring it maintains order. test/framework/src/main/java/org/opensearch/test/InternalTestCluster.java [522-523] Set<Class<? extends Plugin>> plugins = new LinkedHashSet<>(nodeConfigurationSource.nodePlugins()); -plugins.addAll(mockPlugins); +// Preserve order when adding mockPlugins +mockPlugins.forEach(plugins::add); Suggestion importance[1-10]: 3 __ Why: The suggestion raises a valid concern about potential order preservation, but `addAll()` on a `LinkedHashSet` does preserve the iteration order of the source collection. The `improved_code` using `forEach` is functionally equivalent and doesn't provide meaningful improvement. The concern is somewhat theoretical unless `mockPlugins` is proven to be unordered.	Low

Suggestions up to commit 1bd0be2

Category	Suggestion	Impact
Possible issue	Validate sum of pool maxes Add validation to ensure the sum of pool maxes does not exceed the root limit. The current validator only checks that sum of mins ≤ root, but operators can configure pool maxes that collectively exceed the root cap, leading to runtime allocation failures when pools attempt to burst simultaneously. plugins/arrow-base/src/main/java/org/opensearch/arrow/allocator/ArrowBasePlugin.java [322-334] static void validateUpdate(Settings settings) { long rootLimit = ROOT_LIMIT_SETTING.get(settings); long flightMin = FLIGHT_MIN_SETTING.get(settings); long flightMax = FLIGHT_MAX_SETTING.get(settings); long ingestMin = INGEST_MIN_SETTING.get(settings); long ingestMax = INGEST_MAX_SETTING.get(settings); long queryMin = QUERY_MIN_SETTING.get(settings); long queryMax = QUERY_MAX_SETTING.get(settings); validateMinMax(NativeAllocatorPoolConfig.POOL_FLIGHT, flightMin, flightMax); validateMinMax(NativeAllocatorPoolConfig.POOL_INGEST, ingestMin, ingestMax); validateMinMax(NativeAllocatorPoolConfig.POOL_QUERY, queryMin, queryMax); validateMinSum(rootLimit, flightMin, ingestMin, queryMin); + validateMaxSum(rootLimit, flightMax, ingestMax, queryMax); } Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies a gap in validation: the current code only checks that sum of mins ≤ root, but does not validate that sum of maxes ≤ root. This could lead to runtime allocation failures when multiple pools attempt to burst simultaneously. However, the PR's design intentionally allows pool maxes to sum above root (see lines 99-102 comments about "18% of native_memory.limit fits within root.limit (20%) by default, leaving 2 pp headroom"), so this is more of an enhancement than a critical bug.	Medium
General	Validate pool min against root Verify that `newMin` does not exceed the root limit before updating. A dynamic PUT that sets a pool min above the root cap will be accepted by the validator (which only checks sum of mins), but the live limit raise will fail silently or cause unexpected behavior when the allocator attempts to exceed the root. plugins/arrow-base/src/main/java/org/opensearch/arrow/allocator/ArrowNativeAllocator.java [160-172] public void setPoolMin(String poolName, long newMin) { ArrowPoolHandle handle = pools.get(poolName); if (handle == null) { throw new IllegalStateException("Pool '" + poolName + "' does not exist"); + } + if (newMin > root.getLimit()) { + throw new IllegalArgumentException("Pool min (" + newMin + ") exceeds root limit (" + root.getLimit() + ")"); } poolMins.put(poolName, newMin); long max = poolMaxes.getOrDefault(poolName, Long.MAX_VALUE); long current = handle.allocator.getLimit(); long target = Math.min(newMin, max); if (target > current) { handle.allocator.setLimit(target); } } Suggestion importance[1-10]: 6 __ Why: The suggestion adds a defensive check that `newMin` does not exceed the root limit. While the grouped validator in `validateUpdate` already checks sum of mins ≤ root, this per-pool check would catch the edge case where a single pool's min exceeds root. However, the existing validator should already prevent this scenario at the cluster-settings level, making this a defensive enhancement rather than a critical fix.	Low
	Log dropped plugin stats entries Log the dropped plugin-stats entries at debug level so operators can diagnose missing stats during rolling upgrades. The current silent drop makes it difficult to distinguish between "plugin not installed" and "deserialization failed for another reason." server/src/main/java/org/opensearch/action/admin/cluster/node/stats/NodeStats.java [296-334] private static Map<String, PluginNodeStats> readPluginStats(StreamInput in) throws IOException { int size = in.readVInt(); if (size == 0) { return Collections.emptyMap(); } NamedWriteableRegistry registry = in.namedWriteableRegistry(); Map<String, PluginNodeStats> result = new HashMap<>(size); for (int i = 0; i < size; i++) { String name = in.readString(); BytesReference payload = in.readBytesReference(); if (registry == null) { continue; } try ( StreamInput rawIn = payload.streamInput(); NamedWriteableAwareStreamInput payloadIn = new NamedWriteableAwareStreamInput(rawIn, registry) ) { payloadIn.setVersion(in.getVersion()); PluginNodeStats stats = payloadIn.readNamedWriteable(PluginNodeStats.class); result.put(name, stats); } catch (IOException \| IllegalArgumentException e) { - // Receiver doesn't have the plugin's NamedWriteable registered (typical - // during rolling upgrades or in a non-uniform plugin install). Drop the - // entry; the rest of NodeStats remains decodable. + logger.debug("Dropped plugin stats entry '{}': {}", name, e.getMessage()); } } return result; } Suggestion importance[1-10]: 6 __ Why: The suggestion adds debug logging for dropped plugin-stats entries, which would help operators diagnose missing stats during rolling upgrades. This is a reasonable observability improvement, though the current silent drop is intentional (per the comment at line 329-331) to avoid log spam during normal rolling upgrades. The improvement is useful but not critical.	Low
	Validate spill limit is non-negative Validate that `newLimitBytes` is non-negative before attempting the native call. The setting parser rejects negative values at parse time, but defensive validation here prevents potential native-side crashes if the value is corrupted in transit or the parser is bypassed. sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DataFusionPlugin.java [420-444] void updateSpillMemoryLimit(long newLimitBytes) { + if (newLimitBytes < 0) { + logger.warn("Ignoring invalid spill memory limit update: {}B (must be >= 0)", newLimitBytes); + return; + } DataFusionService service = dataFusionService; if (service == null) { logger.debug("DataFusion service not yet initialized; ignoring spill limit update to {}B", newLimitBytes); return; } - if (!service.isSpillLimitDynamic()) { - logger.warn( - "Updated DataFusion spill memory limit to {}B at the cluster level; the loaded native library does not " - + "support runtime spill resize, so the new value will only take effect after a node restart", - newLimitBytes - ); - return; - } - try { - service.setSpillMemoryLimit(newLimitBytes); - logger.info("Updated DataFusion spill memory limit to {}B", newLimitBytes); - } catch (IllegalStateException e) { - logger.warn("Ignoring spill memory limit update to {}B; service is not running", newLimitBytes); - } catch (UnsupportedOperationException e) { - logger.warn("Ignoring spill memory limit update to {}B; native runtime does not support live updates", newLimitBytes); - } + ... } Suggestion importance[1-10]: 5 __ Why: The suggestion adds defensive validation that `newLimitBytes` is non-negative before the native call. The setting parser already rejects negative values (line 159), so this is redundant defensive code. It's a minor improvement for robustness but not addressing a real gap in the current implementation.	Low
	Preserve plugin order when adding mockPlugins The `mockPlugins` collection is added to a `LinkedHashSet` using `addAll()`, which may not preserve insertion order if `mockPlugins` itself is unordered. This could reintroduce non-determinism that the `LinkedHashSet` was meant to prevent. Consider converting `mockPlugins` to a `LinkedHashSet` during initialization or ensuring it maintains order. test/framework/src/main/java/org/opensearch/test/InternalTestCluster.java [522-523] Set<Class<? extends Plugin>> plugins = new LinkedHashSet<>(nodeConfigurationSource.nodePlugins()); -plugins.addAll(mockPlugins); +// Preserve order when adding mockPlugins +mockPlugins.forEach(plugins::add); Suggestion importance[1-10]: 3 __ Why: The suggestion raises a valid concern about potential order preservation, but the impact is limited. The `improved_code` using `forEach(plugins::add)` is functionally equivalent to `addAll()` for a `LinkedHashSet`. The real issue would only occur if `mockPlugins` contains duplicates or if its iteration order matters, which is not clearly demonstrated in the PR context.	Low

Suggestions up to commit b033e5c

Category	Suggestion	Impact
General	Log skipped plugin stats entries The catch block silently drops unknown plugin stats entries without logging. During a rolling upgrade or plugin version mismatch, operators have no visibility into which stats were skipped, making it hard to diagnose incomplete metrics. server/src/main/java/org/opensearch/action/admin/cluster/node/stats/NodeStats.java [320-333] try ( StreamInput rawIn = payload.streamInput(); NamedWriteableAwareStreamInput payloadIn = new NamedWriteableAwareStreamInput(rawIn, registry) ) { payloadIn.setVersion(in.getVersion()); PluginNodeStats stats = payloadIn.readNamedWriteable(PluginNodeStats.class); result.put(name, stats); } catch (IOException \| IllegalArgumentException e) { // Receiver doesn't have the plugin's NamedWriteable registered (typical // during rolling upgrades or in a non-uniform plugin install). Drop the // entry; the rest of NodeStats remains decodable. + logger.debug("Skipping unknown plugin stats entry '{}' during deserialization: {}", name, e.getMessage()); } Suggestion importance[1-10]: 7 __ Why: Good observability improvement. During rolling upgrades or plugin mismatches, operators would benefit from knowing which stats entries were skipped. The debug-level logging is appropriate since this is expected behavior in mixed-version clusters, not an error condition.	Medium
	Enforce runtime spill limit capability check The method doesn't check `isSpillLimitDynamic()` before calling `setSpillLimit()`. According to the javadoc, this should throw `UnsupportedOperationException` when the feature is unavailable. Add a guard to enforce this contract. sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DataFusionService.java [157-159] public void setSpillMemoryLimit(long newLimitBytes) { + if (!isSpillLimitDynamic()) { + throw new UnsupportedOperationException("Spill limit cannot be changed at runtime"); + } NativeBridge.setSpillLimit(getNativeRuntime().get(), newLimitBytes); } Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies that `setSpillMemoryLimit()` should validate `isSpillLimitDynamic()` before proceeding, as documented in the javadoc. This prevents runtime errors and enforces the contract, making it a valuable improvement for robustness.	Medium
	Include pool context in overflow error The overflow check uses `Math.addExact` but the error message does not indicate which pool caused the overflow. When an operator hits this with multiple pools configured, they cannot identify the problematic configuration without inspecting all pool mins. plugins/arrow-base/src/main/java/org/opensearch/arrow/allocator/ArrowBasePlugin.java [377-381] try { sum = Math.addExact(sum, min); } catch (ArithmeticException overflow) { - throw new IllegalArgumentException("Sum of pool minimums overflows.", overflow); + throw new IllegalArgumentException( + "Sum of pool minimums overflows at pool index " + i + " (min=" + min + ", sum so far=" + sum + ")", + overflow + ); } Suggestion importance[1-10]: 6 __ Why: Valid improvement to error diagnostics. The current error message doesn't indicate which pool caused the overflow, making troubleshooting harder. However, the `mins` array doesn't have pool names at this point in the code (it's a varargs of longs), so the suggestion would need adjustment to reference pool names from the calling context.	Low
	Clarify spill-limit warning for operators The method returns early when `isSpillLimitDynamic()` is false, logging a warning but accepting the cluster-state update. If the native library never gains the symbol, operators will see repeated warnings on every cluster-settings PUT without actionable feedback that the setting is effectively node-scope-only until restart. sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DataFusionPlugin.java [420-432] void updateSpillMemoryLimit(long newLimitBytes) { DataFusionService service = dataFusionService; if (service == null) { logger.debug("DataFusion service not yet initialized; ignoring spill limit update to {}B", newLimitBytes); return; } if (!service.isSpillLimitDynamic()) { logger.warn( "Updated DataFusion spill memory limit to {}B at the cluster level; the loaded native library does not " - + "support runtime spill resize, so the new value will only take effect after a node restart", + + "support runtime spill resize (missing df_set_spill_limit symbol), so the new value will only take effect after a node restart. " + + "To avoid this warning, configure the setting in opensearch.yml instead of via _cluster/settings.", newLimitBytes ); return; } Suggestion importance[1-10]: 5 __ Why: The suggestion improves operator guidance by explicitly mentioning the missing symbol and recommending `opensearch.yml` configuration. This is helpful for reducing confusion, though the existing warning already conveys the core issue (runtime resize not supported, restart required).	Low
	Document headroom division truncation behavior The division `headroom / poolCount` can lose precision when headroom is not evenly divisible by poolCount. Consider using `Math.floorDiv` or documenting the intentional truncation behavior, as the remainder is effectively discarded rather than distributed. plugins/arrow-base/src/main/java/org/opensearch/arrow/allocator/ArrowNativeAllocator.java [246] +// Integer division intentionally truncates remainder; undistributed headroom +// stays in the root allocator's unallocated space. long bonusPerPool = poolCount > 0 ? headroom / poolCount : 0; Suggestion importance[1-10]: 4 __ Why: The suggestion correctly identifies that integer division truncates the remainder, but this is standard Java behavior and the code is already clear. Adding a comment would be helpful for maintainability, though the impact is minor since the behavior is mathematically correct and the remainder staying in the root is acceptable.	Low
	Preserve plugin order for mockPlugins The `mockPlugins` collection is added to `plugins` using `addAll()`, which may not preserve insertion order if `mockPlugins` is a `HashSet`. This could reintroduce non-deterministic plugin ordering. Ensure `mockPlugins` is also order-preserving or convert it before adding. test/framework/src/main/java/org/opensearch/test/InternalTestCluster.java [522-523] Set<Class<? extends Plugin>> plugins = new LinkedHashSet<>(nodeConfigurationSource.nodePlugins()); -plugins.addAll(mockPlugins); +plugins.addAll(new LinkedHashSet<>(mockPlugins)); Suggestion importance[1-10]: 4 __ Why: The suggestion correctly identifies that `mockPlugins` could be a `HashSet`, which might reintroduce non-deterministic ordering. However, the impact is limited since `mockPlugins` is typically controlled by the test framework and may already be order-preserving. The fix is defensive but not critical.	Low

Suggestions up to commit a9ff751

Category	Suggestion	Impact
General	Add diagnostic context to overflow error The overflow check uses `Math.addExact` which throws `ArithmeticException` on overflow, but the error message doesn't indicate which pool caused the overflow or what the current sum was. For large deployments with many pools, include diagnostic context (current sum, failing min value) to help operators identify the misconfiguration. plugins/arrow-base/src/main/java/org/opensearch/arrow/allocator/ArrowBasePlugin.java [377-381] try { sum = Math.addExact(sum, min); } catch (ArithmeticException overflow) { - throw new IllegalArgumentException("Sum of pool minimums overflows.", overflow); + throw new IllegalArgumentException( + "Sum of pool minimums overflows. Current sum: " + sum + ", attempted to add: " + min, + overflow + ); } Suggestion importance[1-10]: 6 __ Why: Adding the current sum and failing min value to the error message would significantly improve debuggability for operators trying to diagnose pool configuration issues. This is a practical enhancement that helps identify which pool configuration caused the overflow.	Low
	Clarify warning for missing dynamic spill support The method returns early after logging a warning when the native library doesn't support dynamic spill resize, but the cluster-state update has already been accepted. This could confuse operators who see HTTP 200 but no immediate effect. Consider documenting this behavior in the setting's Javadoc or emitting a more prominent warning at startup if the symbol is missing. sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DataFusionPlugin.java [426-432] if (!service.isSpillLimitDynamic()) { logger.warn( "Updated DataFusion spill memory limit to {}B at the cluster level; the loaded native library does not " - + "support runtime spill resize, so the new value will only take effect after a node restart", + + "support runtime spill resize (df_set_spill_limit symbol absent), so the new value will only take effect after a node restart. " + + "Upgrade to a newer native library for live updates.", newLimitBytes ); return; } Suggestion importance[1-10]: 5 __ Why: The suggestion improves the warning message by explicitly mentioning the missing `df_set_spill_limit` symbol and suggesting an upgrade path. This enhances operator experience by making the limitation clearer, though the existing warning already conveys the essential information.	Low
	Document integer division truncation behavior The division `headroom / poolCount` can lose precision when headroom is not evenly divisible by poolCount. Consider using a more precise distribution algorithm or documenting the intentional truncation behavior, as small pools may systematically receive less than their fair share over many rebalance cycles. plugins/arrow-base/src/main/java/org/opensearch/arrow/allocator/ArrowNativeAllocator.java [246] +// Integer division intentionally truncates remainder; undistributed bytes stay in root headroom long bonusPerPool = poolCount > 0 ? headroom / poolCount : 0; Suggestion importance[1-10]: 4 __ Why: The suggestion correctly identifies that integer division truncates the remainder, but the impact is minimal since the undistributed bytes remain in the root headroom and will be distributed in the next rebalance cycle. The comment would improve code clarity but doesn't address a functional issue.	Low
	Return immutable empty list The `nodeStats()` method returns a mutable list that could be modified by callers, potentially causing unexpected behavior. Return an immutable empty list using `List.of()` instead of `Collections.emptyList()` to prevent accidental modifications and align with modern Java best practices. server/src/main/java/org/opensearch/plugins/Plugin.java [266-268] public List<PluginNodeStats> nodeStats() { - return Collections.emptyList(); + return List.of(); } Suggestion importance[1-10]: 4 __ Why: While `List.of()` is more modern and returns an immutable list, `Collections.emptyList()` is also immutable and widely used. The suggestion is valid but offers minimal practical benefit since both return immutable empty lists. The impact is low as this is a default implementation that plugins can override.	Low
	Preserve deterministic plugin ordering The method modifies a `LinkedHashSet` after construction, which could lead to unexpected plugin ordering if `mockPlugins` contains duplicates or conflicts with `nodePlugins()`. Consider using `Stream.concat()` to preserve insertion order deterministically and avoid potential ordering issues. test/framework/src/main/java/org/opensearch/test/InternalTestCluster.java [522-524] -Set<Class<? extends Plugin>> plugins = new LinkedHashSet<>(nodeConfigurationSource.nodePlugins()); -plugins.addAll(mockPlugins); -return plugins; +return Stream.concat( + nodeConfigurationSource.nodePlugins().stream(), + mockPlugins.stream() +).distinct().collect(Collectors.toCollection(LinkedHashSet::new)); Suggestion importance[1-10]: 3 __ Why: The suggestion adds `.distinct()` which could alter behavior if duplicate plugins are intentionally allowed. The current code using `LinkedHashSet` already handles duplicates and preserves order. The `Stream.concat()` approach is more complex without clear benefits, and the concern about "unexpected plugin ordering" is already addressed by the PR's switch to `LinkedHashSet`.	Low

Suggestions up to commit a9ff751

Category	Suggestion	Impact
General	Validate individual pool mins early The validator checks `rootLimit == Long.MAX_VALUE` to skip validation when the root is unbounded. However, if any individual pool min is `Long.MAX_VALUE`, the `Math.addExact` call will throw `ArithmeticException` even when the root is bounded. Consider adding an early check for individual mins being `Long.MAX_VALUE` to provide a clearer error message before attempting the sum. plugins/arrow-base/src/main/java/org/opensearch/arrow/allocator/ArrowBasePlugin.java [374-384] private static void validateMinSum(long rootLimit, long... mins) { if (rootLimit == Long.MAX_VALUE) { return; + } + for (long min : mins) { + if (min == Long.MAX_VALUE) { + throw new IllegalArgumentException("Pool minimum cannot be Long.MAX_VALUE when root limit is bounded"); + } } long sum = 0; for (long min : mins) { try { sum = Math.addExact(sum, min); } catch (ArithmeticException overflow) { throw new IllegalArgumentException("Sum of pool minimums overflows.", overflow); } } if (sum > rootLimit) { throw new IllegalArgumentException( "Sum of pool minimums (" + sum + ") exceeds root limit (" + rootLimit + ")" ); } } Suggestion importance[1-10]: 6 __ Why: Adding an early check for individual pool mins being `Long.MAX_VALUE` provides a clearer error message before attempting the sum, improving error handling and user experience. This is a moderate improvement in code quality and maintainability.	Low
	Log when min update deferred The method updates `poolMins` unconditionally but only raises the live limit when `target > current`. If `newMin` is below the current limit, the recorded min diverges from the live limit until the next rebalance tick. Consider logging a debug message when the min is updated but the live limit is not raised, so operators can observe that the change is deferred to the rebalancer. plugins/arrow-base/src/main/java/org/opensearch/arrow/allocator/ArrowNativeAllocator.java [160-172] public void setPoolMin(String poolName, long newMin) { ArrowPoolHandle handle = pools.get(poolName); if (handle == null) { throw new IllegalStateException("Pool '" + poolName + "' does not exist"); } poolMins.put(poolName, newMin); long max = poolMaxes.getOrDefault(poolName, Long.MAX_VALUE); long current = handle.allocator.getLimit(); long target = Math.min(newMin, max); if (target > current) { handle.allocator.setLimit(target); + } else { + logger.debug("Pool '{}' min updated to {} but live limit {} not raised; rebalancer will apply on next tick", poolName, newMin, current); } } Suggestion importance[1-10]: 5 __ Why: Adding a debug log when the min is updated but the live limit is not raised improves observability for operators, helping them understand that the change is deferred to the rebalancer. This is a useful enhancement for debugging and monitoring, though not critical.	Low
	Optimize null registry path When `registry == null`, the method continues the loop but never populates `result`, so it will return an empty map after reading all entries. The loop should break early after detecting `registry == null` to avoid unnecessary reads of the remaining entries. Alternatively, log a debug message once when the registry is absent so operators can diagnose why plugin stats are missing. server/src/main/java/org/opensearch/action/admin/cluster/node/stats/NodeStats.java [296-334] private static Map<String, PluginNodeStats> readPluginStats(StreamInput in) throws IOException { int size = in.readVInt(); if (size == 0) { return Collections.emptyMap(); } NamedWriteableRegistry registry = in.namedWriteableRegistry(); + if (registry == null) { + for (int i = 0; i < size; i++) { + in.readString(); + in.readBytesReference(); + } + return Collections.emptyMap(); + } Map<String, PluginNodeStats> result = new HashMap<>(size); for (int i = 0; i < size; i++) { String name = in.readString(); BytesReference payload = in.readBytesReference(); - if (registry == null) { - continue; - } try ( StreamInput rawIn = payload.streamInput(); NamedWriteableAwareStreamInput payloadIn = new NamedWriteableAwareStreamInput(rawIn, registry) ) { payloadIn.setVersion(in.getVersion()); PluginNodeStats stats = payloadIn.readNamedWriteable(PluginNodeStats.class); result.put(name, stats); } catch (IOException \| IllegalArgumentException e) { - // Receiver doesn't have the plugin's NamedWriteable registered (typical - // during rolling upgrades or in a non-uniform plugin install). Drop the - // entry; the rest of NodeStats remains decodable. + // Drop unknown entries } } return result; } Suggestion importance[1-10]: 5 __ Why: Breaking early when `registry == null` avoids unnecessary reads and improves efficiency. This is a minor optimization that enhances code performance and readability, though the impact is limited to edge cases.	Low
	Eliminate redundant exception catch The method catches `UnsupportedOperationException` after checking `isSpillLimitDynamic()`, which should make this branch unreachable. The comment acknowledges this is defensive against a race. However, if the race is truly possible, the `isSpillLimitDynamic()` check should be moved inside the try block or synchronized to avoid the time-of-check-time-of-use gap. Otherwise, remove the redundant catch block. sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DataFusionPlugin.java [420-444] void updateSpillMemoryLimit(long newLimitBytes) { DataFusionService service = dataFusionService; if (service == null) { logger.debug("DataFusion service not yet initialized; ignoring spill limit update to {}B", newLimitBytes); return; } - if (!service.isSpillLimitDynamic()) { - logger.warn( - "Updated DataFusion spill memory limit to {}B at the cluster level; the loaded native library does not " - + "support runtime spill resize, so the new value will only take effect after a node restart", - newLimitBytes - ); - return; - } try { + if (!service.isSpillLimitDynamic()) { + logger.warn( + "Updated DataFusion spill memory limit to {}B at the cluster level; the loaded native library does not " + + "support runtime spill resize, so the new value will only take effect after a node restart", + newLimitBytes + ); + return; + } service.setSpillMemoryLimit(newLimitBytes); logger.info("Updated DataFusion spill memory limit to {}B", newLimitBytes); } catch (IllegalStateException e) { logger.warn("Ignoring spill memory limit update to {}B; service is not running", newLimitBytes); - } catch (UnsupportedOperationException e) { - logger.warn("Ignoring spill memory limit update to {}B; native runtime does not support live updates", newLimitBytes); } } Suggestion importance[1-10]: 4 __ Why: The suggestion to move the `isSpillLimitDynamic()` check inside the try block or remove the redundant catch block is valid for code clarity. However, the current defensive approach is acceptable, and the change is minor in impact.	Low
	Preserve plugin initialization order The `mockPlugins` collection is added using `addAll()` which may not preserve insertion order if `mockPlugins` is not order-preserving. Since plugin initialization order is now critical (as noted in the comment), ensure `mockPlugins` is also order-preserving or document the ordering contract. test/framework/src/main/java/org/opensearch/test/InternalTestCluster.java [522-523] Set<Class<? extends Plugin>> plugins = new LinkedHashSet<>(nodeConfigurationSource.nodePlugins()); -plugins.addAll(mockPlugins); +// Preserve order when adding mockPlugins to maintain deterministic plugin initialization +mockPlugins.forEach(plugins::add); Suggestion importance[1-10]: 3 __ Why: While the concern about order preservation is valid, `addAll()` on a `LinkedHashSet` already preserves insertion order from the source collection. The suggested change using `forEach` is functionally equivalent but less idiomatic. The suggestion addresses a non-issue since `addAll()` maintains order when the target is a `LinkedHashSet`.	Low

github-actions · 2026-05-19T08:52:58Z

❌ Gradle check result for 9188043: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2026-05-19T09:12:22Z

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit d646dd4.

Path	Line	Severity	Description
plugins/arrow-flight-rpc/build.gradle	25	high	New dependency added: `compileOnly project(':libs:opensearch-arrow-spi')`. Per mandatory flagging rule, all dependency additions must be flagged regardless of apparent legitimacy. This is an internal monorepo project reference rather than an external registry artifact.
sandbox/plugins/analytics-backend-datafusion/build.gradle	51	high	Multiple new dependencies added: `compileOnly project(':libs:opensearch-arrow-spi')`, `compileOnly project(':plugins:arrow-base')`, and `testImplementation project(':plugins:arrow-base')`. Per mandatory flagging rule, all dependency additions must be flagged. These are internal monorepo project references.
sandbox/plugins/analytics-engine/build.gradle	66	high	New dependency added (`compileOnly project(':libs:opensearch-arrow-spi')`) and dependency scope changed from `testRuntimeOnly` to `testImplementation` for `project(':plugins:arrow-base')`. Per mandatory flagging rule, all dependency changes must be flagged.
plugins/arrow-base/src/main/java/org/opensearch/arrow/allocator/ArrowNativeAllocator.java	86	low	`ensureForTesting()` is a `public static` method in production code that forcibly closes any existing static `INSTANCE` singleton and replaces it unconditionally. Documented as test-only but accessible to any caller at runtime, including plugin code; misuse could disrupt the node-wide allocator lifecycle outside of tests.

The table above displays the top 10 most important findings.

Total: 4 | Critical: 0 | High: 3 | Medium: 0 | Low: 1

Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.

⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

github-actions · 2026-05-19T11:59:58Z

Persistent review updated to latest commit 498b502

github-actions · 2026-05-19T12:02:06Z

❌ Gradle check result for 498b502: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Bukhtawar

We might need to reject the request, throw OpenSearchRejectedException rather than oom-ing. Also lets try to see how we can wire up circuit-breaker for this. Maybe circuit-breaker stats is something we leverage for tracking memory used

bowenlan-amzn · 2026-05-19T16:56:35Z

This PR seems to be the good place to get some alignment on the memory assignment across different components in the system.
Let's take 64G RAM as example.

64 GB RAM
├── JVM Heap (16 GB / 25%)
│   ├── Indexing pressure, Writer buffer
│   ├── Transport Netty
│   └── etc. ...
|
└── Off-Heap (48 GB / 75%)
    │
    ├── node.native_memory.limit (40 G)
    │   │
    │   ├── Java Arrow Root (12G / 30%)
    │   │   ├── POOL_FLIGHT  (2G / 5%)
    │   │   │   └── flight/server, flight/client — batch ser/deser
    │   │   ├── POOL_INGEST  (4G / 10%)
    │   │   │   └── ArrowBufferPool children (per-VSR, pool_limit/divisor)
    │   │   └── POOL_QUERY   (2G / 5%)
    │   │       └── analytics-search-service → per-query children (256MB each)
    │   │
    │   └── Rust Memory Pool (28G / 70%)
    │       └── DataFusion Rust pool (datafusion.memory_pool_limit_bytes)
    │           ├── hash tables, sort buffers, aggs, intermediate batches
    │           └── spill staging (datafusion.spill_memory_limit_bytes)
    │
    └── Rest (no knobs, at least 8G)
        ├── Lucene mmap (~2 GB)
        └── Page Cache (whatever's left)

I think the Java Arrow memory are mostly used for intermediate data transfer, while Rust memory are used for query execution. That's one reason to provide more to Rust side.

@Bukhtawar @alchemist51 @mch2 @expani

github-actions · 2026-05-19T17:32:31Z

Persistent review updated to latest commit d646dd4

github-actions · 2026-05-19T17:41:19Z

❌ Gradle check result for d646dd4: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2026-05-19T20:12:27Z

Persistent review updated to latest commit 53c0e36

github-actions · 2026-05-19T20:14:46Z

Persistent review updated to latest commit e1f3c5b

github-actions · 2026-05-19T20:23:17Z

❌ Gradle check result for e1f3c5b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2026-05-19T20:27:54Z

Persistent review updated to latest commit 041dc0e

github-actions · 2026-05-19T20:32:36Z

❌ Gradle check result for 041dc0e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

bowenlan-amzn

Overral looks good! Thanks for refactoring the allocator related changes in arrow-base.
Left some comments that should be easy to accommodate. Approving now.

github-actions · 2026-05-20T08:52:03Z

Persistent review updated to latest commit 40688ad

github-actions · 2026-05-20T09:02:08Z

❌ Gradle check result for 40688ad: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2026-05-21T19:36:57Z

Persistent review updated to latest commit a9ff751

gaurav-amz · 2026-05-21T20:00:36Z

Closing and reopening to trigger the build again.

github-actions · 2026-05-21T20:02:24Z

Persistent review updated to latest commit a9ff751

github-actions · 2026-05-21T20:33:57Z

Persistent review updated to latest commit a9ff751

github-actions · 2026-05-21T21:29:12Z

✅ Gradle check result for a9ff751: SUCCESS

VSRRotationBenchmark references ArrowNativeAllocator (the new unified allocator API), but the benchmarks subproject only declared a transitive api dep on parquet-data-format. parquet-data-format declares arrow-base as compileOnly (not exported), so benchmarks couldn't see it at compile. CI failure: VSRRotationBenchmark.java:82: error: package org.opensearch.arrow.allocator does not exist private org.opensearch.arrow.allocator.ArrowNativeAllocator nativeAllocator; Mirror parquet-data-format's pattern: declare arrow-base as compileOnly. Runtime continues to work via the plugin's classloader. Signed-off-by: Gaurav Singh <snghsvn@amazon.com>

Signed-off-by: Gaurav Singh <snghsvn@amazon.com>

github-actions · 2026-05-22T04:17:03Z

Persistent review updated to latest commit b033e5c

github-actions · 2026-05-22T04:19:09Z

Persistent review updated to latest commit 1bd0be2

github-actions · 2026-05-22T05:20:10Z

✅ Gradle check result for 1bd0be2: SUCCESS

github-actions · 2026-05-22T05:24:13Z

Persistent review updated to latest commit dcddc3c

github-actions · 2026-05-22T05:56:39Z

❌ Gradle check result for dcddc3c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2026-05-22T06:25:46Z

Persistent review updated to latest commit dcddc3c

github-actions · 2026-05-22T07:21:52Z

✅ Gradle check result for dcddc3c: SUCCESS

gaurav-amz · 2026-05-22T08:09:38Z

Follow up PR changes.

Native allocator stats under native memory.
Refactoring of native allocator stats.

…deStats (opensearch-project#21732) * Make pool min/max settings dynamic and grouped-validate cross-setting invariants on every cluster-state update (sum of pool mins <= root, per-pool min <= max). FLIGHT_MIN/INGEST_MIN defaults are 0L (the prior Long.MAX_VALUE defaults caused the new validator to reject any non-MAX root). setPoolMin now updates the live BufferAllocator.setLimit so the Dynamic property has observable effect even when the rebalancer is off. * Derive ROOT_LIMIT_SETTING default from the AC node-native-memory budget (limit minus buffer-percent) so the framework respects the same budget AC throttles on, with a Long.MAX_VALUE fallback when AC is unconfigured. * Add QUERY pool alongside FLIGHT and INGEST. Pools init at min when the rebalancer is enabled; otherwise at max so non-rebalanced nodes can still allocate. * Wire arrow-flight-rpc to the FLIGHT pool, parquet-data-format to the INGEST pool, and analytics-engine to the QUERY pool via the framework's allocator service. Hard-fail if the framework plugin is missing — silently skipping the wire-up is the silent-misconfiguration class of bug we want to prevent. * Cleanup ad-hoc allocator fallbacks in parquet-data-format / analytics-engine so all Arrow consumers go through the unified pool hierarchy. * Rebalancer now distributes headroom across all pools (not only those with current allocation > 0). Avoids the dead-pool corner case where a pool with min=0 starts at limit=0, can never make a first allocation, and never receives a bonus. Signed-off-by: Gaurav Singh <snghsvn@amazon.com>

gaurav-amz requested a review from a team as a code owner May 19, 2026 08:45

gaurav-amz force-pushed the unified-allocator-followup branch from 9188043 to d419b2a Compare May 19, 2026 09:10

gaurav-amz force-pushed the unified-allocator-followup branch from d419b2a to 10f3617 Compare May 19, 2026 10:00

Bukhtawar reviewed May 19, 2026

View reviewed changes

Comment thread libs/arrow-spi/src/main/java/org/opensearch/arrow/spi/NativeAllocatorListener.java Outdated

Bukhtawar reviewed May 19, 2026

View reviewed changes

bowenlan-amzn reviewed May 19, 2026

View reviewed changes

Comment thread ...ns/arrow-flight-rpc/src/main/java/org/opensearch/arrow/flight/transport/FlightTransport.java Outdated

alchemist51 added the skip-diff-analyzer Maintainer to skip code-diff-analyzer check, after reviewing issues in AI analysis. label May 19, 2026

alchemist51 closed this May 19, 2026

alchemist51 reopened this May 19, 2026

gaurav-amz force-pushed the unified-allocator-followup branch from d646dd4 to 53c0e36 Compare May 19, 2026 20:10

gaurav-amz force-pushed the unified-allocator-followup branch from e1f3c5b to 041dc0e Compare May 19, 2026 20:26

bowenlan-amzn approved these changes May 19, 2026

View reviewed changes

bharath-techie reviewed May 20, 2026

View reviewed changes

Comment thread ...nalytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DataFusionPlugin.java Outdated

bharath-techie reviewed May 20, 2026

View reviewed changes

Comment thread ...nalytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DataFusionPlugin.java Outdated

gaurav-amz closed this May 21, 2026

gaurav-amz reopened this May 21, 2026

gaurav-amz closed this May 21, 2026

gaurav-amz reopened this May 21, 2026

gaurav-amz added 2 commits May 22, 2026 09:45

Merge branch 'main' into unified-allocator-followup

1bd0be2

Signed-off-by: Gaurav Singh <snghsvn@amazon.com>

Merge branch 'main' into unified-allocator-followup

dcddc3c

gaurav-amz closed this May 22, 2026

gaurav-amz reopened this May 22, 2026

Bukhtawar merged commit 609ac5c into opensearch-project:main May 22, 2026
43 of 45 checks passed

Uh oh!

Conversation

gaurav-amz commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Architecture: three native-memory trackers, three knobs

Worked example: 64 GB / 16 GB-heap node

How a query flows through these layers

Operator surface

Behavior change to call out

Changes after initial review

Uh oh!

github-actions Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit dcddc3c)

Uh oh!

github-actions Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Analyzer ❗

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

Uh oh!

Bukhtawar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bowenlan-amzn commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

bowenlan-amzn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

gaurav-amz commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

github-actions Bot commented May 22, 2026

gaurav-amz commented May 19, 2026 •

edited

Loading

github-actions Bot commented May 19, 2026 •

edited

Loading

(Review updated until commit `dcddc3c`)

github-actions Bot commented May 19, 2026 •

edited

Loading

github-actions Bot commented May 19, 2026 •

edited

Loading