Skip to content

Replace O(n^2) array_merge accumulators with append#73

Open
dylancarruthers wants to merge 1 commit into
sol1:mainfrom
dylancarruthers:72-large-imports-silently-truncate
Open

Replace O(n^2) array_merge accumulators with append#73
dylancarruthers wants to merge 1 commit into
sol1:mainfrom
dylancarruthers:72-large-imports-silently-truncate

Conversation

@dylancarruthers

Copy link
Copy Markdown

Problem

get() and transform() in library/Netbox/Netbox.php accumulate results using array_merge($accum, [$item]) (or array_merge($accum, $page)) inside loops that run once per row. Because array_merge allocates a fresh array and copies the entire accumulator on every call, the total work is O(n²) in the number of imported rows.

For small imports this is invisible. For large ones it becomes catastrophic:

  • Site: ~26,000 devices behind a single import source (NetBox 3.7, ~27 pages of 1000).
  • Symptoms: web "Trigger this import" times out via php-fpm. CLI sync runs burn 30+ minutes of CPU and silently truncate the result, leaving Director with a partial rowset marked succeeded = y. Downstream sync rules then delete or under-modify hosts that were simply lost from the import.
  • Memory peak hits multiple GB before truncation.

Diagnosis

Direct curl pagination of the same NetBox query returns all 26,074 rows cleanly in ~30s, confirming the truncation is module-side, not NetBox-side. Profiling showed the time is dominated by array_merge copying the growing accumulator on every iteration.

There are three independent occurrences of the pattern, each in an outer per-row loop:

Location Pattern
get() $results = array_merge($results, $response->results);
transform() flatten loop $fnew = array_merge($fnew, [(object)$out]);
transform() munge loop $mnew = array_merge($mnew, [(object)$row]);
transform() tags loop $tnew = array_merge($tnew, [(object)$row]);

The tags loop runs unconditionally on every import, so the bug fires for any import source, not just ones using flatten/munge.

Fix

Replace the four outer accumulators with native append ($arr[] = $x) and the page-merge with array_push($accum, ...$page). Behavior is identical, the arrays are numerically indexed and array_merge did not provide any semantic guarantee these calls were relying on, but the work becomes O(n).

// before
$results = array_merge($results, $response->results);
// after
array_push($results, ...$response->results);

// before
$fnew = array_merge($fnew, [(object)$out]);
// after
$fnew[] = (object)$out;

(Same shape for $mnew and $tnew.)

The remaining array_merge calls (inside per-row inner loops over tags and munge keys) operate on small lists and are left alone.

Measured impact

Same import source, same NetBox, same filter (~26,074 matching devices), unchanged module config:

Before After
Rows returned 24,850 (truncated, silently) 26,074 (complete)
Wall time 30+ min CPU, then killed ~130s
Peak memory trended past 4 GB ~2.7 GB
succeeded flag y (misleading) y (accurate)

Risk

array_push($a, ...$b) requires PHP 5.6+ (variadic spread). The module already uses type hints and class properties that require newer PHP, so this is well within the supported range.

No semantic change: the order, types, and indices of the resulting array are identical to the previous code.

array_merge reallocates the accumulator on every call. Replacing the
in-loop accumulator pattern with native append makes pagination and
transform linear in the row count.

- get():        array_merge($results, $page) -> array_push(..., ...$page)
- splitRows():  array_merge($output, [$item]) -> $output[] = $item  (x2)
- transform():  same shape for $fnew, $mnew, and one helper output

Inner per-row array_merge calls on small lists (tags, munge keys) are
unchanged.
@sol1-matt

Copy link
Copy Markdown
Member

By my count, the current main branch has 26 usages of array_merge which doesn't match up with this PR. I'll need to look into this.

@sol1-matt

Copy link
Copy Markdown
Member

added new branch on my side from the current main, 72-large-imports-silently-truncate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants