Skip to content

Add ignore param to createDocuments for silent duplicate handling#850

Open
premtsd-code wants to merge 5 commits intomainfrom
csv-import-upsert
Open

Add ignore param to createDocuments for silent duplicate handling#850
premtsd-code wants to merge 5 commits intomainfrom
csv-import-upsert

Conversation

@premtsd-code
Copy link
Copy Markdown
Contributor

@premtsd-code premtsd-code commented Apr 7, 2026

Summary

  • Adds bool $ignore = false parameter to createDocuments across all adapters
  • When ignore is true, duplicate documents are silently skipped instead of throwing
  • MariaDB/MySQL: INSERT IGNORE INTO
  • SQLite: INSERT OR IGNORE INTO
  • Postgres: ON CONFLICT DO NOTHING
  • MongoDB: ordered: false with duplicate error suppression
  • Also applies ignore to the permissions INSERT statement
  • Threads ignore param from Database.phpAdapter → SQL/Mongo implementations

Test plan

  • Run existing createDocuments tests to verify no regression
  • Test createDocuments with ignore: true inserting documents with duplicate IDs — should not throw
  • Verify on MariaDB, Postgres, SQLite adapters
  • Verify permissions table also uses ignore when flag is set

Summary by CodeRabbit

  • New Features

    • Bulk document creation gains an optional "ignore" mode that skips duplicates instead of failing, available across adapters and mirroring.
  • Performance

    • Upsert operations now pre-batch existence checks, reducing per-item lookups and improving large-batch throughput.
  • Behavior

    • Create/insert flows and mirror operations propagate the new ignore flag so duplicate-insert attempts are handled gracefully.

When ignore is true, duplicate documents are silently skipped instead
of throwing. Uses INSERT IGNORE INTO for MariaDB/MySQL, INSERT OR
IGNORE INTO for SQLite, ON CONFLICT DO NOTHING for Postgres, and
ordered:false for MongoDB.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 7, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f9251c10-f1d4-491a-bef3-6a253f2650c7

📥 Commits

Reviewing files that changed from the base of the PR and between eb564c3 and 59aab83.

📒 Files selected for processing (1)
  • src/Database/Database.php
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/Database/Database.php

📝 Walkthrough

Walkthrough

Adds an optional bool $ignore = false parameter across document-creation flows (Adapter interface, Pool, concrete adapters, Mirror, and Database). When ignore is true, adapters alter insert semantics to suppress duplicate-document errors and Database prefetches existing IDs to skip relationship creation for already-existing documents.

Changes

Cohort / File(s) Summary
Abstract Interface
src/Database/Adapter.php
Abstract createDocuments signature updated to (..., bool $ignore = false) and docblock updated to document ignore behavior.
Delegation / Pool
src/Database/Adapter/Pool.php
createDocuments signature extended to accept $ignore and forwards all args via delegate(__FUNCTION__, \func_get_args()).
Core Database
src/Database/Database.php
createDocuments signature extended with $ignore; when $ignore is true, prefetches existing IDs to skip relationship creation for pre-existing documents and forwards $ignore to adapter calls. upsertDocumentsWithIncrease now batched-prefetches existing docs into an existingDocs lookup instead of per-item getDocument() calls.
Mirror
src/Database/Mirror.php
createDocuments signature extended to accept $ignore and forwards it to source and destination create calls.
Mongo Adapter
src/Database/Adapter/Mongo.php
createDocuments accepts $ignore; sets ordered: false for insertMany when ignoring; processes exceptions and returns inserted documents instead of throwing when processed exception is DuplicateException and $ignore is true.
SQL Adapter
src/Database/Adapter/SQL.php
createDocuments accepts $ignore; inserts and permissions INSERTs composed via new helper extension points (getInsertKeyword, getInsertSuffix, getInsertPermissionsSuffix) to support ignore semantics (e.g., INSERT IGNORE).
Postgres Adapter
src/Database/Adapter/Postgres.php
Added helpers getInsertKeyword, getInsertSuffix, getInsertPermissionsSuffix to centralize ON CONFLICT / insert composition for ignore behavior.
SQLite Adapter
src/Database/Adapter/SQLite.php
Added getInsertKeyword(bool $ignore) returning SQLite-specific insert keyword (INSERT OR IGNORE INTO when $ignore is true).

Sequence Diagram(s)

sequenceDiagram
  rect rgba(200,200,255,0.5)
  participant Client
  end
  rect rgba(200,255,200,0.5)
  participant Database
  end
  rect rgba(255,230,200,0.5)
  participant Pool as AdapterPool
  end
  rect rgba(255,200,200,0.5)
  participant Adapter
  end
  rect rgba(240,240,240,0.5)
  participant DB
  end

  Client->>Database: createDocuments(collection, docs, ignore)
  Database->>Database: prefetch existing IDs (if ignore)
  Database->>AdapterPool: createDocuments(collection, chunk, ignore)
  AdapterPool->>Adapter: delegate('createDocuments', args...)
  Adapter->>DB: INSERT (options/suffix based on ignore)
  alt ignore = false
    DB-->>Adapter: Duplicate key error
    Adapter-->>AdapterPool: throws DuplicateException
    AdapterPool-->>Database: exception
    Database-->>Client: exception
  else ignore = true
    DB-->>Adapter: Partial success / duplicate errors suppressed
    Adapter-->>AdapterPool: returns inserted docs
    AdapterPool-->>Database: returns inserted docs
    Database-->>Client: returns inserted count
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • Allow modifying createdAt and updatedAt #642 — Modifies Database::createDocuments; overlaps on create flow changes and prefetch/edge-case handling.
  • Feat mongodb #721 — Related adapter-level createDocuments logic for Mongo; likely intersects on signature and Mongo behavior.
  • Fix ghost perms #813 — Overlapping changes to adapter duplicate-key handling and Pool delegation; directly related to ignore semantics.

Suggested reviewers

  • abnegate
  • fogelito

Poem

🐇 I hopped through rows and queries, whiskers all a-twitch,
I found a small "ignore" and gave duplicates a ditch.
Inserts tiptoe softly where twins once caused a fight,
Relationships skip the old, letting new ones spring to light.
Thump-thump — I celebrate, then vanish into night.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding an ignore parameter to createDocuments for handling duplicates silently instead of throwing errors.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch csv-import-upsert

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Apr 7, 2026

Greptile Summary

This PR adds an optional $ignore parameter to createDocuments across all adapters so that duplicate documents are silently skipped rather than throwing. It also introduces a batch-prefetch optimization in the upsert path to reduce per-document lookups.

  • SQL (MariaDB/MySQL): uses INSERT IGNORE INTO, which suppresses all constraint violations — not only duplicate-key errors (previously flagged)
  • SQLite: uses INSERT OR IGNORE INTO, correctly scoped to this intent
  • Postgres: adds ON CONFLICT DO NOTHING via override methods; _tenant is unquoted in the conflict target while "_uid" is quoted (previously flagged)
  • Mongo: passes ordered: false and catches DuplicateException; the catch block returns the pre-insert input array instead of actual insert results, leading to wrong return data and inflated $modified counts (previously flagged)
  • Database.php: threads $ignore through the call stack and adds a $preExistingIds prefetch to skip relationship creation for duplicates; the prefetch does not scope by tenant in tenantPerDocument mode, which can incorrectly suppress relationship creation for brand-new documents whose IDs happen to match a different tenant's existing records
  • Mirror.php: correctly propagates $ignore to both source and destination

Confidence Score: 3/5

Not safe to merge yet — multiple P1 issues across adapters and Database.php remain unresolved from previous review threads, plus a new tenant-isolation bug.

Several P1 defects remain open: Mongo returns wrong data on duplicate-ignore, MariaDB/MySQL INSERT IGNORE is too broad, Postgres conflict target has inconsistent quoting, and the newly introduced $preExistingIds fetch can produce false positives in tenantPerDocument mode — silently dropping relationship creation for new documents. The upsert batch-fetch regression for large ID sets (> maxQueryValues) also remains. Score reflects that the feature's core intent is sound but the implementation still has data-correctness gaps.

src/Database/Adapter/Mongo.php, src/Database/Adapter/SQL.php, src/Database/Adapter/Postgres.php, src/Database/Database.php

Important Files Changed

Filename Overview
src/Database/Adapter.php Adds bool $ignore = false to the abstract createDocuments signature with a PHPDoc note; a clean, backward-compatible interface change.
src/Database/Adapter/SQL.php Adds getInsertKeyword/getInsertSuffix/getInsertPermissionsSuffix hooks; MariaDB/MySQL defaults to INSERT IGNORE INTO, which silences all errors (truncation, FK violations, etc.) — not just duplicates.
src/Database/Adapter/Mongo.php Sets ordered: false for ignore mode; the exception-catch path returns the original pre-insert $documents array (un-transformed) instead of actual insert results, producing structurally different output on the error path.
src/Database/Adapter/Postgres.php Overrides insert hooks with ON CONFLICT DO NOTHING; _tenant is unquoted in the conflict target while "_uid" is double-quoted, risking a syntax error on case-sensitive identifiers.
src/Database/Adapter/SQLite.php Cleanly overrides getInsertKeyword to emit INSERT OR IGNORE INTO; correct and isolated.
src/Database/Adapter/Pool.php Signature updated to include bool $ignore = false; delegates via func_get_args() — correct.
src/Database/Database.php Threads $ignore through the stack and adds a $preExistingIds prefetch; the prefetch does not scope by tenant in tenantPerDocument mode, which can suppress relationship creation for legitimately new documents.
src/Database/Mirror.php Adds $ignore parameter and propagates it correctly to both source and destination createDocuments calls.

Reviews (4): Last reviewed commit: "Skip relationship writes for duplicates ..." | Re-trigger Greptile

Comment on lines +2617 to +2619
protected function getInsertKeyword(bool $ignore): string
{
return $ignore ? 'INSERT IGNORE INTO' : 'INSERT INTO';
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 INSERT IGNORE INTO silences more than just duplicates

MySQL/MariaDB's INSERT IGNORE INTO suppresses all errors — not only duplicate-key violations. This includes data-truncation warnings (values silently truncated to fit the column), invalid foreign-key references being coerced to NULL, and other constraint violations. The PR intent is "silent duplicate handling", but INSERT IGNORE means a corrupt or structurally invalid row could be silently inserted with wrong data instead of raising an error.

For strict duplicate-only suppression on MariaDB/MySQL, consider catching the specific PDO error code 1062 in the exception handler (similar to how the Mongo adapter does it), or using INSERT INTO … ON DUPLICATE KEY UPDATE <col> = <col> as a no-op.

Comment on lines 1497 to +1503
} catch (MongoException $e) {
throw $this->processException($e);
$processed = $this->processException($e);

if ($ignore && $processed instanceof DuplicateException) {
return $documents;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Exception catch returns the original input array, not the insert results

With ordered: false, MongoDB attempts all inserts and raises a BulkWriteException after the entire batch — some documents may have been successfully inserted and others skipped as duplicates. When the exception is caught here, $documents still holds the pre-try cloned input array because the assignment $documents = $this->client->insertMany(...) never completed. This means:

  1. All original documents are returned — including the genuine duplicates that were never written to the database.
  2. The normal success path transforms documents via replaceChars + client->toArray() before returning them; the exception path skips this transformation, so the two paths return structurally different data.

This also compounds the inflated $modified count issue in Database.php.

return '';
}

$conflictTarget = $this->sharedTables ? '("_uid", _tenant)' : '("_uid")';
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Inconsistent column quoting in conflict target

_uid is double-quoted but _tenant is not:

$conflictTarget = $this->sharedTables ? '("_uid", _tenant)' : '("_uid")';

Both columns should be quoted consistently for safety:

Suggested change
$conflictTarget = $this->sharedTables ? '("_uid", _tenant)' : '("_uid")';
$conflictTarget = $this->sharedTables ? '("_uid", "_tenant")' : '("_uid")';

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/Database/Database.php (1)

5697-5707: ⚠️ Potential issue | 🟠 Major

ignore=true still performs relationship writes for duplicates that are later skipped.

At Line 5697, createDocumentRelationships() runs before insertion outcome is known. With Line 5706 using ignore mode, duplicate docs can be skipped by adapter, but related writes may already be committed. That breaks “silently skipped” semantics and can mutate unrelated data.

⚠️ Suggested fix (pre-filter duplicates before relationship mutation)
+        $preExistingIds = [];
+        $seenInputIds = [];
+        if ($ignore && $this->resolveRelationships) {
+            $inputIds = \array_values(\array_unique(\array_filter(
+                \array_map(fn (Document $doc) => $doc->getId(), $documents)
+            )));
+
+            foreach (\array_chunk($inputIds, $this->maxQueryValues) as $idChunk) {
+                $existing = $this->authorization->skip(fn () => $this->silent(fn () => $this->find(
+                    $collection->getId(),
+                    [
+                        Query::equal('$id', $idChunk),
+                        Query::select(['$id']),
+                        Query::limit(PHP_INT_MAX),
+                    ]
+                )));
+                foreach ($existing as $doc) {
+                    $preExistingIds[$doc->getId()] = true;
+                }
+            }
+        }
+
         foreach ($documents as $document) {
             $createdAt = $document->getCreatedAt();
             $updatedAt = $document->getUpdatedAt();
@@
+            $docId = $document->getId();
+            $isDuplicateInIgnoreMode = $ignore
+                && ($docId !== '')
+                && (isset($preExistingIds[$docId]) || isset($seenInputIds[$docId]));
+            if ($docId !== '') {
+                $seenInputIds[$docId] = true;
+            }
+
-            if ($this->resolveRelationships) {
+            if ($this->resolveRelationships && !$isDuplicateInIgnoreMode) {
                 $document = $this->silent(fn () => $this->createDocumentRelationships($collection, $document));
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Database/Database.php` around lines 5697 - 5707, The code calls
createDocumentRelationships($collection, $document) (inside the
resolveRelationships branch) before the adapter->createDocuments($collection,
$chunk, $ignore) call, so when $ignore is true duplicate documents that the
adapter later skips can still have relationships created; to fix, defer or
pre-filter relationship writes: determine which documents will actually be
inserted (either by pre-checking duplicates via the adapter or by running
adapter->createDocuments inside the transaction first and then, for the returned
list of successfully created documents, call createDocumentRelationships for
those documents), ensuring relationship mutations occur only for documents
confirmed inserted (update the logic around resolveRelationships,
withTransaction, and adapter->createDocuments to call
createDocumentRelationships after insertion or filter the $chunk before
relationship creation).
🧹 Nitpick comments (1)
src/Database/Adapter/Mongo.php (1)

1495-1512: Verify that returning original documents (not inserted documents) is acceptable for callers.

When a DuplicateException occurs and $ignore is true, the returned $documents is the original input array (from line 1475), not the MongoDB response. This means:

  1. Returned documents won't have MongoDB-generated _id values populated
  2. With ordered: false, MongoDB may have successfully inserted some documents, but the caller cannot determine which ones succeeded vs. were skipped as duplicates

If callers need to know which documents were actually inserted or need the generated _id values, this silent handling won't provide that information.

Consider documenting this behavior in the method's docblock to set caller expectations:

     /**
      * Create Documents in batches
      *
      * `@param` Document $collection
      * `@param` array<Document> $documents
      * `@param` bool $ignore If true, silently ignore duplicate documents instead of throwing
+     *                     Note: When ignore is true and duplicates are encountered, returns
+     *                     the original input documents, not the actual inserted documents.
+     *                     MongoDB-generated _id values will not be populated in returned documents.
      *
      * `@return` array<Document>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Database/Adapter/Mongo.php` around lines 1495 - 1512, The current catch
in insertMany (use of insertMany, processException, DuplicateException, $ignore,
and $documents) returns the original input array when a DuplicateException is
caught, so callers will not get Mongo-generated _id values or know which records
succeeded with ordered:false; either change the behavior to return actual
inserted documents by extracting inserted IDs/results from the exception/bulk
write result (or by re-querying inserted documents) before returning, or
explicitly document this behavior in the method docblock: state that when
$ignore is true and a DuplicateException occurs the method returns the original
input records (no _id population and no success/failed distinction), and
recommend callers use a different flow if they require inserted IDs or
per-record success info.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Database/Adapter/SQL.php`:
- Around line 2592-2595: The permissions INSERT for the _perms table is being
executed for all input docs even when $ignore is true, which can add permissions
for documents that were skipped as duplicates; modify the logic around
getInsertKeyword($ignore)/getSQLTable(....'_perms') and the generation of
$permissions so that when $ignore===true you only generate/insert permission
tuples for docs that were actually inserted (or at minimum filter out _uid
values known to pre-exist). Concretely: detect pre-existing _uid entries before
building $permissions (or after a RETURNING of inserted ids), filter the input
set used to build $permissions to exclude those pre-existing IDs when $ignore is
true, and then only run the INSERT into the _perms table (using
getInsertPermissionsSuffix($ignore) as before) with that filtered set to prevent
ACL drift.

---

Outside diff comments:
In `@src/Database/Database.php`:
- Around line 5697-5707: The code calls createDocumentRelationships($collection,
$document) (inside the resolveRelationships branch) before the
adapter->createDocuments($collection, $chunk, $ignore) call, so when $ignore is
true duplicate documents that the adapter later skips can still have
relationships created; to fix, defer or pre-filter relationship writes:
determine which documents will actually be inserted (either by pre-checking
duplicates via the adapter or by running adapter->createDocuments inside the
transaction first and then, for the returned list of successfully created
documents, call createDocumentRelationships for those documents), ensuring
relationship mutations occur only for documents confirmed inserted (update the
logic around resolveRelationships, withTransaction, and adapter->createDocuments
to call createDocumentRelationships after insertion or filter the $chunk before
relationship creation).

---

Nitpick comments:
In `@src/Database/Adapter/Mongo.php`:
- Around line 1495-1512: The current catch in insertMany (use of insertMany,
processException, DuplicateException, $ignore, and $documents) returns the
original input array when a DuplicateException is caught, so callers will not
get Mongo-generated _id values or know which records succeeded with
ordered:false; either change the behavior to return actual inserted documents by
extracting inserted IDs/results from the exception/bulk write result (or by
re-querying inserted documents) before returning, or explicitly document this
behavior in the method docblock: state that when $ignore is true and a
DuplicateException occurs the method returns the original input records (no _id
population and no success/failed distinction), and recommend callers use a
different flow if they require inserted IDs or per-record success info.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ff5730b9-5638-4b21-96b2-8f0420d026c5

📥 Commits

Reviewing files that changed from the base of the PR and between 72ee161 and b62e93e.

📒 Files selected for processing (7)
  • src/Database/Adapter.php
  • src/Database/Adapter/Mongo.php
  • src/Database/Adapter/Pool.php
  • src/Database/Adapter/Postgres.php
  • src/Database/Adapter/SQL.php
  • src/Database/Adapter/SQLite.php
  • src/Database/Database.php

Comment on lines +2592 to 2595
{$this->getInsertKeyword($ignore)} {$this->getSQLTable($name . '_perms')} (_type, _permission, _document {$tenantColumn})
VALUES {$permissions}
{$this->getInsertPermissionsSuffix($ignore)}
";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Ignoring duplicate documents currently risks mutating permissions on existing documents

When $ignore is true, Line 2592 applies ignore semantics to _perms inserts, but permission tuples are still generated for all input docs. If a doc insert is skipped as duplicate, this can still add new permission rows to the already-existing document (ACL drift/escalation).

Please gate permissions generation/insertion to documents that were actually inserted in this call (or at minimum skip known pre-existing _uids when ignore=true).

Suggested fix direction (skip permissions for pre-existing IDs when ignore=true)
@@
-            $permissions = [];
+            $permissions = [];
+            $existingUids = [];
+
+            if ($ignore) {
+                $uidPlaceholders = [];
+                $uidBinds = [];
+                foreach ($documents as $i => $document) {
+                    $ph = ":existing_uid_{$i}";
+                    $uidPlaceholders[] = $ph;
+                    $uidBinds[$ph] = $document->getId();
+                }
+
+                if (!empty($uidPlaceholders)) {
+                    $sqlExisting = "
+                        SELECT _uid
+                        FROM {$this->getSQLTable($name)}
+                        WHERE _uid IN (" . \implode(', ', $uidPlaceholders) . ")
+                        {$this->getTenantQuery($collection)}
+                    ";
+                    $stmtExisting = $this->getPDO()->prepare($sqlExisting);
+                    foreach ($uidBinds as $k => $v) {
+                        $stmtExisting->bindValue($k, $v, $this->getPDOType($v));
+                    }
+                    if ($this->sharedTables) {
+                        $stmtExisting->bindValue(':_tenant', $this->tenant);
+                    }
+                    $stmtExisting->execute();
+                    $existingUids = \array_flip(\array_column($stmtExisting->fetchAll(), '_uid'));
+                    $stmtExisting->closeCursor();
+                }
+            }
@@
-                foreach (Database::PERMISSIONS as $type) {
+                if ($ignore && isset($existingUids[$document->getId()])) {
+                    continue;
+                }
+                foreach (Database::PERMISSIONS as $type) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Database/Adapter/SQL.php` around lines 2592 - 2595, The permissions
INSERT for the _perms table is being executed for all input docs even when
$ignore is true, which can add permissions for documents that were skipped as
duplicates; modify the logic around
getInsertKeyword($ignore)/getSQLTable(....'_perms') and the generation of
$permissions so that when $ignore===true you only generate/insert permission
tuples for docs that were actually inserted (or at minimum filter out _uid
values known to pre-exist). Concretely: detect pre-existing _uid entries before
building $permissions (or after a RETURNING of inserted ids), filter the input
set used to build $permissions to exclude those pre-existing IDs when $ignore is
true, and then only run the INSERT into the _perms table (using
getInsertPermissionsSuffix($ignore) as before) with that filtered set to prevent
ACL drift.

… of per-row getDocument

Replaces N individual getDocument() SELECT queries with a single
find(Query::equal('$id', $ids)) call. For tenant-per-document mode,
groups by tenant and fetches each group separately.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/Database/Database.php (1)

5697-5727: ⚠️ Potential issue | 🔴 Critical

ignore=true still writes and reports skipped duplicates as created.

This path resolves relationships before the insert, so a duplicate that gets ignored can already have created/updated related rows. Then the post-insert loop still increments modified and fires $onNext for every returned document, but the SQL and Mongo adapters currently return the original input array when duplicates are ignored. Please pre-filter duplicates before createDocumentRelationships() and only count/callback the actually inserted subset.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Database/Database.php` around lines 5697 - 5727, The code resolves
relationships for every input document before insert and then uses the original
input array when $ignore=true, which causes skipped duplicates to still have
relationships created and still be counted/callbacked; change the flow so that
when $ignore is true you first determine/filter out duplicates (e.g., via a new
or existing adapter method like exists/findExisting/batchExists) and only call
createDocumentRelationships() for the non-duplicate subset, then call
createDocuments() and when iterating the returned $batch use the adapter’s
returned created documents (or compare returned IDs to inputs) to drive the
onNext callbacks and increment $modified so only actually inserted documents are
counted; update any places referencing createDocumentRelationships,
populateDocumentsRelationships, createDocuments, $ignore,
$inBatchRelationshipPopulation, $resolveRelationships, $onNext and $modified
accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Database/Database.php`:
- Around line 7122-7159: The unconditional pre-read now populates $old and
breaks the auth-disabled path; revert to honoring the original skipPreRead logic
used in Database::upsertDocumentsWithIncrease(): guard the batch-fetch of
existing documents (the block that builds $ids, $existingDocs and populates
$old) with the same condition used previously (i.e. skip the pre-read when
$skipPreRead is true or when !$this->authorization->getStatus() && $attribute
!== ''), so that when authorization is disabled incoming explicit $permissions
are ignored and existing permissions are preserved (creates default to []).
- Around line 7122-7150: The tenant-per-document branch currently calls find()
with Query::equal('$id', $tenantIds) for the whole tenant group which can exceed
validation limits; change the loop that builds $idsByTenant inside the Database
class so that before calling $this->authorization->skip(...
$this->withTenant(... $this->silent(... $this->find(... Query::equal('$id',
...)))) you chunk $tenantIds into batches of at most $this->maxQueryValues (or
the existing maxQueryValues property/method used elsewhere), and invoke the
find() call once per chunk, merging results into $existingDocs keyed by getId();
keep the same call chain (authorization->skip, withTenant, silent, find) and the
Query::limit(count(chunk)) for each chunk.

---

Outside diff comments:
In `@src/Database/Database.php`:
- Around line 5697-5727: The code resolves relationships for every input
document before insert and then uses the original input array when $ignore=true,
which causes skipped duplicates to still have relationships created and still be
counted/callbacked; change the flow so that when $ignore is true you first
determine/filter out duplicates (e.g., via a new or existing adapter method like
exists/findExisting/batchExists) and only call createDocumentRelationships() for
the non-duplicate subset, then call createDocuments() and when iterating the
returned $batch use the adapter’s returned created documents (or compare
returned IDs to inputs) to drive the onNext callbacks and increment $modified so
only actually inserted documents are counted; update any places referencing
createDocumentRelationships, populateDocumentsRelationships, createDocuments,
$ignore, $inBatchRelationshipPopulation, $resolveRelationships, $onNext and
$modified accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4a7ebb0a-06d4-4f9d-aa6e-bcb8ce39a615

📥 Commits

Reviewing files that changed from the base of the PR and between b62e93e and 36b048f.

📒 Files selected for processing (1)
  • src/Database/Database.php

Comment on lines +7122 to 7159
// Batch-fetch existing documents in one query instead of N individual getDocument() calls
$ids = \array_filter(\array_map(fn ($doc) => $doc->getId(), $documents));
$existingDocs = [];

if (!empty($ids)) {
$uniqueIds = \array_values(\array_unique($ids));

if ($this->getSharedTables() && $this->getTenantPerDocument()) {
$old = $this->authorization->skip(fn () => $this->withTenant($document->getTenant(), fn () => $this->silent(fn () => $this->getDocument(
$collection->getId(),
$document->getId(),
))));
// Group IDs by tenant and fetch each group separately
$idsByTenant = [];
foreach ($documents as $doc) {
$tenant = $doc->getTenant();
$idsByTenant[$tenant][] = $doc->getId();
}
foreach ($idsByTenant as $tenant => $tenantIds) {
$tenantIds = \array_values(\array_unique($tenantIds));
$fetched = $this->authorization->skip(fn () => $this->withTenant($tenant, fn () => $this->silent(fn () => $this->find(
$collection->getId(),
[Query::equal('$id', $tenantIds), Query::limit(\count($tenantIds))],
))));
foreach ($fetched as $doc) {
$existingDocs[$doc->getId()] = $doc;
}
}
} else {
$old = $this->authorization->skip(fn () => $this->silent(fn () => $this->getDocument(
$fetched = $this->authorization->skip(fn () => $this->silent(fn () => $this->find(
$collection->getId(),
$document->getId(),
[Query::equal('$id', $uniqueIds), Query::limit(\count($uniqueIds))],
)));
foreach ($fetched as $doc) {
$existingDocs[$doc->getId()] = $doc;
}
}
}

foreach ($documents as $key => $document) {
$old = $existingDocs[$document->getId()] ?? new Document();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep the auth-disabled skipPreRead behavior here.

This unconditional pre-read changes the !$this->authorization->getStatus() && $attribute !== '' path. Once $old is populated again, the permission diff logic below can start honoring incoming $permissions instead of preserving existing permissions / defaulting creates to []. Please keep the old skipPreRead guard around this lookup, or explicitly preserve the old permission normalization before skipPermissionsUpdate is computed. Based on learnings In src/Database/Database.php, Database::upsertDocumentsWithIncrease() sets $skipPreRead = !empty($attribute) && !$this->authorization->getStatus(), so the pre-read-skip path only runs when authorization is disabled. In this path, explicit $permissions on incoming documents are intentionally ignored; updates preserve existing permissions, and creates default to empty permissions ([]), since no auth layer enforces them.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Database/Database.php` around lines 7122 - 7159, The unconditional
pre-read now populates $old and breaks the auth-disabled path; revert to
honoring the original skipPreRead logic used in
Database::upsertDocumentsWithIncrease(): guard the batch-fetch of existing
documents (the block that builds $ids, $existingDocs and populates $old) with
the same condition used previously (i.e. skip the pre-read when $skipPreRead is
true or when !$this->authorization->getStatus() && $attribute !== ''), so that
when authorization is disabled incoming explicit $permissions are ignored and
existing permissions are preserved (creates default to []).

…eries by maxQueryValues

When ignore=true, pre-fetch existing document IDs so
createDocumentRelationships() is skipped for duplicates that
will be silently ignored by the adapter.

Also chunk the batch find() in upsertDocumentsWithIncrease
by maxQueryValues to avoid exceeding query parameter limits.
Comment on lines +5661 to +5678
if ($ignore && $this->resolveRelationships) {
$inputIds = \array_values(\array_unique(\array_filter(
\array_map(fn (Document $doc) => $doc->getId(), $documents)
)));

foreach (\array_chunk($inputIds, $this->maxQueryValues) as $idChunk) {
$existing = $this->authorization->skip(fn () => $this->silent(fn () => $this->find(
$collection->getId(),
[
Query::equal('$id', $idChunk),
Query::select(['$id']),
Query::limit(\count($idChunk)),
]
)));
foreach ($existing as $doc) {
$preExistingIds[$doc->getId()] = true;
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 $preExistingIds fetch is not tenant-scoped in tenantPerDocument mode

In sharedTables && tenantPerDocument mode, the same document $id can legitimately exist for different tenants. This find() call is not wrapped in withTenant(), so it returns documents owned by any tenant that happen to share the same ID. If tenant A already has a document with $id = "foo", and the current batch is inserting a brand-new "foo" for tenant B, "foo" will land in $preExistingIds and createDocumentRelationships will be silently skipped for tenant B's new document — leaving it without relationships.

The upsert optimization added in the same PR at lines 7152–7170 handles this correctly by grouping IDs per tenant before fetching. The same pattern should be applied here:

if ($ignore && $this->resolveRelationships) {
    if ($this->adapter->getSharedTables() && $this->adapter->getTenantPerDocument()) {
        $idsByTenant = [];
        foreach ($documents as $doc) {
            $idsByTenant[$doc->getTenant()][] = $doc->getId();
        }
        foreach ($idsByTenant as $tenant => $tenantIds) {
            $tenantIds = \array_values(\array_unique($tenantIds));
            foreach (\array_chunk($tenantIds, $this->maxQueryValues) as $idChunk) {
                $existing = $this->authorization->skip(
                    fn () => $this->withTenant($tenant,
                        fn () => $this->silent(
                            fn () => $this->find($collection->getId(), [
                                Query::equal('$id', $idChunk),
                                Query::select(['$id']),
                                Query::limit(\count($idChunk)),
                            ])
                        )
                    )
                );
                foreach ($existing as $doc) {
                    // Key by tenant+id to avoid cross-tenant false positives
                    $preExistingIds[$tenant . '/' . $doc->getId()] = true;
                }
            }
        }
    } else {
        // existing non-tenantPerDocument loop …
    }
}

The check at line 5720 would then need to key on $tenant . '/' . $document->getId() for the tenantPerDocument branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant