Skip to content

feat(nodes): improve Milvus vector DB node — address all TODOs#562

Merged
stepmikhaylov merged 5 commits intorocketride-org:developfrom
charliegillet:feature/milvus-todo-improvements
Apr 6, 2026
Merged

feat(nodes): improve Milvus vector DB node — address all TODOs#562
stepmikhaylov merged 5 commits intorocketride-org:developfrom
charliegillet:feature/milvus-todo-improvements

Conversation

@charliegillet
Copy link
Copy Markdown
Contributor

@charliegillet charliegillet commented Mar 31, 2026

Summary

  • Resolve all 7 TODO/FIXME comments in the Milvus vector DB node (nodes/src/nodes/milvus/milvus.py)
  • Implement bulk insert batching, configurable timeouts, connection error handling, and batched metadata updates
  • Align with patterns already used in Qdrant and Pinecone node implementations

Type

Feature Enhancement

Why this feature fits this codebase

The Milvus node contained 7 TODO comments left by the original authors flagging known performance and reliability gaps. These are not speculative improvements — they are changes the maintainers explicitly marked as needed:

  • TODO line 101 & 483 — Hardcoded timeout=20 with no config option. Now reads timeout from node config (default 60s), matching Qdrant's pattern (timeout=60).
  • TODO lines 449 & 464 — One-at-a-time upsert() calls in addChunks(). Now batches chunks (configurable bulkInsertBatchSize, default 50) before upserting, following the same flush-batch pattern used in Pinecone and Qdrant nodes.
  • TODO lines 514-515 & 546-547 — Per-vector upsert loop in markDeleted()/markActive() flagged as a performance bottleneck. Extracted into _batchUpsertResults() helper that collects results into batches.
  • TODO line 253 — Unclear COSINE score range semantics. Documented that Milvus returns [0, 2] and the rescaling to [0, 1] is correct.
  • Connection initialization now wrapped in try/except to surface meaningful error messages on connection failure.

What changed

File Change
nodes/src/nodes/milvus/milvus.py Configurable timeout + batch size via config; connection error handling; bulk insert in addChunks(); _batchUpsertResults() helper for markDeleted/markActive; timeout on remove(); COSINE score documentation; typo fixes

Validation

  • ruff format --check passes
  • ruff check passes
  • All changes confined to a single file with no new dependencies
  • Batch size and timeout are backward-compatible (sensible defaults, opt-in config)

How this can be extended

  • The bulkInsertBatchSize config could be tuned per deployment (cloud vs local Milvus)
  • _batchUpsertResults() could be reused for future bulk metadata update operations
  • Connection retry logic (exponential backoff) could be added on top of the error handling

Closes: N/A — new contribution, no pre-existing issue

#Hack-with-bay-2

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Enhanced error handling and diagnostics for Milvus node connections to improve reliability.
  • Chores

    • Optimized bulk insert and data update operations through efficient batch processing.
    • Improved timeout management and control across Milvus client operations.
    • Refined data deletion functionality with better error reporting.

charliegillet and others added 2 commits March 30, 2026 17:11
…screen

Handle TASK_STATE.STOPPING in the control button to show "Stopping..."
with a disabled state and distinct orange styling, preventing duplicate
clicks and giving immediate visual feedback during pipeline shutdown.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add configurable timeout (default 60s) replacing hardcoded timeout=20,
  read from node config via 'timeout' key (TODO line 101, 483)
- Add connection error handling with meaningful failure messages
  instead of raw pymilvus exceptions propagating
- Implement bulk insert with configurable batch size (default 50) for
  addChunks(), replacing one-at-a-time upserts (TODO lines 449, 464)
- Add _batchUpsertResults() helper to batch-update markDeleted/markActive
  operations, eliminating the per-vector upsert loop bottleneck
  (TODO lines 514-515, 546-547)
- Add timeout parameter to remove() delete call (TODO line 483)
- Document Milvus COSINE distance score range [0,2] rescaling to [0,1]
  for codebase consistency (TODO line 253)
- Fix typos in docstrings and comments

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 31, 2026

📝 Walkthrough

Walkthrough

Optimizes Milvus integration by introducing batched upsert operations, configurable timeouts, and improved error handling. Changes focus on bulk insert performance, connection robustness, and standardized field queries.

Changes

Cohort / File(s) Summary
Configuration & Connection Handling
nodes/src/nodes/milvus/milvus.py
Added module-level defaults for DEFAULT_BULK_INSERT_BATCH_SIZE and DEFAULT_TIMEOUT; initialized self.timeout and self.bulkInsertBatchSize from configuration; updated client connection to use timeout, standardize non-local URIs to HTTPS, and wrap connection in try/except error handling.
Batching & Performance
nodes/src/nodes/milvus/milvus.py
Refactored addChunks from per-chunk upserts to batched upserts using self.bulkInsertBatchSize; introduced _batchUpsertResults helper method for batch operations with metadata updates; updated markDeleted and markActive to replace per-result loops with single batched calls.
Query & Error Handling
nodes/src/nodes/milvus/milvus.py
Enhanced remove to pass timeout parameter and log/re-raise deletion errors; updated markDeleted and markActive with explicit output_fields; refined render query to request specific output fields; improved COSINE score handling comments.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • jmaionchi

Poem

🐰 Hop hop, batch away!
Timeouts tamed with timeout's say,
Milvus dances, swift and fleet,
Bulk upserts make queries neat! 🚀

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: improving the Milvus vector DB node by addressing TODO items, which directly aligns with the changeset's focus on resolving seven TODOs and implementing multiple enhancements.
Docstring Coverage ✅ Passed Docstring coverage is 90.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added module:nodes Python pipeline nodes module:vscode VS Code extension labels Mar 31, 2026
The PageStatus changes belong in a separate PR (rocketride-org#549) and were
accidentally included here.
@github-actions github-actions Bot removed the module:vscode VS Code extension label Mar 31, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nodes/src/nodes/milvus/milvus.py`:
- Around line 111-123: The connection logic in the constructor always falls
through to the branch that prepends "https://" because self.host was already
stripped of any scheme; remove the dead startswith('http')/startswith('https')
check in the block that builds the MilvusClient URI and instead consistently
construct the URI (e.g., uri=f'https://{self.host}' when profile != 'local' and
uri=f'http://{self.host}:{self.port}' for local), and when re-raising the
failure convert raise Exception(f'Failed to connect to Milvus at {self.host}:
{e}') to use exception chaining (raise Exception(... ) from e) so the original
traceback is preserved; update the code that instantiates MilvusClient (the
self.client assignment) and the except handler accordingly.
- Around line 528-549: The three Milvus query calls that feed
_batchUpsertResults (used by markDeleted/markActive) and the query in
renderChunks are missing output_fields and thus return only the primary key; fix
each query call to include output_fields so full records are returned: for the
queries whose results are later upserted (the ones passing into
_batchUpsertResults/used by markDeleted and markActive) add
output_fields=['id','vector','content','meta'] (or at minimum
['vector','content','meta']) so _batchUpsertResults upserts complete documents,
and for the query in renderChunks add output_fields=['meta','content']
(including any field that contains chunkId if it's stored outside meta) to avoid
KeyError when accessing point['content'] and point['chunkId'] in renderChunks.
Ensure these are added to the Milvus client.query(...) calls that feed
_batchUpsertResults and renderChunks.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 51b827e6-948b-4944-bafa-6c6df22a4427

📥 Commits

Reviewing files that changed from the base of the PR and between b6b2dc1 and 474dc7a.

📒 Files selected for processing (1)
  • nodes/src/nodes/milvus/milvus.py

Comment thread nodes/src/nodes/milvus/milvus.py Outdated
Copy link
Copy Markdown
Collaborator

@stepmikhaylov stepmikhaylov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charliegillet

Thank you for addressing these todo-s. CodeRabbit detected a critical issue that seems to be very relevant. It's a significant improvement for our Milvus node, and it would be helpful if you could address it so that we can merge these changes.

- Remove dead protocol check (host already stripped of scheme at init)
- Add exception chaining with 'from e' for connection errors (B904)
- Add output_fields to markDeleted/markActive queries to prevent data
  loss during upsert (was only returning primary key)
- Add output_fields to renderChunks query to prevent KeyError on
  content/chunkId access

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the module:vscode VS Code extension label Apr 3, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
nodes/src/nodes/milvus/milvus.py (3)

453-497: ⚠️ Potential issue | 🟡 Minor

Partial failure can leave data in inconsistent state.

Old chunks are deleted at line 459 before new chunks are inserted. If an exception occurs during any batch upsert (lines 470-475), some batches succeed while others don't, leaving the collection in an inconsistent state with partial data for those objectIds.

Consider wrapping the entire operation in a transactional pattern or implementing rollback logic. Alternatively, insert new chunks first with a temporary flag, then delete old chunks, then update the flag—though this adds complexity.

This is an existing concern amplified by batching. For now, documenting this limitation may be sufficient.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nodes/src/nodes/milvus/milvus.py` around lines 453 - 497, The current flow
deletes old chunks via self.client.delete (filter_condition) before inserting
new data with flush_batch()/self.client.upsert, which can leave the collection
partially updated if upserts fail; to fix, implement an atomic/rollback-safe
pattern: either 1) read and persist a backup of the deleted entities (fetch
current chunks for the objectIds before calling self.client.delete) so you can
restore them on upsert failure, or 2) change ordering to upsert new chunks with
a temporary marker/meta flag (use the same batch/upsert logic in flush_batch),
then only after all upserts succeed delete old chunks and finally remove the
temporary flag, or 3) upsert into a temporary collection and swap/rename on
success; modify the code around the delete call and the flush_batch/upsert logic
(functions/methods: the block that builds filter_condition, self.client.delete,
flush_batch, and where tmp_struct is created) to implement one of these
strategies and ensure errors during upsert trigger restore/cleanup logic.

626-637: ⚠️ Potential issue | 🟠 Major

Line 637 will raise KeyError: chunkId is inside the meta object.

The query at line 626 returns output_fields=['meta', 'content'], so results contain meta as a nested object. Throughout the file (lines 229, 233, 237, 412, 624), chunkId is consistently accessed as meta['chunkId'] in filter expressions, confirming it is a field within the meta JSON object, not at the top level. Line 637 must be changed from point['chunkId'] to point['meta']['chunkId'].

Required fix
             for point in results:
                 content = point['content']
-                chunk = point['chunkId']
+                chunk = point['meta']['chunkId']
                 index = chunk - offset
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nodes/src/nodes/milvus/milvus.py` around lines 626 - 637, The for-loop
processing Milvus query results uses point['chunkId'] but the query requested
output_fields=['meta','content'], so chunkId lives under meta; update the loop
in the block handling results (after self.client.query(...) and where variables
text, lastIndex are set and the loop "for point in results" runs) to read the
chunk id as point['meta']['chunkId'] (and optionally guard against missing meta
to avoid KeyError) instead of point['chunkId'] so indexing succeeds when
building the text array for renderChunkSize.

264-270: ⚠️ Potential issue | 🔴 Critical

Fix the incorrect COSINE distance documentation.

The inline comment incorrectly states Milvus COSINE returns [0, 2] with 0 meaning identical. According to Milvus documentation, COSINE actually returns [-1, 1] where 1 is identical and -1 is opposite. The formula (distance + 1) / 2 is correct and properly maps [-1, 1] to [0, 1], but the comment describing this transformation must be corrected to reflect the actual distance range and semantics.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nodes/src/nodes/milvus/milvus.py` around lines 264 - 270, The inline comment
near the COSINE handling (the block checking self.similarity == 'COSINE' and the
score = (point.get('distance') + 1) / 2 calculation) is incorrect; update it to
state that Milvus COSINE returns values in the range [-1, 1] where 1 is
identical and -1 is opposite, and note that the formula maps [-1, 1] to [0, 1]
with 1 meaning most similar to stay consistent with the codebase scoring
convention.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nodes/src/nodes/milvus/milvus.py`:
- Around line 98-100: Validate and coerce the configured timeout and
bulkInsertBatchSize to positive integers when assigning self.timeout and
self.bulkInsertBatchSize: convert values from config to int, ensure they are >
0, and if not either clamp to DEFAULT_TIMEOUT / DEFAULT_BULK_INSERT_BATCH_SIZE
or raise a clear ValueError (and/or log a warning) so the batching logic that
checks len(batch) >= self.bulkInsertBatchSize behaves correctly; update the
initialization that sets self.timeout and self.bulkInsertBatchSize to perform
this validation and fallback logic.
- Around line 525-546: The _batchUpsertResults method should take isDeleted as a
keyword-only parameter and add error handling around each upsert call: change
the signature to make isDeleted keyword-only (e.g., def
_batchUpsertResults(self, results: List[dict], *, isDeleted: bool) -> None) so
callers must use isDeleted=..., and wrap each
self.client.upsert(collection_name=self.collection, data=batch) inside a
try/except that logs the exception (use the same logger pattern as flush_batch
in addChunks) and either retries or re-raises after logging to avoid silent
partial updates; ensure you update references to _batchUpsertResults call sites
accordingly.

---

Outside diff comments:
In `@nodes/src/nodes/milvus/milvus.py`:
- Around line 453-497: The current flow deletes old chunks via
self.client.delete (filter_condition) before inserting new data with
flush_batch()/self.client.upsert, which can leave the collection partially
updated if upserts fail; to fix, implement an atomic/rollback-safe pattern:
either 1) read and persist a backup of the deleted entities (fetch current
chunks for the objectIds before calling self.client.delete) so you can restore
them on upsert failure, or 2) change ordering to upsert new chunks with a
temporary marker/meta flag (use the same batch/upsert logic in flush_batch),
then only after all upserts succeed delete old chunks and finally remove the
temporary flag, or 3) upsert into a temporary collection and swap/rename on
success; modify the code around the delete call and the flush_batch/upsert logic
(functions/methods: the block that builds filter_condition, self.client.delete,
flush_batch, and where tmp_struct is created) to implement one of these
strategies and ensure errors during upsert trigger restore/cleanup logic.
- Around line 626-637: The for-loop processing Milvus query results uses
point['chunkId'] but the query requested output_fields=['meta','content'], so
chunkId lives under meta; update the loop in the block handling results (after
self.client.query(...) and where variables text, lastIndex are set and the loop
"for point in results" runs) to read the chunk id as point['meta']['chunkId']
(and optionally guard against missing meta to avoid KeyError) instead of
point['chunkId'] so indexing succeeds when building the text array for
renderChunkSize.
- Around line 264-270: The inline comment near the COSINE handling (the block
checking self.similarity == 'COSINE' and the score = (point.get('distance') + 1)
/ 2 calculation) is incorrect; update it to state that Milvus COSINE returns
values in the range [-1, 1] where 1 is identical and -1 is opposite, and note
that the formula maps [-1, 1] to [0, 1] with 1 meaning most similar to stay
consistent with the codebase scoring convention.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: abacd5cf-3989-42a5-9cb4-9397ba9a37b3

📥 Commits

Reviewing files that changed from the base of the PR and between 474dc7a and 44b34aa.

📒 Files selected for processing (3)
  • apps/vscode/src/providers/views/PageStatus/PageStatus.tsx
  • apps/vscode/src/providers/views/PageStatus/styles.css
  • nodes/src/nodes/milvus/milvus.py

Comment thread nodes/src/nodes/milvus/milvus.py Outdated
Comment thread nodes/src/nodes/milvus/milvus.py Outdated
…-org#562

- Remove unrelated PageStatus changes that were re-introduced
- Validate timeout and bulkInsertBatchSize to ensure positive values
- Make isDeleted a keyword-only argument in _batchUpsertResults

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot removed the module:vscode VS Code extension label Apr 3, 2026
@charliegillet
Copy link
Copy Markdown
Contributor Author

Addressed all review feedback:

  • Fixed critical dead protocol check (host already stripped of scheme at init)
  • Added exception chaining with from e per PEP 3134
  • Added output_fields to markDeleted/markActive/renderChunks queries to prevent data loss
  • Validated timeout and bulkInsertBatchSize to ensure positive values
  • Made isDeleted keyword-only in _batchUpsertResults
  • Removed unrelated PageStatus changes that were accidentally re-introduced

Thanks for the review!

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
nodes/src/nodes/milvus/milvus.py (1)

525-546: ⚠️ Potential issue | 🟡 Minor

Add error handling around upsert calls for consistency with flush_batch.

The flush_batch function in addChunks wraps upsert calls in try/except with logging before re-raise. This method lacks equivalent handling—if an upsert fails mid-batch, there's no diagnostic logging before the exception propagates.

Proposed fix
     def _batchUpsertResults(self, results: List[dict], *, isDeleted: bool) -> None:
         """
         Batch-update the isDeleted metadata field on a list of query results.

         Collects results into batches of bulkInsertBatchSize and upserts them
         together, avoiding the performance bottleneck of one-at-a-time upserts.
         """
         batch: List[dict] = []

         for result in results:
             meta = result.get('meta', {})
             meta['isDeleted'] = isDeleted
             result['meta'] = meta
             batch.append(result)

             if len(batch) >= self.bulkInsertBatchSize:
-                self.client.upsert(collection_name=self.collection, data=batch)
+                try:
+                    self.client.upsert(collection_name=self.collection, data=batch)
+                except Exception as e:
+                    engLib.debug(f'Error during batch upsert ({len(batch)} results): {e}')
+                    raise
                 batch = []

         # Flush remaining
         if batch:
-            self.client.upsert(collection_name=self.collection, data=batch)
+            try:
+                self.client.upsert(collection_name=self.collection, data=batch)
+            except Exception as e:
+                engLib.debug(f'Error during batch upsert ({len(batch)} results): {e}')
+                raise
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nodes/src/nodes/milvus/milvus.py` around lines 525 - 546, _batchUpsertResults
currently calls self.client.upsert directly and lacks the try/except logging
that flush_batch (used in addChunks) provides; wrap each upsert call (both
inside the batch-full branch and the final flush) in a try/except that catches
Exception, logs a descriptive error including the collection name
(self.collection), batch size/contents or len(batch), and the exception (use the
same logger used by flush_batch/addChunks), then re-raise the exception so
behavior remains the same but with diagnostic logging.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nodes/src/nodes/milvus/milvus.py`:
- Around line 466-476: The nested function flush_batch is missing a return type
annotation (Ruff ANN202); update its definition in milvus.py to add an explicit
return type of None for flush_batch (i.e., annotate the function signature for
the nested def flush_batch() to -> None) and ensure the rest of the body
unchanged.
- Around line 98-100: The code sets self.timeout and self.bulkInsertBatchSize
from config using DEFAULT_TIMEOUT and DEFAULT_BULK_INSERT_BATCH_SIZE but those
keys are not present in the services.json schema, so add JSON schema entries for
"timeout" and "bulkInsertBatchSize" to services.json (with appropriate types,
minimum 1, and default values matching DEFAULT_TIMEOUT and
DEFAULT_BULK_INSERT_BATCH_SIZE) so users can configure them via the UI;
alternatively, if you intend to keep them non-configurable, add a clarifying
comment next to the assignments in milvus.py referencing DEFAULT_TIMEOUT and
DEFAULT_BULK_INSERT_BATCH_SIZE to document that these values are intentionally
not exposed.

---

Duplicate comments:
In `@nodes/src/nodes/milvus/milvus.py`:
- Around line 525-546: _batchUpsertResults currently calls self.client.upsert
directly and lacks the try/except logging that flush_batch (used in addChunks)
provides; wrap each upsert call (both inside the batch-full branch and the final
flush) in a try/except that catches Exception, logs a descriptive error
including the collection name (self.collection), batch size/contents or
len(batch), and the exception (use the same logger used by
flush_batch/addChunks), then re-raise the exception so behavior remains the same
but with diagnostic logging.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: fc9cd903-fc58-4eec-a226-c3a0148c28d4

📥 Commits

Reviewing files that changed from the base of the PR and between 44b34aa and cafbd63.

📒 Files selected for processing (1)
  • nodes/src/nodes/milvus/milvus.py

Comment thread nodes/src/nodes/milvus/milvus.py
Comment thread nodes/src/nodes/milvus/milvus.py
@stepmikhaylov stepmikhaylov enabled auto-merge (squash) April 6, 2026 13:36
@stepmikhaylov stepmikhaylov merged commit a368e58 into rocketride-org:develop Apr 6, 2026
13 checks passed
shashidharbabu pushed a commit that referenced this pull request Apr 7, 2026
* feat(vscode): improve stop button feedback in Pipeline Observability screen

Handle TASK_STATE.STOPPING in the control button to show "Stopping..."
with a disabled state and distinct orange styling, preventing duplicate
clicks and giving immediate visual feedback during pipeline shutdown.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(nodes): improve Milvus vector DB node — address all TODOs

- Add configurable timeout (default 60s) replacing hardcoded timeout=20,
  read from node config via 'timeout' key (TODO line 101, 483)
- Add connection error handling with meaningful failure messages
  instead of raw pymilvus exceptions propagating
- Implement bulk insert with configurable batch size (default 50) for
  addChunks(), replacing one-at-a-time upserts (TODO lines 449, 464)
- Add _batchUpsertResults() helper to batch-update markDeleted/markActive
  operations, eliminating the per-vector upsert loop bottleneck
  (TODO lines 514-515, 546-547)
- Add timeout parameter to remove() delete call (TODO line 483)
- Document Milvus COSINE distance score range [0,2] rescaling to [0,1]
  for codebase consistency (TODO line 253)
- Fix typos in docstrings and comments

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove unrelated PageStatus "Stopping..." changes from Milvus PR

The PageStatus changes belong in a separate PR (#549) and were
accidentally included here.

* fix(nodes): address CodeRabbit feedback on Milvus PR #562

- Remove dead protocol check (host already stripped of scheme at init)
- Add exception chaining with 'from e' for connection errors (B904)
- Add output_fields to markDeleted/markActive queries to prevent data
  loss during upsert (was only returning primary key)
- Add output_fields to renderChunks query to prevent KeyError on
  content/chunkId access

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(nodes): address remaining review feedback on Milvus PR #562

- Remove unrelated PageStatus changes that were re-introduced
- Validate timeout and bulkInsertBatchSize to ensure positive values
- Make isDeleted a keyword-only argument in _batchUpsertResults

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:nodes Python pipeline nodes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants