Skip to content

Fix/validate nan query vectors#1156

Open
Goodnight77 wants to merge 2 commits intoqdrant:devfrom
Goodnight77:fix/validate-nan-query-vectors
Open

Fix/validate nan query vectors#1156
Goodnight77 wants to merge 2 commits intoqdrant:devfrom
Goodnight77:fix/validate-nan-query-vectors

Conversation

@Goodnight77
Copy link
Copy Markdown

description

validate query vectors for nan values in grpc client before sending to the server
before : grpc client would return points with nan scores when given vector that contains nan, while local and http clients properly rejected it
those nan values in query vectors would propagate through distance calculations, this can causes confusion for users who receives invalid results without clear error message

All Submissions:

  • Contributions should target the dev branch. Did you create your branch from dev?
  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  1. Does your submission pass tests?
  2. Have you installed pre-commit with pip3 install pre-commit and set up hooks with pre-commit install?

Changes to Core Features:

  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your core changes, as applicable?
  • Have you successfully ran tests with your changes locally?

Copilot AI review requested due to automatic review settings February 19, 2026 22:43
@netlify
Copy link
Copy Markdown

netlify Bot commented Feb 19, 2026

Deploy Preview for poetic-froyo-8baba7 ready!

Name Link
🔨 Latest commit e226991
🔍 Latest deploy log https://app.netlify.com/projects/poetic-froyo-8baba7/deploys/69f1a67385d9b800086b9273
😎 Deploy Preview https://deploy-preview-1156--poetic-froyo-8baba7.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR prevents NaN values in query vectors from being sent via the gRPC client, aligning gRPC behavior with the local and HTTP clients to avoid returning results with nan scores.

Changes:

  • Add NaN validation to REST→gRPC vector conversions and raise an error before issuing gRPC requests.
  • Add/extend tests to ensure gRPC queries (and conversion helpers) reject NaN vectors.
  • Minor formatting/whitespace cleanups in tests, docs tooling, proto comments, and docs assets.

Reviewed changes

Copilot reviewed 6 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
qdrant_client/conversions/conversion.py Adds NaN validation for dense/sparse vectors during REST→gRPC conversion.
tests/congruence_tests/test_query.py Updates congruence test to assert gRPC also rejects NaN queries.
tests/conversions/test_validate_conversions.py Adds unit tests ensuring conversion helpers raise on NaN vectors.
tests/test_migrate.py Minor formatting fix (trailing comma / compact init).
tests/congruence_tests/test_sparse_updates.py Minor formatting simplification.
tests/congruence_tests/test_multivector_updates.py Minor formatting simplification.
tools/generate_docs.sh Minor formatting/line normalization.
qdrant_client/proto/points_service.proto Whitespace cleanup in comments.
docs/images/api-icon.svg Line normalization (no functional change).
docs/.gitignore Line normalization (no functional change).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread qdrant_client/conversions/conversion.py
Comment thread qdrant_client/conversions/conversion.py Outdated
Comment on lines +3607 to +3620
UnexpectedResponse: If the vector contains NaN values
"""
if any(math.isnan(val) for val in vector):
from qdrant_client.http.exceptions import UnexpectedResponse
from httpx import Headers
import json

raise UnexpectedResponse(
status_code=400,
reason_phrase="Bad Request",
content=json.dumps({"error": "Query vector must not contain NaN values"}).encode(),
headers=Headers({}),
)

Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_validate_no_nan raises qdrant_client.http.exceptions.UnexpectedResponse by manually constructing an HTTP-like response (status_code/reason_phrase/headers/content) even though this is client-side validation in the gRPC conversion layer. This is semantically misleading and couples gRPC-only usage to HTTP error types; consider raising a dedicated client-side validation exception (or ValueError) and letting higher-level client code translate it, or otherwise centralize this error construction so it matches the rest of the client's error semantics.

Suggested change
UnexpectedResponse: If the vector contains NaN values
"""
if any(math.isnan(val) for val in vector):
from qdrant_client.http.exceptions import UnexpectedResponse
from httpx import Headers
import json
raise UnexpectedResponse(
status_code=400,
reason_phrase="Bad Request",
content=json.dumps({"error": "Query vector must not contain NaN values"}).encode(),
headers=Headers({}),
)
ValueError: If the vector contains NaN values
"""
if any(math.isnan(val) for val in vector):
raise ValueError("Query vector must not contain NaN values")

Copilot uses AI. Check for mistakes.
Comment thread qdrant_client/conversions/conversion.py Outdated
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 19, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 032756aa-ed89-4112-9eda-c37c6dbec969

📥 Commits

Reviewing files that changed from the base of the PR and between 98f2b54 and e226991.

📒 Files selected for processing (3)
  • qdrant_client/conversions/conversion.py
  • tests/congruence_tests/test_query.py
  • tests/conversions/test_validate_conversions.py
✅ Files skipped from review due to trivial changes (1)
  • tests/conversions/test_validate_conversions.py

📝 Walkthrough

Walkthrough

This pull request introduces NaN validation for vector conversions in the REST-to-gRPC conversion layer. A new static helper method _validate_no_nan is added to the RestToGrpc class that checks each element of a float list for NaN values using math.isnan. The convert_dense_vector and convert_sparse_vector methods are updated to call this validation before constructing their respective gRPC vector structs, raising an UnexpectedResponse with HTTP 400 status if NaN is detected. Tests are updated to verify this validation behavior for both dense and sparse vectors, and to assert that queries containing NaN values are properly rejected.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Fix/validate nan query vectors' directly and concisely describes the main change: adding NaN validation for query vectors in the gRPC client.
Description check ✅ Passed The description clearly explains the problem (gRPC client not rejecting NaN vectors like other clients), the solution (validate vectors before sending), and the rationale (prevent invalid results from NaN propagation).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 6/8 reviews remaining, refill in 11 minutes and 21 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/congruence_tests/test_query.py (1)

1550-1551: Consider adding sparse and multi-dense NaN coverage to this congruence test.

The congruence test test_query_with_nan currently exercises only the dense vector NaN path for gRPC. Sparse and multi-dense NaN rejections are only covered at the unit level (test_validate_conversions.py). Adding end-to-end congruence assertions for those paths here would provide stronger integration confidence.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/congruence_tests/test_query.py` around lines 1550 - 1551, The
congruence test only covers the dense-vector NaN path; extend
test_query_with_nan to also exercise sparse and multi-dense NaN rejection by
invoking grpc_client.query_points with queries that include NaN in a sparse
vector and in a multi-dense payload and asserting they raise UnexpectedResponse
(same as the existing dense case using COLLECTION_NAME and query), i.e., add
additional pytest.raises(UnexpectedResponse) calls around
grpc_client.query_points for the sparse and multi-dense query variants so the
end-to-end gRPC behavior matches the unit-level tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@qdrant_client/conversions/conversion.py`:
- Around line 3601-3619: _in _validate_no_nan replace the NaN-only check with a
finite check by using math.isfinite (i.e., raise when not math.isfinite(val)) so
±Inf are rejected too, and move the lazy imports (json, Headers from httpx, and
UnexpectedResponse from qdrant_client.http.exceptions) to the module top-level
instead of importing them inside the exception branch; keep the same raised
UnexpectedResponse construction and message but use the module-level names.

---

Nitpick comments:
In `@tests/congruence_tests/test_query.py`:
- Around line 1550-1551: The congruence test only covers the dense-vector NaN
path; extend test_query_with_nan to also exercise sparse and multi-dense NaN
rejection by invoking grpc_client.query_points with queries that include NaN in
a sparse vector and in a multi-dense payload and asserting they raise
UnexpectedResponse (same as the existing dense case using COLLECTION_NAME and
query), i.e., add additional pytest.raises(UnexpectedResponse) calls around
grpc_client.query_points for the sparse and multi-dense query variants so the
end-to-end gRPC behavior matches the unit-level tests.

Comment on lines +3601 to +3619
@staticmethod
def _validate_no_nan(vector: list[float]) -> None:
"""validate that vector does not contain NaN values
Args:
vector: List of float values to validate
Raises:
UnexpectedResponse: If the vector contains NaN values
"""
if any(math.isnan(val) for val in vector):
from qdrant_client.http.exceptions import UnexpectedResponse
from httpx import Headers
import json

raise UnexpectedResponse(
status_code=400,
reason_phrase="Bad Request",
content=json.dumps({"error": "Query vector must not contain NaN values"}).encode(),
headers=Headers({}),
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check top-level imports in conversion.py
echo "=== First 50 lines of conversion.py (checking imports) ==="
head -50 qdrant_client/conversions/conversion.py

echo ""
echo "=== Check if json is already imported at module level ==="
rg -n "^import json" qdrant_client/conversions/conversion.py

echo ""
echo "=== Check if Headers is imported at module level ==="
rg -n "from httpx import|import httpx" qdrant_client/conversions/conversion.py

echo ""
echo "=== Check if UnexpectedResponse is imported at module level ==="
rg -n "from qdrant_client.http.exceptions import" qdrant_client/conversions/conversion.py

Repository: qdrant/qdrant-client

Length of output: 2204


🏁 Script executed:

#!/bin/bash
# Check how other parts of the project validate vectors or handle NaN/Inf
echo "=== Check for any existing Inf validation in the codebase ==="
rg "isfinite|isinf|math.isnan" qdrant_client/ --type=py -C2

echo ""
echo "=== Check HTTP client exception usage ==="
rg "UnexpectedResponse" qdrant_client/ --type=py | head -20

Repository: qdrant/qdrant-client

Length of output: 2435


🏁 Script executed:

#!/bin/bash
# Check if Headers is used elsewhere in the conversion.py file
echo "=== Check all uses of Headers in conversion.py ==="
rg "Headers" qdrant_client/conversions/conversion.py -n

echo ""
echo "=== Check all uses of json in conversion.py ==="
rg "json\." qdrant_client/conversions/conversion.py -n | head -10

Repository: qdrant/qdrant-client

Length of output: 353


Broaden validation to reject ±Inf and move imports to module level.

Two valid observations on _validate_no_nan:

  1. Infinity values (float('inf'), float('-inf')) are equally problematic in distance computations and will propagate silently like NaN. The codebase already validates this pattern elsewhere (see local_collection.py:isfinite(), distances.py:isinf(), hybrid/formula.py:isfinite()). Replace math.isnan(val) with not math.isfinite(val) for consistency.

  2. Lazy imports for json, Headers, and UnexpectedResponse inside the error branch reduce readability and add unnecessary coupling. These should be at module level with the other imports, following standard Python practices.

Suggested refactor

At the top of the file, add:

import json
from httpx import Headers
from qdrant_client.http.exceptions import UnexpectedResponse

Then simplify the method:

     `@staticmethod`
     def _validate_no_nan(vector: list[float]) -> None:
-        """validate that vector does not contain NaN values
+        """Validate that vector does not contain NaN or Inf values.
         Args:
             vector: List of float values to validate
         Raises:
-            UnexpectedResponse: If the vector contains NaN values
+            UnexpectedResponse: If the vector contains NaN or Inf values
         """
-        if any(math.isnan(val) for val in vector):
-            from qdrant_client.http.exceptions import UnexpectedResponse
-            from httpx import Headers
-            import json
-
-            raise UnexpectedResponse(
-                status_code=400,
-                reason_phrase="Bad Request",
-                content=json.dumps({"error": "Query vector must not contain NaN values"}).encode(),
-                headers=Headers({}),
-            )
+        if any(not math.isfinite(val) for val in vector):
+            raise UnexpectedResponse(
+                status_code=400,
+                reason_phrase="Bad Request",
+                content=json.dumps(
+                    {"error": "Query vector must not contain NaN or Inf values"}
+                ).encode(),
+                headers=Headers({}),
+            )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@qdrant_client/conversions/conversion.py` around lines 3601 - 3619, _in
_validate_no_nan replace the NaN-only check with a finite check by using
math.isfinite (i.e., raise when not math.isfinite(val)) so ±Inf are rejected
too, and move the lazy imports (json, Headers from httpx, and UnexpectedResponse
from qdrant_client.http.exceptions) to the module top-level instead of importing
them inside the exception branch; keep the same raised UnexpectedResponse
construction and message but use the module-level names.

@Goodnight77 Goodnight77 force-pushed the fix/validate-nan-query-vectors branch from 3aea359 to 7f42dc8 Compare February 19, 2026 22:50
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/congruence_tests/test_multivector_updates.py (1)

56-63: Inconsistent subscript formatting between local_old_point and remote_old_point.

Line 60 now uses compact )[0][0], but remote_old_point (lines 61–63) still uses the split )[0][\n 0\n] style. The same asymmetry exists for remote_new_point (lines 82–84).

♻️ Proposed consistency fix
-    remote_old_point = remote_client.scroll(COLLECTION_NAME, scroll_filter=id_filter, limit=1)[0][
-        0
-    ]
+    remote_old_point = remote_client.scroll(COLLECTION_NAME, scroll_filter=id_filter, limit=1)[0][0]

Apply the same change to remote_new_point at lines 82–84.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/congruence_tests/test_multivector_updates.py` around lines 56 - 63, The
subscripting style is inconsistent: change the split multi-line subscripts for
remote_old_point and remote_new_point to the compact form used for
local_old_point (i.e., replace the current ")[0][\n    0\n]" style with
")[0][0]") so both local_* and remote_* assignments use the same compact
indexing; update the occurrences for remote_old_point and remote_new_point to
match local_old_point's ")[0][0]" pattern.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/congruence_tests/test_multivector_updates.py`:
- Around line 56-63: The subscripting style is inconsistent: change the split
multi-line subscripts for remote_old_point and remote_new_point to the compact
form used for local_old_point (i.e., replace the current ")[0][\n    0\n]" style
with ")[0][0]") so both local_* and remote_* assignments use the same compact
indexing; update the occurrences for remote_old_point and remote_new_point to
match local_old_point's ")[0][0]" pattern.

@Goodnight77 Goodnight77 changed the base branch from master to dev April 29, 2026 06:32
Goodnight77 and others added 2 commits April 29, 2026 07:33
@Goodnight77 Goodnight77 force-pushed the fix/validate-nan-query-vectors branch from 98f2b54 to e226991 Compare April 29, 2026 06:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants