embed_multi_with_metadata() compares item IDs against row IDs instead of content hashes, breaking deduplication

### Bug

`Collection.embed_multi_with_metadata()` has a deduplication check that queries existing rows by `content_hash` but then compares the wrong key. It stores the returned database row `id`s in `existing_ids`, then filters the incoming batch by checking if each incoming item's user-provided ID is in `existing_ids`.

Since user-provided IDs and database row IDs are semantically different, duplicate content submitted under a new ID always bypasses dedup.

### Root cause

In `llm/embeddings.py`, the dedup logic does:

```python
# Queries by content_hash, gets back row IDs
existing = list(db.execute(
    "SELECT id FROM embeddings WHERE collection_id = ? AND content_hash IN (?)",
    ...
))
existing_ids = {row[0] for row in existing}

# Filters by incoming item ID -- wrong comparison
for item in items:
    if item.id not in existing_ids:  # item.id is user-provided, existing_ids are DB row IDs
        to_embed.append(item)
```

Fix: compare incoming content hashes against existing `content_hash` values, not incoming IDs against returned row IDs.

### Impact

Redundant embeddings accumulate, increasing storage and API costs. Similarity search performance degrades with duplicate vectors.

### Note

This is related to but distinct from #224, which describes a different dedup issue with `--store` flag behavior. This is about the fundamental ID-vs-hash comparison logic.

(Found during a multi-LLM code review using [sqry](https://github.com/nicholasgasior/sqry) AST analysis + Codex + Gemini cross-validation.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

embed_multi_with_metadata() compares item IDs against row IDs instead of content hashes, breaking deduplication #1397

Bug

Root cause

Impact

Note

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

embed_multi_with_metadata() compares item IDs against row IDs instead of content hashes, breaking deduplication #1397

Description

Bug

Root cause

Impact

Note

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions