Skip to content

Commit 0e81311

Browse files
authored
Document WithApproximate LINQ operator for SQL Server vector search (#5346)
Document dotnet/efcore#38144
1 parent 641f237 commit 0e81311

2 files changed

Lines changed: 84 additions & 52 deletions

File tree

entity-framework/core/providers/sql-server/vector-search.md

Lines changed: 78 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -86,78 +86,101 @@ This function computes the distance between the query vector and every row in th
8686
> [!NOTE]
8787
> The built-in support in EF 10 replaces the previous [EFCore.SqlServer.VectorSearch](https://github.com/efcore/EFCore.SqlServer.VectorSearch) extension, which allowed performing vector search before the `vector` data type was introduced. As part of upgrading to EF 10, remove the extension from your projects.
8888
89-
## Approximate search with VECTOR_SEARCH()
89+
## Searching with VECTOR_SEARCH()
9090

9191
> [!WARNING]
9292
> `VECTOR_SEARCH()` and vector indexes are currently experimental features in SQL Server and are subject to change. The APIs in EF Core for these features are also subject to change.
9393
94-
For large datasets, computing exact distances for every row can be prohibitively slow. SQL Server 2025 introduces support for *approximate* search through a [vector index](/sql/t-sql/statements/create-vector-index-transact-sql), which provides much better performance at the expense of returning items that are approximately similar - rather than exactly similar - to the query.
94+
SQL Server's `VECTOR_SEARCH()` table-valued function retrieves rows based on vector similarity. Unlike `VECTOR_DISTANCE()` — which computes the distance between two specific vectors — `VECTOR_SEARCH()` searches an entire table for the most similar vectors to a given query vector.
9595

96-
### Vector indexes
97-
98-
To use `VECTOR_SEARCH()`, you must create a vector index on your vector column. Use the `HasVectorIndex()` method in your model configuration:
96+
Use the `VectorSearch()` extension method on your `DbSet`, and chain `OrderBy()`, `Take()`, and `WithApproximate()` to perform an approximate nearest neighbor (ANN) search that uses a [vector index](/sql/t-sql/statements/create-vector-index-transact-sql):
9997

10098
```csharp
101-
protected override void OnModelCreating(ModelBuilder modelBuilder)
99+
var results = await context.Blogs
100+
.VectorSearch(b => b.Embedding, embedding, "cosine")
101+
.OrderBy(r => r.Distance)
102+
.Take(5)
103+
.WithApproximate()
104+
.ToListAsync();
105+
106+
foreach (var result in results)
102107
{
103-
modelBuilder.Entity<Blog>()
104-
.HasVectorIndex(b => b.Embedding, "cosine");
108+
Console.WriteLine($"Blog {result.Value.Id} with distance {result.Distance}");
105109
}
106110
```
107111

108-
This will generate the following SQL migration:
112+
This translates to the following SQL:
109113

110114
```sql
111-
CREATE VECTOR INDEX [IX_Blogs_Embedding]
112-
ON [Blogs] ([Embedding])
113-
WITH (METRIC = COSINE)
115+
SELECT TOP(@__p_1) WITH APPROXIMATE [b].[Id], [b].[Name], [v].[Distance]
116+
FROM VECTOR_SEARCH(
117+
TABLE = [Blogs] AS [b],
118+
COLUMN = [Embedding],
119+
SIMILAR_TO = @__embedding_0,
120+
METRIC = 'cosine'
121+
) AS [v]
122+
ORDER BY [v].[Distance]
114123
```
115124

116-
The following distance metrics are supported for vector indexes:
125+
`VectorSearch()` returns `VectorSearchResult<TEntity>`, which allows you to access both the entity and the computed distance:
117126

118-
Metric | Description
119-
----------- | -----------
120-
`cosine` | Cosine similarity (angular distance)
121-
`euclidean` | Euclidean distance (L2 norm)
122-
`dot` | Dot product (negative inner product)
127+
```csharp
128+
var searchResults = await context.Blogs
129+
.VectorSearch(b => b.Embedding, embedding, "cosine")
130+
.Where(r => r.Distance < 0.05)
131+
.OrderBy(r => r.Distance)
132+
.Select(r => new { Blog = r.Value, Distance = r.Distance })
133+
.Take(3)
134+
.WithApproximate()
135+
.ToListAsync();
136+
```
123137

124-
Choose the metric that best matches your embedding model and use case. Cosine similarity is commonly used for text embeddings, while euclidean distance is often used for image embeddings.
138+
This allows you to filter on the similarity score, present it to users, etc.
139+
140+
### WithApproximate()
125141

126-
### Searching with VECTOR_SEARCH()
142+
`WithApproximate()` instructs SQL Server to use the vector index for approximate nearest neighbor (ANN) search, which provides significantly better performance for large datasets. It causes `WITH APPROXIMATE` to be added to the SQL `TOP` clause. `WithApproximate()` must be called after `Take()`, which specifies the number of results to return.
127143

128-
Once you have a vector index, use the `VectorSearch()` extension method on your `DbSet`:
144+
Without `WithApproximate()`, the query performs an exact k-nearest neighbor (kNN) search that scans all rows, without using the vector index:
129145

130146
```csharp
147+
// Exact kNN search (no vector index used)
131148
var blogs = await context.Blogs
132-
.VectorSearch(b => b.Embedding, embedding, "cosine", topN: 5)
149+
.VectorSearch(b => b.Embedding, embedding, "cosine")
150+
.OrderBy(r => r.Distance)
151+
.Take(5)
133152
.ToListAsync();
153+
```
154+
155+
### Vector indexes
156+
157+
To use approximate search with `WithApproximate()`, you must create a vector index on your vector column. Use the `HasVectorIndex()` method in your model configuration:
134158

135-
foreach (var (blog, score) in blogs)
159+
```csharp
160+
protected override void OnModelCreating(ModelBuilder modelBuilder)
136161
{
137-
Console.WriteLine($"Blog {blog.Id} with score {score}");
162+
modelBuilder.Entity<Blog>()
163+
.HasVectorIndex(b => b.Embedding, "cosine");
138164
}
139165
```
140166

141-
This translates to the following SQL:
167+
This will generate the following SQL migration:
142168

143169
```sql
144-
SELECT [v].[Id], [v].[Name], [v].[Distance]
145-
FROM VECTOR_SEARCH([Blogs], 'Embedding', @__embedding, 'metric = cosine', @__topN)
170+
CREATE VECTOR INDEX [IX_Blogs_Embedding]
171+
ON [Blogs] ([Embedding])
172+
WITH (METRIC = COSINE)
146173
```
147174

148-
The `topN` parameter specifies the maximum number of results to return.
149-
150-
`VectorSearch()` returns `VectorSearchResult<TEntity>`, which allows you to access both the entity and the computed distance:
175+
The following distance metrics are supported for vector indexes:
151176

152-
```csharp
153-
var searchResults = await context.Blogs
154-
.VectorSearch(b => b.Embedding, embedding, "cosine", topN: 5)
155-
.Where(r => r.Distance < 0.05)
156-
.Select(r => new { Blog = r.Value, Distance = r.Distance })
157-
.ToListAsync();
158-
```
177+
Metric | Description
178+
----------- | -----------
179+
`cosine` | Cosine similarity (angular distance)
180+
`euclidean` | Euclidean distance (L2 norm)
181+
`dot` | Dot product (negative inner product)
159182

160-
This allows you to filter on the similarity score, present it to users, etc.
183+
Choose the metric that best matches your embedding model and use case. Cosine similarity is commonly used for text embeddings, while euclidean distance is often used for image embeddings.
161184

162185
## Hybrid search
163186

@@ -175,7 +198,10 @@ var results = await context.Articles
175198
.FreeTextTable<Article, int>(textualQuery, topN: k)
176199
// Perform vector (semantic) search, joining the results of both searches together
177200
.LeftJoin(
178-
context.Articles.VectorSearch(b => b.Embedding, queryEmbedding, "cosine", topN: k),
201+
context.Articles.VectorSearch(b => b.Embedding, queryEmbedding, "cosine")
202+
.OrderBy(r => r.Distance)
203+
.Take(k)
204+
.WithApproximate(),
179205
fts => fts.Key,
180206
vs => vs.Value.Id,
181207
(fts, vs) => new
@@ -209,14 +235,17 @@ This query:
209235
The query produces the following SQL:
210236

211237
```sql
212-
SELECT TOP(@p3) [a0].[Id], [a0].[Content], [a0].[Title]
213-
FROM FREETEXTTABLE([Articles], *, @p, @p1) AS [f]
214-
LEFT JOIN VECTOR_SEARCH(
215-
TABLE = [Articles] AS [a0],
216-
COLUMN = [Embedding],
217-
SIMILAR_TO = @p2,
218-
METRIC = 'cosine',
219-
TOP_N = @p3
220-
) AS [v] ON [f].[KEY] = [a0].[Id]
221-
ORDER BY 1.0E0 / CAST(10 + [f].[RANK] AS float) + ISNULL(1.0E0 / (10.0E0 + [v].[Distance]), 0.0E0) DESC
238+
SELECT TOP(@__p_4) [a0].[Id], [a0].[Content], [a0].[Title]
239+
FROM FREETEXTTABLE([Articles], *, @__textualQuery_0, @__k_1) AS [f]
240+
LEFT JOIN (
241+
SELECT TOP(@__k_1) WITH APPROXIMATE [a].[Id], [a].[Content], [a].[Title], [v].[Distance]
242+
FROM VECTOR_SEARCH(
243+
TABLE = [Articles] AS [a],
244+
COLUMN = [Embedding],
245+
SIMILAR_TO = @__queryEmbedding_2,
246+
METRIC = 'cosine'
247+
) AS [v]
248+
ORDER BY [v].[Distance]
249+
) AS [t] ON [f].[KEY] = [t].[Id]
250+
ORDER BY 1.0E0 / CAST(@__k_1 + [f].[RANK] AS float) + ISNULL(1.0E0 / (CAST(@__k_1 AS float) + [t].[Distance]), 0.0E0) DESC
222251
```

entity-framework/core/what-is-new/ef-core-11.0/whatsnew.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -234,15 +234,18 @@ protected override void OnModelCreating(ModelBuilder modelBuilder)
234234
}
235235
```
236236

237-
Once you have a vector index, you can use the `VectorSearch()` extension method on your `DbSet` to perform an approximate search:
237+
Once you have a vector index, you can use the `VectorSearch()` extension method on your `DbSet`, and chain `Take()` and `WithApproximate()` to perform an approximate search:
238238

239239
```csharp
240240
var blogs = await context.Blogs
241-
.VectorSearch(b => b.Embedding, embedding, "cosine", topN: 5)
241+
.VectorSearch(b => b.Embedding, embedding, "cosine")
242+
.OrderBy(r => r.Distance)
243+
.Take(5)
244+
.WithApproximate()
242245
.ToListAsync();
243246
```
244247

245-
This translates to the SQL Server [`VECTOR_SEARCH()`](/sql/t-sql/functions/vector-search-transact-sql) table-valued function, which performs an approximate search over the vector index. The `topN` parameter specifies the number of results to return.
248+
This translates to the SQL Server [`VECTOR_SEARCH()`](/sql/t-sql/functions/vector-search-transact-sql) table-valued function. `Take()` specifies the number of results to return, and `WithApproximate()` instructs SQL Server to use the vector index for approximate nearest neighbor (ANN) search, adding `WITH APPROXIMATE` to the SQL `TOP` clause. Without `WithApproximate()`, an exact k-nearest neighbor (kNN) search is performed instead.
246249

247250
`VectorSearch()` returns `VectorSearchResult<TEntity>`, allowing you to access the distance alongside the entity.
248251

0 commit comments

Comments
 (0)