Skip to content

Commit c127bad

Browse files
authored
Merge pull request #166 from dotnet-presentations/copilot/fix-c4819c2b-ea66-46bb-b088-26adc348171e
Update Part 3 code snippets and explanations to match Part 2 template implementation
2 parents 337c229 + ada5e3f commit c127bad

1 file changed

Lines changed: 96 additions & 99 deletions

File tree

Part 3 - Template Exploration/README.md

Lines changed: 96 additions & 99 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ In this workshop, you'll explore the code structure of the AI Web Chat template.
88

99
## Services in .NET Aspire AppHost Program.cs
1010

11-
Let's start by examining the `Program.cs` file in the `GenAiLab.AppHost` project:
11+
Let's start by examining the [`AppHost.cs`](../Part%202%20-%20Project%20Creation/GenAiLab/GenAiLab.AppHost/AppHost.cs) file in the `GenAiLab.AppHost` project:
1212

1313
```csharp
1414
var builder = DistributedApplication.CreateBuilder(args);
@@ -40,7 +40,7 @@ Key components in the AppHost:
4040

4141
## Application configuration in Web Program.cs
4242

43-
Now let's look at the `Program.cs` file in the `GenAiLab.Web` project:
43+
Now let's look at the [`Program.cs`](../Part%202%20-%20Project%20Creation/GenAiLab/GenAiLab.Web/Program.cs) file in the `GenAiLab.Web` project:
4444

4545
```csharp
4646
using Microsoft.Extensions.AI;
@@ -99,8 +99,8 @@ app.Run();
9999
Key components in the Web Program.cs:
100100

101101
1. **Service Registration**: Setting up Razor components, service defaults, etc.
102-
1. **GitHub Models Setup**:
103-
- Adding GitHub Models as the AI provider
102+
1. **Azure OpenAI Setup**:
103+
- Adding Azure OpenAI client with connection string reference
104104
- Configuring a chat client with the "gpt-4o-mini" model
105105
- Setting up an embedding generator with "text-embedding-3-small" model
106106
1. **Qdrant Client**: Connecting to the Qdrant vector database
@@ -116,27 +116,41 @@ The `IChatClient` interface is a key part of Microsoft Extensions for AI. Let's
116116

117117
```csharp
118118
// Configuration in Program.cs
119-
var openai = builder.AddGitHubModels();
119+
var openai = builder.AddAzureOpenAIClient("openai");
120120
openai.AddChatClient("gpt-4o-mini")
121121
.UseFunctionInvocation()
122122
.UseOpenTelemetry(configure: c =>
123123
c.EnableSensitiveData = builder.Environment.IsDevelopment());
124124
```
125125

126-
The `IChatClient` is used in the `Chat.razor` component to handle user messages and generate AI responses:
126+
The `IChatClient` is used in the [`Chat.razor`](../Part%202%20-%20Project%20Creation/GenAiLab/GenAiLab.Web/Components/Pages/Chat/Chat.razor#L58-L84) component to handle user messages and generate AI responses:
127127

128128
```csharp
129129
@code {
130-
[Inject]
131-
private IChatClient ChatClient { get; set; } = default!;
130+
@inject IChatClient ChatClient
132131

133-
private async Task HandleUserMessageAsync(string userMessage)
132+
private async Task AddUserMessageAsync(ChatMessage userMessage)
134133
{
135-
// ...
136-
var response = await ChatClient.GetResponseAsync(
137-
SystemPrompt,
138-
chatHistory.Select(m => new ChatMessage(m.Role, m.Content)).ToArray());
139-
// ...
134+
// Add the user message to the conversation
135+
messages.Add(userMessage);
136+
137+
// Stream and display a new response from the IChatClient
138+
var responseText = new TextContent("");
139+
currentResponseMessage = new ChatMessage(ChatRole.Assistant, [responseText]);
140+
currentResponseCancellation = new();
141+
await foreach (var update in ChatClient.GetStreamingResponseAsync(
142+
messages.Skip(statefulMessageCount), chatOptions, currentResponseCancellation.Token))
143+
{
144+
messages.AddMessages(update, filter: c => c is not TextContent);
145+
responseText.Text += update.Text;
146+
chatOptions.ConversationId = update.ConversationId;
147+
ChatMessageItem.NotifyChanged(currentResponseMessage);
148+
}
149+
150+
// Store the final response in the conversation
151+
messages.Add(currentResponseMessage!);
152+
statefulMessageCount = chatOptions.ConversationId is not null ? messages.Count : 0;
153+
currentResponseMessage = null;
140154
}
141155
}
142156
```
@@ -168,7 +182,7 @@ These collections store:
168182

169183
### DataIngestor Service with Vector Collections
170184

171-
Let's examine how the `DataIngestor.cs` uses vector collections directly:
185+
Let's examine how the [`DataIngestor.cs`](../Part%202%20-%20Project%20Creation/GenAiLab/GenAiLab.Web/Services/Ingestion/DataIngestor.cs#L18-L57) uses vector collections directly:
172186

173187
```csharp
174188
public class DataIngestor(
@@ -209,7 +223,7 @@ public class DataIngestor(
209223
{
210224
var documentId = document.DocumentId;
211225
var chunksToDelete = await chunksCollection.GetAsync(record => record.DocumentId == documentId, int.MaxValue).ToListAsync();
212-
if (chunksToDelete.Any())
226+
if (chunksToDelete.Count != 0)
213227
{
214228
await chunksCollection.DeleteAsync(chunksToDelete.Select(r => r.Key));
215229
}
@@ -233,130 +247,113 @@ The template uses several vector collection methods:
233247
- `DeleteAsync()`: Remove documents and their associated chunks
234248
- `EnsureCollectionExistsAsync()`: Create collections if they don't exist
235249

236-
### SemanticSearchRecord for Vector Storage
250+
### IngestedChunk for Vector Storage
237251

238-
The `SemanticSearchRecord.cs` file shows how data is structured for vector storage:
252+
The [`IngestedChunk.cs`](../Part%202%20-%20Project%20Creation/GenAiLab/GenAiLab.Web/Services/IngestedChunk.cs) file shows how data is structured for vector storage:
239253

240254
```csharp
241255
namespace GenAiLab.Web.Services;
242256

243-
public class SemanticSearchRecord
257+
public class IngestedChunk
244258
{
245-
[VectorStoreRecordKey]
259+
private const int VectorDimensions = 1536; // 1536 is the default vector size for the OpenAI text-embedding-3-small model
260+
private const string VectorDistanceFunction = DistanceFunction.CosineSimilarity;
261+
262+
[VectorStoreKey]
246263
public required Guid Key { get; set; }
247264

248-
[VectorStoreRecordData(IsFilterable = true)]
249-
public required string FileName { get; set; }
265+
[VectorStoreData(IsIndexed = true)]
266+
public required string DocumentId { get; set; }
250267

251-
[VectorStoreRecordData]
268+
[VectorStoreData]
252269
public int PageNumber { get; set; }
253270

254-
[VectorStoreRecordData]
271+
[VectorStoreData]
255272
public required string Text { get; set; }
256273

257-
[VectorStoreRecordVector(1536, DistanceFunction.CosineSimilarity)] // 1536 is the default vector size for the OpenAI text-embedding-3-small model
258-
public ReadOnlyMemory<float> Vector { get; set; }
274+
[VectorStoreVector(VectorDimensions, DistanceFunction = VectorDistanceFunction)]
275+
public string? Vector => Text;
259276
}
260277
```
261278

262279
This class represents the data stored in the vector database with specific attributes for vector storage:
263280

264-
- `Key`: The unique identifier for the record, marked with `[VectorStoreRecordKey]`
265-
- `FileName`: The source document's name, marked as filterable with `[VectorStoreRecordData(IsFilterable = true)]`
281+
- `Key`: The unique identifier for the record, marked with `[VectorStoreKey]`
282+
- `DocumentId`: The source document's identifier, marked as indexed with `[VectorStoreData(IsIndexed = true)]`
266283
- `PageNumber`: The page number in the source document
267284
- `Text`: A chunk of text from the document
268-
- `Vector`: The embedding vector configured for the OpenAI text-embedding-3-small model's 1536 dimensions using cosine similarity
285+
- `Vector`: The embedding vector configured for the OpenAI text-embedding-3-small model's 1536 dimensions using cosine similarity. The property returns the Text, which will be automatically embedded when stored.
269286

270-
The `SemanticSearch.cs` file shows how these records are queried:
287+
The [`SemanticSearch.cs`](../Part%202%20-%20Project%20Creation/GenAiLab/GenAiLab.Web/Services/SemanticSearch.cs) file shows how these records are queried:
271288

272289
```csharp
273290
public class SemanticSearch(
274-
IEmbeddingGenerator<string, Embedding<float>> embedder,
275-
IVectorStore vectorStore,
276-
ILogger<SemanticSearch> logger)
291+
VectorStoreCollection<Guid, IngestedChunk> vectorCollection)
277292
{
278-
private const string CollectionName = "data-genailab-ingested";
279-
280-
public async Task<SearchResults> Search(string query)
293+
public async Task<IReadOnlyList<IngestedChunk>> SearchAsync(string text, string? documentIdFilter, int maxResults)
281294
{
282-
try
283-
{
284-
// Generate an embedding vector for the query
285-
var queryEmbedding = await embedder.GenerateEmbeddingVectorAsync(query);
286-
287-
// Search the vector database for similar document chunks
288-
var collection = vectorStore.GetCollection<Guid, SemanticSearchRecord>(CollectionName);
289-
var searchResults = await collection.VectorizedSearchAsync(
290-
queryEmbedding,
291-
new VectorSearchOptions<SemanticSearchRecord> { Top = 5 }
292-
);
293-
294-
// Process and return results
295-
var results = new List<DocumentResult>();
296-
await foreach (var match in searchResults.Results)
297-
{
298-
results.Add(new DocumentResult
299-
{
300-
FileName = match.Record.FileName,
301-
Text = match.Record.Text,
302-
Score = match.Score
303-
});
304-
}
305-
306-
return new SearchResults(results);
307-
}
308-
catch (Exception ex)
295+
var nearest = vectorCollection.SearchAsync(text, maxResults, new VectorSearchOptions<IngestedChunk>
309296
{
310-
logger.LogError(ex, "Error performing semantic search");
311-
return new SearchResults(new List<DocumentResult>());
312-
}
297+
Filter = documentIdFilter is { Length: > 0 } ? record => record.DocumentId == documentIdFilter : null,
298+
});
299+
300+
return await nearest.Select(result => result.Record).ToListAsync();
313301
}
314302
}
315303
```
316304

305+
Key features of semantic search:
306+
307+
1. **Automatic Embedding**: The text parameter is automatically converted to an embedding vector
308+
2. **Vector Similarity**: Finds the most similar chunks using the embedding vector
309+
3. **Optional Filtering**: Can filter results by document ID if specified
310+
4. **Direct Results**: Returns the actual `IngestedChunk` records with their text content
311+
317312
## Document Ingestion and Embeddings with Vector Collections
318313

319-
Let's examine how embeddings are generated during document ingestion using the new vector collection approach. The `PDFDirectorySource` creates chunks and the `DataIngestor` processes them:
314+
Document ingestion is handled by the `DataIngestor` service working with `IIngestionSource` implementations. The `PDFDirectorySource` processes PDF files and creates chunks that are stored directly in vector collections.
315+
316+
### How Ingestion Works
317+
318+
When the application starts, it processes documents from the specified source:
319+
320+
```csharp
321+
await DataIngestor.IngestDataAsync(
322+
app.Services,
323+
new PDFDirectorySource(Path.Combine(builder.Environment.WebRootPath, "Data")));
324+
```
325+
326+
The ingestion process:
327+
328+
1. **Checks for Changes**: Compares current documents with previously ingested documents
329+
2. **Removes Deleted Documents**: If a document was removed, deletes its chunks and metadata
330+
3. **Processes New/Modified Documents**: For each changed document:
331+
- Removes old chunks if the document was previously ingested
332+
- Creates new `IngestedDocument` metadata record
333+
- Splits the document into chunks
334+
- Creates `IngestedChunk` records with text content
335+
- Stores chunks in the vector collection (embeddings are generated automatically)
336+
337+
### Automatic Vector Generation
338+
339+
A key feature is that embeddings are generated automatically:
320340

321341
```csharp
322-
public async Task<IEnumerable<IngestedChunk>> CreateChunksForDocumentAsync(IngestedDocument document)
342+
public class IngestedChunk
323343
{
324-
// Get the document content and split into chunks
325-
var chunks = SplitDocumentIntoChunks(document.Content);
326-
var ingestedChunks = new List<IngestedChunk>();
327-
328-
foreach (var (chunk, pageNumber) in chunks)
329-
{
330-
// Skip empty chunks
331-
if (string.IsNullOrWhiteSpace(chunk)) continue;
332-
333-
// Create the ingested chunk record
334-
var ingestedChunk = new IngestedChunk
335-
{
336-
Key = Guid.NewGuid(),
337-
DocumentId = document.DocumentId,
338-
Text = chunk,
339-
PageNumber = pageNumber,
340-
// Vector will be generated automatically by the vector collection
341-
};
342-
343-
ingestedChunks.Add(ingestedChunk);
344-
}
344+
// ... other properties ...
345345
346-
return ingestedChunks;
346+
[VectorStoreVector(VectorDimensions, DistanceFunction = VectorDistanceFunction)]
347+
public string? Vector => Text;
347348
}
348349
```
349350

350-
Key steps in the new vector-based workflow:
351-
352-
1. Documents are retrieved from a source (like PDFs in the wwwroot/Data directory)
353-
1. Each document is split into smaller chunks for better search precision
354-
1. For each chunk, an `IngestedChunk` record is created with the text content
355-
1. The embedding vectors are generated automatically when the chunks are stored in the vector collection
356-
1. Both document metadata and chunks are stored directly in vector collections
357-
1. During search, query text is converted to an embedding, and vector similarity finds relevant chunks
351+
When an `IngestedChunk` is stored via `chunksCollection.UpsertAsync()`, the vector collection automatically:
352+
1. Takes the `Text` property value (returned by the `Vector` property)
353+
2. Generates an embedding using the configured embedding generator
354+
3. Stores both the text and its embedding vector
358355

359-
This approach eliminates the need for a separate database to track ingestion state, as the vector collections handle both storage and retrieval of document chunks and their metadata.
356+
This approach eliminates the need for manual embedding generation and ensures consistency across all document chunks.
360357

361358
## What You've Learned
362359

0 commit comments

Comments
 (0)