CSHARP-5840: Proper XUnit shared test fixture for Atlas tests#2007
Conversation
There was a problem hiding this comment.
Pull request overview
This PR refactors the Atlas Search integration test suite to use a collection-scoped xUnit fixture that provisions its own ephemeral database, seed data, and required Atlas Search/vector indexes, eliminating reliance on pre-populated sample databases.
Changes:
- Add
AtlasSearchFixture+AtlasSearchCollectionto centralize client/event capture and lazily seed collections/indexes for Atlas Search tests. - Update Atlas Search, Vector Search, Auto-embed Vector Search, and index-management tests to consume seeded collections via the fixture and adjust assertions for the reduced deterministic corpus.
- Add a one-shot
AtlasSeedExtractortool and a conveniencedocker-compose.atlas-local.ymlfor running Atlas Local with Voyage AI auto-embedding.
Reviewed changes
Copilot reviewed 11 out of 12 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/Tools/AtlasSeedExtractor/README.md | Documents the seed-extraction tool used to generate embedded C# seed literals. |
| tests/Tools/AtlasSeedExtractor/Program.cs | Implements deterministic extraction + C# emission of curated sample documents/embeddings. |
| tests/Tools/AtlasSeedExtractor/AtlasSeedExtractor.csproj | Adds the new tool project targeting net10.0. |
| tests/MongoDB.Driver.Tests/Search/VectorSearchTests.cs | Switches tests to fixture-provided client/collections and updates ranking assertions for lean seed. |
| tests/MongoDB.Driver.Tests/Search/AutoEmbedVectorSearchTests.cs | Enables real auto-embed tests behind env var gating and uses fixture-owned corpus/index. |
| tests/MongoDB.Driver.Tests/Search/AtlasSearchTestsUtils.cs | Removes the old ad-hoc Atlas Search client helper. |
| tests/MongoDB.Driver.Tests/Search/AtlasSearchTests.cs | Migrates to fixture-backed seeded collections/indexes and recalibrates corpus-dependent assertions. |
| tests/MongoDB.Driver.Tests/Search/AtlasSearchIndexManagmentTests.cs | Migrates index-helper tests to fixture-owned client and updates auto-embed gating/expectations. |
| tests/MongoDB.Driver.Tests/Search/AtlasSearchFixture.cs | Introduces the shared fixture that seeds collections and waits for indexes to become READY. |
| tests/MongoDB.Driver.Tests/Search/AtlasSearchCollection.cs | Adds the xUnit collection definition binding the fixture to test classes. |
| docker-compose.atlas-local.yml | Adds an Atlas Local compose file configured for Voyage AI auto-embedding. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
0bbc0a1 to
fc12213
Compare
| @@ -0,0 +1,18 @@ | |||
| <Project Sdk="Microsoft.NET.Sdk"> | |||
There was a problem hiding this comment.
I do agree that this tool looks useful, but do we really need it in the Driver's repo? Are we expecting to use it frequently?
Should we have a separate repo for tools like this? Somewhere under https://github.com/mongodb-labs .
I'm not insisting, just asking.
There was a problem hiding this comment.
I'll just remove it.
Replaces the suite's dependence on pre-configured Atlas sample databases and search indexes. The new AtlasSearchFixture (collection-scoped) owns a single client, an EventCapturer, and a fixture-managed atlas_search_<guid> database. Lazy seeders populate historical_documents, movies, embedded_movies, airbnb_listings, test_classes, binary_vector_items, auto_embed_movies, two synonym source collections, and the existing return-scope directors collection, and create the ten search/vector indexes required by the tests (waiting for READY). Dispose drops the whole database. The seed data is baked in as C# literals in AtlasSearchFixtureSeedData.cs (generated by tests/Tools/AtlasSeedExtractor from Atlas Local sample data once) so the test suite has zero runtime dependency on sample databases — only on an Atlas-search-capable cluster being reachable at ATLAS_SEARCH_URI. Embeddings are preserved as real Voyage-AI float32 vectors via BinaryVectorFloat32. AtlasSearchTests, VectorSearchTests, AutoEmbedVectorSearchTests, and AtlasSearchIndexManagementTests join via [Collection(AtlasSearchCollection.Name)] and pull their client / collections through the fixture. AtlasSearchTests loses its static __indexesCreated / __databaseUniquifier / __returnScopeTestCount machinery and the inline return-scope seed. AtlasSearchTestsUtils is obsoleted. AutoEmbedVectorSearchTests now run for real (the previous CSHARP-5840 skip is gone): the auto-embed collection and index live in the fixture so the Voyage AI embedding work happens once per test session instead of three times, the corpus is reduced to 4 movies (2 pass the runtime+year filter, 2 don't), and the ctor's unbounded wait loop is replaced with a 15-minute deadline that surfaces a clear error mentioning VOYAGE_API_KEY. Gated on a new AUTO_EMBEDDING_TESTS_ENABLED env var so contributors without a Voyage AI key can still run the rest of the Atlas Search suite. When enabled, the Atlas Local container must have VOYAGE_API_KEY provisioned via the Atlas UI's Model API Keys page (the key must authenticate against ai.mongodb.com, not api.voyageai.com). docker-compose.atlas-local.yml is included as a convenience launcher. Count- and ranking-dependent assertions are recalibrated for the lean corpus: phrase-match counts drop from 108 to 1, geo-result counts from 25 to membership checks, and vector-search top-K assertions switch from order-sensitive ShouldBeEquivalentTo to order-insensitive BeEquivalentTo. The Pinocchio→Oz swap in VectorSearch top-5 is documented in the test where it lives. The Rerank test guards on AtlasSearchFixture.IsRerankSupported, which probes the deployment on first use. Validated against Atlas Local: 77/77 AtlasSearch* tests pass, 36/36 AtlasSearchIndexManagement tests pass, 3/3 AutoEmbedVectorSearchTests pass (with a valid VOYAGE_API_KEY), 74 tests skip cleanly when env vars are unset, and no atlas_search_* databases remain after the suite exits. [review-iter 1] Address review findings [review-iter 2] Address review findings
The AtlasSeedExtractor project was a one-shot generator that produced the baked-in literals in AtlasSearchFixtureSeedData.cs. That seed data is now static and checked in, so the tool is no longer needed: delete the project and drop the generator references from the seed-data file header. Also collapse the eight near-identical double-checked-locking blocks in AtlasSearchFixture into a single EnsureInitialized(isInitialized, initialize) helper, and mark the per-collection init guards volatile so the lock-free fast-path read is correctly published. Behavior is unchanged: the shared _initLock still serializes seeding, and a throwing seeder still leaves its guard unset so the next access retries (unlike Lazy<T>, which would cache and re-throw the failure).
Replaces the suite's dependence on pre-configured Atlas sample databases and search indexes. The new AtlasSearchFixture (collection-scoped) owns a single client, an EventCapturer, and a fixture-managed atlas_search_ database. Lazy seeders populate historical_documents, movies, embedded_movies, airbnb_listings, test_classes, binary_vector_items, auto_embed_movies, two synonym source collections, and the existing return-scope directors collection, and create the ten search/vector indexes required by the tests (waiting for READY). Dispose drops the whole database.
The seed data is baked in as C# literals in AtlasSearchFixtureSeedData.cs (generated by tests/Tools/AtlasSeedExtractor from Atlas Local sample data once) so the test suite has zero runtime dependency on sample databases — only on an Atlas-search-capable cluster being reachable at ATLAS_SEARCH_URI. Embeddings are preserved as real Voyage-AI float32 vectors via BinaryVectorFloat32.
AtlasSearchTests, VectorSearchTests, AutoEmbedVectorSearchTests, and AtlasSearchIndexManagementTests join via [Collection(AtlasSearchCollection.Name)] and pull their client / collections through the fixture. AtlasSearchTests loses its static __indexesCreated / __databaseUniquifier / __returnScopeTestCount machinery and the inline return-scope seed. AtlasSearchTestsUtils is obsoleted.
AutoEmbedVectorSearchTests now run for real (the previous CSHARP-5840 skip is gone): the auto-embed collection and index live in the fixture so the Voyage AI embedding work happens once per test session instead of three times, the corpus is reduced to 4 movies (2 pass the runtime+year filter, 2 don't), and the ctor's unbounded wait loop is replaced with a 15-minute deadline that surfaces a clear error mentioning VOYAGE_API_KEY. Gated on a new AUTO_EMBEDDING_TESTS_ENABLED env var so contributors without a Voyage AI key can still run the rest of the Atlas Search suite. When enabled, the Atlas Local container must have VOYAGE_API_KEY provisioned via the Atlas UI's Model API Keys page (the key must authenticate against ai.mongodb.com, not api.voyageai.com). docker-compose.atlas-local.yml is included as a convenience launcher.
Count- and ranking-dependent assertions are recalibrated for the lean corpus: phrase-match counts drop from 108 to 1, geo-result counts from 25 to membership checks, and vector-search top-K assertions switch from order-sensitive ShouldBeEquivalentTo to order-insensitive BeEquivalentTo. The Pinocchio→Oz swap in VectorSearch top-5 is documented in the test where it lives. The Rerank test guards on AtlasSearchFixture.IsRerankSupported, which probes the deployment on first use.
Validated against Atlas Local: 77/77 AtlasSearch* tests pass, 36/36 AtlasSearchIndexManagement tests pass, 3/3 AutoEmbedVectorSearchTests pass (with a valid VOYAGE_API_KEY), 74 tests skip cleanly when env vars are unset, and no atlas_search_* databases remain after the suite exits.