Skip to content

feat: Add select-algorithm samples for DocumentDB vector index selection (5 languages)#74

Merged
diberry merged 11 commits into
Azure-Samples:mainfrom
diberry:article2/select-algorithm
May 20, 2026
Merged

feat: Add select-algorithm samples for DocumentDB vector index selection (5 languages)#74
diberry merged 11 commits into
Azure-Samples:mainfrom
diberry:article2/select-algorithm

Conversation

@diberry
Copy link
Copy Markdown
Collaborator

@diberry diberry commented Apr 29, 2026

Summary

Adds select-algorithm samples demonstrating how to choose the optimal vector index algorithm (HNSW, IVF, DiskANN) for Azure Cosmos DB for MongoDB (vCore) / DocumentDB, in 5 languages: TypeScript, Python, Go, Java, and .NET.

What's included

Language Path
TypeScript ai/select-algorithm-typescript/
Python ai/select-algorithm-python/
Go ai/select-algorithm-go/
Java ai/select-algorithm-java/
.NET ai/select-algorithm-dotnet/

Each sample includes:

  • select-algorithm — Creates vector indexes using documented default parameters
  • compare-all — Compares search quality/recall across all three algorithms
  • Utility helpers — Connection, data loading, index management

CI

  • validate-samples.yml — Builds all 5 language samples on PR and push

Key parameters (from docs)

Algorithm Parameter Value
HNSW m 16
HNSW efConstruction 64
DiskANN maxDegree 32
DiskANN lBuild 50
IVF numLists 1

Documentation references

@diberry diberry force-pushed the article2/select-algorithm branch from 45387bd to 5114591 Compare April 29, 2026 19:20
@diberry diberry changed the title feat: Article 2 - Select Algorithm samples (5 languages) feat: Article 2/3 - Select Algorithm samples (5 languages) May 6, 2026
@diberry diberry force-pushed the article2/select-algorithm branch 2 times, most recently from ed818fa to d0e7e60 Compare May 11, 2026 21:36
@diberry diberry changed the title feat: Article 2/3 - Select Algorithm samples (5 languages) feat: Add select-algorithm samples for DocumentDB vector index selection (5 languages) May 11, 2026
@diberry diberry force-pushed the article2/select-algorithm branch 2 times, most recently from 2ec45b0 to ce6ff95 Compare May 11, 2026 22:21
Add DocumentDB vector index algorithm selection samples demonstrating
HNSW, IVF, and DiskANN index types across TypeScript, Python, Go,
Java, and .NET. Each sample creates indexes with documented defaults,
performs vector searches, and compares results.

CI updated to validate all new samples in the existing workflow matrix.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@diberry diberry force-pushed the article2/select-algorithm branch from ce6ff95 to 5e0eec4 Compare May 11, 2026 22:41
diberry and others added 9 commits May 15, 2026 12:34
- Rename MONGO_CLUSTER_NAME to DOCUMENTDB_CLUSTER_NAME in all 5 language samples
- Add DOCUMENTDB_CLUSTER_NAME dual-output in Bicep (preserves backward compat)
- Replace Data Explorer cleanup guidance with VS Code extension
- Strengthen algorithm guidance: DiskANN recommended for enterprise (16K dims, disk-based)
- Remove python-dotenv from pip install (repo rule Azure-Samples#10)
- Fix Python filename refs (select_algorithm.py -> compare_all.py)
- Revert out-of-scope vector-search-* changes to origin/main

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use 'Azure Databases extension' consistently in all 5 language quickstarts
to match the actual marketplace listing (ms-azuretools.vscode-cosmosdb).
The section intro previously said 'Azure DocumentDB extension' while the
link tab already used the correct 'Azure Databases extension' name.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace Azure Databases extension (ms-azuretools.vscode-cosmosdb) with
DocumentDB for VS Code (ms-azuretools.vscode-documentdb) per Khelan's
PM feedback to align with recommended developer experience.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…tructions

- Add rule 14: always use ms-azuretools.vscode-documentdb
- Remove ms-azuretools.vscode-cosmosdb exception from rule 1

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Create ai/includes/choosing-algorithm.md with enhanced content
- Add quick-reference decision table (IVF/DiskANN/HNSW by scenario)
- Elevate DiskANN-as-default recommendation with IMPORTANT callout
- Add operational benefits: easier backups, faster recovery
- Add dimension future-proofing context (models evolving past 8K)
- Replace duplicated sections in all 5 quickstarts with include ref
- Addresses Khelan Modi feedback points #3 and #4

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Checklist covers branding/naming, tooling references, index selection
guidance, and DiskANN-as-default requirements. Derived from Khelan Modi
(DocumentDB PM) feedback on PR Azure-Samples#74.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add bounded retry logic (5 attempts, 2s backoff) for index readiness in all 5 languages
- Fix Go: validate LOAD_SIZE_BATCH/EMBEDDING_DIMENSIONS > 0, track comparison failures
- Fix TypeScript: exit non-zero on total failure, remove 'all' as valid algo/similarity value
- Fix Python quickstart: correct download URL path (ai/data/ not data/)
- Standardize data file path guidance across all quickstarts
- Remove ALGORITHM=all / SIMILARITY=all from all docs (use unset for all combos)
- Fix quickstart entrypoints to match actual code (TS, Java, Go, .NET)
- Replace .NET appsettings real values with placeholders, document Section__Key overrides
- Align copilot-instructions: DiskANN 32/50 for select-algorithm, document naming exception

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…v patterns

- Add exit-code-on-all-fail to .NET, Java, Python (matching Go/TS)
- Replace all 5 quickstart output blocks with actual output/*.txt content
- Fix file tree layouts to match actual project structure
- Fix version refs: .NET 8 (not 9), Java 17 (not 21)
- Remove dotenv/.env-file patterns (Java dotenv, TS --env-file)
- Fix devcontainer extensions: vscode-cosmosdb -> vscode-documentdb
- Fix Python CosmosDB branding -> DocumentDB
- Standardize TS retry to 6 attempts, remove fixed waits
- Make TS scalar indexes optional (skip in compare-all)
- Clarify compare-all always runs 9 combos (ignores ALGORITHM/SIMILARITY)
- Add Diff column explanation to all quickstarts

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Python: updated scores to match pymongo float precision (0.6183/0.5057/0.8735/0.9942)
- .NET: added Summary line and Done footer to output
- Java: fixed output order (table before cleanup), DISKANN casing, added Summary
- devcontainer: removed stale vscode-cosmosdb extension
- appsettings.json: reverted to placeholder values

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…count phrasing

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@diberry diberry merged commit 86822cb into Azure-Samples:main May 20, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant