diff --git a/docs/docs/en/src/SUMMARY.md b/docs/docs/en/src/SUMMARY.md index 312086da1..a29657543 100644 --- a/docs/docs/en/src/SUMMARY.md +++ b/docs/docs/en/src/SUMMARY.md @@ -56,6 +56,7 @@ # Resources +- [Migrating to VSAG 1.0](resources/migration_to_1_0.md) - [Release Notes](resources/release_notes.md) - [Roadmap](resources/roadmap_2025.md) - [Community](resources/community.md) diff --git a/docs/docs/en/src/resources/migration_to_1_0.md b/docs/docs/en/src/resources/migration_to_1_0.md new file mode 100644 index 000000000..d03935f53 --- /dev/null +++ b/docs/docs/en/src/resources/migration_to_1_0.md @@ -0,0 +1,328 @@ +# Migrating to VSAG 1.0 + +This page collects everything users coming from **VSAG 0.18.x** (and +earlier) need to know to upgrade smoothly to **VSAG 1.0**. Read it before +you recompile or redeploy. + +> Tracked in +> [issue #2069](https://github.com/antgroup/vsag/issues/2069) / +> [PR #2070](https://github.com/antgroup/vsag/pull/2070). Corrections +> and "what we hit during the upgrade" feedback are welcome — please +> open an issue. + +> 1.0 follows [Semantic Versioning](https://semver.org/). The compatibility +> rules going forward will be spelled out in a dedicated *API Stability* +> page, planned as a follow-up PR tracked in +> [#2069](https://github.com/antgroup/vsag/issues/2069); this page focuses +> on the one-time 0.18 → 1.0 migration. + +## At a glance + +| Area | Status in 1.0 | Action | +|------|---------------|--------| +| `hnsw` index | Deprecated, still works | Move new deployments to [HGraph](../indexes/hgraph.md) | +| `diskann` index | Deprecated, still works | Move new deployments to [IVF](../indexes/ivf.md) or the [hybrid memory-disk index](../advanced/hybrid_index.md) | +| `Index::KnnSearch(query, k, SearchParam&)` | Deprecated overload | Switch to `Index::SearchWithRequest(SearchRequest)` | +| `SearchParam::allocator` | Deprecated field | Use `SearchRequest::search_allocator_` | +| `Index::CalDistanceById` (batch) | Kept (typo'd name) | Continue to use; a correctly-spelled `CalcDistancesById` is planned (see [#2068](https://github.com/antgroup/vsag/issues/2068)) | +| Serialized indexes from 0.18.x | Readable by 1.0 | Re-serialize after upgrade to pick up any layout improvements | +| Public C ABI | Stable | No action | + +The rest of this page expands each row with concrete code samples. + +## Deprecated indexes + +### `hnsw` → `hgraph` + +`hnsw` is the original graph-based index inherited from hnswlib. In 1.0 it +is retained for backward compatibility but **deprecated**; new deployments +should use [HGraph](../indexes/hgraph.md), which is a superset: + +- Same hierarchical-graph topology, with the same `max_degree` / + `ef_construction` / `ef_search` knobs. +- A unified `index_param` build schema with richer quantization options + (`fp32`, `fp16`, `bf16`, `sq8`, `sq8_uniform`, `sq4_uniform`, `pq`, + `pqfs`, `rabitq`). +- Optional re-ranking (`use_reorder` + `precise_quantization_type`), + duplicate handling, `Remove()`, and ELP-based runtime tuning. + +Build-time mapping: + +```diff +- auto index = vsag::Factory::CreateIndex("hnsw", R"({ +- "dim": 768, +- "dtype": "float32", +- "metric_type": "ip", +- "hnsw": { +- "max_degree": 32, +- "ef_construction": 400 +- } +- })").value(); ++ auto index = vsag::Factory::CreateIndex("hgraph", R"({ ++ "dim": 768, ++ "dtype": "float32", ++ "metric_type": "ip", ++ "index_param": { ++ "base_quantization_type": "fp32", ++ "max_degree": 32, ++ "ef_construction": 400 ++ } ++ })").value(); +``` + +Search-time mapping: + +```diff +- auto result = index->KnnSearch(query, k, R"({"hnsw": {"ef_search": 100}})").value(); ++ auto result = index->KnnSearch(query, k, R"({"hgraph": {"ef_search": 100}})").value(); +``` + +Two things to remember: + +1. The build sub-object key changes from `"hnsw"` to `"index_param"`, and + `base_quantization_type` becomes a required field. +2. The search sub-object key also changes from `"hnsw"` to `"hgraph"`. + +### `diskann` → `ivf` or hybrid memory-disk + +`diskann` provided memory-disk hybrid retrieval with PQ-in-memory and +full vectors on disk. In 1.0 it is **deprecated**; choose one of: + +- [IVF](../indexes/ivf.md) — for partition-based search at scale; the + natural in-memory replacement when your dataset fits in RAM. +- [Hybrid memory-disk index](../advanced/hybrid_index.md) — when you + genuinely need part of the index on NVMe (large corpora under tight + memory budgets). + +Pick IVF first; only fall back to the disk-resident hybrid configuration +if you have measured that memory is the binding constraint. + +### `hnsw` and `diskann` examples are no longer the primary references + +The on-website pages [Creating an Index](../guide/create_index.md), +[Index Parameters](index_parameters.md), and +[Serialization](../advanced/serialization.md) will be updated to use +`hgraph` as the default example in follow-up PRs tracked in +[#2069](https://github.com/antgroup/vsag/issues/2069). The legacy +examples remain in `examples/cpp/101_index_hnsw.cpp` and +`examples/cpp/102_index_diskann.cpp` for reference. + +## Deprecated search API: `SearchParam` → `SearchRequest` + +VSAG accumulated several `Index::KnnSearch` overloads over time. The 1.0 +public API converges on a single entry point that carries **all** search +options through one struct: + +```cpp +[[nodiscard]] tl::expected +SearchWithRequest(const SearchRequest& request) const; +``` + +`SearchRequest` (declared in [`include/vsag/search_request.h`](https://github.com/antgroup/vsag/blob/main/include/vsag/search_request.h)) +supports KNN and range search, attribute filtering, callback filtering, +bitset filtering, iterator search, per-search allocators, and "expected +labels" reasoning — all from one struct. The older +`Index::KnnSearch(query, k, SearchParam&)` overload is **deprecated** and +will be removed in a future major release. + +### Field mapping + +| `SearchParam` (old) | `SearchRequest` (new) | Notes | +|---------------------|-----------------------|-------| +| `parameters` (`const std::string&`) | `params_str_` (`std::string`) | The JSON parameter string (e.g. `{"hgraph": {"ef_search": 200}}`). | +| `filter` | `filter_` + `enable_filter_ = true` | The callback `Filter` object. Must explicitly enable. | +| `allocator` | `search_allocator_` | Per-search arena allocator. See [Per-Search Allocator](../advanced/search_allocator.md). | +| `iter_ctx` | `p_iter_ctx_` + `enable_iterator_search_ = true` | Note the `**` shape — `SearchRequest` takes `IteratorContext**`. | +| `is_iter_filter` | folded into `enable_iterator_search_` | Iterator search is now opt-in via a single boolean. | +| `is_last_search` | `is_last_search_` | Same semantics. | + +`SearchRequest` additionally exposes capabilities that `SearchParam` never +had: + +- `mode_` (`SearchMode::KNN_SEARCH` / `SearchMode::RANGE_SEARCH`), + `topk_`, `radius_`, `limited_size_` — one struct for both KNN and + range search. +- `enable_attribute_filter_` + `attribute_filter_str_` — SQL-like + attribute filtering; see [Attribute Filter](../advanced/attribute_filter.md). +- `enable_bitset_filter_` + `bitset_filter_` — bitset-based filtering. +- `expected_labels_` — for recall-debugging / reasoning analysis. + +### Code migration + +Before: + +```cpp +vsag::SearchParam param( + /*iter_filter_flag=*/false, + R"({"hgraph": {"ef_search": 200}})", + /*filter=*/my_filter, + /*allocator=*/my_arena); +auto result = index->KnnSearch(query, /*k=*/10, param).value(); +``` + +After: + +```cpp +vsag::SearchRequest req; +req.query_ = query; +req.mode_ = vsag::SearchMode::KNN_SEARCH; +req.topk_ = 10; +req.params_str_ = R"({"hgraph": {"ef_search": 200}})"; +req.enable_filter_ = static_cast(my_filter); +req.filter_ = my_filter; +req.search_allocator_ = my_arena; +auto result = index->SearchWithRequest(req).value(); +``` + +Range search collapses into the same call by switching `mode_`: + +```cpp +req.mode_ = vsag::SearchMode::RANGE_SEARCH; +req.radius_ = 0.42F; +req.limited_size_ = 1000; // -1 means no cap +auto result = index->SearchWithRequest(req).value(); +``` + +> **Tip.** `SearchRequest` is a plain struct with default values, so +> wrapping it in a small helper / builder is straightforward and tends +> to read more clearly than the multi-argument `SearchParam` +> constructor. + +## `CalDistanceById` typo and the `CalcDistancesById` path + +VSAG exposes two flavors of distance-by-ID APIs on `Index`: + +- **Single** ID, correctly spelled: `CalcDistanceById(...)`. +- **Batch** IDs, *misspelled* historically: `CalDistanceById(...)` + (missing the `c` in `Calc`). + +The naming inconsistency is documented in +[Calculate Distance by ID](../advanced/calc_distance_by_id.md) and tracked +in [#2068](https://github.com/antgroup/vsag/issues/2068). + +**What 1.0 does:** + +- Both names continue to work; the batch method is **not** renamed in + 1.0. +- The batch method will be renamed to `CalcDistancesById` in a future + release, with the old name kept as a deprecated alias for at least + one minor cycle. + +**What you should do today:** + +- Keep using `CalDistanceById` for batch calls. +- Centralize the call behind a thin wrapper in your codebase. When the + rename ships, you only need to update the wrapper: + + ```cpp + // wrappers/vsag_calc_distance.h + inline auto CalcDistances(const vsag::IndexPtr& index, + const float* query, + const int64_t* ids, + int64_t count, + bool precise = true) { + // Today: forwards to the typo'd name. + return index->CalDistanceById(query, ids, count, precise); + } + ``` + +## Serialization compatibility + +VSAG 1.0 can **read** indexes serialized by 0.18.x via any of the three +serialization interfaces (`BinarySet` / `ReaderSet`, file streams, custom +`WriteFuncType`); the on-disk layout and metadata format are compatible +on the forward path. + +Recommendations: + +- After upgrading, **re-serialize once** so newly-produced artefacts use + any layout improvements that ship with 1.0. +- The reverse direction (1.0 → 0.18.x) is **not** supported. Pin a single + reader version per production cluster during the upgrade window. +- `Deserialize` still requires an empty target index whose build + configuration (`dim`, `dtype`, `metric_type`, …) matches the original; + see [Serialization](../advanced/serialization.md). +- DiskANN's on-disk shards remain managed independently; if you are + migrating away from `diskann`, treat the disk files as throwaway data + and rebuild on the new index type. + +Going forward, the compatibility contract between minor versions will be +codified in a dedicated *API Stability* page, planned as a follow-up PR +tracked in [#2069](https://github.com/antgroup/vsag/issues/2069). + +## Default-value and behavioral changes + +Things to double-check after pulling 1.0: + +- **MKL is off by default.** `VSAG_ENABLE_INTEL_MKL` (CMake: + `ENABLE_INTEL_MKL`) defaults to `OFF`. On Intel CPUs where MKL was + expected, set `VSAG_ENABLE_INTEL_MKL=ON` at build time. The + [reference performance](performance.md) numbers are gathered with MKL + off. +- **HGraph defaults.** `max_degree` defaults to `64`, `ef_construction` + to `400`, `graph_type` to `"nsw"`. The build sub-object key is + `index_param`; `base_quantization_type` is required. +- **`support_remove` / `support_duplicate` are opt-in.** If you relied + on `Remove()` or on duplicate detection from an experimental branch, + enable them explicitly under `index_param`. +- **`store_raw_vector`** is opt-in and only needed when you require the + raw vector after build (e.g. for `cosine` re-ranking when the base + representation is quantized). + +If a behavioral change surfaces that is not covered here, please file an +issue and link this page. + +## Build-system and packaging notes + +- **Toolchain pins remain unchanged.** `clang-format` / `clang-tidy` + must be **version 15 exactly**; GCC ≥ 9.4, Clang ≥ 13.0, CMake ≥ 3.18. +- **ABI variants are unchanged.** Choose the redistributable tarball + matching your downstream toolchain: + - `make dist-pre-cxx11-abi` — GCC `_GLIBCXX_USE_CXX11_ABI=0`. + - `make dist-cxx11-abi` — GCC `_GLIBCXX_USE_CXX11_ABI=1`. + - `make dist-libcxx` — Clang's libc++. +- **Python wheels.** `pip install pyvsag` continues to work; build from + source via `make pyvsag PY_VERSION=3.10` or `make pyvsag-all`. +- **Node.js / TypeScript.** `npm install vsag`. + +## Upgrade checklist + +A short, ordered list to drive an upgrade from 0.18.x to 1.0: + +1. **Read this page** end-to-end and skim the + [release notes](release_notes.md). +2. **Inventory deprecated usage** in your codebase: + - `vsag::Factory::CreateIndex("hnsw", ...)` and `("diskann", ...)`. + - `Index::KnnSearch(query, k, SearchParam&)` and any code that + constructs `vsag::SearchParam` directly. + - Direct calls to `CalDistanceById` (the batch overload); add a + wrapper now to soften the future rename. +3. **Plan replacements** using the tables in this page; aim for HGraph + and `SearchRequest` first. +4. **Test in staging.** Build an HGraph (and/or IVF) index with the same + `dim` / `metric_type` as your existing one; compare recall and + latency via [`eval_performance`](eval.md). +5. **Validate serialization round-trip.** Load 0.18.x artefacts with the + 1.0 binary, then re-serialize and reload. +6. **Roll out gradually.** Keep one cluster on 0.18.x as a fall-back + until the new cluster has been stable for at least one release of + 1.0.x. +7. **Update CI/CD pinning.** `pip install pyvsag==1.0.*`, + `npm install vsag@^1.0.0`, and pin the C++ tarball to the matching + ABI variant. + +When the upgrade is complete, please consider filing an issue or +contributing a short "what we hit" note so this page can keep improving. + +## See also + +- [Release Notes](release_notes.md) +- *API Stability* (planned, see [#2069](https://github.com/antgroup/vsag/issues/2069)) +- [HGraph](../indexes/hgraph.md) +- [IVF](../indexes/ivf.md) +- [Per-Search Allocator](../advanced/search_allocator.md) +- [Serialization](../advanced/serialization.md) +- Serialization-format compatibility statement. +- Default-value and behavioral changes. +- Build-system / packaging notes. +- Step-by-step upgrade checklist. diff --git a/docs/docs/en/src/resources/release_notes.md b/docs/docs/en/src/resources/release_notes.md index 6d2dd3042..fa9584c8d 100644 --- a/docs/docs/en/src/resources/release_notes.md +++ b/docs/docs/en/src/resources/release_notes.md @@ -1,52 +1,228 @@ # Release Notes -VSAG's official release history and change notes are maintained on GitHub Releases: - -- [Releases on GitHub](https://github.com/antgroup/vsag/releases) - -Each release includes: - -- **Features** — new functionality -- **Improvements** -- **Bug Fixes** -- **Breaking Changes** (when applicable) -- **Contributor credits** - -## Versioning +This page is the canonical changelog for the VSAG 1.x line. For +pre-1.0 history (the 0.15 / 0.16 / 0.18 lines), see +[Releases on GitHub](https://github.com/antgroup/vsag/releases). VSAG follows [Semantic Versioning 2.0](https://semver.org/): - `MAJOR.MINOR.PATCH` -- `MAJOR` generally comes with incompatible API or serialization changes. +- `MAJOR` carries incompatible API or serialization changes. - `MINOR` adds functionality while remaining backward compatible. - `PATCH` contains only bug fixes and performance improvements. -## Getting a Specific Version +The compatibility contract that 1.x will hold to is described in a +dedicated *API Stability* page (planned as a follow-up PR, tracked in +[#2069](https://github.com/antgroup/vsag/issues/2069)). If you are +upgrading from 0.18, start with the +[Migration to VSAG 1.0](migration_to_1_0.md) guide; it covers every +breaking change in one place. + +--- + +## VSAG 1.0.0 — *target: 2026, exact date TBD* + +VSAG 1.0 is the first stable major release. It locks in the public +C++/Python/Node.js API surface, the on-disk serialization format, and +the supported index families, so the rest of the 1.x line can ship new +features without breaking your code. + +### Highlights + +- **Two production-ready index families** — `hgraph` for graph-based + search, `ivf` for inverted-index search. Both cover in-memory and + memory-plus-disk hybrid retrieval. Legacy `hnsw` and `diskann` indexes + are deprecated; see [Migration to VSAG 1.0](migration_to_1_0.md). +- **Comprehensive quantization** — RabitQ (BQ) for extreme compression, + PQ for flexible compression ratios, SQ4 / SQ8 for standard + quantization with minor recall loss. All quantizers can be combined + with HGraph or IVF. +- **First-class non-FP32 inputs** — INT8, BF16, FP16 and sparse vectors + are accepted as primary input types, not just emulated on top of FP32. +- **Multi-platform SIMD** — x86_64 (SSE / AVX / AVX2 / AVX-512 / AMX) and + ARM (NEON / SVE) backends, plus optional Intel MKL and OpenBLAS for + matrix kernels. +- **Per-tenant resource isolation** — per-index allocators and + injectable thread pools make it practical to host multiple tenants in + the same process. +- **New unified search API** — `Index::SearchWithRequest(SearchRequest)` + replaces the deprecated `KnnSearch(query, k, SearchParam&)` overload, + with explicit per-search allocator and reasoning support. +- **Stable public headers** — every header under `include/vsag/` is now + guaranteed self-contained; the 1.x line will not silently change + public ABI surface within a minor version. + +### Indexes + +- **HGraph** — recommended graph index for most workloads. + - Reverse-edge support, optional duplicate-distance threshold, and + `hops_limit` search parameter for tightly bounded latency budgets. + - `Remove` is now supported on graph indexes (mark-remove plus + `ShrinkAndRepair` reclamation with timeout). + - Built-in `Train` API plus ODescent builder for offline graph + construction; see [Build and Train](../advanced/build_and_train.md). + - Reasoning instrumentation: pass a `QueryContext` to collect + per-search diagnostics (visited nodes, hop count, distance + computations) without changing the result format. +- **IVF** — recommended inverted index for batched / large-K queries. + Supports the same set of quantizers as HGraph and integrates with the + per-search allocator. +- **SINDI** — sparse inverted index with built-in term-ID remapping for + sparse vocabularies, vector update support, and analyzer hooks. +- **Pyramid** — hierarchical inverted index with deduplication support, + static optimization, `topk_factor` parameter on the base search + parameter class, and a `PyramidAnalyzer` for index statistics. +- **BruteForce** — exact baseline with parallel range search. +- **WARP** — multi-vector brute-force backend, migrated to the new + MultiVectors API. + +### Quantization + +- **RabitQ (BQ)** with extend-bit and split-base reorder support, plus + dedicated SIMD kernels. +- **PQ / SQ4 / SQ8** as standard memory/recall trade-offs. +- **Scalar quantizer** hardened against NaN encoding. +- **Quantization Transform** advanced page documenting the full + pipeline; see [Quantization Transform](../advanced/quantization_transform.md). + +### Data types and dataset support + +- **FP32 / INT8 / BF16 / FP16** vector inputs as first-class formats. +- **Sparse vectors** end-to-end (SINDI + sparse HDF5 dataset helpers in + `pyvsag`). +- **MultiVector datasets** as a first-class type; eval tooling and + WARP both consume the new MultiVectors API directly. +- **`extra_info`** payload stored alongside vectors; see the user + guide on `extra_info` for HGraph. + +### Search API + +- New `SearchRequest` / `Index::SearchWithRequest` pair as the primary + search entry point. Carries the query dataset, k, optional filter, + reasoning hook, and a per-search allocator in a single struct so the + hot path no longer mixes positional and out-parameter arguments. +- `SearchParam` and the old `KnnSearch(query, k, SearchParam&)` overload + remain available but are marked `[[deprecated]]`. The full mapping is + in [Migration to VSAG 1.0](migration_to_1_0.md). +- `CalDistanceById` (batch) is being renamed to `CalcDistancesById` with + consistent return semantics; the legacy name remains as a wrapper. See + [Calculate Distance by ID](../advanced/calc_distance_by_id.md) and + issue [#2068](https://github.com/antgroup/vsag/issues/2068). +- Range search variant (`SearchWithRequest` with radius semantics) is + available across HGraph, IVF, and BruteForce. + +### Platforms and packaging + +- **x86_64 SIMD:** SSE, AVX, AVX2, AVX-512, plus AMX backends (SQ8U INT8 + IP and BF16 GEMM for KMeans). +- **ARM SIMD:** NEON and SVE. +- **macOS (Darwin)** is a supported build platform. +- **Intel MKL** is now opt-in (`VSAG_ENABLE_INTEL_MKL=OFF` / + CMake `ENABLE_INTEL_MKL=OFF` by default). +- **OpenBLAS** can be linked from the system instead of the bundled + copy (`VSAG_ENABLE_SYSTEM_OPENBLAS=ON`). +- Third-party downloads support custom mirror URLs for environments + without direct GitHub access. + +### Resource isolation and observability + +- **Per-index allocators** — pass a custom `Allocator` through + `IndexCommonParam` and every container under that index honors it. +- **Injectable thread pools** — supply your own thread pool for both + build and search. +- **Per-search allocator** — see + [Per-Search Allocator](../advanced/search_allocator.md). +- **Search statistics** — `io_cnt`, `io_time_ms`, and other counters + exposed through `SearchRequest` reasoning. +- **Memory and introspection** — see + [Memory](../advanced/memory.md) and + [Index Introspection](../advanced/introspection.md). +- **Index lifecycle** — [Index Lifecycle Management](../advanced/index_lifecycle.md) + documents how to + add, remove, mark-remove, and rebuild safely under load. + +### Tooling and ecosystem + +- **`pyvsag`** Python bindings extended to cover the full index + surface, including sparse HDF5 helpers and pyramid export. +- **Node.js / TypeScript bindings** — `vsag` npm package with + quickstart examples in `examples/typescript/`. +- **`eval_performance`** tool supports multi-vector datasets and a + configurable search query count. +- **HTTP monitor server** built on `cpp-httplib` for exposing live + index metrics. + +### Breaking changes (vs. 0.18) + +The full list with code-diff examples lives in +[Migration to VSAG 1.0](migration_to_1_0.md). Headline items: + +1. `hnsw` and `diskann` index types are deprecated. Use `hgraph` (or + the hybrid memory-disk configuration) and `ivf` respectively. +2. `SearchParam` and `Index::KnnSearch(query, k, SearchParam&)` are + deprecated in favor of `SearchRequest` / + `Index::SearchWithRequest(SearchRequest)`. +3. `CalDistanceById` (batch) returns `-1` for invalid IDs and is being + renamed to `CalcDistancesById`. The old name continues to work for + one minor cycle. +4. `VSAG_ENABLE_INTEL_MKL` defaults to `OFF`. Set it explicitly if you + were relying on MKL. +5. Several HGraph defaults changed (`max_degree=64`, + `ef_construction=400`, `graph_type="nsw"`); `support_remove`, + `support_duplicate`, and `store_raw_vector` default to `OFF`. + +Serialization: 0.18 snapshots are **not** guaranteed to deserialize on +1.0; rebuild on the new release. See *Migration*. + +### Known issues + +- *To be filled in during the 1.0 RC cycle.* + +### Acknowledgments + +VSAG 1.0 is the result of contributions from the Ant Group VSAG team +and the wider open-source community. Full per-release contributor +credits remain on the +[GitHub Releases page](https://github.com/antgroup/vsag/releases). + +--- + +## Getting a specific version ### C++ / source ```bash -git checkout vX.Y.Z +git checkout v1.0.0 make release ``` ### Python ```bash -pip install pyvsag==X.Y.Z +pip install pyvsag==1.0.0 ``` ### Node.js / TypeScript ```bash -npm install vsag@X.Y.Z +npm install vsag@1.0.0 ``` -## Upgrade Guidance +## Upgrade guidance -- Read the **Breaking Changes** section of the corresponding release before upgrading across major - versions. -- When the serialization format changes, validate deserialization compatibility in a staging - environment first. +- Read [Migration to VSAG 1.0](migration_to_1_0.md) before upgrading + from any 0.x release. +- Read the **Breaking Changes** section of each future major release + before crossing major versions. +- When the serialization format changes, validate deserialization + compatibility in a staging environment first. - Roll out gradually in production and use the - [performance evaluation tool](eval.md) to compare recall and latency. + [performance evaluation tool](eval.md) to compare recall and latency + against your existing baseline. + +## See also + +- [Migration to VSAG 1.0](migration_to_1_0.md) +- [Roadmap](roadmap_2025.md) +- [Best Practices](best_practices.md) +- [Performance](performance.md) diff --git a/docs/docs/zh/src/SUMMARY.md b/docs/docs/zh/src/SUMMARY.md index 61f9f95db..394b060fe 100644 --- a/docs/docs/zh/src/SUMMARY.md +++ b/docs/docs/zh/src/SUMMARY.md @@ -56,6 +56,7 @@ # 资源 +- [升级到 VSAG 1.0](resources/migration_to_1_0.md) - [版本日志](resources/release_notes.md) - [路线图](resources/roadmap_2025.md) - [开源社区](resources/community.md) diff --git a/docs/docs/zh/src/resources/migration_to_1_0.md b/docs/docs/zh/src/resources/migration_to_1_0.md new file mode 100644 index 000000000..8f65f63f8 --- /dev/null +++ b/docs/docs/zh/src/resources/migration_to_1_0.md @@ -0,0 +1,298 @@ +# 升级到 VSAG 1.0 + +本页汇总了从 **VSAG 0.18.x**(及更早版本)平滑升级到 **VSAG 1.0** 所需了解的 +全部内容。请在重新编译或重新部署前先阅读本页。 + +> 进度跟踪在 +> [issue #2069](https://github.com/antgroup/vsag/issues/2069) / +> [PR #2070](https://github.com/antgroup/vsag/pull/2070)。如有勘误或 +> "升级途中踩到的坑"反馈,欢迎提 issue。 + +> 1.0 之后版本之间的兼容性规则将由独立的 *API 稳定性* 页面说明,作为后续 +> PR 跟踪于 [#2069](https://github.com/antgroup/vsag/issues/2069); +> 本页只覆盖 0.18 → 1.0 的一次性迁移。 + +## 一图速览 + +| 主题 | 1.0 中的状态 | 操作 | +|------|-------------|------| +| `hnsw` 索引 | 已弃用,仍可用 | 新增部署改用 [HGraph](../indexes/hgraph.md) | +| `diskann` 索引 | 已弃用,仍可用 | 新增部署改用 [IVF](../indexes/ivf.md) 或 [内存-磁盘混合索引](../advanced/hybrid_index.md) | +| `Index::KnnSearch(query, k, SearchParam&)` | 已弃用的重载 | 改用 `Index::SearchWithRequest(SearchRequest)` | +| `SearchParam::allocator` | 已弃用字段 | 改用 `SearchRequest::search_allocator_` | +| `Index::CalDistanceById`(批量) | 保留(拼写错误的名字) | 继续使用;正确拼写的 `CalcDistancesById` 在规划中(见 [#2068](https://github.com/antgroup/vsag/issues/2068)) | +| 0.18.x 序列化产物 | 1.0 可读 | 升级后建议重新序列化以采用新的布局优化 | +| 公共 C ABI | 稳定 | 无需操作 | + +下文逐行展开,给出可直接套用的代码片段。 + +## 已弃用的索引 + +### `hnsw` → `hgraph` + +`hnsw` 是从 hnswlib 继承下来的图索引。1.0 中为了向后兼容仍然保留,但 +**已标记为弃用**;新增部署请使用 [HGraph](../indexes/hgraph.md),它是 +`hnsw` 的超集: + +- 同样的分层图拓扑,同样的 `max_degree` / `ef_construction` / + `ef_search` 调参。 +- 统一的 `index_param` 构建参数 schema,量化选项更丰富(`fp32`、 + `fp16`、`bf16`、`sq8`、`sq8_uniform`、`sq4_uniform`、`pq`、`pqfs`、 + `rabitq`)。 +- 可选的重排(`use_reorder` + `precise_quantization_type`)、去重、 + `Remove()`、以及基于 ELP 的运行期自动调参。 + +构建期映射: + +```diff +- auto index = vsag::Factory::CreateIndex("hnsw", R"({ +- "dim": 768, +- "dtype": "float32", +- "metric_type": "ip", +- "hnsw": { +- "max_degree": 32, +- "ef_construction": 400 +- } +- })").value(); ++ auto index = vsag::Factory::CreateIndex("hgraph", R"({ ++ "dim": 768, ++ "dtype": "float32", ++ "metric_type": "ip", ++ "index_param": { ++ "base_quantization_type": "fp32", ++ "max_degree": 32, ++ "ef_construction": 400 ++ } ++ })").value(); +``` + +搜索期映射: + +```diff +- auto result = index->KnnSearch(query, k, R"({"hnsw": {"ef_search": 100}})").value(); ++ auto result = index->KnnSearch(query, k, R"({"hgraph": {"ef_search": 100}})").value(); +``` + +两个易错点: + +1. 构建子对象的 key 从 `"hnsw"` 变成 `"index_param"`,并且 + `base_quantization_type` 是必填字段。 +2. 搜索子对象的 key 也从 `"hnsw"` 变成 `"hgraph"`。 + +### `diskann` → `ivf` 或 内存-磁盘混合索引 + +`diskann` 提供了内存放 PQ、磁盘放原始向量的混合检索能力。1.0 中 +**已弃用**;请按以下顺序选择替代项: + +- [IVF](../indexes/ivf.md) —— 适合大规模分区检索;当数据可以完全放入 + 内存时,是 `diskann` 的自然替代。 +- [内存-磁盘混合索引](../advanced/hybrid_index.md) —— 当确实需要把 + 部分索引下沉到 NVMe(语料巨大、内存预算紧张)时再使用。 + +优先尝试 IVF;只有在确实测量到内存是瓶颈时,再退回磁盘混合配置。 + +### `hnsw` / `diskann` 不再作为首选示例 + +网站文档中 [创建索引](../guide/create_index.md)、 +[索引参数](index_parameters.md)、[序列化](../advanced/serialization.md) +等页面将在后续 PR 中改为默认使用 `hgraph` 示例,整体进度跟踪于 +[#2069](https://github.com/antgroup/vsag/issues/2069)。原有的示例代码 +`examples/cpp/101_index_hnsw.cpp`、`examples/cpp/102_index_diskann.cpp` +仍保留以供参考。 + +## 已弃用的检索 API:`SearchParam` → `SearchRequest` + +历史上 VSAG 累积了多个 `Index::KnnSearch` 重载。1.0 的公开 API 收敛到 +单一入口,把**所有**检索选项都通过一个 struct 传递: + +```cpp +[[nodiscard]] tl::expected +SearchWithRequest(const SearchRequest& request) const; +``` + +`SearchRequest`(声明在 [`include/vsag/search_request.h`](https://github.com/antgroup/vsag/blob/main/include/vsag/search_request.h)) +同时支持 KNN 与范围检索、属性过滤、回调过滤、bitset 过滤、迭代器检索、 +每次检索独立的 allocator,以及 expected-labels 召回归因 —— 全部由一个 +struct 承载。旧的 `Index::KnnSearch(query, k, SearchParam&)` 重载 +**已弃用**,将在未来某个 major 版本中移除。 + +### 字段映射 + +| `SearchParam`(旧) | `SearchRequest`(新) | 说明 | +|---------------------|-----------------------|------| +| `parameters` (`const std::string&`) | `params_str_` (`std::string`) | JSON 参数字符串(如 `{"hgraph": {"ef_search": 200}}`)。 | +| `filter` | `filter_` + `enable_filter_ = true` | 回调式 `Filter` 对象,需要显式开启。 | +| `allocator` | `search_allocator_` | 每次检索使用的 arena allocator,见 [搜索路径 Allocator](../advanced/search_allocator.md)。 | +| `iter_ctx` | `p_iter_ctx_` + `enable_iterator_search_ = true` | 注意指针层级 —— `SearchRequest` 接收 `IteratorContext**`。 | +| `is_iter_filter` | 由 `enable_iterator_search_` 承担 | 迭代器检索改为一个布尔开关。 | +| `is_last_search` | `is_last_search_` | 语义不变。 | + +`SearchRequest` 还额外暴露了 `SearchParam` 不具备的能力: + +- `mode_`(`SearchMode::KNN_SEARCH` / `SearchMode::RANGE_SEARCH`)、 + `topk_`、`radius_`、`limited_size_` —— KNN 与范围检索共用同一 struct。 +- `enable_attribute_filter_` + `attribute_filter_str_` —— SQL 风格的 + 属性过滤,见 [属性过滤](../advanced/attribute_filter.md)。 +- `enable_bitset_filter_` + `bitset_filter_` —— bitset 过滤。 +- `expected_labels_` —— 用于召回调试 / 归因分析。 + +### 代码迁移 + +迁移前: + +```cpp +vsag::SearchParam param( + /*iter_filter_flag=*/false, + R"({"hgraph": {"ef_search": 200}})", + /*filter=*/my_filter, + /*allocator=*/my_arena); +auto result = index->KnnSearch(query, /*k=*/10, param).value(); +``` + +迁移后: + +```cpp +vsag::SearchRequest req; +req.query_ = query; +req.mode_ = vsag::SearchMode::KNN_SEARCH; +req.topk_ = 10; +req.params_str_ = R"({"hgraph": {"ef_search": 200}})"; +req.enable_filter_ = static_cast(my_filter); +req.filter_ = my_filter; +req.search_allocator_ = my_arena; +auto result = index->SearchWithRequest(req).value(); +``` + +范围检索只需切换 `mode_`: + +```cpp +req.mode_ = vsag::SearchMode::RANGE_SEARCH; +req.radius_ = 0.42F; +req.limited_size_ = 1000; // -1 表示不限制 +auto result = index->SearchWithRequest(req).value(); +``` + +> **提示**:`SearchRequest` 是带默认值的 POD struct,包一层小型 +> builder/helper 通常比旧的多参数 `SearchParam` 构造函数更清晰。 + +## `CalDistanceById` 拼写问题与 `CalcDistancesById` 迁移路径 + +`Index` 上有两种按 ID 计算距离的 API: + +- **单条** ID,拼写正确:`CalcDistanceById(...)`。 +- **批量** IDs,历史上**拼写有误**:`CalDistanceById(...)`(少了 + `Calc` 中的 `c`)。 + +该命名不一致在 [按 ID 计算距离](../advanced/calc_distance_by_id.md) +里有说明,并由 [#2068](https://github.com/antgroup/vsag/issues/2068) +跟踪。 + +**1.0 的处理:** + +- 两个名字都继续可用;批量方法**不会**在 1.0 中改名。 +- 未来某个版本会把批量方法重命名为 `CalcDistancesById`,旧名字会以 + 弃用别名的形式至少保留一个 minor 版本。 + +**现在应该怎么做:** + +- 批量调用继续使用 `CalDistanceById`。 +- 在自己的代码中包一层 thin wrapper,未来重命名时只改 wrapper: + + ```cpp + // wrappers/vsag_calc_distance.h + inline auto CalcDistances(const vsag::IndexPtr& index, + const float* query, + const int64_t* ids, + int64_t count, + bool precise = true) { + // 当前:转发到拼写错误的旧名字。 + return index->CalDistanceById(query, ids, count, precise); + } + ``` + +## 序列化兼容性 + +VSAG 1.0 通过三种序列化接口(`BinarySet` / `ReaderSet`、文件流、 +自定义 `WriteFuncType`)均可**读取** 0.18.x 序列化产物;磁盘布局与 +元数据格式在前向方向上兼容。 + +建议: + +- 升级完成后**重新序列化一次**,让新产物使用 1.0 的布局改进。 +- 反向兼容(1.0 → 0.18.x)**不支持**。升级窗口期内,每个生产集群应固定 + 在单一 reader 版本上。 +- `Deserialize` 仍要求目标索引为空,且构建配置(`dim`、`dtype`、 + `metric_type` 等)与原索引一致;详见 + [序列化](../advanced/serialization.md)。 +- DiskANN 的磁盘文件仍独立管理;如果你正在从 `diskann` 迁出,把这些 + 磁盘文件当作可丢弃数据,在新的索引类型上重建即可。 + +之后版本之间的兼容性合约将由独立的 *API 稳定性* 页面规范,作为后续 PR +跟踪于 [#2069](https://github.com/antgroup/vsag/issues/2069)。 + +## 默认值与行为变化 + +升级 1.0 后建议确认: + +- **MKL 默认关闭。** `VSAG_ENABLE_INTEL_MKL`(CMake: + `ENABLE_INTEL_MKL`)默认 `OFF`。在原本期望开启 MKL 的 Intel CPU + 环境,请在构建时显式 `VSAG_ENABLE_INTEL_MKL=ON`。 + [标准环境性能参考](performance.md) 的数据是在 MKL 关闭下采集的。 +- **HGraph 默认值。** `max_degree` 默认 `64`,`ef_construction` 默认 + `400`,`graph_type` 默认 `"nsw"`。构建子对象的 key 为 + `index_param`;`base_quantization_type` 是必填字段。 +- **`support_remove` / `support_duplicate` 默认关闭。** 如果你依赖 + `Remove()` 或之前实验分支上的去重能力,请在 `index_param` 中显式 + 开启。 +- **`store_raw_vector`** 默认关闭,只在确实需要在构建后访问原始向量 + 时再开启(例如基础表征已被量化、需要 `cosine` 重排)。 + +本页未覆盖到的行为变化欢迎提 issue 反馈。 + +## 构建系统与打包说明 + +- **工具链版本约束不变。** `clang-format` / `clang-tidy` 必须**严格 + 等于 15**;GCC ≥ 9.4,Clang ≥ 13.0,CMake ≥ 3.18。 +- **ABI 变体不变。** 根据下游工具链选择对应的发行包: + - `make dist-pre-cxx11-abi` —— GCC `_GLIBCXX_USE_CXX11_ABI=0`。 + - `make dist-cxx11-abi` —— GCC `_GLIBCXX_USE_CXX11_ABI=1`。 + - `make dist-libcxx` —— Clang libc++。 +- **Python wheel。** 继续支持 `pip install pyvsag`;源码构建用 + `make pyvsag PY_VERSION=3.10` 或 `make pyvsag-all`。 +- **Node.js / TypeScript。** `npm install vsag`。 + +## 升级操作清单 + +驱动从 0.18.x 升级到 1.0 的一个简短有序清单: + +1. **通读本页**,并速览 [版本日志](release_notes.md)。 +2. **盘点代码中的弃用用法**: + - `vsag::Factory::CreateIndex("hnsw", ...)` / `("diskann", ...)`。 + - `Index::KnnSearch(query, k, SearchParam&)` 以及直接构造 + `vsag::SearchParam` 的代码。 + - 直接调用 `CalDistanceById`(批量重载)的位置;现在包一层 + wrapper,未来改名只需改 wrapper。 +3. **规划替换**,优先选 HGraph 与 `SearchRequest`。 +4. **预发环境验证。** 用同样的 `dim` / `metric_type` 构建 HGraph + (或 IVF),通过 [`eval_performance`](eval.md) 对比召回与延迟。 +5. **序列化往返验证。** 用 1.0 二进制加载 0.18.x 产物,重新序列化后 + 再次加载。 +6. **灰度滚动。** 旧版本集群作为回滚池保留,直到新集群在某个 1.0.x + 小版本上稳定一段时间。 +7. **更新 CI/CD 版本约束。** `pip install pyvsag==1.0.*`、 + `npm install vsag@^1.0.0`、C++ 发行包固定到匹配的 ABI 变体。 + +升级完成后,欢迎提 issue 或贡献一段"实战记录",帮助本页持续完善。 + +## 参考 + +- [版本日志](release_notes.md) +- *API 稳定性*(规划中,见 [#2069](https://github.com/antgroup/vsag/issues/2069)) +- [HGraph](../indexes/hgraph.md) +- [IVF](../indexes/ivf.md) +- [搜索路径 Allocator](../advanced/search_allocator.md) +- [序列化](../advanced/serialization.md) +- 序列化格式兼容性声明。 +- 默认值与行为变化。 +- 构建系统 / 打包相关说明。 +- 升级操作清单。 diff --git a/docs/docs/zh/src/resources/release_notes.md b/docs/docs/zh/src/resources/release_notes.md index 4d6afed08..156659a42 100644 --- a/docs/docs/zh/src/resources/release_notes.md +++ b/docs/docs/zh/src/resources/release_notes.md @@ -1,49 +1,201 @@ # 版本日志 -VSAG 的正式发布历史与变更说明维护在 GitHub Releases 页面: - -- [Releases on GitHub](https://github.com/antgroup/vsag/releases) - -每个发布版本包含: - -- **新增功能**(Features) -- **改进**(Improvements) -- **缺陷修复**(Bug Fixes) -- **不兼容变更**(Breaking Changes,如有) -- **贡献者名单** - -## 版本号规范 +本页是 VSAG 1.x 系列的主变更日志。1.0 之前(0.15 / 0.16 / 0.18 系列) +的历史请见 [GitHub Releases](https://github.com/antgroup/vsag/releases)。 VSAG 遵循 [Semantic Versioning 2.0](https://semver.org/): - `MAJOR.MINOR.PATCH` -- `MAJOR` 通常伴随 API / 序列化格式的不兼容修改; +- `MAJOR` 通常伴随 API 或序列化格式的不兼容修改; - `MINOR` 新增功能但保持向后兼容; - `PATCH` 仅包含缺陷修复与性能改进。 +1.x 系列将遵守的兼容性合约会在独立的 *API 稳定性* 页面中描述(作为后续 +PR 跟踪于 [#2069](https://github.com/antgroup/vsag/issues/2069))。 +从 0.18 升级,请先阅读 [升级到 VSAG 1.0](migration_to_1_0.md),所有不 +兼容变更都集中说明在那里。 + +--- + +## VSAG 1.0.0 — *目标发布:2026 年,具体日期 TBD* + +VSAG 1.0 是首个稳定大版本。它锁定了对外公开的 C++/Python/Node.js API、 +索引序列化格式,以及支持的索引族,让 1.x 后续版本可以在不破坏已有代码 +的前提下持续迭代。 + +### 亮点 + +- **两类生产可用的索引族** —— `hgraph` 用于图检索,`ivf` 用于倒排索引。 + 两者均覆盖纯内存与内存+磁盘混合检索模式。旧的 `hnsw` 和 `diskann` + 索引已弃用,详见 [升级到 VSAG 1.0](migration_to_1_0.md)。 +- **完整量化方案** —— RabitQ(BQ)用于极致压缩、PQ 用于灵活压缩比、 + SQ4 / SQ8 提供标准量化与小幅召回损失。所有量化器都可与 HGraph 或 + IVF 组合。 +- **原生支持非 FP32 输入** —— INT8、BF16、FP16 与稀疏向量作为一级输入 + 类型,不再依赖 FP32 仿真。 +- **多平台 SIMD** —— x86_64(SSE / AVX / AVX2 / AVX-512 / AMX)与 ARM + (NEON / SVE)后端,以及可选的 Intel MKL、OpenBLAS 矩阵核。 +- **租户级资源隔离** —— per-index allocator 与可注入的线程池,使得在 + 同一进程内承载多租户成为现实可行的方案。 +- **统一的检索 API** —— `Index::SearchWithRequest(SearchRequest)` 取代 + 弃用的 `KnnSearch(query, k, SearchParam&)`,原生支持 per-search + allocator 与 reasoning。 +- **稳定的对外头文件** —— `include/vsag/` 下每个头文件保证自包含; + 1.x 系列内的小版本不会悄悄改变对外 ABI。 + +### 索引 + +- **HGraph** —— 大多数场景下推荐使用的图索引。 + - 支持反向边、可选的重复距离阈值,以及用于精细延迟控制的 + `hops_limit` 检索参数; + - 图索引支持 `Remove`(mark-remove 配合带超时的 `ShrinkAndRepair` + 回收); + - 内置 `Train` API 与 ODescent 离线构图,详见 + [Build and Train](../advanced/build_and_train.md); + - Reasoning 诊断:通过 `QueryContext` 收集每次检索的访问节点、跳数、 + 距离计算次数等,不影响检索结果格式。 +- **IVF** —— 推荐用于批量 / 大 K 检索的倒排索引。支持与 HGraph 相同的 + 量化器集合,并与 per-search allocator 集成。 +- **SINDI** —— 稀疏倒排索引,内置稀疏词表的 term ID 重映射、向量更新 + 与 analyzer 钩子。 +- **Pyramid** —— 分层倒排索引,支持去重、静态优化、基类 + `IndexSearchParameter` 上的 `topk_factor` 参数,以及 + `PyramidAnalyzer` 统计工具。 +- **BruteForce** —— 精确基线,支持并行 range search。 +- **WARP** —— 多向量暴力检索后端,已迁移到新的 MultiVectors API。 + +### 量化 + +- **RabitQ(BQ)** 支持 extend-bit 与 split-base reorder,并配套独立的 + SIMD 实现; +- **PQ / SQ4 / SQ8** 作为标准的内存 / 召回权衡; +- **Scalar quantizer** 加固了 NaN 编码场景; +- **Quantization Transform** 高级页面完整描述了量化流水线,详见 + [Quantization Transform](../advanced/quantization_transform.md)。 + +### 数据类型与数据集 + +- **FP32 / INT8 / BF16 / FP16** 向量输入作为一级类型; +- **稀疏向量** 端到端支持(SINDI + `pyvsag` 的稀疏 HDF5 helper); +- **MultiVector 数据集** 作为一级类型;评测工具和 WARP 均直接消费新的 + MultiVectors API; +- **`extra_info`** 可与向量一并存储,详见 HGraph 的 `extra_info` + 使用指南。 + +### 检索 API + +- 新的 `SearchRequest` / `Index::SearchWithRequest` 作为主检索入口。 + query 数据集、k、可选 filter、reasoning 钩子、per-search allocator + 统一封装在一个结构体里,热路径不再混用位置参数与 out 参数。 +- `SearchParam` 与旧的 `KnnSearch(query, k, SearchParam&)` 仍然可用, + 但已标记 `[[deprecated]]`。完整对照见 + [升级到 VSAG 1.0](migration_to_1_0.md)。 +- `CalDistanceById`(批量接口)正在改名为 `CalcDistancesById`,返回值 + 语义统一;旧名作为 wrapper 保留。见 + [按 ID 计算距离](../advanced/calc_distance_by_id.md) 与 + Issue [#2068](https://github.com/antgroup/vsag/issues/2068)。 +- Range search 变体(带半径语义的 `SearchWithRequest`)在 HGraph、 + IVF、BruteForce 上均可用。 + +### 平台与打包 + +- **x86_64 SIMD:** SSE、AVX、AVX2、AVX-512,并新增 AMX 后端(SQ8U + INT8 IP,以及 KMeans 用的 BF16 GEMM); +- **ARM SIMD:** NEON 与 SVE; +- **macOS(Darwin)** 作为受支持的构建平台; +- **Intel MKL** 改为可选(`VSAG_ENABLE_INTEL_MKL=OFF` / CMake + `ENABLE_INTEL_MKL=OFF` 默认关闭); +- **OpenBLAS** 可从系统链接,而非使用内置副本 + (`VSAG_ENABLE_SYSTEM_OPENBLAS=ON`); +- 第三方依赖下载支持自定义镜像地址,方便无法直连 GitHub 的环境。 + +### 资源隔离与可观测性 + +- **Per-index allocator** —— 通过 `IndexCommonParam` 注入自定义 + `Allocator`,该索引下所有容器都会沿用; +- **可注入线程池** —— 构建与检索都可以使用业务自带的线程池; +- **Per-search allocator** —— 详见 + [Per-Search Allocator](../advanced/search_allocator.md); +- **检索统计** —— `io_cnt`、`io_time_ms` 等计数器通过 `SearchRequest` + reasoning 暴露; +- **内存与诊断** —— 详见 [Memory](../advanced/memory.md) 与 + [Index Introspection](../advanced/introspection.md); +- **索引生命周期** —— [Index Lifecycle Management](../advanced/index_lifecycle.md) + 描述了在线情况下 + add / remove / mark-remove / rebuild 的安全做法。 + +### 工具与生态 + +- **`pyvsag`** Python 绑定已扩展到完整的索引接口,包含稀疏 HDF5 + helper 与 pyramid 导出; +- **Node.js / TypeScript 绑定** —— `vsag` npm 包,配套 + `examples/typescript/` 快速上手示例; +- **`eval_performance`** 工具支持多向量数据集与可配置的 query 数量; +- **HTTP monitor 服务** 基于 `cpp-httplib`,对外暴露在线索引指标。 + +### 不兼容变更(相对 0.18) + +完整列表与代码 diff 见 [升级到 VSAG 1.0](migration_to_1_0.md)。这里 +只列要点: + +1. `hnsw`、`diskann` 索引类型已弃用,分别迁移到 `hgraph`(或内存+磁盘 + 混合配置)与 `ivf`。 +2. `SearchParam` 与 `Index::KnnSearch(query, k, SearchParam&)` 已弃用, + 请改用 `SearchRequest` / `Index::SearchWithRequest(SearchRequest)`。 +3. `CalDistanceById`(批量)对非法 ID 返回 `-1`,并正在改名为 + `CalcDistancesById`,旧名再保留一个小版本周期。 +4. `VSAG_ENABLE_INTEL_MKL` 默认 `OFF`,如果之前依赖 MKL,请显式开启。 +5. 多个 HGraph 默认值发生变化(`max_degree=64`、`ef_construction=400`、 + `graph_type="nsw"`);`support_remove`、`support_duplicate`、 + `store_raw_vector` 默认关闭。 + +序列化方面:0.18 序列化文件**不保证**能在 1.0 上反序列化;建议在新版本 +重建索引。详见 *Migration*。 + +### 已知问题 + +- *将在 1.0 RC 阶段补充。* + +### 致谢 + +VSAG 1.0 是蚂蚁集团 VSAG 团队与开源社区共同贡献的成果。逐版本完整 +贡献者名单仍维护在 +[GitHub Releases](https://github.com/antgroup/vsag/releases) 页面。 + +--- + ## 如何获取对应版本 ### C++ / 源码 ```bash -git checkout vX.Y.Z +git checkout v1.0.0 make release ``` ### Python ```bash -pip install pyvsag==X.Y.Z +pip install pyvsag==1.0.0 ``` ### Node.js / TypeScript ```bash -npm install vsag@X.Y.Z +npm install vsag@1.0.0 ``` ## 升级建议 -- 跨大版本升级前,请先阅读对应 Release 的 **Breaking Changes** 部分; -- 涉及序列化格式变更时,建议先在测试环境验证反序列化兼容性; +- 从任意 0.x 版本升级前,请先阅读 + [升级到 VSAG 1.0](migration_to_1_0.md); +- 跨大版本升级前,请阅读对应 Release 的 **Breaking Changes** 部分; +- 涉及序列化格式变更时,建议先在测试环境验证反序列化兼容性; - 生产环境灰度升级,结合 [性能评估工具](eval.md) 对比召回与延迟。 + +## 参考 + +- [升级到 VSAG 1.0](migration_to_1_0.md) +- [路线图](roadmap_2025.md) +- [最佳实践](best_practices.md) +- [性能](performance.md)