Commit 5ba15c1
Port datastax#659: Streaming N:1 on-disk graph index compaction (#6)
* Add on-disk graph index compaction algorithm
Introduce OnDiskGraphIndexCompactor and PQRetrainer for streaming N:1
merging of on-disk HNSW indexes without full in-memory materialization.
Supports deletion filtering via live-node bitsets, custom ordinal
mapping, and PQ codebook retraining.
* Add compaction unit tests
Tests for OnDiskGraphIndexCompactor covering basic compaction, deletions,
ordinal remapping, multi-source merging, and FusedPQ compaction scenarios.
* Add reporting and storage infrastructure for CompactorBenchmark
Add JFR recording, system stats collection, JSONL logging, git info
capture, thread allocation tracking, dataset partitioning, and cloud
storage layout utilities used by CompactorBenchmark. Switch
jvector-examples logging from logback to log4j2 for consistency with
benchmarks-jmh and to avoid duplicate SLF4J bindings in the fat jar.
* Add CompactorBenchmark and tooling
JMH-based benchmark with configurable workload modes (PARTITION_AND_COMPACT,
PARTITION_ONLY, COMPACT_ONLY, BUILD_FROM_SCRATCH), recall measurement, JFR
recording, and JSONL result logging. Includes BenchmarkParamCounter for
progress tracking, EventLogAnalyzer for post-run analysis, GHA workflow,
and exec-maven-plugin integration. Add forced vectorization provider
property to VectorizationProvider for benchmark reproducibility.
* Update build config and project metadata for compaction
Add result file patterns to .gitignore, update rat-excludes for the new
compaction workflow and catalog cache files.
* Fix JMH jar selection in run-compaction.yml
The benchmarks-jmh-*.jar glob matched the -javadoc jar first, which has
no Main-Class. Select the shaded JMH jar explicitly by excluding
-javadoc and -sources jars.
* Fix CompactorBenchmark invocation in run-compaction.yml
Use -cp with CompactorBenchmark.main() instead of -jar with JMH Main
to avoid BenchmarkList discovery issues in CI's shaded jar.
* Address PR review feedback
- Extract CompactWriter into its own file to reduce OnDiskGraphIndexCompactor size
- Rewrite SystemStatsCollector to read /proc files directly in Java instead of spawning bash
- Clarify recall section description in docs/compaction.md
* Fix benchmark invocation in docs and default dataset
Use -cp instead of -jar in docs since the benchmarks-jmh-*.jar glob
matches the -javadoc jar first. Change default dataset from
glove-100-angular to ada002-100k. Note -Xmx should be adjusted to
fit the dataset.
* Fix jar selection: use fixed output name compactor-benchmark.jar
The benchmarks-jmh-*.jar glob expands to multiple jars (shaded +
javadoc), causing -cp to misinterpret the second jar as the main
class. Configure shade plugin outputFile to produce a fixed
compactor-benchmark.jar name. Update docs and CI workflow.
* Refactor workload modes and fix build-from-scratch timing
Simplify WorkloadMode enum: PARTITION_ONLY/COMPACT_ONLY/COMPACT_AND_RECALL/
BUILD_FROM_SCRATCH collapsed into PARTITION/COMPACT/BUILD/PARTITION_AND_COMPACT
plus a separate measureRecall flag.
Fix buildFromScratch timing to include PQ computation and graph
construction (previously only timed the write step).
Add fair comparison guidelines to CompactorBenchmark.md.
* Add TIERED_10_90 and TIERED_1_99 split distributions
Support 10%/90% and 1%/99% partition splits for benchmarking compaction
of a small new segment into a large existing index. Add split
distribution reference table to CompactorBenchmark.md.
* fix for bug when fused pq is used with no hierarchy (datastax#664)
---------
Co-authored-by: dian-lun-lin <cyc4542000@gmail.com>
Co-authored-by: Mark Wolters <mwolters138@gmail.com>1 parent 17cf5d9 commit 5ba15c1
34 files changed
Lines changed: 7696 additions & 17 deletions
File tree
- .github/workflows
- benchmarks-jmh
- src/main
- java/io/github/jbellis/jvector/bench
- benchtools
- resources
- docs
- jvector-base/src/main/java/io/github/jbellis/jvector
- graph
- disk
- quantization
- vector
- jvector-examples
- src/main
- java/io/github/jbellis/jvector/example
- reporting
- util
- storage
- yaml
- resources
- jvector-tests/src/test/java/io/github/jbellis/jvector/graph/disk
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
9 | 13 | | |
10 | 14 | | |
11 | 15 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
18 | 21 | | |
19 | 22 | | |
20 | 23 | | |
| |||
53 | 56 | | |
54 | 57 | | |
55 | 58 | | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
56 | 64 | | |
57 | 65 | | |
58 | 66 | | |
| |||
85 | 93 | | |
86 | 94 | | |
87 | 95 | | |
| 96 | + | |
88 | 97 | | |
89 | 98 | | |
90 | 99 | | |
| |||
94 | 103 | | |
95 | 104 | | |
96 | 105 | | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
97 | 135 | | |
98 | 136 | | |
99 | | - | |
| 137 | + | |
0 commit comments