MegaQuant RAG Compress

Low-bit compression experiments for stored RAG/document vectors.

At a glance

In a small SQuAD exact-search proxy benchmark, the best current document-only method gives:

What you care about	Result	Compared with
Stored-vector payload size	9.77% of float32	10.24x compression / 90.23% saving vs float32
Retrieval quality	98.69% recall@1 retention	`0.440009` vs `0.445840` float32 recall@1
Better than old local baseline	55.34% smaller and +4.92% recall@1	vs older `blockwise_seven_level_3bit` result

Main method:

doconly_affine3_g64_meta4
3.126250 effective bits/dim
0.440009 recall@1
98.69% recall@1 retention vs float32

Why document-only? In this benchmark, document vectors are the stored index payload, while query vectors are transient. Keeping queries float32 preserves quality better than quantizing both sides.

Scope

This repository is a research proof-of-concept, not a production vector database.

The numbers above are:

from a small CPU/Python exact-search proxy benchmark,
based on TF-IDF + random projection vectors, not modern embedding models,
modeled stored-vector payload accounting,
not total vector-database memory.

They are not claims about FAISS/Qdrant/Milvus/LanceDB, ANN serving, HNSW/IVF memory, GPU search, or BEIR/MTEB quality.

Related repository

KV-cache companion project:

https://github.com/CrazyAngelm/megaquant-kv-cache

Benchmark setup

Dataset: SQuAD v1.1 dev paragraphs/questions
Docs: 800
Queries: 4460
Embedding proxy: TF-IDF + GaussianRandomProjection -> 256d + L2 normalize
Search: exact dense matrix search on CPU/Python

This is a micro-scale proxy benchmark. TF-IDF + random projection is not a modern semantic embedding model such as BGE, E5, GTE, or OpenAI embeddings. Results may change on larger corpora, denser candidate sets, real embedding models, or ANN indexes.

Headline result

Best method in this benchmark:

doconly_affine3_g64_meta4

Result:

effective_bits_per_dim = 3.126250
stored-vector memory saved = 90.230%
compression vs float32     = 10.236x
recall@1                   = 0.440009
recall@1 retention         = 98.69% of float32
MRR retention              = 98.99% of float32
score correlation          = 0.984499

Float32 baseline:

recall@1 = 0.445840
MRR      = 0.542880

Plain-language summary for this benchmark:

The stored vector payload is about 10x smaller, while recall@1 retention remains about 98.7% versus float32.

The memory number refers to compressed vector payload accounting, not total vector database footprint. For simulated low-bit metadata, it includes a small shared metadata-range overhead term. It does not include HNSW/IVF graph structures, IDs, metadata columns, allocator overhead, or packed-kernel layout overhead.

Current frontier table

Method	Effective bits/dim	Stored-vector memory saved	Recall@1	Recall@1 retention	MRR retention	Notes
`doconly_affine2_g64_meta4`	2.126250	93.355%	0.409509	91.85%	93.50%	best ultra-compact point tested here
`doconly_affine3_g64_meta4`	3.126250	90.230%	0.440009	98.69%	98.99%	best tradeoff tested here
`affine3_g64_meta4`	3.126250	90.230%	0.430366	96.53%	97.47%	compress docs and queries
`nf3_g64_meta8`	3.125625	90.232%	0.428347	96.08%	97.06%	nonuniform codebook variant

Recommended methods

Main method

doconly_affine3_g64_meta4

Use as the main PoC configuration when you want near-float32 metrics in this small exact-search benchmark with about 10.236x smaller modeled stored-vector payload.

Ultra-compact method

doconly_affine2_g64_meta4

Result:

effective_bits_per_dim = 2.126250
stored-vector memory saved = 93.355%
recall@1 retention = 91.85%
MRR retention      = 93.50%

Use when stored-vector memory is more important than maximum benchmark recall.

Reproduce

Install dependencies:

python -m pip install -r requirements.txt

Place SQuAD files in the repository root as described in DATA.md.

Run the current frontier benchmark from the repository root:

python scripts/run_frontier_benchmark.py \
  --docs 800 \
  --components 256 \
  --output-csv results/frontier_rag_benchmark.csv \
  --output-md reports/frontier_benchmark_report.md

Reports

Current reports:

reports/frontier_summary.md
reports/frontier_benchmark_report.md

Results

results/frontier_rag_benchmark.csv

Changelog

CHANGELOG.md

Honest limitations

This project currently demonstrates a CPU/Python exact-search quality and modeled stored-vector-memory result.

Not yet proven:

production ANN/vector database speed,
HNSW/IVF/PQ integration,
GPU search,
large embedding models such as OpenAI/text-embedding, BGE, E5, GTE,
large-scale BEIR/MTEB retrieval quality,
packed integer index implementation,
total vector database memory savings including graph/ID/metadata overhead.

Conservative claim:

In this small CPU/Python exact-search proxy benchmark, document-only affine3_g64_meta4 compression gives the best observed stored-vector memory/quality tradeoff among the tested MegaQuant RAG configurations: about 90% modeled stored-vector memory saving while retaining about 98.7% of float32 recall@1.

Repository positioning

This repository is public as a research PoC for compressed RAG/vector indexes. It is not a production vector database engine.

Suggested GitHub topics after public release:

rag vector-search embeddings compression quantization retrieval ai-search

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
methods		methods
reports		reports
results		results
scripts		scripts
src/megaquant_hdc		src/megaquant_hdc
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
DATA.md		DATA.md
LICENSE		LICENSE
README.md		README.md
compare_hdc_turbovec_rotor.py		compare_hdc_turbovec_rotor.py
hdc_rotor_v2_compact.py		hdc_rotor_v2_compact.py
pyproject.toml		pyproject.toml
real_bench_hdc_quant_v2.py		real_bench_hdc_quant_v2.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MegaQuant RAG Compress

At a glance

Scope

Related repository

Benchmark setup

Headline result

Current frontier table

Recommended methods

Main method

Ultra-compact method

Reproduce

Reports

Results

Changelog

Related prior-work topics

Honest limitations

Repository positioning

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MegaQuant RAG Compress

At a glance

Scope

Related repository

Benchmark setup

Headline result

Current frontier table

Recommended methods

Main method

Ultra-compact method

Reproduce

Reports

Results

Changelog

Related prior-work topics

Honest limitations

Repository positioning

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages