These benchmarks were executed for the local embedded vector path only, using Docker with GPU acceleration.
Models tested:
all-MiniLM-L6-v2bge-small-en-v1.5bge-base-en-v1.5
Benchmark sets:
perflongmemevallocomomembenchconvomemFalseMemBench
- All runs executed inside
docker/docker-compose.yml - GPU exposed via NVIDIA container runtime
- Persistent XDG and dataset directories mounted from
TAGMEM_DATA_ROOTon the host. - Commands invoked through
justwrappers and scripts inscripts/cmd/
cd /path/to/tagmem
just datasetsTAGMEM_EMBED_MODEL=all-MiniLM-L6-v2 just bench-suite
TAGMEM_EMBED_MODEL=bge-small-en-v1.5 just bench-suite
TAGMEM_EMBED_MODEL=bge-base-en-v1.5 just bench-suiteTAGMEM_EMBED_MODEL=all-MiniLM-L6-v2 just bench-longmemeval
TAGMEM_EMBED_MODEL=bge-small-en-v1.5 just bench-longmemeval
TAGMEM_EMBED_MODEL=bge-base-en-v1.5 just bench-longmemevalTo select a benchmark path explicitly:
TAGMEM_BENCH_PATH=component TAGMEM_EMBED_MODEL=bge-small-en-v1.5 just bench-longmemeval
TAGMEM_BENCH_PATH=interface TAGMEM_EMBED_MODEL=bge-small-en-v1.5 just bench-longmemeval
TAGMEM_BENCH_PATH=both TAGMEM_EMBED_MODEL=bge-small-en-v1.5 just bench-longmemevaljust release-checkThis command runs focused Go tests and a guarded LongMemEval rerun for bge-small-en-v1.5, then compares the result against benchmarks/guards/longmemeval-bge-small-en-v1.5.json with a 0.01 tolerance on the tracked quality metrics.
The release guardrail currently tracks the component path. The interface path is measured separately because it exercises the real repository and search pipeline and has different latency characteristics.
If a reachable local daemon socket is present, the interface path may reuse daemon-backed hot corpus state. Without a daemon, it falls back to per-run local corpus construction.
This audit pass also reran a warm interface LongMemEval for bge-small-en-v1.5; that raw artifact is checked in at benchmarks/raw/bge-small-en-v1.5/longmemeval-interface.json.
- Source URL:
https://huggingface.co/datasets/xiaowu0162/longmemeval-cleaned/resolve/main/longmemeval_s_cleaned.json
- SHA256:
d6f21ea9d60a0d56f34a05b609c79c88a451d2ae03597821ea3d5a9678c3a442
- Source repo:
https://github.com/snap-research/locomo.git
- File used:
data/locomo10.json
- SHA256:
79fa87e90f04081343b8c8debecb80a9a6842b76a7aa537dc9fdf651ea698ff4
- Source repo:
https://github.com/import-myself/Membench.git
- Dataset path:
MemData/FirstAgent
- Commit:
f66d8d1028d3f68627d00f77a967b93fbb8694b6
- Source dataset:
- HuggingFace
Salesforce/ConvoMem
- HuggingFace
- Retrieval during run:
- downloaded and cached automatically to
${TAGMEM_DATA_ROOT}/datasets/convomem_cache
- downloaded and cached automatically to
- Source project:
- standalone benchmark project maintained outside the main repo
- Published artifacts:
- copied into
benchmarks/raw/adversarial/
- copied into
- Audit note:
- not rerun in this audit pass; current repo evidence is the checked-in raw artifacts
- Compared measured systems currently include:
tagmemBM25MemPalace raw-styleContrieverStella
- Embedded provider:
embedded - Default GPU model after evaluation:
bge-small-en-v1.5 - Execution provider:
CUDA - Runtime library path pattern:
${TAGMEM_DATA_ROOT}/xdg/data/tagmem/models/<model>/runtime-cuda/libonnxruntime.so.1.24.1
- Raw outputs in
raw/are copied verbatim from the benchmark run artifacts. - Docker image definition is versioned in the repo.
- The host GPU workload may affect exact timing numbers.
- The
ConvoMembenchmark downloads cached files from HuggingFace; the cache directory should be preserved for exact reruns.