Skip to content

Latest commit

 

History

History
138 lines (100 loc) · 4.01 KB

File metadata and controls

138 lines (100 loc) · 4.01 KB

Methodology

Scope

These benchmarks were executed for the local embedded vector path only, using Docker with GPU acceleration.

Models tested:

  • all-MiniLM-L6-v2
  • bge-small-en-v1.5
  • bge-base-en-v1.5

Benchmark sets:

  • perf
  • longmemeval
  • locomo
  • membench
  • convomem
  • FalseMemBench

Execution Environment

  • All runs executed inside docker/docker-compose.yml
  • GPU exposed via NVIDIA container runtime
  • Persistent XDG and dataset directories mounted from TAGMEM_DATA_ROOT on the host.
  • Commands invoked through just wrappers and scripts in scripts/cmd/

Commands Used

Prepare datasets

cd /path/to/tagmem
just datasets

Full suite per model

TAGMEM_EMBED_MODEL=all-MiniLM-L6-v2 just bench-suite
TAGMEM_EMBED_MODEL=bge-small-en-v1.5 just bench-suite
TAGMEM_EMBED_MODEL=bge-base-en-v1.5 just bench-suite

LongMemEval only per model

TAGMEM_EMBED_MODEL=all-MiniLM-L6-v2 just bench-longmemeval
TAGMEM_EMBED_MODEL=bge-small-en-v1.5 just bench-longmemeval
TAGMEM_EMBED_MODEL=bge-base-en-v1.5 just bench-longmemeval

To select a benchmark path explicitly:

TAGMEM_BENCH_PATH=component TAGMEM_EMBED_MODEL=bge-small-en-v1.5 just bench-longmemeval
TAGMEM_BENCH_PATH=interface TAGMEM_EMBED_MODEL=bge-small-en-v1.5 just bench-longmemeval
TAGMEM_BENCH_PATH=both TAGMEM_EMBED_MODEL=bge-small-en-v1.5 just bench-longmemeval

Release guardrail

just release-check

This command runs focused Go tests and a guarded LongMemEval rerun for bge-small-en-v1.5, then compares the result against benchmarks/guards/longmemeval-bge-small-en-v1.5.json with a 0.01 tolerance on the tracked quality metrics.

The release guardrail currently tracks the component path. The interface path is measured separately because it exercises the real repository and search pipeline and has different latency characteristics.

If a reachable local daemon socket is present, the interface path may reuse daemon-backed hot corpus state. Without a daemon, it falls back to per-run local corpus construction.

This audit pass also reran a warm interface LongMemEval for bge-small-en-v1.5; that raw artifact is checked in at benchmarks/raw/bge-small-en-v1.5/longmemeval-interface.json.

Dataset Sources

LongMemEval

  • Source URL:
    • https://huggingface.co/datasets/xiaowu0162/longmemeval-cleaned/resolve/main/longmemeval_s_cleaned.json
  • SHA256:
    • d6f21ea9d60a0d56f34a05b609c79c88a451d2ae03597821ea3d5a9678c3a442

LoCoMo

  • Source repo:
    • https://github.com/snap-research/locomo.git
  • File used:
    • data/locomo10.json
  • SHA256:
    • 79fa87e90f04081343b8c8debecb80a9a6842b76a7aa537dc9fdf651ea698ff4

MemBench

  • Source repo:
    • https://github.com/import-myself/Membench.git
  • Dataset path:
    • MemData/FirstAgent
  • Commit:
    • f66d8d1028d3f68627d00f77a967b93fbb8694b6

ConvoMem

  • Source dataset:
    • HuggingFace Salesforce/ConvoMem
  • Retrieval during run:
    • downloaded and cached automatically to ${TAGMEM_DATA_ROOT}/datasets/convomem_cache

FalseMemBench

  • Source project:
    • standalone benchmark project maintained outside the main repo
  • Published artifacts:
    • copied into benchmarks/raw/adversarial/
  • Audit note:
    • not rerun in this audit pass; current repo evidence is the checked-in raw artifacts
  • Compared measured systems currently include:
    • tagmem
    • BM25
    • MemPalace raw-style
    • Contriever
    • Stella

Embedded Runtime Details

  • Embedded provider: embedded
  • Default GPU model after evaluation: bge-small-en-v1.5
  • Execution provider: CUDA
  • Runtime library path pattern:
    • ${TAGMEM_DATA_ROOT}/xdg/data/tagmem/models/<model>/runtime-cuda/libonnxruntime.so.1.24.1

Notes on Repeatability

  • Raw outputs in raw/ are copied verbatim from the benchmark run artifacts.
  • Docker image definition is versioned in the repo.
  • The host GPU workload may affect exact timing numbers.
  • The ConvoMem benchmark downloads cached files from HuggingFace; the cache directory should be preserved for exact reruns.