This folder contains publishable benchmark results for tagmem.
Contents:
REPORT.md: executive summary and cross-model comparison tablesMACHINE.md: hardware and software environment detailsMETHODOLOGY.md: exact commands, dataset sources, hashes, and reproducibility notesraw/: raw JSON outputs for each model and benchmark set, plus the latest audited warminterfaceLongMemEval artifact forbge-small-en-v1.5
Current benchmark matrix:
- Models:
all-MiniLM-L6-v2bge-small-en-v1.5bge-base-en-v1.5
- Benchmarks:
perflongmemevallocomomembenchconvomemFalseMemBench
Benchmark paths:
component: direct retrieval harness over benchmark corporainterface: real repository and search path over benchmark-loaded corpora
When a reachable local daemon socket is present, the interface path may reuse daemon-backed hot corpus state. Otherwise it falls back to per-run local corpus construction.
The latest audited warm interface-path LongMemEval artifact is checked in at raw/bge-small-en-v1.5/longmemeval-interface.json.
Measured systems and source-reported reference values are intentionally separated in the detailed report.
FalseMemBench values in this repo come from checked-in artifacts under raw/adversarial/; the standalone harness that produced them is maintained outside this repository.
Recommended default after these runs:
- GPU default:
bge-small-en-v1.5 - Throughput-first alternate among the measured models:
all-MiniLM-L6-v2
Current release guardrail:
just release-checkruns focused Go tests plusLongMemEvalforbge-small-en-v1.5- the LongMemEval run must stay within
0.01of the checked-in baseline inbenchmarks/guards/
The checked-in release guardrail currently tracks the component path. The interface path is measured separately for real product-path behavior and may include daemon-backed corpus reuse when a reachable local daemon socket is present.