Generic interface for producing embeddings from input data.
Fields / behavior:
embeddingDim: Int— output vector sizeinitialize()— async initializationisInitialized(): Boolean— initialization statecloseSession()— optional cleanupembed(data: T): FloatArray— generates embedding
Specialization of EmbeddingProvider<String>.
Fields:
maxTokens: Int— maximum input length in tokens
Type alias:
EmbeddingProvider<Bitmap>
Represents a persisted embedding record.
Fields:
id: Long— unique identifierdate: Long— timestampembedding: FloatArray— vector representation
Interface for embedding persistence and retrieval.
Fields / behavior:
exists: Boolean— storage availability
Methods:
add(embeddings): Intupdate(embeddings): Intremove(ids): Intget(): List<StoredEmbedding>clear()save()query(embedding, topK, threshold, ids): List<Long>
Notes:
- Supports optional ID-filtered queries
- Designed for disk or memory-backed implementations
File-backed implementation of EmbeddingStore.
Constructor:
FileEmbeddingStore(file: File, embeddingDimension: Int)Behavior:
- Binary format with fixed-size records
- Little-endian encoding
- Header stores record count
- Supports lazy loading and in-memory caching
Internal state:
cache: LinkedHashMap<Long, StoredEmbedding>idToFileOffsetIndex: MutableMap<Long, Long>fileMutex: Mutex
- Appends new embeddings to file
- Skips duplicates
- Updates header and index
- Overwrites existing records in-place using file offsets
- Updates cache if initialized
- Removes entries from in-memory structures only
- Returns cached embeddings if available
- Otherwise loads full file
- Rewrites entire file
- Rebuilds header and index
- Computes cosine similarity
- Returns top-K results above threshold
- Optional ID filtering
- Memory-maps file
- Reconstructs cache and index
- Validates file integrity
Uses hnswlib (https://github.com/nmslib/hnswlib) JNI-backed approximate nearest neighbor index.
Constructor:
HnswIndex(dim: Int, maxElements: Int = 1_000_000, efConstruction: Int = 200, m: Int = 16, efSearch: Int = 50)Methods:
init()add(id, vector)query(vector, k): List<Int>saveIndex(path)loadIndex(path, dim)
Notes:
- Requires native
hnswlib_jni - Optimized for large-scale similarity search