Skip to content

[Proposal] Native Vector + Graph: HNSW index on Node embeddings for on-device RAG #61

Description

@ElizioMartins

Hi ObjectBox team 👋

I built a proof-of-concept called objectbox-graph that layers a graph database (BFS, DFS, Dijkstra) on top of ObjectBox's storage model, extending it with vector similarity search via an HNSW-ready interface.

🔗 Repo: https://github.com/ElizioMartins/objectbox-graph


What it demonstrates

  1. Graph traversal (BFS / DFS / Dijkstra) using ObjectBox as the persistence backend
  2. CosineSimilarity + VectorBackend interfaceMemoryStorage falls back to an O(n) scan; ObjectBoxStorage would delegate to HNSW for O(log n)
  3. VectorTraversal = vector search → BFS expansion, the core primitive for on-device RAG without any cloud API
// Query: "I want to understand neural networks"
contextNodes, _ := store.VectorTraversal(queryEmbedding, topK=2, depth=2)
// Returns: Neural Networks • Deep Learning • Machine Learning — fully on-device

Why this matters

Solution Vector + Graph Footprint On-device
Neo4j ~500 MB (JVM)
pgvector ~100 MB (Postgres)
ObjectBox + this POC < 1 MB

The gap: no Go/Kotlin/Swift library exposes Vector + Graph together at embedded scale. ObjectBox already has HNSW in objectbox-dart 4.x — this POC shows how to expose the same capability in the Go binding.


Benchmark (Intel i5-10400, MemoryStorage)

Operation ns/op B/op
AddNode 647 461
AddEdge 307 161
BFS 100-node chain 109,860 8,913
Dijkstra 1000-node chain 13,904,844 4.4 MB

The 4.4 MB/call on MemoryStorage is the argument: with ObjectBoxStorage + HNSW, only visited nodes load from disk — 10–50× less memory per traversal. This is what enables the only graph+vector DB that fits in a 512 MB IoT device.


The VectorBackend capability pattern

The design uses an interface capability check (the same model pgvector uses to extend PostgreSQL without touching its core):

// graph/vector.go
type VectorBackend interface {
    NearestNeighbors(query []float32, maxResults int) ([]*ScoredNode, error)
}

func (g *GraphStore) FindSimilarNodes(query []float32, maxResults int, minScore float64) ([]*ScoredNode, error) {
    // O(log n) if backend has HNSW (ObjectBoxStorage)
    if vb, ok := g.storage.(VectorBackend); ok {
        return vb.NearestNeighbors(query, maxResults)
    }
    // O(n) cosine scan fallback (MemoryStorage)
    ...
}

ObjectBoxStorage only needs to implement NearestNeighbors() to upgrade automatically from O(n) to O(log n) — no changes to the graph layer.

The entity annotation already maps to ObjectBox's HNSW API:

type NodeEntity struct {
    Id        uint64
    Label     string
    Properties string
    Embedding []float32 `objectbox:"hnsw:dimensions=384"`
}

Status

  • ✅ Phase 1: Graph layer + MemoryStorage (13 tests passing)
  • ✅ Phase 2: ObjectBoxStorage code (behind //go:build objectbox tag — compiles without CGO)
  • ✅ Phase 3: Vector + Graph (CosineSimilarity, VectorTraversal, knowledge graph example)
  • ⏳ Phase 4: Wire ObjectBoxStorage.NearestNeighbors to the real HNSW query API (needs CGO + ObjectBox C lib)

I'd love to discuss contributing this as an official extension, a community module, or even joining the team to work on it. Happy to share more details or adapt the design to fit ObjectBox's roadmap.

— Elizio Martins
GitHub: https://github.com/ElizioMartins

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions