Skip to content

Add New SemanticResolver component #10985

@maxdswain

Description

@maxdswain

Is your feature request related to a problem? Please describe.
When generating your own knowledge graph using LLMs, the LLMMetadataExtractor component can be used to encode entities and relationships inside the metadata of haystack Document objects. LLMMetadataExtractor supports ChatGenerator's that could use structured output to define possible graph ontology, however it is common when creating knowledge to not have predefined possible entities, especially when working with large amounts of data.

This means you can have generated metadata for document_one and document_two such that;

document_one.meta = {'entities': [{'entity': 'deepset', 'entity_type': 'company'}]}
document_two.meta = {'entities': [{'entity': 'deepset GmbH', 'entity_type': 'company'}]}

when these entities are referring to the same concept. These entities are often ingested into a graph as separate nodes.

Describe the solution you'd like
Implement a new SemanticResolver component that uses the text-embedding-inference (TEI) similarity endpoint along with a given threshold to determine whether entities are similar enough to merge and then merges the entities.

Describe alternatives you've considered
Using spaCy to determine similarity, however this is much slower than TEI. However, another component could be created that utilises spaCy.

Additional context
Add any other context or screenshots about the feature request here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium priority, add to the next sprint if no P1 available

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions