Is your feature request related to a problem? Please describe.
When generating your own knowledge graph using LLMs, the LLMMetadataExtractor component can be used to encode entities and relationships inside the metadata of haystack Document objects. LLMMetadataExtractor supports ChatGenerator's that could use structured output to define possible graph ontology, however it is common when creating knowledge to not have predefined possible entities, especially when working with large amounts of data.
This means you can have generated metadata for document_one and document_two such that;
document_one.meta = {'entities': [{'entity': 'deepset', 'entity_type': 'company'}]}
document_two.meta = {'entities': [{'entity': 'deepset GmbH', 'entity_type': 'company'}]}
when these entities are referring to the same concept. These entities are often ingested into a graph as separate nodes.
Describe the solution you'd like
Implement a new SemanticResolver component that uses the text-embedding-inference (TEI) similarity endpoint along with a given threshold to determine whether entities are similar enough to merge and then merges the entities.
Describe alternatives you've considered
Using spaCy to determine similarity, however this is much slower than TEI. However, another component could be created that utilises spaCy.
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Please describe.
When generating your own knowledge graph using LLMs, the
LLMMetadataExtractorcomponent can be used to encode entities and relationships inside the metadata of haystackDocumentobjects.LLMMetadataExtractorsupportsChatGenerator's that could use structured output to define possible graph ontology, however it is common when creating knowledge to not have predefined possible entities, especially when working with large amounts of data.This means you can have generated metadata for
document_oneanddocument_twosuch that;when these entities are referring to the same concept. These entities are often ingested into a graph as separate nodes.
Describe the solution you'd like
Implement a new
SemanticResolvercomponent that uses the text-embedding-inference (TEI) similarity endpoint along with a given threshold to determine whether entities are similar enough to merge and then merges the entities.Describe alternatives you've considered
Using
spaCyto determine similarity, however this is much slower than TEI. However, another component could be created that utilisesspaCy.Additional context
Add any other context or screenshots about the feature request here.