Add new "textToLLMContext" field to improve embeddings#27485
Conversation
🟡 Playwright Results — all passed (20 flaky)✅ 3691 passed · ❌ 0 failed · 🟡 20 flaky · ⏭️ 89 skipped
🟡 20 flaky test(s) (passed on retry)
How to debug locally# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip # view trace |
Code Review 👍 Approved with suggestions 0 resolved / 1 findingsIncorporates the textToLLMContext field to enhance embedding generation. Re-add the try-catch block for reflection calls to ensure lambda safety against unchecked casts. 💡 Edge Case: Removed try-catch makes unchecked casts in lambdas unguarded📄 openmetadata-service/src/main/java/org/openmetadata/service/search/vector/VectorDocBuilder.java:554 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/vector/VectorDocBuilder.java:352-354 The old The risk is low since Suggested fix🤖 Prompt for agentsOptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
| if (spec == null) { | ||
| return null; | ||
| } | ||
| List<String> childNames = readChildNames(spec.childGetter().apply(entity)); |
There was a problem hiding this comment.
💡 Edge Case: Removed try-catch makes unchecked casts in lambdas unguarded
The old readChildNames wrapped the reflection call in a try-catch that returned Collections.emptyList() on any exception. The new code at line 554 calls spec.childGetter().apply(entity) without any exception handling. While the cast is logically guarded by the entityType map key (derived from entity.getEntityReference().getType() at line 151), a ClassCastException would now propagate uncaught if there's ever an inconsistency (e.g., a subclass returning an unexpected type string). The same applies to SEMANTIC_ENRICHERS at line 352-354.
The risk is low since entityType is derived from the entity itself, but the behavioral change from silent-failure to exception-propagation is worth noting — especially since this runs during search reindexing where a single failure could interrupt batch processing.
Suggested fix:
Consider wrapping the lambda invocations in a
try-catch for ClassCastException, logging a warning
and returning null/empty to preserve the previous
fail-safe behavior:
try {
List<String> childNames =
readChildNames(spec.childGetter().apply(entity));
} catch (ClassCastException e) {
LOG.warn("Type mismatch for {}: {}",
entityType, e.getMessage());
return null;
}
Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion
|



Closes https://github.com/open-metadata/ai-platform/issues/303
Summary by Gitar
textToEmbedintotextToLLMContext(legacy format for agents) and a new, cleantextToEmbedfor semantic vector search.buildSemanticMetaLightTextgenerator that excludes structural noise (FQN, system fields) to improve search relevance.SEMANTIC_CHILDREN_SPECS.SEMANTIC_ENRICHERSmap to replaceinstanceofbranching for entity-specific metadata.textToLLMContextto all elasticsearch index mappings to ensure backward compatibility for tooling.This will update automatically on new commits.