Commit de950c6
feat(search): new embedding entity + body text extension hook (#27233)
* feat(search): enroll ContextMemory in vector embedding + body text extension hook
Add ContextMemory to AvailableEntityTypes.LIST so the existing vector
embedding pipeline (VectorEmbeddingHandler live path and the admin
reembed/reindex CLIs) iterates Collate memories alongside the built-in
data assets.
Memories store their semantic payload in title/question/answer/summary,
not description, so the default buildBodyText would feed an empty string
to the embedder. To fix that without pulling Collate schema classes into
OSS, expose a BodyTextExtractor functional interface plus a
registerBodyTextExtractor(entityType, extractor) hook on VectorDocBuilder.
buildBodyText consults the registry first and falls back to the existing
description-based logic when no extractor is registered. Collate
registers a typed extractor from its repository static initializer, so
both server and CLI paths go through the same code.
No behavior change for the default entity types: LIST gains one entry at
the end, and the registry starts empty so every existing call falls
through to the unchanged default branch.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(search): add VectorBodyTextContributor SPI marker interface
Document the contract for plugging in entity-specific body text extractors
into the vector embedding pipeline. Implementations declare their entity
type, return a typed BodyTextExtractor, and call the default register()
hook from a stable initialization site (typically the owning
EntityRepository static initializer).
Pure documentation of the shape — the backing mechanism is still
VectorDocBuilder.registerBodyTextExtractor(), so callers that use the raw
registration hook keep working unchanged. The interface exists so new
contributors get an IDE nudge ("implement this") instead of having to
grep an existing extension class for the pattern.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(search): enroll ContextMemory in vector embedding + body text extension hook
Add ContextMemory to AvailableEntityTypes.LIST so the existing vector
embedding pipeline (VectorEmbeddingHandler live path and the admin
reembed/reindex CLIs) iterates Collate memories alongside the built-in
data assets.
Memories store their semantic payload in title/question/answer/summary,
not description, so the default buildBodyText would feed an empty string
to the embedder. To fix that without pulling Collate schema classes into
OSS, expose a BodyTextExtractor functional interface plus a
registerBodyTextExtractor(entityType, extractor) hook on VectorDocBuilder.
buildBodyText consults the registry first and falls back to the existing
description-based logic when no extractor is registered. Collate
registers a typed extractor from its repository static initializer, so
both server and CLI paths go through the same code.
No behavior change for the default entity types: LIST gains one entry at
the end, and the registry starts empty so every existing call falls
through to the unchanged default branch.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(search): add VectorBodyTextContributor SPI marker interface
Document the contract for plugging in entity-specific body text extractors
into the vector embedding pipeline. Implementations declare their entity
type, return a typed BodyTextExtractor, and call the default register()
hook from a stable initialization site (typically the owning
EntityRepository static initializer).
Pure documentation of the shape — the backing mechanism is still
VectorDocBuilder.registerBodyTextExtractor(), so callers that use the raw
registration hook keep working unchanged. The interface exists so new
contributors get an IDE nudge ("implement this") instead of having to
grep an existing extension class for the pattern.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Fix NPE in buildBodyText when entityType is null
ConcurrentHashMap.get(null) throws NullPointerException. GlossaryTerm
tests pass null entityType through buildEmbeddingFields. Guard with a
null check and wrap custom extractor calls in try/catch so a faulty
downstream extractor degrades gracefully instead of crashing the
embedding pipeline.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix
* add tests
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 512771d commit de950c6
4 files changed
Lines changed: 194 additions & 1 deletion
File tree
- openmetadata-service/src
- main/java/org/openmetadata/service/search/vector
- utils
- test/java/org/openmetadata/service/search/vector
Lines changed: 55 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
Lines changed: 51 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
| 10 | + | |
10 | 11 | | |
11 | 12 | | |
12 | 13 | | |
| |||
28 | 29 | | |
29 | 30 | | |
30 | 31 | | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
31 | 67 | | |
32 | 68 | | |
33 | 69 | | |
| |||
219 | 255 | | |
220 | 256 | | |
221 | 257 | | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
222 | 273 | | |
223 | 274 | | |
224 | 275 | | |
| |||
Lines changed: 2 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
30 | | - | |
| 30 | + | |
| 31 | + | |
31 | 32 | | |
32 | 33 | | |
33 | 34 | | |
| |||
Lines changed: 86 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
312 | 312 | | |
313 | 313 | | |
314 | 314 | | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
315 | 401 | | |
316 | 402 | | |
317 | 403 | | |
| |||
0 commit comments