Commit bc0ae4f
committed
fix(verif): extend query-entity resolution to natural-language tokens (engineer follow-up)
Engineer flagged a residual gap in ddb5b58/024ea1a: extract_query_entities()
only matches CamelCase / path / backtick patterns. Natural-language queries
("what was discussed about authentication") resolved zero entity_ids and
fell through to the token-Jaccard fallback, which could mute the dendritic
ablation delta on most LongMemEval-S queries.
Fix: two-stage resolution in _resolve_query_entity_ids():
- Stage 1: extract_query_entities() (high-precision, unchanged)
- Stage 2: extract_keywords() from shared/text.py (stopword-aware) →
for each token of length >= 4, try get_entity_by_name(token).
Skips tokens already resolved by stage 1.
Cost per recall: typically <=30 indexed lookups against the entities.name
index (sub-ms). The keyword extractor's stopword filter prevents swamping
the entity index with function-word junk.
This matters for dendritic ablation evidence: if natural-language queries
fall to token-Jaccard in BOTH active and ablated paths, the delta is zero
by construction — independent of whether the mechanism contributes. After
this fix, queries that name real entities exercise the entity-set Jaccard
on the active path and degrade to token-Jaccard only when ablated.1 parent 024ea1a commit bc0ae4f
1 file changed
Lines changed: 44 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
311 | 311 | | |
312 | 312 | | |
313 | 313 | | |
314 | | - | |
315 | | - | |
316 | | - | |
317 | | - | |
318 | | - | |
319 | | - | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
320 | 333 | | |
321 | 334 | | |
| 335 | + | |
322 | 336 | | |
323 | 337 | | |
324 | 338 | | |
325 | 339 | | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
326 | 343 | | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
327 | 347 | | |
328 | 348 | | |
329 | 349 | | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
330 | 368 | | |
331 | 369 | | |
332 | 370 | | |
| |||
0 commit comments