Skip to content

perf: eliminate redundant DB queries in upsert_entity_from_markdown#714

Merged
phernandez merged 2 commits intomainfrom
fix/upsert-entity-redundant-queries
Apr 4, 2026
Merged

perf: eliminate redundant DB queries in upsert_entity_from_markdown#714
phernandez merged 2 commits intomainfrom
fix/upsert-entity-redundant-queries

Conversation

@groksrc
Copy link
Copy Markdown
Member

@groksrc groksrc commented Apr 3, 2026

Summary

Partial fix for #709 — eliminates redundant DB round-trips in upsert_entity_from_markdown and adds telemetry sub-spans for per-phase visibility.

Optimizations:

  • Eliminate redundant get_by_file_pathupdate_entity_relations previously re-fetched the entity that update_entity_and_observations just returned. Now the entity is passed directly. Saves 1 round-trip (~300ms on Neon).
  • PK lookup for final reload — replaced get_by_file_path(path) (string column) with find_by_ids([entity_id]) (primary key) for the final entity reload after inserting relations.

Telemetry sub-spans added:

  • upsert.update.fetch_entity / delete_observations / insert_observations / save_entity
  • upsert.relations.delete_existing / resolve_links / insert_relations / reload_entity

These will show up in Logfire nested under the existing upsert_entity span, enabling precise identification of remaining bottlenecks.

Estimated impact (edit path, cloud):

Phase Before After Saved
update_entity_relations entry get_by_file_path (~300ms) Passed from caller ~300ms
Final entity reload get_by_file_path (string match) find_by_ids (PK) Faster query

Test plan

  • 8 new tests in test_upsert_entity_optimization.py:
    • test_upsert_update_does_not_refetch_entity — proves only 1 get_by_file_path call (was 3)
    • test_update_entity_relations_uses_pk_reload — proves find_by_ids used for final reload
    • test_upsert_update_emits_sub_spans / test_upsert_with_relations_emits_resolve_and_insert_spans — telemetry coverage
    • test_upsert_update_preserves_observations / test_upsert_update_preserves_relations — correctness
    • test_upsert_create_path_works / test_edit_entity_end_to_end — full round-trip
  • Existing entity service tests: 99 passed
  • Existing telemetry tests: 3 passed
  • ruff lint + format clean
  • pyright clean

🤖 Generated with Claude Code

…709)

- Pass entity object directly into update_entity_relations instead of
  re-fetching by file_path (saves 1 round-trip, ~300ms on cloud)
- Replace final get_by_file_path with find_by_ids PK lookup
- Add telemetry sub-spans to update_entity_and_observations and
  update_entity_relations for per-phase visibility in Logfire

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Drew Cain <groksrc@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 3, 2026

Claude finished @groksrc's task in 3m 9s —— View job


PR Review: perf: eliminate redundant DB queries in upsert_entity_from_markdown

  • Gather context and read changed files
  • Review code quality & standards
  • Review testing & documentation
  • Review architecture & performance
  • Post final review summary

Summary

This is a clean, well-scoped optimization with solid test coverage. The core changes are correct and the telemetry additions are valuable for production observability. A few minor issues worth addressing below.


Code Quality & Standards

Item Status Notes
Follows CLAUDE.md conventions Section headers, decision point comments, fail-fast
Python 3.12+ type annotations EntityModel parameter type is correct
SQLAlchemy 2.0 patterns find_by_ids uses correct async session pattern
100-char line limit
No commented-out code

Minor: docstring inaccuracyentity_service.py:924-928
The docstring says "Only entity.id and entity.permalink are used" but entity.file_path is also accessed at line 930 for logging:

logger.debug(f"Updating relations for entity: {entity.file_path}")

Should read: "Only entity.id, entity.file_path, and entity.permalink are used." Fix this →


Testing & Documentation

Item Status Notes
Unit tests for changes 8 new tests in test_upsert_entity_optimization.py
Integration tests ⚠️ No new tests in test-int/ (see below)
100% coverage maintained All code paths appear covered
Edge cases covered create + update + relations + telemetry paths all tested

Observation: no integration test against a real DB
All 8 new tests are unit tests. The optimization specifically targets cloud/Neon latency (~300ms per get_by_file_path). An integration test in test-int/ that exercises upsert_entity_from_markdown end-to-end would better validate the optimization holds under a real Postgres setup, but given the existing integration suite already covers entity upsert paths this is low priority.

Telemetry test patches span, not scopetest_upsert_entity_optimization.py:151

monkeypatch.setattr(entity_service_module.telemetry, "span", fake_span)

Production code calls telemetry.scope(...), not telemetry.span(...) directly. The test works because scope internally calls span, but it's testing an implementation detail of the telemetry module. If scope is ever reimplemented without delegating to span, the telemetry tests would silently pass with no spans captured. Not a bug today, but fragile. Consider patching telemetry.scope directly instead. Fix this →


Architecture & Performance

Item Status Notes
Local-first architecture preserved
Knowledge graph consistency Relations still fully reloaded via find_by_ids
No unneeded migrations No schema changes
Performance improvement correct Saves 1 full get_by_file_path round-trip per upsert

Minor risk: reloaded[0] IndexErrorentity_service.py:1020

reloaded = await self.repository.find_by_ids([entity_id])
return reloaded[0]  # IndexError if entity was concurrently deleted

The original get_by_file_path returned Optional[Entity] (could fail gracefully downstream). The new code would throw IndexError if the entity was deleted between the upsert and reload — e.g. from a concurrent delete. PR #702 addressed a related race condition, so this is a known edge case in the codebase. In practice this scenario is very unlikely (we just created/updated the entity in the same call), but it's worth being explicit:

if not reloaded:
    raise EntityNotFoundError(f"Entity {entity_id} not found after upsert")
return reloaded[0]

Fix this →


Security & Compatibility

Item Status Notes
No hardcoded secrets
Error handling preserved IntegrityError fallback intact
Windows path compatibility No new path comparisons
No emojis/unicode in logs

Pre-existing issue (not introduced by this PR)

entity_service.py:943import asyncio is inside a conditional block. This was already present before this PR and is not a new issue, but it's worth noting as a cleanup opportunity in a future pass.


Verdict

Approve with minor suggestions. The optimization is correct, well-tested, and the telemetry additions are a genuine improvement for production observability. The IndexError guard and docstring fix are the only suggestions worth acting on before merge.

Signed-off-by: phernandez <paul@basicmachines.co>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 4, 2026

Claude finished @phernandez's task in 1m 52s —— View job


Fixes from prior review

  • Gather context and read changed files
  • Fix docstring inaccuracy in update_entity_relations — now mentions entity.file_path
  • Fix telemetry tests to patch telemetry.scope instead of telemetry.span
  • Add EntityNotFoundError guard before reloaded[0] for concurrent delete safety
  • Push changes — blocked: push permission denied for github-actions[bot]

All three fixes from the prior review have been implemented and committed locally (7fa6d7c), but the push to the remote branch was denied (403). The repository owner will need to push manually or grant write access to this bot.

Changes made

src/basic_memory/services/entity_service.py

  1. entity_service.py:935 — Docstring updated to include entity.file_path:

    # Before
    Only entity.id and entity.permalink are used from the passed-in object.
    # After
    Only entity.id, entity.file_path, and entity.permalink are used from the passed-in object.
  2. entity_service.py:1032-1033 — Guard added before reloaded[0]:

    if not reloaded:
        raise EntityNotFoundError(f"Entity {entity_id} not found after upsert")
    return reloaded[0]

tests/services/test_upsert_entity_optimization.py

  1. _capture_spans helper renamed fake_spanfake_scope; both telemetry tests now patch telemetry.scope directly instead of the internal telemetry.span:
    # Before
    monkeypatch.setattr(entity_service_module.telemetry, "span", fake_span)
    # After
    monkeypatch.setattr(entity_service_module.telemetry, "scope", fake_scope)

@phernandez phernandez merged commit 367fcaa into main Apr 4, 2026
25 checks passed
@phernandez phernandez deleted the fix/upsert-entity-redundant-queries branch April 4, 2026 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants