perf: eliminate redundant DB queries in upsert_entity_from_markdown#714
perf: eliminate redundant DB queries in upsert_entity_from_markdown#714phernandez merged 2 commits intomainfrom
Conversation
…709) - Pass entity object directly into update_entity_relations instead of re-fetching by file_path (saves 1 round-trip, ~300ms on cloud) - Replace final get_by_file_path with find_by_ids PK lookup - Add telemetry sub-spans to update_entity_and_observations and update_entity_relations for per-phase visibility in Logfire Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Drew Cain <groksrc@gmail.com>
|
Claude finished @groksrc's task in 3m 9s —— View job PR Review: perf: eliminate redundant DB queries in
|
| Item | Status | Notes |
|---|---|---|
| Follows CLAUDE.md conventions | ✅ | Section headers, decision point comments, fail-fast |
| Python 3.12+ type annotations | ✅ | EntityModel parameter type is correct |
| SQLAlchemy 2.0 patterns | ✅ | find_by_ids uses correct async session pattern |
| 100-char line limit | ✅ | |
| No commented-out code | ✅ |
Minor: docstring inaccuracy — entity_service.py:924-928
The docstring says "Only entity.id and entity.permalink are used" but entity.file_path is also accessed at line 930 for logging:
logger.debug(f"Updating relations for entity: {entity.file_path}")Should read: "Only entity.id, entity.file_path, and entity.permalink are used." Fix this →
Testing & Documentation
| Item | Status | Notes |
|---|---|---|
| Unit tests for changes | ✅ | 8 new tests in test_upsert_entity_optimization.py |
| Integration tests | No new tests in test-int/ (see below) |
|
| 100% coverage maintained | ✅ | All code paths appear covered |
| Edge cases covered | ✅ | create + update + relations + telemetry paths all tested |
Observation: no integration test against a real DB
All 8 new tests are unit tests. The optimization specifically targets cloud/Neon latency (~300ms per get_by_file_path). An integration test in test-int/ that exercises upsert_entity_from_markdown end-to-end would better validate the optimization holds under a real Postgres setup, but given the existing integration suite already covers entity upsert paths this is low priority.
Telemetry test patches span, not scope — test_upsert_entity_optimization.py:151
monkeypatch.setattr(entity_service_module.telemetry, "span", fake_span)Production code calls telemetry.scope(...), not telemetry.span(...) directly. The test works because scope internally calls span, but it's testing an implementation detail of the telemetry module. If scope is ever reimplemented without delegating to span, the telemetry tests would silently pass with no spans captured. Not a bug today, but fragile. Consider patching telemetry.scope directly instead. Fix this →
Architecture & Performance
| Item | Status | Notes |
|---|---|---|
| Local-first architecture preserved | ✅ | |
| Knowledge graph consistency | ✅ | Relations still fully reloaded via find_by_ids |
| No unneeded migrations | ✅ | No schema changes |
| Performance improvement correct | ✅ | Saves 1 full get_by_file_path round-trip per upsert |
Minor risk: reloaded[0] IndexError — entity_service.py:1020
reloaded = await self.repository.find_by_ids([entity_id])
return reloaded[0] # IndexError if entity was concurrently deletedThe original get_by_file_path returned Optional[Entity] (could fail gracefully downstream). The new code would throw IndexError if the entity was deleted between the upsert and reload — e.g. from a concurrent delete. PR #702 addressed a related race condition, so this is a known edge case in the codebase. In practice this scenario is very unlikely (we just created/updated the entity in the same call), but it's worth being explicit:
if not reloaded:
raise EntityNotFoundError(f"Entity {entity_id} not found after upsert")
return reloaded[0]Security & Compatibility
| Item | Status | Notes |
|---|---|---|
| No hardcoded secrets | ✅ | |
| Error handling preserved | ✅ | IntegrityError fallback intact |
| Windows path compatibility | ✅ | No new path comparisons |
| No emojis/unicode in logs | ✅ |
Pre-existing issue (not introduced by this PR)
entity_service.py:943 — import asyncio is inside a conditional block. This was already present before this PR and is not a new issue, but it's worth noting as a cleanup opportunity in a future pass.
Verdict
Approve with minor suggestions. The optimization is correct, well-tested, and the telemetry additions are a genuine improvement for production observability. The IndexError guard and docstring fix are the only suggestions worth acting on before merge.
Signed-off-by: phernandez <paul@basicmachines.co>
|
Claude finished @phernandez's task in 1m 52s —— View job Fixes from prior review
All three fixes from the prior review have been implemented and committed locally ( Changes made
|
Summary
Partial fix for #709 — eliminates redundant DB round-trips in
upsert_entity_from_markdownand adds telemetry sub-spans for per-phase visibility.Optimizations:
get_by_file_path—update_entity_relationspreviously re-fetched the entity thatupdate_entity_and_observationsjust returned. Now the entity is passed directly. Saves 1 round-trip (~300ms on Neon).get_by_file_path(path)(string column) withfind_by_ids([entity_id])(primary key) for the final entity reload after inserting relations.Telemetry sub-spans added:
upsert.update.fetch_entity/delete_observations/insert_observations/save_entityupsert.relations.delete_existing/resolve_links/insert_relations/reload_entityThese will show up in Logfire nested under the existing
upsert_entityspan, enabling precise identification of remaining bottlenecks.Estimated impact (edit path, cloud):
update_entity_relationsentryget_by_file_path(~300ms)get_by_file_path(string match)find_by_ids(PK)Test plan
test_upsert_entity_optimization.py:test_upsert_update_does_not_refetch_entity— proves only 1get_by_file_pathcall (was 3)test_update_entity_relations_uses_pk_reload— provesfind_by_idsused for final reloadtest_upsert_update_emits_sub_spans/test_upsert_with_relations_emits_resolve_and_insert_spans— telemetry coveragetest_upsert_update_preserves_observations/test_upsert_update_preserves_relations— correctnesstest_upsert_create_path_works/test_edit_entity_end_to_end— full round-trip🤖 Generated with Claude Code