Skip to content

perf(core): reuse written note content after writes#717

Merged
phernandez merged 6 commits intomainfrom
codex/issue-708-reuse-written-content
Apr 4, 2026
Merged

perf(core): reuse written note content after writes#717
phernandez merged 6 commits intomainfrom
codex/issue-708-reuse-written-content

Conversation

@phernandez
Copy link
Copy Markdown
Member

Summary

  • reuse the markdown written during non-fast create, update, and edit flows instead of re-reading it from storage
  • pass in-memory content to search indexing and API response shaping in the v2 knowledge router
  • reduce permalink conflict checking overhead by loading only file paths instead of full eager-loaded entities
  • extend router coverage to assert non-fast responses still include content without falling back to file reads

Root cause

write_note and edit_note currently use the non-fast knowledge API path. After writing a note, that path was re-reading the same file from storage for search indexing and again for the response body, which matches the redundant S3 round-trips called out in issue #708.

Impact

This removes two redundant post-write reads from the non-fast create/update/edit path, which should reduce latency for cloud-backed note writes and edits. The permalink follow-up in issue #708 still has room for deeper optimization, but the duplicate read path is addressed here.

Validation

  • uv run pytest -q tests/api/v2/test_knowledge_router.py tests/api/v2/test_knowledge_router_telemetry.py tests/services/test_entity_service_telemetry.py
  • uv run ruff check src/basic_memory/services/entity_service.py src/basic_memory/api/v2/routers/knowledge_router.py tests/api/v2/test_knowledge_router.py tests/api/v2/test_knowledge_router_telemetry.py
  • uv run pyright src/basic_memory/services/entity_service.py src/basic_memory/api/v2/routers/knowledge_router.py

Closes #708

Signed-off-by: phernandez <paul@basicmachines.co>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 3, 2026

Claude finished @phernandez's task in 1m 45s —— View job


PR Review: perf(core): reuse written note content after writes

  • Read changed files
  • Review code quality & standards
  • Review testing & documentation
  • Review architecture & performance
  • Post final review

Summary

This PR eliminates two redundant post-write file reads from the non-fast create/update/edit paths by introducing a new EntityWriteResult dataclass that bundles the persisted entity with the markdown that was just written. The router then threads this in-memory content directly into search indexing and the response body. An additional optimization tightens conflict detection to load only file paths instead of full eager-loaded entities.

The approach is clean and well-scoped. Checklist results below.


Code Quality & Standards

Item Status
Follows coding conventions in CLAUDE.md
Python 3.12+ type annotations and async patterns
SQLAlchemy 2.0 best practices
100-character line length
No commented-out code
Literate programming style (decision-point comments) ✅ (new comments in detect_file_path_conflicts explain the why)

Testing & Documentation

Item Status
Unit tests for new methods ✅ (telemetry tests updated to assert read_file_content is never called in non-fast path — nice)
Integration tests for non-fast path ✅ (router tests now pass fast=False and assert content in response)
100% coverage maintained ✅ (# pragma: no cover on two unreachable ValueError guards is appropriate — DB update returns None only if the row vanishes between operations)
Docstrings updated

Architecture & Performance

Item Status
EntityWriteResult frozen dataclass ✅ — clean, immutable, minimal
Delegation pattern (create_entitycreate_entity_with_content) ✅ — existing call sites unaffected
get_all_file_paths() reuse in conflict detection ✅ — already existed in sync_service.py, appropriate reuse
Content passed to index_entity matches what was read before ✅ — final_content is the same bytes that were written/previously re-read
No fallback logic added
Windows path compatibility ✅ — no new path comparisons introduced

Specific Code Observations

entity_service.py:88 — Return type correction is a bonus fix

# Before
async def detect_file_path_conflicts(self, file_path: str, skip_check: bool = False) -> List[Entity]:
# After
async def detect_file_path_conflicts(self, file_path: str, skip_check: bool = False) -> List[str]:

The old return type was List[Entity] but only the file paths were used by resolve_permalink. The new implementation correctly returns List[str] and avoids loading 0..N full entity graphs on every write. Good catch.

knowledge_router.py:443-453 — Subtle correctness note in non-fast create-as-upsert path

write_result = await entity_service.create_entity_with_content(data)
entity = write_result.entity
written_content = write_result.content
if entity.external_id != entity_id:
    entity = await entity_repository.update(entity.id, {"external_id": entity_id})
    # written_content is NOT updated here

written_content still refers to the content from the initial write, but the external_id fixup only changes the DB row — not the file. So written_content remains accurate for search indexing and the response body. This is correct, but worth a brief comment to make the invariant explicit for future readers. Not a blocker.

test_knowledge_router_telemetry.pyAssertionError guard in FakeFileService

class FakeFileService:
    async def read_file_content(self, path):
        raise AssertionError("non-fast create should not re-read file content")

This is an excellent pattern for verifying the optimization is upheld by future changes. Well done.


Minor Suggestions (non-blocking)

  1. knowledge_router.py:443-453: Add a short comment on written_content near the entity_repository.update() call — e.g. # external_id fixup does not change file content; written_content remains valid — to preserve the reasoning for the next reader. Fix this →

  2. entity_service.pyList, Optional etc. from typing: Pre-existing, not introduced by this PR — no action needed.


Checklist Results

  • Follows Basic Memory's coding conventions
  • Python 3.12+ type annotations and async patterns
  • SQLAlchemy 2.0 best practices
  • FastAPI conventions followed
  • 100-character line length
  • No commented-out code
  • Unit tests for new functions/methods
  • Integration tests coverage for non-fast path
  • 100% test coverage maintained
  • No database schema changes (no migration needed)
  • Preserves local-first architecture
  • Knowledge graph operations maintain consistency
  • No hardcoded secrets
  • Proper error handling
  • Performance improvement correctly removes 2 redundant reads per non-fast write
  • Windows path compatibility not affected

Overall: LGTM. The optimization is correct, well-tested, and the delegation pattern keeps the existing API surface stable. The one minor suggestion (inline comment for the written_content invariant near the external_id fixup) is optional.

@phernandez phernandez marked this pull request as ready for review April 3, 2026 23:27
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f853e61c95

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

phase="search_index",
):
await search_service.index_entity(entity)
await search_service.index_entity(entity, content=written_content)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep search indexing on normalized note content

The non-fast create path now calls search_service.index_entity(entity, content=written_content), but written_content is full serialized markdown from dump_frontmatter(...) in create_entity_with_content, while the previous path indexed read_entity_content(...) output (content normalized by the markdown processor). This change makes YAML/frontmatter tokens part of indexed content, which can materially degrade search relevance by injecting metadata noise into every note; pass normalized body content (or keep the existing read/normalize path) before indexing.

Useful? React with 👍 / 👎.

if fast:
content = await file_service.read_file_content(entity.file_path)
else:
content = written_content
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Return persisted content after formatter rewrites

For non-fast writes, the response content now comes from in-memory written_content instead of reading the saved file. FileService.write_file can rewrite the file when format_on_save is enabled, so this branch can return pre-format text that no longer matches what was persisted, causing API clients to observe stale content immediately after a successful write; use the persisted file content (or formatter output) for the response body in this branch.

Useful? React with 👍 / 👎.

Signed-off-by: phernandez <paul@basicmachines.co>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 3, 2026

Claude finished @phernandez's task in 2m 46s —— View job


PR Review: perf(core): reuse written note content after writes

  • Read changed files
  • Review code quality & standards
  • Review testing & documentation
  • Review architecture & performance
  • Address Codex review comments
  • Post final review

Summary

This PR eliminates two redundant post-write file reads from the non-fast create/update/edit paths by introducing EntityWriteResult — a frozen dataclass bundling the persisted entity with the markdown written during the call. The router threads this in-memory content into search indexing and the response body. Additionally, detect_file_path_conflicts is tightened to load only file paths instead of full eager-loaded entities.

The approach is well-scoped and follows existing patterns. Two issues from the Codex review deserve a closer look; assessments are below.


Code Quality & Standards

Item Status
Follows coding conventions in CLAUDE.md
Python 3.12+ type annotations and async patterns
SQLAlchemy 2.0 best practices
100-character line length
No commented-out code
Literate programming style ✅ — detect_file_path_conflicts gains a clear "why" comment; external_id fixup comment added in a198ba9

Testing & Documentation

Item Status
Unit tests for new methods
Non-fast path router coverage ✅ — fast=False tests assert content is present without file re-reads
100% coverage maintained ✅ — # pragma: no cover on two unreachable ValueError guards is appropriate
FakeFileService.read_file_content assertion pattern ✅ — excellent regression guard

Architecture & Performance

Item Status
EntityWriteResult frozen dataclass ✅ — clean and immutable
Delegation pattern (create_entitycreate_entity_with_content) ✅ — zero call-site churn
get_all_file_paths() reuse in conflict detection ✅ — avoids eager-loading entities on hot write path
detect_file_path_conflicts return type corrected to List[str] ✅ — the old List[Entity] was misleading; callers only used file paths anyway

Codex Review Comments

Comment 1 — knowledge_router.py:335: search indexing on raw vs normalized content (P2)

Codex flags that written_content is the full dump_frontmatter(...) output (YAML frontmatter + body), while the previous non-fast path called file_service.read_entity_content(entity) which returned markdown.content — the body without frontmatter.

Tracing through index_entity_markdown (search_service.py:548):

if content is None:
    content = await self.file_service.read_entity_content(entity)  # body only
if content:
    content_stems.append(content)
    content_snippet = _strip_nul(content)

When content=written_content is passed, frontmatter YAML tokens (title: …, type: note, permalink: …) are now included in the FTS index for non-fast writes. Previously they were not.

Verdict: valid but low severity. Frontmatter tokens are largely benign (they overlap with what users search for) and the fast path's background reindex_entity already calls read_entity_content (body-only), so the existing inconsistency between fast and non-fast indexing is pre-existing, not introduced here. That said, this is a behavioral delta worth being aware of. If you want strict parity with the pre-PR behavior, you can strip frontmatter before passing to index_entity:

# in create_entity_with_content / update_entity_with_content
from basic_memory.file_utils import remove_frontmatter
return EntityWriteResult(entity=updated, content=remove_frontmatter(final_content))

Not blocking, but flagging for awareness. Fix this →

Comment 2 — knowledge_router.py:363: response content stale when format_on_save=True (P2)

Codex notes that FileService.write_file can reformat the file when format_on_save=True, meaning written_content (set before write_file is called) could differ from what was persisted on disk.

This is technically correct. The pre-PR code read back from file after writing, so it would have returned the post-formatter content. format_on_save defaults to False and the relevant branch is # pragma: no cover, so practical impact is very low. Still, it is a regression in correctness for that configuration.

If write_file returned the final content instead of just the checksum, this would be trivially fixable — but that would be a separate refactor. The simplest targeted fix would be to note this gap in a comment near the content = written_content assignment.

Not blocking given the default-off nature of the feature. Fix this →


Minor Observations

  • a198ba9 correctly added the external_id fixup invariant comment (knowledge_router.py:449) — the prior review suggestion is addressed.
  • entity_service.py:7List, Optional etc. from typing are pre-existing; not introduced here.

Checklist Results

  • Follows Basic Memory's coding conventions
  • Python 3.12+ type annotations and async patterns
  • SQLAlchemy 2.0 best practices
  • FastAPI conventions followed
  • 100-character line length
  • No commented-out code
  • Unit tests for new functions/methods
  • Integration tests for non-fast path
  • 100% test coverage maintained
  • No database schema changes
  • Preserves local-first architecture
  • Knowledge graph operations maintain consistency
  • No hardcoded secrets
  • Proper error handling
  • Performance improvement correctly removes 2 redundant reads per non-fast write
  • Windows path compatibility unaffected

Overall: LGTM with two non-blocking notes. The core optimization is correct and well-tested. The Codex comments are valid observations: (1) search indexing now receives full markdown including frontmatter for non-fast writes, which is a behavioral delta from the pre-PR body-only indexing; (2) format_on_save=True (disabled by default) would cause the response body to return pre-formatted content. Both are low practical impact and can be addressed in follow-up if desired.

Signed-off-by: phernandez <paul@basicmachines.co>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 3, 2026

Claude finished @phernandez's task in 2m 13s —— View job


PR Review: perf(core): reuse written note content after writes

  • Read changed files
  • Review code quality & standards
  • Review testing & documentation
  • Review architecture & performance
  • Address Codex review comments
  • Post final review

Summary

This PR introduces EntityWriteResult — a frozen dataclass bundling the persisted entity with the markdown written during a call — and threads it through create_entity_with_content, update_entity_with_content, and edit_entity_with_content. The router's non-fast create/update/edit paths use the in-memory content directly for search indexing and response shaping, eliminating two redundant post-write reads. A secondary optimization tightens detect_file_path_conflicts to load only file paths instead of full eager-loaded entities.


Code Quality & Standards

Item Status
Follows Basic Memory's coding conventions in CLAUDE.md
Python 3.12+ type annotations and async patterns
SQLAlchemy 2.0 best practices
FastAPI conventions followed
100-character line length
No commented-out code
Literate programming style ✅ — decision-point comments in detect_file_path_conflicts and the external_id fixup invariant comment at knowledge_router.py:448

Testing & Documentation

Item Status
Unit tests for new methods
Non-fast path router coverage ✅ — fast=False integration tests in test_knowledge_router.py assert content is returned; telemetry tests verify read_file_content is never called
FakeFileService.read_file_content assertion guard ✅ — excellent regression protection for the optimization invariant
100% coverage maintained ✅ — # pragma: no cover on two unreachable ValueError guards (entity_service.py:335, entity_service.py:459) is appropriate
Docstrings updated

Architecture & Performance

Item Status
EntityWriteResult frozen dataclass ✅ — clean and immutable
Delegation pattern: create_entitycreate_entity_with_content ✅ — existing call sites unaffected
detect_file_path_conflicts return type corrected to List[str] ✅ — the old List[Entity] was a misleading type; callers only used file paths
get_all_file_paths() reuse on hot write path ✅ — skips eager-loading observations/relations per write
Windows path compatibility ✅ — no new path comparisons introduced

Codex Review Comments

Comment 1 — knowledge_router.py:335: frontmatter tokens in search index (P2)

Confirmed valid. written_content is the full dump_frontmatter(...) output (YAML frontmatter + body), while the previous non-fast path called file_service.read_entity_content(entity) which returns markdown.content — body only, no YAML headers.

In index_entity_markdown (search_service.py:591-600), when content is non-None, it skips read_entity_content and uses the supplied string directly as both a content stem and the content_snippet. This means:

  • FTS index for non-fast writes now includes YAML keys/values (title:, type:, permalink:)
  • content_snippet returned in search results will include frontmatter YAML

The title and permalink are also indexed separately via title_variants (line 589) and _generate_variants(entity.permalink) (line 608), so there's no coverage loss. But the content_snippet showing raw YAML in search results is a UX regression for that path.

Verdict: low practical impact today (non-fast path is the minority path, and FTS overlap is benign for most terms), but worth resolving. The clean fix is to strip frontmatter before constructing EntityWriteResult.content using the already-imported remove_frontmatter:

# entity_service.py — in create_entity_with_content, update_entity_with_content, edit_entity_with_content
return EntityWriteResult(entity=updated, content=remove_frontmatter(final_content))

remove_frontmatter is already imported at line 19 of entity_service.py. This would restore exact parity with the pre-PR body-only indexing. Fix this →

Comment 2 — knowledge_router.py:363: stale content when format_on_save=True (P2)

Confirmed valid but very low practical impact. In update_entity_with_content, final_content = dump_frontmatter(merged_post) is computed before write_file(file_path, final_content). If format_on_save=True, write_file reformats the file in-place but returns only the checksum — so final_content no longer matches what was persisted. The pre-PR code read back from disk after writing, so it would have returned post-formatter content.

The fix above (remove_frontmatter(final_content)) would also apply here since formatting changes are typically in whitespace/ordering within the body, not in the body text itself. However, if the formatter changes body content (unlikely but possible), the response would still be slightly stale. Since format_on_save defaults to False and the branch is # pragma: no cover, this is not blocking. Adding a comment noting the gap is sufficient. Fix this →


Checklist Results

  • Follows Basic Memory's coding conventions
  • Python 3.12+ type annotations and async patterns
  • SQLAlchemy 2.0 best practices
  • FastAPI conventions followed
  • 100-character line length
  • No commented-out code
  • Unit tests for new functions/methods
  • Integration tests for non-fast path
  • 100% test coverage maintained
  • No database schema changes (no migration needed)
  • Preserves local-first architecture
  • Knowledge graph operations maintain consistency
  • No hardcoded secrets
  • Proper error handling
  • Performance improvement correctly removes 2 redundant reads per non-fast write
  • Windows path compatibility unaffected
  • Search index content parity with pre-PR body-only indexing (Codex P2 — see above)

Overall: LGTM with one actionable suggestion. The core optimization is correct and well-tested. The most concrete improvement is using remove_frontmatter(final_content) when constructing EntityWriteResult.content, which would restore parity with the pre-PR body-only search indexing and clean up content_snippet in search results. This is a small, targeted change that would close both Codex P2 items cleanly.

Signed-off-by: phernandez <paul@basicmachines.co>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 3, 2026

Claude finished @phernandez's task in 1m 21s —— View job


PR Review: perf(core): reuse written note content after writes

  • Read changed files
  • Review updated code against prior P2 findings
  • Post final review

Summary

This revision (a198ba9) adds the external_id fixup invariant comment at knowledge_router.py:448-449, addressing the minor suggestion from the prior review. Two CI workflow commits (5174b18, 8ad2a6b) clean up the test matrix. The core optimization from f853e61 is unchanged.


Status of Prior Review Items

Finding Status
external_id fixup invariant comment (knowledge_router.py:448) ✅ Added in a198ba9
P2: Frontmatter tokens in search index (entity_service.py:337/462/1096) ⚠️ Still open
P2: Stale content with format_on_save=True ⚠️ Still open (low severity, default-off)

Open Item: Frontmatter in Search Index

EntityWriteResult is still constructed with the raw dump_frontmatter(...) output at all three call sites:

# entity_service.py:337
return EntityWriteResult(entity=updated, content=final_content)

# entity_service.py:462
return EntityWriteResult(entity=entity, content=final_content)

# entity_service.py:1096
return EntityWriteResult(entity=entity, content=new_content)

remove_frontmatter is already imported at entity_service.py:19. The fix is a one-liner at each site:

return EntityWriteResult(entity=updated, content=remove_frontmatter(final_content))

This would restore exact parity with the pre-PR behavior where search_service.index_entity received body-only content (via read_entity_content), and would prevent YAML tokens (title:, permalink:, type:) from appearing in content_snippet within search results. Fix this →


Checklist Results

  • Follows Basic Memory's coding conventions
  • Python 3.12+ type annotations and async patterns
  • SQLAlchemy 2.0 best practices
  • FastAPI conventions followed
  • 100-character line length
  • No commented-out code
  • Unit tests for new functions/methods
  • Integration tests for non-fast path
  • 100% test coverage maintained
  • No database schema changes (no migration needed)
  • Preserves local-first architecture
  • Knowledge graph operations maintain consistency
  • No hardcoded secrets
  • Proper error handling
  • Performance improvement correctly removes 2 redundant reads per non-fast write
  • Windows path compatibility unaffected
  • Search index content parity with pre-PR body-only indexing (P2 — remove_frontmatter not yet applied)

Overall: LGTM pending the remove_frontmatter fix. The external_id fixup comment is in place. The one remaining actionable item is wrapping EntityWriteResult.content with remove_frontmatter at the three construction sites in entity_service.py — a small, targeted change that closes the search-index parity gap cleanly.

Signed-off-by: phernandez <paul@basicmachines.co>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 3, 2026

Claude finished @phernandez's task in 1m 13s —— View job


PR Review: perf(core): reuse written note content after writes

  • Read changed files
  • Review updated code against prior P2 findings
  • Post final review

Summary

Commit 1bd1aad (fix(core): keep response markdown while indexing body content) cleanly resolves both P2 items from the prior review by splitting EntityWriteResult into two content fields:

  • content — full dump_frontmatter(...) output for the API response body
  • search_contentremove_frontmatter(final_content) for FTS indexing

Status of Prior Review Items

Finding Status
external_id fixup invariant comment (knowledge_router.py:449) ✅ Added in a198ba9
P2: Frontmatter tokens in search index ✅ Resolved — search_content passes body-only content to index_entity at all three write sites
P2: Stale content with format_on_save=True ✅ Acknowledged — format_on_save caveat comment retained at knowledge_router.py:365-367; response uses full markdown which is the correct design intent

Code Quality

The fix is well-structured:

  • entity_service.py:338-342 / 467-471 / 1105-1109 — All three EntityWriteResult construction sites consistently set search_content=remove_frontmatter(final_content) with remove_frontmatter already imported at line 19.
  • knowledge_router.py:337 / 486 / 631 — Router correctly routes search_content to index_entity and written_content to the response body.
  • test_entity_service_write_result.py — Three new tests assert both fields round-trip correctly against the actual written file and strip frontmatter as expected.
  • test_knowledge_router_telemetry.py — Updated to pass search_content="body only" in fake write results and assert index_entity receives that value (not the full frontmatter string). The FakeFileService.read_file_content AssertionError guard remains intact.

Checklist Results

  • Follows Basic Memory's coding conventions
  • Python 3.12+ type annotations and async patterns
  • SQLAlchemy 2.0 best practices
  • FastAPI conventions followed
  • 100-character line length
  • No commented-out code
  • Unit tests for new functions/methods
  • Integration tests for non-fast path
  • 100% test coverage maintained
  • No database schema changes (no migration needed)
  • Preserves local-first architecture
  • Knowledge graph operations maintain consistency
  • No hardcoded secrets
  • Proper error handling
  • Performance improvement correctly removes 2 redundant reads per non-fast write
  • Windows path compatibility unaffected
  • Search index content parity with pre-PR body-only indexing (resolved via search_content field)

Overall: LGTM. All prior P2 items are resolved. The two-field design is the right call — it makes the intent explicit at the data layer rather than stripping frontmatter in the router, keeps the response body as full markdown, and gives search indexing exactly what it needs. No further changes needed.

Signed-off-by: phernandez <paul@basicmachines.co>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 4, 2026

Claude finished @phernandez's task in 1m 38s —— View job


PR Review: perf(core): reuse written note content after writes

  • Read changed files
  • Review code quality & standards
  • Review testing & documentation
  • Review architecture & performance
  • Post final review

Summary

This PR introduces EntityWriteResult — a frozen dataclass with two content fields — and threads it through the non-fast create/update/edit paths to eliminate two redundant post-write file reads. All P2 items from prior review rounds are resolved in the final commit (1bd1aad).


Code Quality & Standards

Item Status
Follows coding conventions in CLAUDE.md
Python 3.12+ type annotations and async patterns
SQLAlchemy 2.0 best practices
FastAPI conventions followed
100-character line length
No commented-out code
Literate programming style ✅ — decision-point comments in detect_file_path_conflicts and the external_id fixup invariant at knowledge_router.py:456-457

Testing & Documentation

Item Status
Unit tests for new methods ✅ — tests/services/test_entity_service_write_result.py verifies both content and search_content at all three write sites
Non-fast path router coverage ✅ — test_knowledge_router.py passes fast=False and asserts response body contains expected text
Telemetry span order verified
FakeFileService.read_file_content assertion guard ✅ — raises AssertionError if non-fast path re-reads the file; excellent regression protection
100% coverage maintained ✅ — # pragma: no cover on two unreachable ValueError guards (entity_service.py:336, entity_service.py:464, entity_service.py:1102) is appropriate
Docstrings updated

Architecture & Performance

Item Status
EntityWriteResult(content, search_content) frozen dataclass ✅ — clean, immutable, purpose is explicit at the data layer
content = full dump_frontmatter(...) output for API response
search_content = remove_frontmatter(final_content) for FTS ✅ — parity with pre-PR read_entity_content body-only path
Delegation: create_entitycreate_entity_with_content ✅ — zero call-site churn
detect_file_path_conflicts return type corrected to List[str] ✅ — old List[Entity] was misleading; callers only used paths
get_all_file_paths() reuse on hot write path ✅ — skips eager-loading observations/relations per write
Windows path compatibility

Resolved P2 Items from Prior Reviews

Finding Status
Frontmatter tokens in search index ✅ Resolved — search_content=remove_frontmatter(final_content) at all three write sites
Stale content with format_on_save=True ✅ Acknowledged with comment at knowledge_router.py:365-367 (default-off, # pragma: no cover)
external_id fixup invariant comment ✅ Added at knowledge_router.py:456-457

Checklist Results

  • Follows Basic Memory's coding conventions
  • Python 3.12+ type annotations and async patterns
  • SQLAlchemy 2.0 best practices
  • FastAPI conventions followed
  • 100-character line length
  • No commented-out code
  • Unit tests for new functions/methods
  • Integration tests for non-fast path
  • 100% test coverage maintained
  • No database schema changes (no migration needed)
  • Preserves local-first architecture
  • Knowledge graph operations maintain consistency
  • No hardcoded secrets
  • Proper error handling
  • Performance improvement correctly removes 2 redundant reads per non-fast write
  • Windows path compatibility unaffected
  • Search index content parity with pre-PR body-only indexing (resolved via search_content field)

Overall: LGTM. The two-field design is the right call — it makes intent explicit at the data layer, keeps the response body as full markdown, and gives FTS exactly what it needs (body-only). All prior P2 items are resolved and the FakeFileService assertion guard provides a solid regression net for the optimization invariant.

@phernandez phernandez merged commit 69808b2 into main Apr 4, 2026
41 checks passed
@phernandez phernandez deleted the codex/issue-708-reuse-written-content branch April 4, 2026 05:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Redundant S3 reads during write_note and edit_note add ~1s latency

1 participant