Skip to content

[BUGFIX] : Persistent Storage Leak of Knowledge Graphs (GraphRAG)#546

Closed
hrshjswniii wants to merge 8 commits into
param20h:devfrom
hrshjswniii:bugfix/Persistent-Storage-Leak
Closed

[BUGFIX] : Persistent Storage Leak of Knowledge Graphs (GraphRAG)#546
hrshjswniii wants to merge 8 commits into
param20h:devfrom
hrshjswniii:bugfix/Persistent-Storage-Leak

Conversation

@hrshjswniii

Copy link
Copy Markdown
Contributor

🔗 Related Issue


Closes #539



📝 What does this PR do?


This PR implements multi-document chat capabilities, introduces a Recycle Bin trash/restore flow, and resolves a critical persistent storage leak in GraphRAG background cleanup:

1. GraphRAG Persistent Storage Leak Fix (Bug Fix)

  • **Background auto-cleanup in [main.py].
  • Added a delete_graph try-except block call inside document_cleanup_job() to automatically clean up persisting knowledge graph JSON files of expired/inactive documents (which previously caused disk leaks on the server).
  • Unit Test coverage: Added test_cleanup_old_deleted_documents_purges_graph and test_document_cleanup_job_purges_graph in [test_documents.py]to assert correct file removal, database purging, vector cleanup, and graph cleanup.

2. Multi-Document Selection Chat (New Feature)

  • RAG Pipeline & Retrieval Pipeline (Backend): Enhanced retrieve(), PDFSearchTool, _candidate_graphs, get_entity_context, and agent execution helper signatures to accept document_ids: List[str] and verify user access for each. Updated tracing decorator signatures to support keyword arguments dynamically.
  • WebSocket & SSE handlers (Backend): Updated endpoints to parse document_ids, validate readiness, save multi-doc messages under document_id = None, and cache query results under a sorted comma-joined key.
  • Interactive Checkboxes & Indicators (Frontend): Rendered checking inputs in the sidebar list of documents, added selection badge count indicators, updated textarea placeholders, and structured payloads to dispatch document_ids.

3. Recycle Bin / Trash Modal (New Feature)

  • Added TrashModal.tsx and custom routing to list, restore, or immediately purge soft-deleted files.


🗂️ Type of Change


  • 🐛 Bug fix
  • ✨ New feature
  • 🔧 Refactor / code cleanup
  • 📝 Documentation update
  • 🎨 UI / styling change
  • ⚙️ CI / tooling / config change
  • 🧪 Tests


🧪 How was this tested?


  • Tested the affected API endpoints manually
  • Added / updated tests
    • Added test_cleanup_old_deleted_documents_purges_graph and test_document_cleanup_job_purges_graph in test_documents.py.
    • Added test_retrieve_with_document_ids_list_and_rbac_checks in test_retriever.py.
    • Added test_chat_ask_success_with_document_ids in test_chat.py.
    • Verified all 164 tests pass successfully via backend\.venv\Scripts\python -m pytest backend/tests
  • Ran frontend local typecheck and compilation (npx tsc --noEmit completed with no compilation errors)


⚠️ Anything to flag for reviewers?


  • In main.py, the delete_graph import and invocation is wrapped in a try-except block to prevent database updates or file deletions from failing if a document doesn't have an associated graph yet.
  • In test_documents.py, MockDbSessionContext has been updated to commit transactions on context exit so that database purging side effects are accurately tracked by the tests.

@hrshjswniii hrshjswniii requested a review from param20h as a code owner June 9, 2026 20:19
@param20h

Copy link
Copy Markdown
Owner

✅ This PR has been merged directly into dev by maintainer.

@param20h param20h closed this Jun 28, 2026
@param20h param20h added level:intermediate +35 pts mentor:param20h Mentor for this PR gssoc:approved Approved for GSSoC base points (+50 pts) labels Jun 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gssoc:approved Approved for GSSoC base points (+50 pts) level:intermediate +35 pts mentor:param20h Mentor for this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] : Persistent Storage Leak of Knowledge Graphs (GraphRAG)

2 participants