chore: adopt structlog and add asyncio.Lock to engine singleton#15
Conversation
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (5)
✅ Files skipped from review due to trivial changes (2)
🚧 Files skipped from review as they are similar to previous changes (3)
WalkthroughReplace stdlib logging with structlog across modules, add an asyncio.Lock to serialize RAG engine singleton init/start/shutdown, convert startup/health/search logs to structured events, and update CHANGELOG entries to reference RSPEED-3047. ChangesStructured Logging Migration with Engine Thread-Safety
Sequence Diagram(s)(No diagram generated.) Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/docs2db_mcp/engine.py (1)
80-87:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFix RAG engine shutdown to await teardown (don’t just drop the singleton reference)
In
src/docs2db_mcp/engine.py(lines 80-87),shutdown_engine()only does_engine = Noneand never awaits any shutdown/close/cleanup on the existingUniversalRAGEngineinstance—so the engine’s resources are left to whatever the GC feels like doing today (meanwhileget_engine()doesawait _engine.start()).async def shutdown_engine() -> None: """Shutdown the RAG engine and cleanup resources.""" global _engine if _engine is not None: logger.info("Shutting down UniversalRAGEngine") _engine = NoneUpdate
shutdown_engine()to call/await the engine’s teardown method(s) before clearing_engine(and keep the lock/lifecycle contract intact).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/docs2db_mcp/engine.py` around lines 80 - 87, shutdown_engine currently just drops the _engine reference, leaking resources; update shutdown_engine to, when _engine is not None, log the shutdown, call and await the engine's lifecycle teardown API (e.g., prefer await _engine.shutdown() or await _engine.teardown() or await _engine.close() — choose the actual async method implemented by UniversalRAGEngine; if only a sync close exists call it accordingly), handle missing-method fallback gracefully, then clear _engine = None and preserve any existing lock/lifecycle contract used by get_engine/start. Ensure you await the teardown before nulling _engine so resources are cleaned up.src/docs2db_mcp/__main__.py (1)
3-3:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPurge stdlib
loggingfromsrc/docs2db_mcp(structlog-only, no undead imports)
src/docs2db_mcp/__main__.pystill importsloggingand callslogging.basicConfig(...)(SSE + non-SSE branches), andsrc/docs2db_mcp/tools/search_documents.pyalso imports stdliblogging(vialogging.getLogger). Swap these bootstrap/logger usages to structlog so thesrc/**/*.pycontract isn’t haunted by zombie imports.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/docs2db_mcp/__main__.py` at line 3, The files import and configure stdlib logging (import logging and calls to logging.basicConfig plus uses of logging.getLogger) which must be replaced with structlog-only usage; remove any import logging from src/docs2db_mcp/__main__.py and src/docs2db_mcp/tools/search_documents.py, replace the logging.basicConfig(...) bootstrap in the SSE and non-SSE branches of __main__.py with structlog configuration (structlog.configure(...) + appropriate processors/sink) and swap uses of logging.getLogger(...) in search_documents (and any local module-level logger variables) to structlog.get_logger(...); ensure all logging calls use the structlog logger API and delete the now-unused stdlib logging references.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/docs2db_mcp/__main__.py`:
- Line 43: Replace the eager f-string formatting in the logger calls in
search_documents.py (the logger.info / logger.error calls around the lines
referenced—the calls at ~49, ~76, ~85) with lazy formatting: use %-style
placeholders like "%s" or "%r" with arguments, or pass values as kwargs to
logger (e.g., logger.info("message %s", var) or logger.info("message",
extra_field=var)) so logging remains lazy and avoids premature string
interpolation.
---
Outside diff comments:
In `@src/docs2db_mcp/__main__.py`:
- Line 3: The files import and configure stdlib logging (import logging and
calls to logging.basicConfig plus uses of logging.getLogger) which must be
replaced with structlog-only usage; remove any import logging from
src/docs2db_mcp/__main__.py and src/docs2db_mcp/tools/search_documents.py,
replace the logging.basicConfig(...) bootstrap in the SSE and non-SSE branches
of __main__.py with structlog configuration (structlog.configure(...) +
appropriate processors/sink) and swap uses of logging.getLogger(...) in
search_documents (and any local module-level logger variables) to
structlog.get_logger(...); ensure all logging calls use the structlog logger API
and delete the now-unused stdlib logging references.
In `@src/docs2db_mcp/engine.py`:
- Around line 80-87: shutdown_engine currently just drops the _engine reference,
leaking resources; update shutdown_engine to, when _engine is not None, log the
shutdown, call and await the engine's lifecycle teardown API (e.g., prefer await
_engine.shutdown() or await _engine.teardown() or await _engine.close() — choose
the actual async method implemented by UniversalRAGEngine; if only a sync close
exists call it accordingly), handle missing-method fallback gracefully, then
clear _engine = None and preserve any existing lock/lifecycle contract used by
get_engine/start. Ensure you await the teardown before nulling _engine so
resources are cleaned up.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 96f1137b-6178-4e7b-a107-cb407abf618d
📒 Files selected for processing (4)
CHANGELOG.mdsrc/docs2db_mcp/__main__.pysrc/docs2db_mcp/engine.pysrc/docs2db_mcp/server.py
4071f7b to
5088d70
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/docs2db_mcp/engine.py`:
- Around line 84-88: The shutdown block for the singleton engine must ensure the
module-level _engine is cleared even if UniversalRAGEngine.close() raises; wrap
the await _engine.close() call in a try/finally inside the async with
_engine_lock so that in the finally you set _engine = None (and optionally log
the close exception before re-raising), ensuring get_engine() cannot return a
half-closed instance; update the code referencing _engine_lock, _engine,
get_engine, and UniversalRAGEngine accordingly.
In `@src/docs2db_mcp/tools/search_documents.py`:
- Line 49: The current logger.info call in search_documents.py logs raw user
input (logger.info("Searching", query=query, max_chunks=max_chunks)); change
this to avoid logging the plain query text by removing or replacing the query
field with non-sensitive metadata (e.g., query length, a one-way hash like
SHA256 of the query, or a redacted flag) and keep max_chunks; update the
logger.info invocation in the same function/module (the line referencing
logger.info, query and max_chunks) so logs no longer contain raw query content.
- Around line 5-7: The import change should be reverted to keep ToolAnnotations
imported from mcp.types (restore the original import), and update the logging in
search_documents where logger.info("Searching", query=query, ...) is called: do
not log the raw user-controlled query string—either log a non-reversible
fingerprint (e.g., an SHA-256 hex of the query) or only non-sensitive metadata
such as query length and max_chunks; modify the usage in the same function where
logger.info is invoked to replace query=query with the redacted/hash/metadata
field(s) and ensure ToolAnnotations remains imported from mcp.types.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: a2f6caba-ad11-425c-b2c0-618e3b46f938
📒 Files selected for processing (5)
CHANGELOG.mdsrc/docs2db_mcp/__main__.pysrc/docs2db_mcp/engine.pysrc/docs2db_mcp/server.pysrc/docs2db_mcp/tools/search_documents.py
✅ Files skipped from review due to trivial changes (1)
- CHANGELOG.md
🚧 Files skipped from review as they are similar to previous changes (2)
- src/docs2db_mcp/main.py
- src/docs2db_mcp/server.py
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
- Convert stdlib logging to structlog in engine.py, server.py, __main__.py - Add asyncio.Lock guard to get_engine() singleton initialization - Add RSPEED-3047 references to CHANGELOG entries - Consolidate startup logging into single structured message These changes were part of PR #14 but missed the merge due to a force-push/merge race condition.
5088d70 to
a05e58d
Compare
Follow-up to #14 — these changes were committed but missed the merge due to a force-push/merge timing issue.
Changes
loggingtostructloginengine.py,server.py,__main__.py(keyword args style)get_engine()singleton initialization against concurrent callsVerification
search_documentstool call returns results