Skip to content

chore: adopt structlog and add asyncio.Lock to engine singleton#15

Merged
major merged 1 commit into
mainfrom
chore/structlog-and-lock-cleanup
May 28, 2026
Merged

chore: adopt structlog and add asyncio.Lock to engine singleton#15
major merged 1 commit into
mainfrom
chore/structlog-and-lock-cleanup

Conversation

@Lifto
Copy link
Copy Markdown
Collaborator

@Lifto Lifto commented May 28, 2026

Follow-up to #14 — these changes were committed but missed the merge due to a force-push/merge timing issue.

Changes

  • structlog adoption: Convert stdlib logging to structlog in engine.py, server.py, __main__.py (keyword args style)
  • asyncio.Lock: Guard get_engine() singleton initialization against concurrent calls
  • CHANGELOG: Add RSPEED-3047 references to existing entries
  • Startup logging: Consolidate 4 separate log calls into 1 structured message

Verification

  • All 43 tests pass
  • Pre-commit clean (ruff, ruff-format, pyright)
  • Live-tested: server starts, health check passes, search_documents tool call returns results

@Lifto Lifto requested a review from a team as a code owner May 28, 2026 00:43
@Lifto
Copy link
Copy Markdown
Collaborator Author

Lifto commented May 28, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 608ef89a-107d-4472-90a3-fc2b08a1d6d5

📥 Commits

Reviewing files that changed from the base of the PR and between 5088d70 and a05e58d.

📒 Files selected for processing (5)
  • CHANGELOG.md
  • src/docs2db_mcp/__main__.py
  • src/docs2db_mcp/engine.py
  • src/docs2db_mcp/server.py
  • src/docs2db_mcp/tools/search_documents.py
✅ Files skipped from review due to trivial changes (2)
  • CHANGELOG.md
  • src/docs2db_mcp/server.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/docs2db_mcp/tools/search_documents.py
  • src/docs2db_mcp/engine.py
  • src/docs2db_mcp/main.py

Walkthrough

Replace stdlib logging with structlog across modules, add an asyncio.Lock to serialize RAG engine singleton init/start/shutdown, convert startup/health/search logs to structured events, and update CHANGELOG entries to reference RSPEED-3047.

Changes

Structured Logging Migration with Engine Thread-Safety

Layer / File(s) Summary
Module-level structlog setup
src/docs2db_mcp/__main__.py, src/docs2db_mcp/server.py, src/docs2db_mcp/engine.py, src/docs2db_mcp/tools/search_documents.py
Module-level loggers now use structlog.get_logger(__name__); __main__.py adjusts logging bootstrap to ensure MCP transport logs route to stderr and sets non-sse log level via DOCS2DB_LOG_LEVEL.
Application startup structured events
src/docs2db_mcp/__main__.py
main() emits a structured startup event (host, port, transport, DB connection, RAG params) and logs server errors via structlog with error=str(e) and exc_info=True.
Engine singleton init, shutdown, and health-check
src/docs2db_mcp/engine.py
get_engine() acquires _engine_lock, initializes configs, awaits _engine.start() under the lock and resets _engine on failure; shutdown_engine() awaits _engine.close() under _engine_lock and clears the singleton; health_check() emits structured success/failure logs.
Search tool structured events
src/docs2db_mcp/tools/search_documents.py
Search tool logs now emit structured fields: start (query_length, max_chunks), completion (num_results), and failure (error, exc_info).
Changelog updates
CHANGELOG.md
Append RSPEED-3047 to three existing bullets and add bullets noting structlog adoption, engine lock, and shutdown awaiting engine.close().

Sequence Diagram(s)

(No diagram generated.)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive The PR description covers key changes (structlog adoption, asyncio.Lock, CHANGELOG updates, startup logging consolidation) and includes verification results, but deviates from the template structure by omitting the 'What', 'Why', 'How to Test', and explicit 'Checklist' sections. Restructure the description to follow the template: add explicit 'What' and 'Why' sections at the top, provide 'How to Test' steps separately, and include the checkbox checklist format from the template.
✅ Passed checks (3 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch chore/structlog-and-lock-cleanup

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/docs2db_mcp/engine.py (1)

80-87: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix RAG engine shutdown to await teardown (don’t just drop the singleton reference)

In src/docs2db_mcp/engine.py (lines 80-87), shutdown_engine() only does _engine = None and never awaits any shutdown/close/cleanup on the existing UniversalRAGEngine instance—so the engine’s resources are left to whatever the GC feels like doing today (meanwhile get_engine() does await _engine.start()).

async def shutdown_engine() -> None:
    """Shutdown the RAG engine and cleanup resources."""
    global _engine

    if _engine is not None:
        logger.info("Shutting down UniversalRAGEngine")
        _engine = None

Update shutdown_engine() to call/await the engine’s teardown method(s) before clearing _engine (and keep the lock/lifecycle contract intact).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/docs2db_mcp/engine.py` around lines 80 - 87, shutdown_engine currently
just drops the _engine reference, leaking resources; update shutdown_engine to,
when _engine is not None, log the shutdown, call and await the engine's
lifecycle teardown API (e.g., prefer await _engine.shutdown() or await
_engine.teardown() or await _engine.close() — choose the actual async method
implemented by UniversalRAGEngine; if only a sync close exists call it
accordingly), handle missing-method fallback gracefully, then clear _engine =
None and preserve any existing lock/lifecycle contract used by get_engine/start.
Ensure you await the teardown before nulling _engine so resources are cleaned
up.
src/docs2db_mcp/__main__.py (1)

3-3: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Purge stdlib logging from src/docs2db_mcp (structlog-only, no undead imports)

src/docs2db_mcp/__main__.py still imports logging and calls logging.basicConfig(...) (SSE + non-SSE branches), and src/docs2db_mcp/tools/search_documents.py also imports stdlib logging (via logging.getLogger). Swap these bootstrap/logger usages to structlog so the src/**/*.py contract isn’t haunted by zombie imports.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/docs2db_mcp/__main__.py` at line 3, The files import and configure stdlib
logging (import logging and calls to logging.basicConfig plus uses of
logging.getLogger) which must be replaced with structlog-only usage; remove any
import logging from src/docs2db_mcp/__main__.py and
src/docs2db_mcp/tools/search_documents.py, replace the logging.basicConfig(...)
bootstrap in the SSE and non-SSE branches of __main__.py with structlog
configuration (structlog.configure(...) + appropriate processors/sink) and swap
uses of logging.getLogger(...) in search_documents (and any local module-level
logger variables) to structlog.get_logger(...); ensure all logging calls use the
structlog logger API and delete the now-unused stdlib logging references.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/docs2db_mcp/__main__.py`:
- Line 43: Replace the eager f-string formatting in the logger calls in
search_documents.py (the logger.info / logger.error calls around the lines
referenced—the calls at ~49, ~76, ~85) with lazy formatting: use %-style
placeholders like "%s" or "%r" with arguments, or pass values as kwargs to
logger (e.g., logger.info("message %s", var) or logger.info("message",
extra_field=var)) so logging remains lazy and avoids premature string
interpolation.

---

Outside diff comments:
In `@src/docs2db_mcp/__main__.py`:
- Line 3: The files import and configure stdlib logging (import logging and
calls to logging.basicConfig plus uses of logging.getLogger) which must be
replaced with structlog-only usage; remove any import logging from
src/docs2db_mcp/__main__.py and src/docs2db_mcp/tools/search_documents.py,
replace the logging.basicConfig(...) bootstrap in the SSE and non-SSE branches
of __main__.py with structlog configuration (structlog.configure(...) +
appropriate processors/sink) and swap uses of logging.getLogger(...) in
search_documents (and any local module-level logger variables) to
structlog.get_logger(...); ensure all logging calls use the structlog logger API
and delete the now-unused stdlib logging references.

In `@src/docs2db_mcp/engine.py`:
- Around line 80-87: shutdown_engine currently just drops the _engine reference,
leaking resources; update shutdown_engine to, when _engine is not None, log the
shutdown, call and await the engine's lifecycle teardown API (e.g., prefer await
_engine.shutdown() or await _engine.teardown() or await _engine.close() — choose
the actual async method implemented by UniversalRAGEngine; if only a sync close
exists call it accordingly), handle missing-method fallback gracefully, then
clear _engine = None and preserve any existing lock/lifecycle contract used by
get_engine/start. Ensure you await the teardown before nulling _engine so
resources are cleaned up.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 96f1137b-6178-4e7b-a107-cb407abf618d

📥 Commits

Reviewing files that changed from the base of the PR and between 1259372 and 4071f7b.

📒 Files selected for processing (4)
  • CHANGELOG.md
  • src/docs2db_mcp/__main__.py
  • src/docs2db_mcp/engine.py
  • src/docs2db_mcp/server.py

Comment thread src/docs2db_mcp/__main__.py
@Lifto Lifto force-pushed the chore/structlog-and-lock-cleanup branch from 4071f7b to 5088d70 Compare May 28, 2026 00:54
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/docs2db_mcp/engine.py`:
- Around line 84-88: The shutdown block for the singleton engine must ensure the
module-level _engine is cleared even if UniversalRAGEngine.close() raises; wrap
the await _engine.close() call in a try/finally inside the async with
_engine_lock so that in the finally you set _engine = None (and optionally log
the close exception before re-raising), ensuring get_engine() cannot return a
half-closed instance; update the code referencing _engine_lock, _engine,
get_engine, and UniversalRAGEngine accordingly.

In `@src/docs2db_mcp/tools/search_documents.py`:
- Line 49: The current logger.info call in search_documents.py logs raw user
input (logger.info("Searching", query=query, max_chunks=max_chunks)); change
this to avoid logging the plain query text by removing or replacing the query
field with non-sensitive metadata (e.g., query length, a one-way hash like
SHA256 of the query, or a redacted flag) and keep max_chunks; update the
logger.info invocation in the same function/module (the line referencing
logger.info, query and max_chunks) so logs no longer contain raw query content.
- Around line 5-7: The import change should be reverted to keep ToolAnnotations
imported from mcp.types (restore the original import), and update the logging in
search_documents where logger.info("Searching", query=query, ...) is called: do
not log the raw user-controlled query string—either log a non-reversible
fingerprint (e.g., an SHA-256 hex of the query) or only non-sensitive metadata
such as query length and max_chunks; modify the usage in the same function where
logger.info is invoked to replace query=query with the redacted/hash/metadata
field(s) and ensure ToolAnnotations remains imported from mcp.types.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a2f6caba-ad11-425c-b2c0-618e3b46f938

📥 Commits

Reviewing files that changed from the base of the PR and between 4071f7b and 5088d70.

📒 Files selected for processing (5)
  • CHANGELOG.md
  • src/docs2db_mcp/__main__.py
  • src/docs2db_mcp/engine.py
  • src/docs2db_mcp/server.py
  • src/docs2db_mcp/tools/search_documents.py
✅ Files skipped from review due to trivial changes (1)
  • CHANGELOG.md
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/docs2db_mcp/main.py
  • src/docs2db_mcp/server.py

Comment thread src/docs2db_mcp/engine.py Outdated
Comment thread src/docs2db_mcp/tools/search_documents.py
Comment thread src/docs2db_mcp/tools/search_documents.py Outdated
@Lifto
Copy link
Copy Markdown
Collaborator Author

Lifto commented May 28, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

- Convert stdlib logging to structlog in engine.py, server.py, __main__.py
- Add asyncio.Lock guard to get_engine() singleton initialization
- Add RSPEED-3047 references to CHANGELOG entries
- Consolidate startup logging into single structured message

These changes were part of PR #14 but missed the merge due to
a force-push/merge race condition.
@Lifto Lifto force-pushed the chore/structlog-and-lock-cleanup branch from 5088d70 to a05e58d Compare May 28, 2026 01:52
@major major merged commit d483887 into main May 28, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants