feat: add Medical Document Parser using Gemma 4 vision and PyMuPDF#47
feat: add Medical Document Parser using Gemma 4 vision and PyMuPDF#47Tiioluwani wants to merge 1 commit into
Conversation
📝 WalkthroughWalkthroughThis PR introduces a complete Gradio-based medical document parser that processes PDFs and images, extracts structured clinical information using Google's Gemini LLM, merges multi-page results with intelligent deduplication, and presents findings through a web interface with flagged abnormality highlighting. ChangesMedical Document Parser with Gemini Integration
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (8)
multimodal/medical_document_parser/requirements.txt (1)
1-6: 🏗️ Heavy liftAdd a lock/constraints file for reproducible installs.
Using only
>=bounds makes runtime behavior drift over time. Consider pinning resolved versions (e.g.,requirements.lock/constraints) for stable deploys.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@multimodal/medical_document_parser/requirements.txt` around lines 1 - 6, Create a reproducible install by adding a pinned lock/constraints file and updating the project metadata: generate a constraints or lock file (e.g., requirements.lock or constraints.txt) that pins exact versions for the packages listed in requirements.txt (gradio, google-genai, PyMuPDF, Pillow, python-dotenv, pydantic), and update CI/deploy/pip install commands to use --constraint/--requirement against that file; produce the locked file by resolving current working versions (using pip freeze, pip-tools' pip-compile, or poetry/pipenv lock) and commit the resulting lock so installs are deterministic.multimodal/medical_document_parser/llm_extractor.py (2)
48-54: ⚡ Quick winAdd error handling for JSON parsing failures.
If the LLM returns malformed JSON or the regex fails to strip fences correctly,
json.loadswill raise an exception. This should be caught and return an emptyClinicalProfilerather than crashing the entire document processing pipeline.🛡️ Proposed fix to add JSON parsing error handling
def _parse_response_text(text: str) -> ClinicalProfile: - cleaned = text.strip() - if cleaned.startswith("```"): - cleaned = re.sub(r"^```(?:json)?\s*", "", cleaned) - cleaned = re.sub(r"\s*```$", "", cleaned) - data = json.loads(cleaned) - return ClinicalProfile.model_validate(data) + try: + cleaned = text.strip() + if cleaned.startswith("```"): + cleaned = re.sub(r"^```(?:json)?\s*", "", cleaned) + cleaned = re.sub(r"\s*```$", "", cleaned) + data = json.loads(cleaned) + return ClinicalProfile.model_validate(data) + except (json.JSONDecodeError, ValueError) as e: + print(f"Failed to parse LLM response: {e}") + return ClinicalProfile()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@multimodal/medical_document_parser/llm_extractor.py` around lines 48 - 54, The _parse_response_text function can raise on malformed JSON or failed fence stripping; wrap the cleaning + json.loads + ClinicalProfile.model_validate calls in a try/except that catches json.JSONDecodeError and ValueError, log the parsing failure (use existing logger if available, otherwise a concise message) and return an empty ClinicalProfile() instead of letting the exception bubble up; ensure you still strip code fences (the existing re.sub lines) inside the try so failures there are also caught.
42-44: 💤 Low valueDocument cost and latency implications of HIGH thinking level.
The
thinking_level=types.ThinkingLevel.HIGHsetting may significantly increase API costs and response latency. For a medical document parser processing multi-page documents, this could impact user experience and operational costs. Consider whether HIGH is necessary for all pages or if it should be configurable.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@multimodal/medical_document_parser/llm_extractor.py` around lines 42 - 44, The hardcoded thinking_config using types.ThinkingConfig(thinking_level=types.ThinkingLevel.HIGH) can drive up API cost and latency; make the thinking level configurable (e.g., expose a parameter on the LLMExtractor class or its factory function and/or per-page override) and document the cost/latency tradeoffs in the function/class docstring and README; update references to thinking_config in llm_extractor.py so the default is a lower-cost level (e.g., MEDIUM or a named constant) and allow callers to opt into HIGH when necessary.multimodal/medical_document_parser/schemas.py (2)
9-11: ⚡ Quick winConsider using None for missing values instead of empty strings.
Empty strings as defaults make it difficult to distinguish between "field was not provided" and "field was explicitly set to empty". For medical data, this distinction can be important for downstream validation and clinical decision support.
♻️ Proposed change to use Optional types
class PatientInfo(BaseModel): - name: str = "" - age: str = "" - date: str = "" + name: str | None = None + age: str | None = None + date: str | None = None🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@multimodal/medical_document_parser/schemas.py` around lines 9 - 11, Change the default field values from empty strings to None and use Optional types so missing vs explicitly empty is distinguishable: update the declarations for name, age, and date in schemas.py to use Optional[str] = None (and add the necessary typing import, e.g., Optional) so all three fields are None by default instead of "" and downstream validation can detect absent values.
11-11: ⚡ Quick winUse a validated date type for stronger guarantees.
Storing dates as strings bypasses Pydantic's built-in validation. Medical documents often have critical date requirements (e.g., lab result dates, prescription dates).
📅 Proposed change to use datetime type
+from datetime import date + class PatientInfo(BaseModel): name: str = "" age: str = "" - date: str = "" + date: date | None = None🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@multimodal/medical_document_parser/schemas.py` at line 11, The field currently declared as "date: str = \"\"" should be changed to a validated date type: replace it with "date: Optional[date] = None" (import date and Optional) and add a Pydantic validator (e.g., `@validator`('date', pre=True, allow_reuse=True) def parse_date(cls, v): ...) to parse strings and reject/normalize invalid values; update any code that relied on empty-string default to handle None instead. Ensure the new validator lives in the same model in multimodal/medical_document_parser/schemas.py and use datetime.date parsing/formatting for consistent validation.multimodal/medical_document_parser/document_processor.py (2)
73-78: ⚡ Quick winAdd error handling for corrupted or malformed PDFs.
PyMuPDF can raise exceptions when opening corrupted PDF files. Without explicit error handling, users would see generic stack traces instead of helpful error messages.
🛡️ Proposed fix to add error handling
if suffix == ".pdf": pages: list[ProcessedPage] = [] - with fitz.open(path) as document: - for page in document: - pages.append(_classify_pdf_page(page)) + try: + with fitz.open(path) as document: + for page in document: + pages.append(_classify_pdf_page(page)) + except Exception as e: + raise ValueError(f"Failed to process PDF: {e}") from e return pages🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@multimodal/medical_document_parser/document_processor.py` around lines 73 - 78, Wrap the PDF handling block that uses fitz.open(path) and iterates pages in a try/except that catches PyMuPDF-specific exceptions (fitz.FitzError) and a general Exception fallback; on error, surface a clear message that includes the file path and the original exception (either by logging via the module logger or re-raising a ValueError/RuntimeError that wraps the original exception) instead of letting a raw traceback propagate; ensure the code calling _classify_pdf_page still runs for valid pages and that the exception path returns or raises a single, helpful error for the caller.
11-11: 💤 Low valueDocument the TEXT_THRESHOLD rationale.
The threshold of 50 characters appears arbitrary. In medical documents, even short pages (e.g., a prescription with patient name, medication, and dosage) may contain critical text that should be processed as text rather than vision. Consider documenting why 50 was chosen or making it configurable.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@multimodal/medical_document_parser/document_processor.py` at line 11, The TEXT_THRESHOLD constant (TEXT_THRESHOLD = 50) is undocumented and may be too low for medical pages; update the code to document the rationale and make it configurable: add a docstring or inline comment next to TEXT_THRESHOLD explaining why 50 was chosen (e.g., tradeoff between OCR cost and noise), and refactor code to allow overriding it via a configuration parameter or constructor argument (e.g., a config.TXT_THRESHOLD or DocumentProcessor(text_threshold=...)) so callers can adjust behavior for short but important pages; ensure all references to TEXT_THRESHOLD in functions/methods within DocumentProcessor are changed to read from the new configurable value and include a default of 50.multimodal/medical_document_parser/merger.py (1)
55-62: ⚡ Quick winOptimize the membership check for flagged_items.
The
if label not in merged.flagged_itemscheck on line 61 performs a linear O(n) scan for each lab finding. For documents with many abnormal findings, this could become a performance bottleneck. Since_dedupe_stringsis called earlier (line 53), duplicates will be removed anyway.⚡ Proposed optimization
Option 1: Skip the check and rely on deduplication
for finding in merged.lab_findings: if finding.status in {"abnormal", "critical"}: label = f"{finding.test}: {finding.value}" if finding.normal_range: label += f" (ref: {finding.normal_range})" label += f" [{finding.status}]" - if label not in merged.flagged_items: - merged.flagged_items.append(label) + merged.flagged_items.append(label) + + # Deduplicate again after adding auto-generated items + merged.flagged_items = _dedupe_strings(merged.flagged_items)Option 2: Use a set for O(1) lookup
+ existing_flags = {item.strip().lower() for item in merged.flagged_items} for finding in merged.lab_findings: if finding.status in {"abnormal", "critical"}: label = f"{finding.test}: {finding.value}" if finding.normal_range: label += f" (ref: {finding.normal_range})" label += f" [{finding.status}]" - if label not in merged.flagged_items: + if label.strip().lower() not in existing_flags: merged.flagged_items.append(label) + existing_flags.add(label.strip().lower())🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@multimodal/medical_document_parser/merger.py` around lines 55 - 62, The membership check "if label not in merged.flagged_items" is causing O(n^2) behavior; replace it by maintaining a temporary set (e.g. seen_flagged) local to the loop and use seen_flagged.add(label) for O(1) lookups when iterating merged.lab_findings, then extend merged.flagged_items from that set (or keep the list order by appending only when label not in seen_flagged). Alternatively, if you prefer relying on deduplication, remove the per-item check entirely since _dedupe_strings is already called earlier; choose one approach and apply it consistently to the code that builds merged.flagged_items.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@multimodal/medical_document_parser/app.py`:
- Around line 65-66: The Gradio css and theme kwargs are being passed to
demo.launch() but Gradio 5.x requires them on Blocks construction; update the
gr.Blocks(...) call that creates demo (the gr.Blocks(title="Medical Document
Parser") as demo block) to include the css and theme arguments (e.g., css=...,
theme=...) and remove css and theme from the demo.launch(...) invocation so
launch() is called without those kwargs.
- Around line 31-34: The parse_document function currently uses a mutable/shared
default by setting progress: gr.Progress = gr.Progress() at definition time;
change the signature to accept progress: gr.Progress | None = None and inside
parse_document check if progress is None then set progress = gr.Progress() so
each call gets a fresh Progress instance (refer to parse_document and its
progress parameter when making this change).
In `@multimodal/medical_document_parser/document_processor.py`:
- Around line 69-83: process_upload lacks file size checks and can load very
large PDFs/images into memory; add a configurable MAX_UPLOAD_SIZE_BYTES constant
and validate path.stat().st_size against it at the top of process_upload,
raising ValueError if exceeded, before calling fitz.open or path.read_bytes();
keep the checks when handling both the PDF branch (before looping into
_classify_pdf_page) and the image branch (before invoking _image_to_page) so
large files are rejected early.
In `@multimodal/medical_document_parser/llm_extractor.py`:
- Around line 57-82: Wrap the calls to client.models.generate_content inside
extract_from_page with a try/except that catches network/HTTP/auth/timeouts
(SDK-specific exceptions and a generic Exception), enforce a request timeout via
the SDK or by adding a timeout param from _generation_config (or client init)
and return an empty ClinicalProfile or propagate a clear, logged error when
failures occur; ensure both the text and vision branches use the same guarded
call, log or attach the exception details, and keep _parse_response_text usage
unchanged for successful responses.
In `@multimodal/medical_document_parser/README.md`:
- Around line 89-90: The project tree root in README.md is incorrect—replace the
top-level folder name "medical-document-parser/" with the repository path
"multimodal/medical_document_parser/" so the tree header matches the actual repo
layout and avoids confusion; update the tree line containing
"medical-document-parser/" (above the listed files like app.py) to
"multimodal/medical_document_parser/".
In `@multimodal/medical_document_parser/requirements.txt`:
- Line 4: Update the Pillow dependency in requirements.txt from "Pillow>=10.0.0"
to "Pillow>=12.2.0" to ensure the package version includes fixes for the listed
CVEs; locate the Pillow line in
multimodal/medical_document_parser/requirements.txt and change the version floor
to 12.2.0, then run dependency install/check (e.g., pip check or CI dependency
audit) to confirm no conflicts.
In `@multimodal/medical_document_parser/schemas.py`:
- Around line 8-11: The PatientInfo model currently holds PHI (name, age, date)
and must not be sent to external LLMs without safeguards: update the PatientInfo
class and the code paths in llm_extractor.py to require explicit user consent
(e.g., a boolean consent flag or signed consent token) before transmission, add
an option to de-identify/anonymize fields via a new method (e.g.,
PatientInfo.deidentify()) and use that by default, ensure all network calls use
TLS and that persisted PHI is stored encrypted (enable encryption-at-rest config
checks), add a runtime check that a configured BAA_with_google flag or
documented BAA approval is present before calling Gemini, and emit audit log
entries (who, what, when, purpose) whenever PatientInfo is accessed or sent to
an external API; implement these checks centrally where PatientInfo is
serialized for llm_extractor.py so they cannot be bypassed.
---
Nitpick comments:
In `@multimodal/medical_document_parser/document_processor.py`:
- Around line 73-78: Wrap the PDF handling block that uses fitz.open(path) and
iterates pages in a try/except that catches PyMuPDF-specific exceptions
(fitz.FitzError) and a general Exception fallback; on error, surface a clear
message that includes the file path and the original exception (either by
logging via the module logger or re-raising a ValueError/RuntimeError that wraps
the original exception) instead of letting a raw traceback propagate; ensure the
code calling _classify_pdf_page still runs for valid pages and that the
exception path returns or raises a single, helpful error for the caller.
- Line 11: The TEXT_THRESHOLD constant (TEXT_THRESHOLD = 50) is undocumented and
may be too low for medical pages; update the code to document the rationale and
make it configurable: add a docstring or inline comment next to TEXT_THRESHOLD
explaining why 50 was chosen (e.g., tradeoff between OCR cost and noise), and
refactor code to allow overriding it via a configuration parameter or
constructor argument (e.g., a config.TXT_THRESHOLD or
DocumentProcessor(text_threshold=...)) so callers can adjust behavior for short
but important pages; ensure all references to TEXT_THRESHOLD in
functions/methods within DocumentProcessor are changed to read from the new
configurable value and include a default of 50.
In `@multimodal/medical_document_parser/llm_extractor.py`:
- Around line 48-54: The _parse_response_text function can raise on malformed
JSON or failed fence stripping; wrap the cleaning + json.loads +
ClinicalProfile.model_validate calls in a try/except that catches
json.JSONDecodeError and ValueError, log the parsing failure (use existing
logger if available, otherwise a concise message) and return an empty
ClinicalProfile() instead of letting the exception bubble up; ensure you still
strip code fences (the existing re.sub lines) inside the try so failures there
are also caught.
- Around line 42-44: The hardcoded thinking_config using
types.ThinkingConfig(thinking_level=types.ThinkingLevel.HIGH) can drive up API
cost and latency; make the thinking level configurable (e.g., expose a parameter
on the LLMExtractor class or its factory function and/or per-page override) and
document the cost/latency tradeoffs in the function/class docstring and README;
update references to thinking_config in llm_extractor.py so the default is a
lower-cost level (e.g., MEDIUM or a named constant) and allow callers to opt
into HIGH when necessary.
In `@multimodal/medical_document_parser/merger.py`:
- Around line 55-62: The membership check "if label not in merged.flagged_items"
is causing O(n^2) behavior; replace it by maintaining a temporary set (e.g.
seen_flagged) local to the loop and use seen_flagged.add(label) for O(1) lookups
when iterating merged.lab_findings, then extend merged.flagged_items from that
set (or keep the list order by appending only when label not in seen_flagged).
Alternatively, if you prefer relying on deduplication, remove the per-item check
entirely since _dedupe_strings is already called earlier; choose one approach
and apply it consistently to the code that builds merged.flagged_items.
In `@multimodal/medical_document_parser/requirements.txt`:
- Around line 1-6: Create a reproducible install by adding a pinned
lock/constraints file and updating the project metadata: generate a constraints
or lock file (e.g., requirements.lock or constraints.txt) that pins exact
versions for the packages listed in requirements.txt (gradio, google-genai,
PyMuPDF, Pillow, python-dotenv, pydantic), and update CI/deploy/pip install
commands to use --constraint/--requirement against that file; produce the locked
file by resolving current working versions (using pip freeze, pip-tools'
pip-compile, or poetry/pipenv lock) and commit the resulting lock so installs
are deterministic.
In `@multimodal/medical_document_parser/schemas.py`:
- Around line 9-11: Change the default field values from empty strings to None
and use Optional types so missing vs explicitly empty is distinguishable: update
the declarations for name, age, and date in schemas.py to use Optional[str] =
None (and add the necessary typing import, e.g., Optional) so all three fields
are None by default instead of "" and downstream validation can detect absent
values.
- Line 11: The field currently declared as "date: str = \"\"" should be changed
to a validated date type: replace it with "date: Optional[date] = None" (import
date and Optional) and add a Pydantic validator (e.g., `@validator`('date',
pre=True, allow_reuse=True) def parse_date(cls, v): ...) to parse strings and
reject/normalize invalid values; update any code that relied on empty-string
default to handle None instead. Ensure the new validator lives in the same model
in multimodal/medical_document_parser/schemas.py and use datetime.date
parsing/formatting for consistent validation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 12eadcc4-ce8d-4fb1-927d-81f395e236a6
⛔ Files ignored due to path filters (1)
multimodal/medical_document_parser/assets/demo.pngis excluded by!**/*.png
📒 Files selected for processing (8)
multimodal/medical_document_parser/.env.examplemultimodal/medical_document_parser/README.mdmultimodal/medical_document_parser/app.pymultimodal/medical_document_parser/document_processor.pymultimodal/medical_document_parser/llm_extractor.pymultimodal/medical_document_parser/merger.pymultimodal/medical_document_parser/requirements.txtmultimodal/medical_document_parser/schemas.py
| def parse_document( | ||
| upload: str | None, | ||
| progress: gr.Progress = gr.Progress(), | ||
| ) -> tuple[dict, str]: |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In Python, are default argument expressions evaluated once at function definition time, and is this why linters warn about function calls in defaults (e.g., Ruff B008)?
💡 Result:
Yes, in Python, default argument expressions are evaluated once at function definition time, not on each call. This is explicitly stated in the official Python documentation: "The default values are evaluated at the point of function definition in the defining scope" and "Important warning: The default value is evaluated only once."[1] This behavior stores the defaults in the function's defaults tuple, reusing the same value (or object) across calls.[2][3][4] This is why linters like Ruff warn about function calls in defaults via rule B008: "Any function call that's used in a default argument will only be performed once, at definition time. The returned value will then be reused by all calls to the function, which can lead to unexpected behaviour."[5] For example, calling a function that returns a mutable object (like a list) shares that object across calls, causing mutations to persist unexpectedly. Even non-mutable calls (e.g., time.time or random) yield stale values from definition time.[5] The standard fix is to use None as the default and initialize inside the function body.[1][5]
Citations:
- 1: https://docs.python.org/3/tutorial/controlflow.html
- 2: https://stackoverflow.com/questions/1651154/why-are-default-arguments-evaluated-at-definition-time
- 3: https://stackoverflow.com/questions/1132941/the-mutable-default-argument-in-python
- 4: https://engineersofai.com/docs/python/python-foundation/functions-and-modularity/Default-Parameters-Pitfalls
- 5: https://docs.astral.sh/ruff/rules/function-call-in-default-argument/
🏁 Script executed:
cd multimodal/medical_document_parser && head -120 app.py | tail -70Repository: Sumanth077/Hands-On-AI-Engineering
Length of output: 2416
🌐 Web query:
Gradio 5.0.0 API: How to pass css and theme arguments to Blocks and launch methods? What is the correct pattern?
💡 Result:
Correct pattern in Gradio (including 5.0.0-era and current docs): pass theme and CSS-related arguments to Blocks’ launch method, not to the Blocks constructor. 1) Theme - Use demo.launch(theme=...) Example: with gr.Blocks as demo:... demo.launch(theme=gr.themes.Soft) This is explicitly shown in the Gradio theming guide: “pass the theme= kwarg to the launch method of Blocks” [1]. 2) Custom CSS - Use demo.launch(css=...) for a CSS string - Use demo.launch(css_paths=...) for one or more CSS files (path objects) Example: with gr.Blocks as demo:... demo.launch(css=".gradio-container { background-color: red; }") demo.launch(css_paths=["./styles/base.css", "./styles/overrides.css"]) The “Custom CSS and JS” guide shows passing css= and css_paths= to launch [2]. 3) What changed vs passing to Blocks(...) - Gradio moved styling-related parameters (theme, css, css_paths, js, head, head_paths) from the Blocks constructor to the launch method. The codebase includes a warning message stating that these parameters “have been moved from the Blocks constructor to the launch method in Gradio 6.0: theme, css, css_paths, js, head, head_paths. Please pass these parameters to launch instead.” [3]. So, the “correct pattern” is: - Build UI inside the Blocks context - Call launch(...) with theme= and css=/css_paths= (and any other styling-related args) If you currently have something like gr.Blocks(theme=..., css=...), update it to gr.Blocks + demo.launch(theme=..., css=..., css_paths=...). [3][1][2]
Citations:
- 1: https://www.gradio.app/guides/theming-guide
- 2: https://www.gradio.app/guides/custom-CSS-and-JS
- 3: gradio-app/gradio@d3966ca
Avoid function-call default for progress (shared default evaluation).
At Line 33, gr.Progress() is evaluated at function definition time, creating a single shared instance. Use None as the default and instantiate inside the function instead to create a fresh instance per call.
Proposed fix
def parse_document(
upload: str | None,
- progress: gr.Progress = gr.Progress(),
+ progress: gr.Progress | None = None,
) -> tuple[dict, str]:
+ if progress is None:
+ progress = gr.Progress()
+
if upload is None:
raise gr.Error("Please upload a medical PDF or image.")🧰 Tools
🪛 Ruff (0.15.12)
[warning] 33-33: Do not perform function call gr.Progress in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@multimodal/medical_document_parser/app.py` around lines 31 - 34, The
parse_document function currently uses a mutable/shared default by setting
progress: gr.Progress = gr.Progress() at definition time; change the signature
to accept progress: gr.Progress | None = None and inside parse_document check if
progress is None then set progress = gr.Progress() so each call gets a fresh
Progress instance (refer to parse_document and its progress parameter when
making this change).
| with gr.Blocks(title="Medical Document Parser") as demo: | ||
| gr.Markdown( |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
For gradio==5.0.0, does gr.Blocks.launch() accept cssandtheme keyword arguments, or must they be passed to gr.Blocks(...) when constructing the app?
💡 Result:
For gradio==5.0.0, css and theme must be passed to gr.Blocks(...) when constructing the app, not to gr.Blocks.launch(). The move of these parameters from the Blocks constructor to launch() occurred in Gradio 6.x as a breaking change[1][2][3]. Gradio 5 docs and examples show them in the constructor[4][5][6][7][8], and the Gradio 6 migration guide contrasts "Before (Gradio 5.x)" with constructor usage vs "After (Gradio 6.x)" with launch[1]. The 5.0.0 release notes do not mention adding them to launch[9]. In later versions (post-6.0), they were temporarily restored to the constructor with warnings[3][10].
Citations:
- 1: https://www.gradio.app/main/guides/gradio-6-migration-guide
- 2: https://github.com/gradio-app/gradio/blob/main/gradio/CHANGELOG.md
- 3: gradio-app/gradio@d3966ca
- 4: https://www.gradio.app/4.44.1/docs/gradio/themes
- 5: https://www.gradio.app/docs/gradio/blocks
- 6: https://gradio.app/docs/gradio/blocks
- 7: https://mintlify.com/gradio-app/gradio/api/themes
- 8: gradio-app/gradio@c3d93be
- 9: https://github.com/gradio-app/gradio/releases/tag/gradio@5.0.0
- 10: Restore Blocks constructor args deprecated in 6.0 (theme, css, etc.) gradio-app/gradio#12502
🏁 Script executed:
cat -n multimodal/medical_document_parser/app.py | sed -n '60,115p'Repository: Sumanth077/Hands-On-AI-Engineering
Length of output: 2198
🏁 Script executed:
# Check for CUSTOM_CSS definition and Gradio version requirement
head -30 multimodal/medical_document_parser/app.pyRepository: Sumanth077/Hands-On-AI-Engineering
Length of output: 804
🏁 Script executed:
# Look for requirements or version specs
find . -name "requirements*.txt" -o -name "setup.py" -o -name "pyproject.toml" | head -5Repository: Sumanth077/Hands-On-AI-Engineering
Length of output: 324
🏁 Script executed:
cat multimodal/medical_document_parser/requirements.txtRepository: Sumanth077/Hands-On-AI-Engineering
Length of output: 180
Move css/theme to gr.Blocks(...); launch() should not receive them.
The code passes css and theme to launch() (line 110), which is incompatible with Gradio 5.x. Per the requirements.txt (gradio>=5.0.0), these parameters must be configured at Blocks construction (line 65). Gradio 5 does not accept these kwargs in launch().
Proposed fix
def build_app() -> gr.Blocks:
- with gr.Blocks(title="Medical Document Parser") as demo:
+ with gr.Blocks(
+ title="Medical Document Parser",
+ css=CUSTOM_CSS,
+ theme=gr.themes.Soft(),
+ ) as demo:
gr.Markdown(
"""
# Medical Document Parser
@@
if __name__ == "__main__":
- build_app().launch(css=CUSTOM_CSS, theme=gr.themes.Soft())
+ build_app().launch()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@multimodal/medical_document_parser/app.py` around lines 65 - 66, The Gradio
css and theme kwargs are being passed to demo.launch() but Gradio 5.x requires
them on Blocks construction; update the gr.Blocks(...) call that creates demo
(the gr.Blocks(title="Medical Document Parser") as demo block) to include the
css and theme arguments (e.g., css=..., theme=...) and remove css and theme from
the demo.launch(...) invocation so launch() is called without those kwargs.
| def process_upload(file_path: str | Path) -> list[ProcessedPage]: | ||
| path = Path(file_path) | ||
| suffix = path.suffix.lower() | ||
|
|
||
| if suffix == ".pdf": | ||
| pages: list[ProcessedPage] = [] | ||
| with fitz.open(path) as document: | ||
| for page in document: | ||
| pages.append(_classify_pdf_page(page)) | ||
| return pages | ||
|
|
||
| if suffix in {".png", ".jpg", ".jpeg", ".webp", ".bmp", ".tif", ".tiff"}: | ||
| return [_image_to_page(path.read_bytes())] | ||
|
|
||
| raise ValueError(f"Unsupported file type: {suffix or 'unknown'}") |
There was a problem hiding this comment.
Add file size validation to prevent memory exhaustion.
The function processes uploaded files without checking their size. Large PDF files or high-resolution images could consume excessive memory when loaded entirely into RAM, potentially causing application crashes or denial-of-service conditions.
🛡️ Proposed fix to add size validation
+MAX_FILE_SIZE = 50 * 1024 * 1024 # 50 MB
+
def process_upload(file_path: str | Path) -> list[ProcessedPage]:
path = Path(file_path)
+
+ file_size = path.stat().st_size
+ if file_size > MAX_FILE_SIZE:
+ raise ValueError(f"File too large: {file_size / (1024*1024):.1f} MB (max {MAX_FILE_SIZE / (1024*1024):.0f} MB)")
+
suffix = path.suffix.lower()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def process_upload(file_path: str | Path) -> list[ProcessedPage]: | |
| path = Path(file_path) | |
| suffix = path.suffix.lower() | |
| if suffix == ".pdf": | |
| pages: list[ProcessedPage] = [] | |
| with fitz.open(path) as document: | |
| for page in document: | |
| pages.append(_classify_pdf_page(page)) | |
| return pages | |
| if suffix in {".png", ".jpg", ".jpeg", ".webp", ".bmp", ".tif", ".tiff"}: | |
| return [_image_to_page(path.read_bytes())] | |
| raise ValueError(f"Unsupported file type: {suffix or 'unknown'}") | |
| MAX_FILE_SIZE = 50 * 1024 * 1024 # 50 MB | |
| def process_upload(file_path: str | Path) -> list[ProcessedPage]: | |
| path = Path(file_path) | |
| file_size = path.stat().st_size | |
| if file_size > MAX_FILE_SIZE: | |
| raise ValueError(f"File too large: {file_size / (1024*1024):.1f} MB (max {MAX_FILE_SIZE / (1024*1024):.0f} MB)") | |
| suffix = path.suffix.lower() | |
| if suffix == ".pdf": | |
| pages: list[ProcessedPage] = [] | |
| with fitz.open(path) as document: | |
| for page in document: | |
| pages.append(_classify_pdf_page(page)) | |
| return pages | |
| if suffix in {".png", ".jpg", ".jpeg", ".webp", ".bmp", ".tif", ".tiff"}: | |
| return [_image_to_page(path.read_bytes())] | |
| raise ValueError(f"Unsupported file type: {suffix or 'unknown'}") |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@multimodal/medical_document_parser/document_processor.py` around lines 69 -
83, process_upload lacks file size checks and can load very large PDFs/images
into memory; add a configurable MAX_UPLOAD_SIZE_BYTES constant and validate
path.stat().st_size against it at the top of process_upload, raising ValueError
if exceeded, before calling fitz.open or path.read_bytes(); keep the checks when
handling both the PDF branch (before looping into _classify_pdf_page) and the
image branch (before invoking _image_to_page) so large files are rejected early.
| def extract_from_page(client: genai.Client, page: ProcessedPage) -> ClinicalProfile: | ||
| config = _generation_config() | ||
| page_context = f"{EXTRACTION_PROMPT}\n\nPage {page.page_number}." | ||
|
|
||
| if page.kind == "text" and page.text: | ||
| response = client.models.generate_content( | ||
| model=MODEL_ID, | ||
| contents=f"{page_context}\n\nDocument text:\n{page.text}", | ||
| config=config, | ||
| ) | ||
| elif page.kind == "vision" and page.image_bytes: | ||
| response = client.models.generate_content( | ||
| model=MODEL_ID, | ||
| contents=[ | ||
| types.Part.from_bytes(data=page.image_bytes, mime_type="image/png"), | ||
| page_context, | ||
| ], | ||
| config=config, | ||
| ) | ||
| else: | ||
| return ClinicalProfile() | ||
|
|
||
| if not response.text: | ||
| return ClinicalProfile() | ||
|
|
||
| return _parse_response_text(response.text) |
There was a problem hiding this comment.
Add error handling and timeout for API calls.
The Gemini API calls lack error handling for network failures, rate limits, authentication issues, or timeouts. In production, external API calls should be wrapped with proper exception handling and timeout configuration to prevent indefinite hangs and provide meaningful error messages.
🛡️ Proposed fix to add error handling and timeout
+from google.genai.errors import ClientError
+
def extract_from_page(client: genai.Client, page: ProcessedPage) -> ClinicalProfile:
config = _generation_config()
page_context = f"{EXTRACTION_PROMPT}\n\nPage {page.page_number}."
- if page.kind == "text" and page.text:
- response = client.models.generate_content(
- model=MODEL_ID,
- contents=f"{page_context}\n\nDocument text:\n{page.text}",
- config=config,
- )
- elif page.kind == "vision" and page.image_bytes:
- response = client.models.generate_content(
- model=MODEL_ID,
- contents=[
- types.Part.from_bytes(data=page.image_bytes, mime_type="image/png"),
- page_context,
- ],
- config=config,
- )
- else:
+ try:
+ if page.kind == "text" and page.text:
+ response = client.models.generate_content(
+ model=MODEL_ID,
+ contents=f"{page_context}\n\nDocument text:\n{page.text}",
+ config=config,
+ )
+ elif page.kind == "vision" and page.image_bytes:
+ response = client.models.generate_content(
+ model=MODEL_ID,
+ contents=[
+ types.Part.from_bytes(data=page.image_bytes, mime_type="image/png"),
+ page_context,
+ ],
+ config=config,
+ )
+ else:
+ return ClinicalProfile()
+ except ClientError as e:
+ # Log error and return empty profile rather than crashing
+ print(f"API error on page {page.page_number}: {e}")
return ClinicalProfile()
+ except Exception as e:
+ print(f"Unexpected error on page {page.page_number}: {e}")
+ return ClinicalProfile()Note: The google-genai SDK may require additional timeout configuration via client initialization or request parameters. Check the SDK documentation for the proper timeout mechanism.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@multimodal/medical_document_parser/llm_extractor.py` around lines 57 - 82,
Wrap the calls to client.models.generate_content inside extract_from_page with a
try/except that catches network/HTTP/auth/timeouts (SDK-specific exceptions and
a generic Exception), enforce a request timeout via the SDK or by adding a
timeout param from _generation_config (or client init) and return an empty
ClinicalProfile or propagate a clear, logged error when failures occur; ensure
both the text and vision branches use the same guarded call, log or attach the
exception details, and keep _parse_response_text usage unchanged for successful
responses.
| medical-document-parser/ | ||
| ├── app.py # Gradio UI and orchestration |
There was a problem hiding this comment.
Use the real repository path/name in the project tree.
The tree root (medical-document-parser/) doesn’t match the documented repo location (multimodal/medical_document_parser). Aligning names avoids navigation confusion.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@multimodal/medical_document_parser/README.md` around lines 89 - 90, The
project tree root in README.md is incorrect—replace the top-level folder name
"medical-document-parser/" with the repository path
"multimodal/medical_document_parser/" so the tree header matches the actual repo
layout and avoids confusion; update the tree line containing
"medical-document-parser/" (above the listed files like app.py) to
"multimodal/medical_document_parser/".
| gradio>=5.0.0 | ||
| google-genai>=1.0.0 | ||
| PyMuPDF>=1.24.0 | ||
| Pillow>=10.0.0 |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
What is the first Pillow version that fixes GHSA-3f63-hfp8-52jq, GHSA-44wm-f244-xhp3, GHSA-j7hp-h8jx-5ppr, GHSA-r73j-pqj5-w3x7, and GHSA-wjx4-4jcj-g98j?
💡 Result:
First fixed Pillow version for each GHSA: - GHSA-3f63-hfp8-52jq: Pillow 10.2.0 [1] - GHSA-44wm-f244-xhp3: Pillow 10.3.0 [2] - GHSA-j7hp-h8jx-5ppr: Pillow 10.0.1 [3] - GHSA-r73j-pqj5-w3x7: Pillow 12.2.0 [4] - GHSA-wjx4-4jcj-g98j: Pillow 12.2.0 [5] (“First fixed” interpreted as the patched version specified by each advisory/summary.)
Citations:
- 1: https://duartecsantos.github.io/2024-01-02-CVE-2023-50447/
- 2: GHSA-44wm-f244-xhp3
- 3: https://security.snyk.io/vuln/SNYK-PYTHON-PILLOW-5918878
- 4: GHSA-r73j-pqj5-w3x7
- 5: https://github.com/python-pillow/Pillow/security
Bump Pillow to ≥12.2.0 to address critical image parsing vulnerabilities.
The current floor of Pillow>=10.0.0 allows multiple known vulnerabilities (GHSA-3f63-hfp8-52jq, GHSA-44wm-f244-xhp3, GHSA-j7hp-h8jx-5ppr, GHSA-r73j-pqj5-w3x7, GHSA-wjx4-4jcj-g98j) that are particularly relevant since this application parses user-supplied images and PDF-derived content. Pillow 12.2.0 is the first version that patches all identified advisories.
Suggested diff
-Pillow>=10.0.0
+Pillow>=12.2.0📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Pillow>=10.0.0 | |
| Pillow>=12.2.0 |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@multimodal/medical_document_parser/requirements.txt` at line 4, Update the
Pillow dependency in requirements.txt from "Pillow>=10.0.0" to "Pillow>=12.2.0"
to ensure the package version includes fixes for the listed CVEs; locate the
Pillow line in multimodal/medical_document_parser/requirements.txt and change
the version floor to 12.2.0, then run dependency install/check (e.g., pip check
or CI dependency audit) to confirm no conflicts.
| class PatientInfo(BaseModel): | ||
| name: str = "" | ||
| age: str = "" | ||
| date: str = "" |
There was a problem hiding this comment.
Flag PII handling and compliance requirements.
This model stores Protected Health Information (PHI) including patient names. The codebase sends this data to Google's Gemini API (as seen in llm_extractor.py). Medical document parsers must comply with HIPAA, GDPR, or similar regulations when handling patient data.
Ensure:
- Users provide explicit consent for external API transmission
- Data is encrypted in transit and at rest
- A Business Associate Agreement (BAA) exists with Google
- Audit logging tracks all PHI access
- Consider de-identification or anonymization before LLM processing
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@multimodal/medical_document_parser/schemas.py` around lines 8 - 11, The
PatientInfo model currently holds PHI (name, age, date) and must not be sent to
external LLMs without safeguards: update the PatientInfo class and the code
paths in llm_extractor.py to require explicit user consent (e.g., a boolean
consent flag or signed consent token) before transmission, add an option to
de-identify/anonymize fields via a new method (e.g., PatientInfo.deidentify())
and use that by default, ensure all network calls use TLS and that persisted PHI
is stored encrypted (enable encryption-at-rest config checks), add a runtime
check that a configured BAA_with_google flag or documented BAA approval is
present before calling Gemini, and emit audit log entries (who, what, when,
purpose) whenever PatientInfo is accessed or sent to an external API; implement
these checks centrally where PatientInfo is serialized for llm_extractor.py so
they cannot be bypassed.
Description
Adds a Medical Document Parser that ingests lab reports, prescriptions, and clinical notes — classifying each PDF page as text or vision, routing to the appropriate extraction path, and using Gemma 4 with thinking mode to return a unified JSON clinical profile with abnormal and critical values flagged in red.
Summary by CodeRabbit
New Features
Documentation