Fix/caption failure logging visibility by Nancy-3012 · Pull Request #608 · param20h/PDF-Assistant-RAG

Nancy-3012 · 2026-06-15T04:24:13Z

📝 What does this PR do?

Improves visibility of image captioning failures in generate_captions_for_chunks
and _openai_caption. Previously, exceptions were silently swallowed with
logger.debug, meaning admins had no way to know that image data was being
permanently discarded. This PR upgrades both to logger.warning with actionable
messages and full tracebacks via exc_info=True.

Fixes #590

🗂️ Type of Change

🐛 Bug fix
✨ New feature

🧪 How was this tested?

Ran the frontend locally (npm run dev inside frontend/)

📸 Screenshots (if UI change)

N/A

⚠️ Anything to flag for reviewers?

No flags.

✅ Self-Review Checklist

My branch is based on dev, not main
I have not added any secrets / API keys
I have not modified main branch or any HuggingFace deployment config
My code follows the existing style (no unnecessary formatting changes)
I have updated relevant docs / comments if needed

…lure (param20h#590)

param20h · 2026-06-15T10:17:34Z

I'll retrieve the logs for job 81396718755 to identify the failure and provide a solution.The primary failure is a Python syntax error in the code. Let me retrieve the problematic file to see the exact issue:## Solution for Job 81396718755

The job is failing due to an indentation error in backend/app/rag/vision.py at line 185.

Root Cause

Line 185 has incorrect indentation. The except statement uses 3 spaces instead of 4, causing Python to raise an IndentationError: unindent does not match any outer indentation level.

The Problem

183|         return response.choices[0].message.content.strip()
184| 
185|    except Exception as exc:  # ❌ Only 3 spaces (should be 4)
186|         logger.warning(

The Fix

Change line 185 to use 4 spaces for indentation:

183|         return response.choices[0].message.content.strip()
184| 
185|     except Exception as exc:  # ✅ 4 spaces
186|         logger.warning(
187|             "OpenAI vision caption failed — falling back to OCR/placeholder. "
188|             "This may be a transient API error (rate-limit, timeout). Error: %s",
189|             exc,
190|             exc_info=True,
191|         )
192|         return ""

Additionally, there's duplicate code in the file (lines 234-254). The caption_image() function has redundant logic that should be removed. Here's the corrected version:

def caption_image(
    image_bytes: "bytes | List[bytes]",
    page: "int | List[int] | None" = None,
) -> "str | List[str]":
    """Generate a caption for a single image or a batch of images.

    Resolution order:
      1. Configured VLM provider (set VISION_PROVIDER in .env)
      2. Local OCR via pytesseract
      3. Placeholder string with page number and dimensions
    """
    if isinstance(image_bytes, list):
        pages = (
            page if isinstance(page, list)
            else ([page] * len(image_bytes) if page is not None else [None] * len(image_bytes))
        )
        return [caption_image(img, pg) for img, pg in zip(image_bytes, pages)]

    # Strategy: try the configured VLM provider
    provider = get_vision_provider(getattr(settings, "VISION_PROVIDER", None))
    if provider is not None:
        result = provider.caption(image_bytes)
        if result:
            return result

    # Fallback 1: local OCR
    ocr = _ocr_caption(image_bytes)
    if ocr:
        return ocr

    # Fallback 2: placeholder
    try:
        pix = fitz.Pixmap(image_bytes)
        dims = f"{pix.width}x{pix.height} px"
    except Exception:
        dims = "unknown size"

    return f"Figure on page {page} ({dims})." if page else f"Figure ({dims})."

Remove lines 235-254 as they duplicate the logic already present above.

View the file

Nancy-3012 · 2026-06-15T12:11:24Z

@param20h please merge this branch

Nancy-3012 · 2026-06-16T04:25:06Z

@param20h please merge this branch

Nancy-3012 · 2026-06-21T10:42:01Z

@param20h please merge this branch

param20h · 2026-06-22T04:27:09Z

@param20h please merge this branch

running the checks

param20h · 2026-06-22T04:30:26Z

Backend issues still

Nancy-3012 added 2 commits June 15, 2026 08:54

Abstract VLM providers into an extensible Strategy Pattern

ffc80e2

fix: improve log visibility when image_bytes discarded on caption fai…

bf4e059

…lure (param20h#590)

Nancy-3012 requested a review from param20h as a code owner June 15, 2026 04:24

param20h previously approved these changes Jun 15, 2026

View reviewed changes

fix: resolve indentation error and remove dead code in vision.py

55b923f

Nancy-3012 dismissed param20h’s stale review via 55b923f June 15, 2026 11:21

param20h previously approved these changes Jun 15, 2026

View reviewed changes

Merge branch 'dev' into fix/caption-failure-logging-visibility

fbccd93

Nancy-3012 dismissed param20h’s stale review via fbccd93 June 22, 2026 04:20

param20h previously approved these changes Jun 22, 2026

View reviewed changes

Nancy-3012 added 2 commits June 22, 2026 10:17

fix: remove broken app.vision registry import in rag/vision.py

9fc472a

merge: resolve vision.py conflict, keep _openai_caption call

5028dc2

Nancy-3012 dismissed param20h’s stale review via 5028dc2 June 22, 2026 06:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/caption failure logging visibility#608

Fix/caption failure logging visibility#608
Nancy-3012 wants to merge 6 commits into
param20h:devfrom
Nancy-3012:fix/caption-failure-logging-visibility

Nancy-3012 commented Jun 15, 2026

Uh oh!

param20h commented Jun 15, 2026

Uh oh!

Nancy-3012 commented Jun 15, 2026

Uh oh!

Nancy-3012 commented Jun 16, 2026

Uh oh!

Nancy-3012 commented Jun 21, 2026

Uh oh!

param20h commented Jun 22, 2026

Uh oh!

param20h commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Nancy-3012 commented Jun 15, 2026

📝 What does this PR do?

🗂️ Type of Change

🧪 How was this tested?

📸 Screenshots (if UI change)

⚠️ Anything to flag for reviewers?

✅ Self-Review Checklist

Uh oh!

param20h commented Jun 15, 2026

Root Cause

The Problem

The Fix

Uh oh!

Nancy-3012 commented Jun 15, 2026

Uh oh!

Nancy-3012 commented Jun 16, 2026

Uh oh!

Nancy-3012 commented Jun 21, 2026

Uh oh!

param20h commented Jun 22, 2026

Uh oh!

param20h commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants