Skip to content

[Question]:Cannot process scanned PDF in Knowledge Base - "Skipped empty document" error #431

@Colada-K

Description

@Colada-K

Do you need to ask a question?

  • I have searched the existing questions and discussions and this question is not already answered.
  • I believe this is a legitimate question, not just a bug or feature request.

Your Question

Problem

When uploading a scanned PDF (image-based, no text layer) to the Knowledge Base,
the initialization fails with the following error:

Skipped empty document: xxx.pdf
No valid documents found
RAG pipeline returned failure

Steps to reproduce

  1. Upload a scanned PDF to Knowledge Base
  2. Initialization starts but fails immediately

My setup

  • DeepTutor version: v1.3.3
  • OS: Windows
  • LLM: DeepSeek
  • Embedding: Jina

Question

I installed magic-pdf hoping it would enable OCR for scanned PDFs,
but the error persists.

Is there a way to enable MinerU/magic-pdf for scanned PDF processing
in the current v1.x architecture? If so, how should it be configured?

Thank you!

Related Module

Knowledge Base Management

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions