Skip to content

[Bug] - Document extraction fails with Ollama models for both PDF and TXT files for Synthetic Data Generation #928

@casualcomputer

Description

@casualcomputer

Describe the bug
I cannot extract documents using Ollama models. I've tried both PDF files and TXT files, and both fail to extract with the error "2 documents failed to extract. Retry or delete documents."

Checks

To Reproduce
Steps to reproduce the behavior:

  1. Go to 'Synthetic Data Generation'
  2. Add PDF documents (doc1.pdf, doc2.pdf for try 1, doc1.txt, doc2.txt for try 2)
  3. Click 'Retry Extraction'
  4. Select extractor: 'Qwen 3 VL Thinking 8B (Vision-Language) (Ollama) - Markdown'
  5. Click 'Run Extraction'
  6. See error: "2 documents failed to extract"
  7. Tried converting PDFs to TXT files
  8. Changed extractor to 'Llama 3.2 11B (Vision) (Ollama) - Text'
  9. Extraction still fails with same error

Expected behavior
Documents should extract successfully and show "Extracted" status, allowing me to proceed to Q&A generation.

Screenshots
The UI shows "2 documents failed to extract. Retry or delete documents." The TXT files show as "Extracted" in the list, but the error message persists.

System Information:

  • OS: Win11
  • Browser: Comet
  • Kiln app Version: 0.23.0

Additional context
This appears to be related to issue #814 "[Bug] - RAG Feature not working when using Ollama". I've tried both PDF and TXT file formats with different Ollama vision models (Qwen 3 VL Thinking 8B and Llama 3.2 11B Vision), but extraction consistently fails.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions