Skip to content

Latest commit

 

History

History
30 lines (23 loc) · 1.38 KB

File metadata and controls

30 lines (23 loc) · 1.38 KB

PDF Processing

This folder contains the PDF-native and validated-text fallback workflow used by CV optimization.

Main entry points

Data flow

flowchart TD
    A["Sanitized PDF upload"] --> B["Native provider PDF attempt"]
    B -->|"success"| C["Optimized HTML"]
    B -->|"explicit files/PDF failure"| D["Page-by-page text extraction"]
    D --> E["Quality heuristics"]
    E -->|"pass"| F["Same provider/model with extracted text"]
    E -->|"fail"| G["Explicit user-facing extraction error"]
    F --> C
Loading

Important behavior

  • Native PDF input is always attempted first.
  • Text fallback is allowed only for explicit file/PDF capability, file-credits, or file-endpoint failures.
  • Generic provider failures do not trigger fallback.
  • If extraction quality is suspicious, the workflow fails with an explicit no-OCR message instead of sending incomplete text to the LLM.