Feature Description
Include confidence scores for each extracted element to help downstream processing decide which elements to trust.
Use Case
In our enterprise RAG pipeline, we process thousands of PDFs daily. Some elements are extracted with low confidence (rotated tables, scanned handwritten notes). Having confidence scores would let us:
- Filter out low-confidence extractions
- Route uncertain elements to manual review
- Weight chunk importance in retrieval
Current Behavior
All extracted elements are treated equally regardless of extraction quality.
Proposed Enhancement
element.metadata.confidence_score # 0.0 - 1.0
element.metadata.extraction_method # 'ocr', 'native', 'inferred'
This would significantly improve RAG quality for noisy document sources. Thank you!
Feature Description
Include confidence scores for each extracted element to help downstream processing decide which elements to trust.
Use Case
In our enterprise RAG pipeline, we process thousands of PDFs daily. Some elements are extracted with low confidence (rotated tables, scanned handwritten notes). Having confidence scores would let us:
Current Behavior
All extracted elements are treated equally regardless of extraction quality.
Proposed Enhancement
This would significantly improve RAG quality for noisy document sources. Thank you!