Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
## 0.17.6-dev0
## 0.17.6-dev1

### Enhancements

### Features

### Fixes
- **Do not use NLP to determine element types for extracted elements with hi_res.** This avoids extraneous Title elements in hi_res outputs.

## 0.17.5

Expand Down
2 changes: 1 addition & 1 deletion unstructured/__version__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "0.17.6-dev0" # pragma: no cover
__version__ = "0.17.6-dev1" # pragma: no cover
5 changes: 4 additions & 1 deletion unstructured/partition/pdf.py
Original file line number Diff line number Diff line change
Expand Up @@ -362,7 +362,10 @@ def partition_pdf_or_image(
table_ocr_agent=table_ocr_agent,
**kwargs,
)
out_elements = _process_uncategorized_text_elements(elements)
# NOTE(crag): do not call _process_uncategorized_text_elements here, because
# extracted elements (which are text blocks outside of OD-determined blocks)
# are likely not Titles and should not be identified as such.
return elements

elif strategy == PartitionStrategy.FAST:
out_elements = _partition_pdf_with_pdfparser(
Expand Down
Loading