This method should be called after Doctor has extracted text from the binary content and works to clean up extraction artifacts, formatting issues, or unwanted text that appears in the extracted output. This is typically needed when the extraction process introduces unwanted characters, preserves headers/footers from the original document, or includes metadata that should be removed from the final plain text.
This will help us solve issue #6443 from CL
This method should be called after Doctor has extracted text from the binary content and works to clean up extraction artifacts, formatting issues, or unwanted text that appears in the extracted output. This is typically needed when the extraction process introduces unwanted characters, preserves headers/footers from the original document, or includes metadata that should be removed from the final plain text.
This will help us solve issue #6443 from CL