This Python script processes a PDF file by extracting its text content, performing some text modifications, and saving the result to a text file.
- Python 3.x
- pdfplumber library (
pip install pdfplumber)
- Set the
path_to_pdfvariable to the path of your input PDF file. - Set the
original_txtvariable to the desired output text file name. - (Optional) Add strings to the
strings_to_deletelist if you want to remove specific content from the text. - Run the script.
- 📂 The script opens the specified PDF file using pdfplumber.
- 📝 It extracts text from each page of the PDF and writes it to the
original_txtfile. - 🔄 The script then reads the
original_txtfile, processes each line by:- 🗑️ Removing specified strings (if any)
- ➕ Adding two newline characters before lines starting with "Frage"
- 💾 The processed text is written to a temporary file (
temp.txt). - 🔁 Finally, the temporary file replaces the original text file.
Happy PDF processing! 📚✨