As part of the corpus creation process, the PDF content should be converted to text, and aggregated together into a large dataset.
This dataset should be stored into the data/papers/processed folder, and the script that creates it should be saved under src/papers/data/make_dataset.py file.
As part of the corpus creation process, the PDF content should be converted to text, and aggregated together into a large dataset.
This dataset should be stored into the
data/papers/processedfolder, and the script that creates it should be saved undersrc/papers/data/make_dataset.pyfile.