Create a dataset out of the PDFs content 

As part of the corpus creation process, the PDF content should be converted to text, and aggregated together into a large dataset.

This dataset should be stored into the `data/papers/processed` folder, and the script that creates it should be saved under `src/papers/data/make_dataset.py` file.