Hi Maarten,
I was just wondering what the reason is for following a different procedure for replacing \n characters with the UN dataset versus the Trump dataset https://github.com/MaartenGr/BERTopic_evaluation/blob/main/evaluation/data.py#L227.
I guess it has something to do with the longer length of the UN documents, being from debates as opposed to short form tweets. But what benefit does indicating new paragraphs with \p have compared to just a space?
Thanks for your efforts on BERTopic.
Hi Maarten,
I was just wondering what the reason is for following a different procedure for replacing \n characters with the UN dataset versus the Trump dataset https://github.com/MaartenGr/BERTopic_evaluation/blob/main/evaluation/data.py#L227.
I guess it has something to do with the longer length of the UN documents, being from debates as opposed to short form tweets. But what benefit does indicating new paragraphs with \p have compared to just a space?
Thanks for your efforts on BERTopic.