Skip to content

replacing new line characters #10

@ptear

Description

@ptear

Hi Maarten,

I was just wondering what the reason is for following a different procedure for replacing \n characters with the UN dataset versus the Trump dataset https://github.com/MaartenGr/BERTopic_evaluation/blob/main/evaluation/data.py#L227.

I guess it has something to do with the longer length of the UN documents, being from debates as opposed to short form tweets. But what benefit does indicating new paragraphs with \p have compared to just a space?

Thanks for your efforts on BERTopic.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions