I by chance noticed this, but all data formats for this particular dataset seem to end with spaces at the end of the lines. The original source files, from https://www.elrc-share.eu/repository/browse/covid-19-health-wikipedia-dataset-bilingual-en-zh/c6236d148de811ea913100155d026706c2a9a16f8fc74d0487006e8379d322a0/, don't seem to have this issue.
Also, these might be duplicates. The samples are different, but en-zh tmx is exactly the same except for the creation header:
I haven't checked all other ELRC imported datasets, but another en-zh didn't seem to have this issue.
I by chance noticed this, but all data formats for this particular dataset seem to end with spaces at the end of the lines. The original source files, from https://www.elrc-share.eu/repository/browse/covid-19-health-wikipedia-dataset-bilingual-en-zh/c6236d148de811ea913100155d026706c2a9a16f8fc74d0487006e8379d322a0/, don't seem to have this issue.
Also, these might be duplicates. The samples are different, but en-zh tmx is exactly the same except for the creation header:
I haven't checked all other ELRC imported datasets, but another en-zh didn't seem to have this issue.