Machine Translation:
Convert from Spanish to English.
27.06437722023294 (Based upon the model trained for 13 epochs.)
The gist is clear, but has significant grammatical errors.
For details, have a look at the table which describes the interpretation of what the score range means.
-
Columns:
- source: Spanish (source) sentences
- translation_reference: English (target) reference sentences
- translation_hypothesis: English translation by NMT model
- sentence_bleu_score: Sentence BLEU score
- Gaussian kernel density estimate plot using Seaborn's distplot.
- The above distribution shows that a significant number of translated sentences have very poor BLEU score (almost 0).
Assignment code had the following error in the function
utils.py # read_corpus()
Many sentences in test.en have consecutive multiple space characters.
line.strip().split(' ') leads to empty strings in the split output.
Whereas the default sep parameter (i.e. None) of split discards the empty strings from the output.
This led to increase of BLEU score by approximately 4.6.
- Corpus BLEU score: 34.55419013672562
- Google Colab notebook
- Notebook uses Marian NMT's HuggingFace model
- Probability density distribution of sentence BLEU scores:

- Observation: Unlike the implementation in this repository, MarianNMT's model has very few close to 0 sentence BLEU score.
- Translation Output: csv file
- Website: https://marian-nmt.github.io/
Assignment heavily inspired by the https://github.com/pcyin/pytorch_nmt repository
