Name	Name	Last commit message	Last commit date
parent directory ..
outputs	outputs
written_solution	written_solution
README.md	README.md
__init__.py	__init__.py
collect_submission.sh	collect_submission.sh
gpu_requirements.txt	gpu_requirements.txt
local_env.yml	local_env.yml
model_embeddings.py	model_embeddings.py
nmt_model.py	nmt_model.py
run.py	run.py
run.sh	run.sh
sanity_check.py	sanity_check.py
utils.py	utils.py
vocab.py	vocab.py

Name

Last commit message

Last commit date

written_solution

README.md

__init__.py

collect_submission.sh

Neural Machine Translation (NMT) Assignment

Task

Machine Translation:
Convert from Spanish to English.

Data

zip file

Result

Corpus BLEU score

27.06437722023294 (Based upon the model trained for 13 epochs.)

Interpretation of BLEU score

The gist is clear, but has significant grammatical errors.

For details, have a look at the table which describes the interpretation of what the score range means.

Output

csv file
Columns:
- source: Spanish (source) sentences
- translation_reference: English (target) reference sentences
- translation_hypothesis: English translation by NMT model
- sentence_bleu_score: Sentence BLEU score

Probability density distribution of sentence BLEU scores

Gaussian kernel density estimate plot using Seaborn's distplot.
The above distribution shows that a significant number of translated sentences have very poor BLEU score (almost 0).

Errata

Assignment code had the following error in the function utils.py # read_corpus()

Many sentences in test.en have consecutive multiple space characters. line.strip().split(' ') leads to empty strings in the split output. Whereas the default sep parameter (i.e. None) of split discards the empty strings from the output.

This led to increase of BLEU score by approximately 4.6.

Reference

https://stackoverflow.com/questions/2492415/how-can-i-split-by-1-or-more-occurrences-of-a-delimiter-in-python

Comparison

Marian NMT

Corpus BLEU score: 34.55419013672562
Google Colab notebook
- Notebook uses Marian NMT's HuggingFace model
Probability density distribution of sentence BLEU scores:
Observation: Unlike the implementation in this repository, MarianNMT's model has very few close to 0 sentence BLEU score.
Translation Output: csv file
Website: https://marian-nmt.github.io/

Note

Assignment heavily inspired by the https://github.com/pcyin/pytorch_nmt repository

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Neural Machine Translation (NMT) Assignment

Task

Data

Result

Corpus BLEU score

Interpretation of BLEU score

Output

Probability density distribution of sentence BLEU scores

Errata

Reference

Comparison

Marian NMT

Note

FilesExpand file tree

a4

Directory actions

More options

Directory actions

More options

Latest commit

History

a4

Folders and files

parent directory

README.md

Neural Machine Translation (NMT) Assignment

Task

Data

Result

Corpus BLEU score

Interpretation of BLEU score

Output

Probability density distribution of sentence BLEU scores

Errata

Reference

Comparison

Marian NMT

Note