Skip to content

logs18:RL test train seq2seq first

Higepon Taro Minowa edited this page May 17, 2018 · 4 revisions
Log Type Detail
1: What specific output am I working on right now? In 16, RL seemed working. If it starts from scratch. In this experiment. We test if it can converge even if it's trained in seq2seq first. This will ensure most likely reply + best len appears in the result.
2: Thinking out loud
- hypotheses about the current problem
- what to work on next
- how can I verify
If it converges len == 2 with a reasonable reply, it's working.
3: A record of currently ongoing runs along with a short reminder of what question each run is supposed to answer
4: Results of runs and conclusion same as logs17:RL test len equals 2 is the best, it got stuck with all -1 reward
5: Next steps see if kind random negative reward makes it better.
6: mega.nz rl_test_20180517111949
seq2seq
RL

hparam| {'machine': 'client2', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.5, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 28, 'decoder_length': 28, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 1248, 'model_path': 'model/tweet_large'} dst {'machine': 'client2', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.1, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 28, 'decoder_length': 28, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 1560, 'model_path': 'model/tweet_large_rl'}

Clone this wiki locally