You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The paper includes two steps of maximum and minimum, and contains two loss functions. So we have to train distillation.py twice?
What dose Quick start in README try to do?
And could you please provide the pseudo-training samples?
When I run it, I get an error: --config: command not found. But it's already written in the code, why did this error occur?
And also shows the next paragraph, I don't know if I need to do anything:
"Some weights of the model checkpoint at bert-base-uncased were not used when initializing Bert_For_Att_output_MLM: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
This IS expected if you are initializing Bert_For_Att_output_MLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing Bert_For_Att_output_MLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model)."
The paper includes two steps of maximum and minimum, and contains two loss functions. So we have to train distillation.py twice?
What dose Quick start in README try to do?
And could you please provide the pseudo-training samples?
When I run it, I get an error: --config: command not found. But it's already written in the code, why did this error occur?
And also shows the next paragraph, I don't know if I need to do anything:
"Some weights of the model checkpoint at bert-base-uncased were not used when initializing Bert_For_Att_output_MLM: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
This IS expected if you are initializing Bert_For_Att_output_MLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing Bert_For_Att_output_MLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model)."