Hello @TIAN-viola , I got this running on a VM with an Nvidia T4, running Debian 11 and Python 3.7.10. Thank you for providing this code and the instructions for the data set and model weights. When I run python test.py, indeed it reports an F1 score of 0.93 as described in your 2023 paper 10.18653/v1/2023.acl-long.139.
I did two experiments:
Replaced all images

then re-ran python convert_image_to_tensor_save.py (so all the .npy files are identical). Then python test.py still reports an F1 score of 0.93 (with only a single changed prediction over the set of 2,373 test cases). It seems that the model is able to perform nearly the same sarcasm/non-sarcasm classification without the original images.
Swapped ending whitespace
In clean_dataset.py I modified the captions by:
- Removing (
rstrip) ending whitespace from all lines that end in space.
- Append a single space character to all lines that do not end in space.
32c64,69
< test_text_new.append(line)
---
> if line.endswith(" "):
> test_text_new.append(line.rstrip())
> print("strip")
> else:
> test_text_new.append(line + " ")
> print(" added space")
After re-running python clean_dataset.py, python test.py reports an F1 score of 0.4 (with either the original or the changed images). It seems that the model is sensitive to ending whitespace. I am having difficulty understanding why ending whitespace should affect the sarcasm/non-sarcasm classification.
Please help? Perhaps I just made some silly mistake in running the model, preparing the files, installing the prerequisites. Happy to walk through this in detail.
Hello @TIAN-viola , I got this running on a VM with an Nvidia T4, running Debian 11 and Python 3.7.10. Thank you for providing this code and the instructions for the data set and model weights. When I run
python test.py, indeed it reports an F1 score of 0.93 as described in your 2023 paper 10.18653/v1/2023.acl-long.139.I did two experiments:
Replaced all images
then re-ran
python convert_image_to_tensor_save.py(so all the.npyfiles are identical). Thenpython test.pystill reports an F1 score of 0.93 (with only a single changed prediction over the set of 2,373 test cases). It seems that the model is able to perform nearly the same sarcasm/non-sarcasm classification without the original images.Swapped ending whitespace
In
clean_dataset.pyI modified the captions by:rstrip) ending whitespace from all lines that end in space.After re-running
python clean_dataset.py,python test.pyreports an F1 score of 0.4 (with either the original or the changed images). It seems that the model is sensitive to ending whitespace. I am having difficulty understanding why ending whitespace should affect the sarcasm/non-sarcasm classification.Please help? Perhaps I just made some silly mistake in running the model, preparing the files, installing the prerequisites. Happy to walk through this in detail.