Skip to content

Help validating model? #8

@rcyeh

Description

@rcyeh

Hello @TIAN-viola , I got this running on a VM with an Nvidia T4, running Debian 11 and Python 3.7.10. Thank you for providing this code and the instructions for the data set and model weights. When I run python test.py, indeed it reports an F1 score of 0.93 as described in your 2023 paper 10.18653/v1/2023.acl-long.139.

I did two experiments:

Replaced all images

inconceivable

then re-ran python convert_image_to_tensor_save.py (so all the .npy files are identical). Then python test.py still reports an F1 score of 0.93 (with only a single changed prediction over the set of 2,373 test cases). It seems that the model is able to perform nearly the same sarcasm/non-sarcasm classification without the original images.

Swapped ending whitespace

In clean_dataset.py I modified the captions by:

  • Removing (rstrip) ending whitespace from all lines that end in space.
  • Append a single space character to all lines that do not end in space.
32c64,69
<             test_text_new.append(line)
---
>             if line.endswith(" "):
>                 test_text_new.append(line.rstrip())
>                 print("strip")
>             else:
>                 test_text_new.append(line + " ")
>                 print("      added space")

After re-running python clean_dataset.py, python test.py reports an F1 score of 0.4 (with either the original or the changed images). It seems that the model is sensitive to ending whitespace. I am having difficulty understanding why ending whitespace should affect the sarcasm/non-sarcasm classification.

Please help? Perhaps I just made some silly mistake in running the model, preparing the files, installing the prerequisites. Happy to walk through this in detail.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions