Text-Analytics-Project--Fall-2020

This project addresses the hate speech through building a multimodal classifier on Facebook hateful memes dataset

Dataset: Customized Facebook hate memes dataset for sarcasm/hateful labels

Google API was used to extract the text from Images and stored in a row
Various pre-processing techniques were used to solve OCR errors like word segmentation, Internet slang contractions, spelling correction using language models.
Conducted exploratory data analysis to understand the data(Used Topic modelling, word frequency plots, Named Entity Recognition using spacy language model, Bigrams & Trigrams to understand the conext of a meme)
Used pre-trained fasttext model with urban dictionary embeddings to get better representations of internet slang words
Built four models to check which type of model has outperformed others and can be used to improve the current algorithms
Insights from our current work and future work