This project analyzes YouTube comments to extract insights about audience sentiment, emotions, sarcasm, and discussion topics. The pipeline combines text preprocessing, sentiment analysis, sarcasm detection, emotion classification, and topic modeling to deliver a comprehensive understanding of community engagement.
- Automated data collection from YouTube API
- Data cleaning & preprocessing pipeline (text, authors, replies, timestamps)
- Sentiment analysis (discrete + thread-aware)
- Sarcasm detection with wordclouds for sarcastic comments
- Emotion classification using Google’s GoEmotions model
- Topic modeling with BERTopic
- Keyword extraction with KeyBERT per sentiment class
- Rich visualizations (wordclouds, bar charts, emotion/sentiment distributions)
-
Python (pandas, numpy, matplotlib, seaborn, wordcloud)
-
NLP: HuggingFace Transformers (GoEmotions, sentiment models), BERTopic, KeyBERT
-
Visualization: Matplotlib, WordCloud, Plotly
-
Data: YouTube API
-
% Positive, Negative, Neutral
-
Thread-aware sentiment shifts
-
% of sarcastic comments
-
Wordcloud of sarcasm-heavy terms
-
Top 5 most frequent emotions
-
Wordcloud of some of the emotions
-
Top topics discussed by viewers
-
Representative keywords per topic
- Key phrases per sentiment/emotion - Thread-aware sentiment shifts
- Sarcasm Detection
- Deploy interactive dashboard with Streamlit / Power BI / Looker Studio
- Fine-tune sarcasm/emotion models on domain-specific data