Performance IMprovement Suggestions #3
Unanswered
Varunkumar2516
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Title: Improving Sentiment Analysis Accuracy (Currently 90%) – Suggestions Needed
Hello
I built a Movie Sentiment Analysis model using TF-IDF + Logistic Regression and achieved 90% accuracy on the IMDB dataset.
Current pipeline:
Text cleaning (HTML removal, contractions, punctuation removal)
Stopword removal (keeping negations)
Lemmatization with POS tagging
TF-IDF (max_features=45000, ngram_range=(1,2))
Models tried: Naive Bayes, KNN, Logistic Regression, SVM, Decision Tree
Goal: Improve accuracy to ~92 to 95%+
What I’ve tried:
Ensemble methods (did not improve significantly)
Hyperparameter tuning
Questions:
Are there better feature engineering techniques I should try?
Would word embeddings (Word2Vec, GloVe) help here?
Any suggestions for handling tricky cases like negations better?
Here is my notebook:
https://github.com/Varunkumar2516/IMDb-Sentiment-Analysis-NLP-Project/blob/master/1%20IMDB_Sentiment_Analyzer_Notebook%20.ipynb
Any suggestions or feedback would be really helpful. Thanks!
Beta Was this translation helpful? Give feedback.
All reactions