This project uses Natural Language Processing (NLP) and Machine Learning to classify consumer complaint narratives into banking product categories.
The goal is to help financial institutions and regulatory teams organize customer complaints more efficiently, identify common issue areas, and support faster response and resolution.
Financial institutions receive large volumes of customer complaints in free-text format. Manually reviewing and categorizing these complaints can be time-consuming and inconsistent.
This project automates complaint classification by predicting the product category based on the complaint narrative.
Input: Consumer complaint narrative
Output: Predicted banking product category
Example categories: Mortgage, Credit Card, Student Loan, Credit Reporting
The dataset is based on the public Consumer Financial Protection Bureau (CFPB) Consumer Complaint Database.
Main columns used:
consumer_complaint_narrative: Complaint textproduct: Target product category
- Removed missing complaint narratives
- Converted text to lowercase
- Removed punctuation and special characters
- Removed stopwords
- Applied lemmatization
- Applied TF-IDF vectorization to convert text into numerical features
The following machine learning models were tested:
- Logistic Regression
- Random Forest Classifier
- XGBoost Classifier
Models were evaluated using:
- Accuracy
- Precision
- Recall
- F1-score
- Confusion Matrix
The best-performing model was XGBoost Classifier.
Key results:
- Accuracy: 0.90
- Macro F1-score: 0.74
- Strong performance for categories such as Mortgage and Credit Reporting
- Lower performance for underrepresented categories with fewer samples
- Python
- Pandas
- NumPy
- Scikit-learn
- XGBoost
- NLTK
- Matplotlib
- Seaborn
- Jupyter Notebook
- Clone the repository:
git clone https://github.com/bahar-data/bank-customer-complaints-nlp.git
cd bank-customer-complaints-nlp