Skip to content

bahar-data/banking-complaints-nlp-classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

Bank Customer Complaints - NLP Classification

Overview

This project uses Natural Language Processing (NLP) and Machine Learning to classify consumer complaint narratives into banking product categories.

The goal is to help financial institutions and regulatory teams organize customer complaints more efficiently, identify common issue areas, and support faster response and resolution.

Business Objective

Financial institutions receive large volumes of customer complaints in free-text format. Manually reviewing and categorizing these complaints can be time-consuming and inconsistent.

This project automates complaint classification by predicting the product category based on the complaint narrative.

Input: Consumer complaint narrative
Output: Predicted banking product category
Example categories: Mortgage, Credit Card, Student Loan, Credit Reporting

Dataset

The dataset is based on the public Consumer Financial Protection Bureau (CFPB) Consumer Complaint Database.

Main columns used:

  • consumer_complaint_narrative: Complaint text
  • product: Target product category

Methodology

1. Data Cleaning and Preprocessing

  • Removed missing complaint narratives
  • Converted text to lowercase
  • Removed punctuation and special characters
  • Removed stopwords
  • Applied lemmatization

2. Feature Engineering

  • Applied TF-IDF vectorization to convert text into numerical features

3. Modeling

The following machine learning models were tested:

  • Logistic Regression
  • Random Forest Classifier
  • XGBoost Classifier

4. Evaluation

Models were evaluated using:

  • Accuracy
  • Precision
  • Recall
  • F1-score
  • Confusion Matrix

Results

The best-performing model was XGBoost Classifier.

Key results:

  • Accuracy: 0.90
  • Macro F1-score: 0.74
  • Strong performance for categories such as Mortgage and Credit Reporting
  • Lower performance for underrepresented categories with fewer samples

Tools and Technologies

  • Python
  • Pandas
  • NumPy
  • Scikit-learn
  • XGBoost
  • NLTK
  • Matplotlib
  • Seaborn
  • Jupyter Notebook

How to Run

  1. Clone the repository:
git clone https://github.com/bahar-data/bank-customer-complaints-nlp.git
cd bank-customer-complaints-nlp

About

NLP and machine learning project for classifying banking customer complaints using Python, TF-IDF, and XGBoost.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors