Intelligent SMS filtering using NLP and Deep Learning
- Overview
- Features
- Demo
- How It Works
- Tech Stack
- Installation
- Usage
- Model Architecture
- Performance
- Dataset
- API Documentation
- Deployment
- Contributing
- License
A machine learning-powered application that identifies spam messages using Natural Language Processing (NLP). The system:
- Analyzes SMS/text messages in real-time
- Classifies text as Spam or Ham (legitimate)
- Provides confidence scores for predictions
- Processes messages instantly with <100ms latency
- Features an intuitive web interface
With over 45% of SMS messages being spam globally, this tool helps:
- Protect users from phishing attempts
- Prevent financial scams
- Filter malicious links and content
- Save time by auto-filtering unwanted messages
- NLP-Based Classification - Advanced text processing
- TF-IDF Vectorization - Smart feature extraction
- Deep Learning Model - TensorFlow/Keras neural network
- Real-Time Prediction - Instant message analysis
- Confidence Scoring - Probability-based results
- Interactive Dashboard - Streamlit web interface
- Pattern Recognition - Identify common spam patterns
- URL Detection - Flag suspicious links
- Phone Number Extraction - Identify spam sender patterns
# Launch the Streamlit app
streamlit run app.py┌──────────────┐
│ Input Text │
│ "FREE PRIZE" │
└──────┬───────┘
│
▼
┌──────────────────┐
│ Text Cleaning │
│ • Lowercase │
│ • Remove punct. │
│ • Tokenization │
└────────┬─────────┘
│
▼
┌──────────────────┐
│ Feature │
│ Extraction │
│ • TF-IDF │
│ • N-grams │
│ • Word vectors │
└────────┬─────────┘
│
▼
┌──────────────────┐
│ Neural Network │
│ Classification │
│ • Dense layers │
│ • Dropout │
│ • Softmax output │
└────────┬─────────┘
│
▼
┌──────────────────┐
│ Prediction │
│ SPAM: 98.7% │
│ HAM: 1.3% │
└──────────────────┘
- Frontend: Streamlit
- Machine Learning: TensorFlow / Keras
- NLP: NLTK, Scikit-learn (TF-IDF Vectorization)
- Data Handling: Joblib, NumPy, Pandas
# 1. Clone & Activate
python -m venv .venv
.\.venv\Scripts\activate
# 2. Install dependencies
pip install -r requirements.txt
# 3. Run the app
streamlit run app.pymodel = Sequential([
# Input layer
Dense(128, activation='relu', input_shape=(5000,)),
Dropout(0.5),
# Hidden layers
Dense(64, activation='relu'),
Dropout(0.4),
Dense(32, activation='relu'),
Dropout(0.3),
# Output layer
Dense(2, activation='softmax') # Binary classification
])| Metric | Score |
|---|---|
| Accuracy | 98.2% |
| Precision | 97.8% |
| Recall | 96.5% |
Contributions welcome! Please follow these steps:
- Fork the repository
- Create feature branch
- Push to branch
- Open Pull Request
This project is licensed under the Apache License 2.0.
Au Amores - AI/ML Engineer
