Hackathon-ready full-stack ML project that analyzes real transaction datasets, detects behavioral patterns, flags anomalies, predicts spend direction, and delivers recommendations through a fintech-style React dashboard.
- Primary dataset:
PaySimfrom Kaggle
https://www.kaggle.com/datasets/ealaxi/paysim1 - Secondary anomaly reference:
Credit Card Fraud Detectionfrom Kaggle
https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud
The backend downloads both public datasets through kagglehub. No randomly generated training data is used.
graph LR
A[PaySim Kaggle Dataset] --> B[Preprocessing Layer]
A2[Credit Card Fraud Kaggle Dataset] --> B
B --> C[Categorization Model<br/>TF-IDF + Logistic Regression]
B --> D[Anomaly Model<br/>Isolation Forest]
B --> E[Behavior Clustering<br/>KMeans]
B --> F[Recurring Pattern Detection]
B --> G[Trend + Forecast Engine]
C --> H[FastAPI Service]
D --> H
E --> H
F --> H
G --> H
H --> I[React + Vite Dashboard]
PaySimis the main transaction source.creditcardfraudis used as a fraud-pattern reference to calibrate anomaly severity on top of Isolation Forest.- PaySim does not provide true calendar dates, so
stepis anchored to2024-01-15to derive dates. - PaySim account histories are sparse, so clustering and recurring analysis operate on stable account cohorts derived from real origin accounts. This is an inference layer on top of the original data, not synthetic training data.
- Input features:
- generated transaction description
- amount
- transaction type
- Model:
TF-IDF + Logistic Regression
- Output:
- predicted category label
- Model:
Isolation Forest
- Signals:
- unusual amount size
- rare merchant/type behavior
- similarity to real fraud amount/time patterns from the secondary Kaggle dataset
- Output:
anomaly_flaganomaly_score- human-readable anomaly reason
- Model:
KMeans
- Features:
- average spend
- transaction frequency
- amount variance
- volatility
- anomaly ratio
- savings ratio
- category mix
- Output labels:
SaverImpulse spenderBalancedRisky spender
- Detects repeated merchant/category patterns with similar amounts and repeat intervals.
- Produces medium/high confidence recurring candidates.
- Factors:
- savings ratio
- spending volatility
- anomaly ratio
- recurring payment load
- Returns:
- score
0-100 - health label
- score breakdown
- score
- Daily and monthly spend aggregation
- category growth
- spending momentum
- next-month projection via
Linear Regression
- Produces product-ready structured recommendations such as:
- category concentration warnings
- recurring load insights
- anomaly review prompts
- momentum-based spend warnings
GET /dashboard- dashboard-oriented summary payload for the React app
POST /upload- upload a compatible CSV or submit
dataset_name=default
- upload a compatible CSV or submit
GET /analyze- run or return the cached analysis payload
GET /transactions- processed transactions table data
GET /anomalies- anomaly table data
GET /insights- recommendations, behavior profile, score, model explanation
GET /health- API health check
- React + Vite
- Tailwind CSS
- Material UI chips
- Recharts
- Axios
- Framer Motion
- KPI cards
- category pie chart
- monthly trend line chart
- top categories bar chart
- anomaly scatter chart
- transactions table
- anomaly table
- recurring payments table
- behavior personality card
- financial health gauge
- AI recommendations panel
backend/
main.py
routes.py
preprocessing.py
categorization.py
anomaly.py
clustering.py
recommendations.py
financial_score.py
trend_analysis.py
pipeline.py
data/
models/saved_models/
frontend/
src/components/
common/
dashboard/
insights/
layout/
transactions/
upload/
src/pages/
src/services/
src/styles/
src/utils/
python -m pip install -r requirements.txt
python -m uvicorn backend.main:app --reloadBackend runs at http://127.0.0.1:8000.
cd frontend
npm install
npm run devFrontend runs at http://localhost:5173.
npm run backend:dev
npm run frontend:dev
npm run frontend:buildCopy .env.example to .env when you want to override defaults.
Important settings:
FINANCE_MAX_PRIMARY_ROWSFINANCE_ANCHOR_DATEFINANCE_CREDIT_ANCHOR_DATEFINANCE_FRONTEND_ORIGINVITE_API_BASE_URLinfrontend/.env
A real processed sample generated from the Kaggle-backed pipeline is saved at:
backend/data/sample_processed_dataset.csv
The latest cached analysis artifacts are written into:
backend/data/cache/
The current sample analysis run produced:
128,447processed transactions10,328anomalies256recurring payment candidates- dominant profile:
Balanced
- add Prophet or ARIMA as an optional forecasting backend
- add SHAP-based explanation views for anomalies and categorization
- support multi-user uploads with explicit user IDs and real merchant descriptions
- move cached results into a database or object store
- add authentication, saved workspaces, and downloadable PDF reports
- add websocket progress updates for long-running dataset analysis