Skip to content

dheeraj574/FBA-Finance-Behaviour-Analyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Personal Finance Behavior Analyzer

Hackathon-ready full-stack ML project that analyzes real transaction datasets, detects behavioral patterns, flags anomalies, predicts spend direction, and delivers recommendations through a fintech-style React dashboard.

Real datasets used

The backend downloads both public datasets through kagglehub. No randomly generated training data is used.

Architecture

graph LR
  A[PaySim Kaggle Dataset] --> B[Preprocessing Layer]
  A2[Credit Card Fraud Kaggle Dataset] --> B
  B --> C[Categorization Model<br/>TF-IDF + Logistic Regression]
  B --> D[Anomaly Model<br/>Isolation Forest]
  B --> E[Behavior Clustering<br/>KMeans]
  B --> F[Recurring Pattern Detection]
  B --> G[Trend + Forecast Engine]
  C --> H[FastAPI Service]
  D --> H
  E --> H
  F --> H
  G --> H
  H --> I[React + Vite Dashboard]
Loading

Architecture notes

  • PaySim is the main transaction source.
  • creditcardfraud is used as a fraud-pattern reference to calibrate anomaly severity on top of Isolation Forest.
  • PaySim does not provide true calendar dates, so step is anchored to 2024-01-15 to derive dates.
  • PaySim account histories are sparse, so clustering and recurring analysis operate on stable account cohorts derived from real origin accounts. This is an inference layer on top of the original data, not synthetic training data.

ML pipeline

1. Transaction categorization

  • Input features:
    • generated transaction description
    • amount
    • transaction type
  • Model:
    • TF-IDF + Logistic Regression
  • Output:
    • predicted category label

2. Anomaly detection

  • Model:
    • Isolation Forest
  • Signals:
    • unusual amount size
    • rare merchant/type behavior
    • similarity to real fraud amount/time patterns from the secondary Kaggle dataset
  • Output:
    • anomaly_flag
    • anomaly_score
    • human-readable anomaly reason

3. Spending behavior clustering

  • Model:
    • KMeans
  • Features:
    • average spend
    • transaction frequency
    • amount variance
    • volatility
    • anomaly ratio
    • savings ratio
    • category mix
  • Output labels:
    • Saver
    • Impulse spender
    • Balanced
    • Risky spender

4. Recurring payment detection

  • Detects repeated merchant/category patterns with similar amounts and repeat intervals.
  • Produces medium/high confidence recurring candidates.

5. Financial health score

  • Factors:
    • savings ratio
    • spending volatility
    • anomaly ratio
    • recurring payment load
  • Returns:
    • score 0-100
    • health label
    • score breakdown

6. Trend analysis

  • Daily and monthly spend aggregation
  • category growth
  • spending momentum
  • next-month projection via Linear Regression

7. Recommendation engine

  • Produces product-ready structured recommendations such as:
    • category concentration warnings
    • recurring load insights
    • anomaly review prompts
    • momentum-based spend warnings

API

  • GET /dashboard
    • dashboard-oriented summary payload for the React app
  • POST /upload
    • upload a compatible CSV or submit dataset_name=default
  • GET /analyze
    • run or return the cached analysis payload
  • GET /transactions
    • processed transactions table data
  • GET /anomalies
    • anomaly table data
  • GET /insights
    • recommendations, behavior profile, score, model explanation
  • GET /health
    • API health check

Frontend

  • React + Vite
  • Tailwind CSS
  • Material UI chips
  • Recharts
  • Axios
  • Framer Motion

Dashboard sections

  • KPI cards
  • category pie chart
  • monthly trend line chart
  • top categories bar chart
  • anomaly scatter chart
  • transactions table
  • anomaly table
  • recurring payments table
  • behavior personality card
  • financial health gauge
  • AI recommendations panel

Project structure

backend/
  main.py
  routes.py
  preprocessing.py
  categorization.py
  anomaly.py
  clustering.py
  recommendations.py
  financial_score.py
  trend_analysis.py
  pipeline.py
  data/
  models/saved_models/

frontend/
  src/components/
    common/
    dashboard/
    insights/
    layout/
    transactions/
    upload/
  src/pages/
  src/services/
  src/styles/
  src/utils/

Run locally

1. Backend

python -m pip install -r requirements.txt
python -m uvicorn backend.main:app --reload

Backend runs at http://127.0.0.1:8000.

2. Frontend

cd frontend
npm install
npm run dev

Frontend runs at http://localhost:5173.

3. Optional root scripts

npm run backend:dev
npm run frontend:dev
npm run frontend:build

Environment variables

Copy .env.example to .env when you want to override defaults.

Important settings:

  • FINANCE_MAX_PRIMARY_ROWS
  • FINANCE_ANCHOR_DATE
  • FINANCE_CREDIT_ANCHOR_DATE
  • FINANCE_FRONTEND_ORIGIN
  • VITE_API_BASE_URL in frontend/.env

Sample processed dataset

A real processed sample generated from the Kaggle-backed pipeline is saved at:

  • backend/data/sample_processed_dataset.csv

The latest cached analysis artifacts are written into:

  • backend/data/cache/

Verified outputs

The current sample analysis run produced:

  • 128,447 processed transactions
  • 10,328 anomalies
  • 256 recurring payment candidates
  • dominant profile: Balanced

Future improvements

  • add Prophet or ARIMA as an optional forecasting backend
  • add SHAP-based explanation views for anomalies and categorization
  • support multi-user uploads with explicit user IDs and real merchant descriptions
  • move cached results into a database or object store
  • add authentication, saved workspaces, and downloadable PDF reports
  • add websocket progress updates for long-running dataset analysis

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors