Personal Finance Behavior Analyzer

Hackathon-ready full-stack ML project that analyzes real transaction datasets, detects behavioral patterns, flags anomalies, predicts spend direction, and delivers recommendations through a fintech-style React dashboard.

Real datasets used

Primary dataset: PaySim from Kaggle
https://www.kaggle.com/datasets/ealaxi/paysim1
Secondary anomaly reference: Credit Card Fraud Detection from Kaggle
https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud

The backend downloads both public datasets through kagglehub. No randomly generated training data is used.

Architecture

graph LR
  A[PaySim Kaggle Dataset] --> B[Preprocessing Layer]
  A2[Credit Card Fraud Kaggle Dataset] --> B
  B --> C[Categorization Model<br/>TF-IDF + Logistic Regression]
  B --> D[Anomaly Model<br/>Isolation Forest]
  B --> E[Behavior Clustering<br/>KMeans]
  B --> F[Recurring Pattern Detection]
  B --> G[Trend + Forecast Engine]
  C --> H[FastAPI Service]
  D --> H
  E --> H
  F --> H
  G --> H
  H --> I[React + Vite Dashboard]

Architecture notes

PaySim is the main transaction source.
creditcardfraud is used as a fraud-pattern reference to calibrate anomaly severity on top of Isolation Forest.
PaySim does not provide true calendar dates, so step is anchored to 2024-01-15 to derive dates.
PaySim account histories are sparse, so clustering and recurring analysis operate on stable account cohorts derived from real origin accounts. This is an inference layer on top of the original data, not synthetic training data.

ML pipeline

1. Transaction categorization

Input features:
- generated transaction description
- amount
- transaction type
Model:
- TF-IDF + Logistic Regression
Output:
- predicted category label

2. Anomaly detection

Model:
- Isolation Forest
Signals:
- unusual amount size
- rare merchant/type behavior
- similarity to real fraud amount/time patterns from the secondary Kaggle dataset
Output:
- anomaly_flag
- anomaly_score
- human-readable anomaly reason

3. Spending behavior clustering

Model:
- KMeans
Features:
- average spend
- transaction frequency
- amount variance
- volatility
- anomaly ratio
- savings ratio
- category mix
Output labels:
- Saver
- Impulse spender
- Balanced
- Risky spender

4. Recurring payment detection

Detects repeated merchant/category patterns with similar amounts and repeat intervals.
Produces medium/high confidence recurring candidates.

5. Financial health score

Factors:
- savings ratio
- spending volatility
- anomaly ratio
- recurring payment load
Returns:
- score 0-100
- health label
- score breakdown

6. Trend analysis

Daily and monthly spend aggregation
category growth
spending momentum
next-month projection via Linear Regression

7. Recommendation engine

Produces product-ready structured recommendations such as:
- category concentration warnings
- recurring load insights
- anomaly review prompts
- momentum-based spend warnings

API

GET /dashboard
- dashboard-oriented summary payload for the React app
POST /upload
- upload a compatible CSV or submit dataset_name=default
GET /analyze
- run or return the cached analysis payload
GET /transactions
- processed transactions table data
GET /anomalies
- anomaly table data
GET /insights
- recommendations, behavior profile, score, model explanation
GET /health
- API health check

Frontend

React + Vite
Tailwind CSS
Material UI chips
Recharts
Axios
Framer Motion

Dashboard sections

KPI cards
category pie chart
monthly trend line chart
top categories bar chart
anomaly scatter chart
transactions table
anomaly table
recurring payments table
behavior personality card
financial health gauge
AI recommendations panel

Project structure

backend/
  main.py
  routes.py
  preprocessing.py
  categorization.py
  anomaly.py
  clustering.py
  recommendations.py
  financial_score.py
  trend_analysis.py
  pipeline.py
  data/
  models/saved_models/

frontend/
  src/components/
    common/
    dashboard/
    insights/
    layout/
    transactions/
    upload/
  src/pages/
  src/services/
  src/styles/
  src/utils/

Run locally

1. Backend

python -m pip install -r requirements.txt
python -m uvicorn backend.main:app --reload

Backend runs at http://127.0.0.1:8000.

2. Frontend

cd frontend
npm install
npm run dev

Frontend runs at http://localhost:5173.

3. Optional root scripts

npm run backend:dev
npm run frontend:dev
npm run frontend:build

Environment variables

Copy .env.example to .env when you want to override defaults.

Important settings:

FINANCE_MAX_PRIMARY_ROWS
FINANCE_ANCHOR_DATE
FINANCE_CREDIT_ANCHOR_DATE
FINANCE_FRONTEND_ORIGIN
VITE_API_BASE_URL in frontend/.env

Sample processed dataset

A real processed sample generated from the Kaggle-backed pipeline is saved at:

backend/data/sample_processed_dataset.csv

The latest cached analysis artifacts are written into:

backend/data/cache/

Verified outputs

The current sample analysis run produced:

128,447 processed transactions
10,328 anomalies
256 recurring payment candidates
dominant profile: Balanced

Future improvements

add Prophet or ARIMA as an optional forecasting backend
add SHAP-based explanation views for anomalies and categorization
support multi-user uploads with explicit user IDs and real merchant descriptions
move cached results into a database or object store
add authentication, saved workspaces, and downloadable PDF reports
add websocket progress updates for long-running dataset analysis

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
backend		backend
frontend		frontend
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
package.json		package.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Personal Finance Behavior Analyzer

Real datasets used

Architecture

Architecture notes

ML pipeline

1. Transaction categorization

2. Anomaly detection

3. Spending behavior clustering

4. Recurring payment detection

5. Financial health score

6. Trend analysis

7. Recommendation engine

API

Frontend

Dashboard sections

Project structure

Run locally

1. Backend

2. Frontend

3. Optional root scripts

Environment variables

Sample processed dataset

Verified outputs

Future improvements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Personal Finance Behavior Analyzer

Real datasets used

Architecture

Architecture notes

ML pipeline

1. Transaction categorization

2. Anomaly detection

3. Spending behavior clustering

4. Recurring payment detection

5. Financial health score

6. Trend analysis

7. Recommendation engine

API

Frontend

Dashboard sections

Project structure

Run locally

1. Backend

2. Frontend

3. Optional root scripts

Environment variables

Sample processed dataset

Verified outputs

Future improvements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages