Skip to content

krishnaarora023/Movies_Classification-Analysis-and-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸŽ¬πŸ“Š Movie Classification Analysis & Prediction

Python Pandas Seaborn Matplotlib Machine Learning


πŸ“Œ Project Overview

This project focuses on Movie Classification Analysis & Prediction πŸŽ₯ using data science techniques.
It includes:

  • πŸ“‚ Data Loading & Cleaning
  • πŸ” Exploratory Data Analysis (EDA)
  • πŸ“Š Data Visualization
  • πŸ€– Machine Learning Model (Classification)

The goal is to analyze movie-related data and build a model that can predict movie categories/classes based on features.


πŸ“ Dataset

  • Dataset used: Movie_classification.csv
  • Contains information about movies such as:
    • 🎭 Genre
    • ⭐ Ratings
    • πŸ’° Revenue
    • 🎬 Other attributes used for classification

βš™οΈ Tech Stack

Tool Purpose
🐍 Python Programming
πŸ“Š Pandas Data Manipulation
πŸ”’ NumPy Numerical Operations
πŸ“‰ Matplotlib Basic Visualizations
🎨 Seaborn Advanced Visualizations
πŸ€– Scikit-learn Machine Learning

πŸš€ Project Workflow

1️⃣ Import Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

2️⃣ Load Dataset πŸ“‚

data = pd.read_csv("Movie_classification.csv")
df = data.copy()

3️⃣ Exploratory Data Analysis πŸ”

View dataset structure:

-head()

-tail()

-info()

-describe()

-Understand:

-Missing values ❌

-Data types πŸ”’

-Feature distributions πŸ“Š


4️⃣ Data Cleaning 🧹

-Handle missing values

-Remove duplicates

-Fix data types

-Feature selection


5️⃣ Data Visualization πŸ“Š

Some key visualizations:

-πŸ“ˆ Distribution plots

-πŸ“Š Count plots

-πŸ”₯ Heatmaps (correlation)

-πŸ“‰ Pairplots

Example:

sns.heatmap(df.corr(), annot=True)
plt.show()

6️⃣ Model Building πŸ€–

Used Logistic Regression (Binomial) for classification

Steps:

-Split data (Train/Test)

-Train model

-Predict results

-Evaluate performance


7️⃣ Model Evaluation πŸ“

Metrics used:

-βœ… Accuracy

-πŸ“‰ Confusion Matrix

-πŸ“Š Classification Report


πŸ“Œ Key Insights πŸ’‘

-🎯 Identified important features affecting classification

-πŸ“Š Found correlations between variables

-πŸ€– Built a predictive model for movie classification

-πŸ“‰ Improved understanding of dataset patterns


🧠 Future Improvements

-πŸ”„ Try advanced models (Random Forest, XGBoost)

-βš™οΈ Hyperparameter tuning

-πŸ“Š More feature engineering

-πŸš€ Deployment as a web app


πŸ“‚ Project Structure

πŸ“ Movie-Classification-Project

│── πŸ“„ Movies.ipynb │── πŸ“„ Movie_classification.csv │── πŸ“„ cleaned_movie.csv │── πŸ“„ README.md


πŸ™Œ Author

Krishna Arora πŸš€

πŸ“Š Data Science Enthusiast


⭐ Show Your Support

If you like this project:

⭐ Star this repository

🍴 Fork it

🧠 Learn & Build more!

πŸ’¬ β€œData is the new oil, but insights are the real fuel!” πŸš€

About

πŸš€ Movie Success Prediction using Logistic Regression A data analysis and machine learning project that predicts whether a movie is a Hit or Flop based on factors like budget, marketing, ratings, and trailer views. Includes data cleaning, visualization, and model evaluation using Python.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors