This project focuses on Movie Classification Analysis & Prediction π₯ using data science techniques.
It includes:
- π Data Loading & Cleaning
- π Exploratory Data Analysis (EDA)
- π Data Visualization
- π€ Machine Learning Model (Classification)
The goal is to analyze movie-related data and build a model that can predict movie categories/classes based on features.
- Dataset used:
Movie_classification.csv - Contains information about movies such as:
- π Genre
- β Ratings
- π° Revenue
- π¬ Other attributes used for classification
| Tool | Purpose |
|---|---|
| π Python | Programming |
| π Pandas | Data Manipulation |
| π’ NumPy | Numerical Operations |
| π Matplotlib | Basic Visualizations |
| π¨ Seaborn | Advanced Visualizations |
| π€ Scikit-learn | Machine Learning |
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as snsdata = pd.read_csv("Movie_classification.csv")
df = data.copy()-head()
-tail()
-info()
-describe()
-Understand:
-Missing values β
-Data types π’
-Feature distributions π
-Handle missing values
-Remove duplicates
-Fix data types
-Feature selection
-π Distribution plots
-π Count plots
-π₯ Heatmaps (correlation)
-π Pairplots
sns.heatmap(df.corr(), annot=True)
plt.show()-Split data (Train/Test)
-Train model
-Predict results
-Evaluate performance
-β Accuracy
-π Confusion Matrix
-π Classification Report
-π― Identified important features affecting classification
-π Found correlations between variables
-π€ Built a predictive model for movie classification
-π Improved understanding of dataset patterns
-π Try advanced models (Random Forest, XGBoost)
-βοΈ Hyperparameter tuning
-π More feature engineering
-π Deployment as a web app
βββ π Movies.ipynb βββ π Movie_classification.csv βββ π cleaned_movie.csv βββ π README.md