A machine learning project that predicts whether a Pokémon is Legendary based on its base stats. This project demonstrates data preprocessing, feature engineering, and binary classification using Logistic Regression, achieving 94.17% accuracy.
This project analyzes Pokémon statistics from Generations 1-6 to build a predictive model that classifies Pokémon as Legendary or non-Legendary. The model uses various features including base stats (HP, Attack, Defense, etc.), type information, and engineered features to make predictions.
Key Features:
- Binary classification task (Legendary vs. Non-Legendary)
- Comprehensive data preprocessing and feature engineering
- Feature analysis with correlation matrices
- Logistic Regression model with 94%+ accuracy
- Exploratory data analysis with visualizations
The dataset contains information about 800 Pokémon from Generations 1-6, including:
Features:
- #: Pokémon ID number
- Name: Pokémon name
- Type 1/Type 2: Pokémon types (e.g., Grass, Poison, Fire)
- Total: Sum of all base stats
- HP: Hit Points
- Attack: Physical attack power
- Defense: Physical defense
- Sp. Atk: Special attack power
- Sp. Def: Special defense
- Speed: Speed statistic
- Generation: Generation number (1-6)
- Legendary: Target variable (True/False)
Engineered Features:
- Dual Type: Indicator if Pokémon has two different types
- Mega: Indicator if Pokémon is a Mega Evolution
Source: The dataset is included in the repository as Pokemon.csv
- Python 3.x
- Libraries:
pandas- Data manipulation and analysisnumpy- Numerical computationsscikit-learn- Machine learning models and preprocessingmatplotlib- Data visualizationseaborn- Statistical data visualization
- Python 3.7 or higher
- pip package manager
-
Clone the repository:
git clone https://github.com/yourusername/pokemon-classification.git cd pokemon-classification -
Create a virtual environment (recommended):
python -m venv venv # On Windows venv\Scripts\activate # On macOS/Linux source venv/bin/activate
-
Install required packages:
pip install pandas numpy scikit-learn matplotlib seaborn tensorflow
Or install from requirements file (if available):
pip install -r requirements.txt
-
Launch Jupyter Notebook:
jupyter notebook
-
Open
PokemonClassification.ipynband run all cells
- Open the Jupyter notebook
PokemonClassification.ipynb - Run cells sequentially from top to bottom
- The notebook will:
- Load and explore the Pokémon dataset
- Perform data preprocessing and feature engineering
- Train a Logistic Regression model
- Display model performance metrics
- Show visualizations and insights
Expected Output:
- Model accuracy: ~94.17%
- Correlation heatmap of features
- Feature importance analysis
- Predictions on test data
- Algorithm: Logistic Regression
- Accuracy: 94.17%
- Train/Test Split: 70/30 (stratified)
- Scaling: MinMaxScaler for feature normalization
-
Strong Predictors:
- Total base stats show the strongest correlation with Legendary status
- Individual stats (HP, Attack, Defense, Sp. Atk, Sp. Def, Speed) all contribute positively
-
Feature Correlations:
- High multicollinearity exists between Total stats and individual stats
- Attack and Sp. Atk show positive correlation
- Defense and Sp. Def are moderately correlated
-
Type Analysis:
- Both Type 1 and Type 2 show weak correlation with Legendary status
- Dual-type Pokémon are slightly more likely to be Legendary
-
Mega Evolution:
- Mega Pokémon show strong correlation with Legendary classification
- Only 65 Legendary Pokémon in the dataset (8.1%)
The model correctly predicts high-stat Pokémon as Legendary. For example, a Pokémon with stats:
- Type: Dragon/Psychic
- Total: 700
- HP: 80, Attack: 130, Defense: 100
- Sp. Atk: 160, Sp. Def: 120, Speed: 110
- Generation: 3
Prediction: 35.4% probability of being Legendary (threshold: 50%)
pokemon-classification/
│
├── PokemonClassification.ipynb # Main Jupyter notebook
├── Pokemon.csv # Dataset (in sample_data/ if using Colab)
├── README.md # This file
└── requirements.txt # Python dependencies (optional)
- The notebook was originally created in Google Colab (note the
sample_data/path) - The dataset includes Mega Evolutions but excludes Meganium and Yanmega when identifying Megas
- Missing Type 2 values are filled with Type 1 (single-type Pokémon)
- The Legendary column is converted to binary (0/1) for modeling
This is a personal portfolio project, but suggestions and feedback are welcome! Feel free to:
- Open an issue for bugs or suggestions
- Fork the repository and submit pull requests
- Share your own improvements or variations
This project is available under the MIT License. Feel free to use it for learning and portfolio purposes.
- Dataset source: Kaggle Pokémon Dataset
- Pokémon and all related properties are © Nintendo, Game Freak, and The Pokémon Company
- Created as a machine learning portfolio project