Skip to content

tbplong/PokemonClassification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Pokémon Legendary Classification

A machine learning project that predicts whether a Pokémon is Legendary based on its base stats. This project demonstrates data preprocessing, feature engineering, and binary classification using Logistic Regression, achieving 94.17% accuracy.

Project Overview

This project analyzes Pokémon statistics from Generations 1-6 to build a predictive model that classifies Pokémon as Legendary or non-Legendary. The model uses various features including base stats (HP, Attack, Defense, etc.), type information, and engineered features to make predictions.

Key Features:

  • Binary classification task (Legendary vs. Non-Legendary)
  • Comprehensive data preprocessing and feature engineering
  • Feature analysis with correlation matrices
  • Logistic Regression model with 94%+ accuracy
  • Exploratory data analysis with visualizations

Dataset

The dataset contains information about 800 Pokémon from Generations 1-6, including:

Features:

  • #: Pokémon ID number
  • Name: Pokémon name
  • Type 1/Type 2: Pokémon types (e.g., Grass, Poison, Fire)
  • Total: Sum of all base stats
  • HP: Hit Points
  • Attack: Physical attack power
  • Defense: Physical defense
  • Sp. Atk: Special attack power
  • Sp. Def: Special defense
  • Speed: Speed statistic
  • Generation: Generation number (1-6)
  • Legendary: Target variable (True/False)

Engineered Features:

  • Dual Type: Indicator if Pokémon has two different types
  • Mega: Indicator if Pokémon is a Mega Evolution

Source: The dataset is included in the repository as Pokemon.csv

Technologies Used

  • Python 3.x
  • Libraries:
    • pandas - Data manipulation and analysis
    • numpy - Numerical computations
    • scikit-learn - Machine learning models and preprocessing
    • matplotlib - Data visualization
    • seaborn - Statistical data visualization

Installation

Prerequisites

  • Python 3.7 or higher
  • pip package manager

Setup Instructions

  1. Clone the repository:

    git clone https://github.com/yourusername/pokemon-classification.git
    cd pokemon-classification
  2. Create a virtual environment (recommended):

    python -m venv venv
    
    # On Windows
    venv\Scripts\activate
    
    # On macOS/Linux
    source venv/bin/activate
  3. Install required packages:

    pip install pandas numpy scikit-learn matplotlib seaborn tensorflow

    Or install from requirements file (if available):

    pip install -r requirements.txt
  4. Launch Jupyter Notebook:

    jupyter notebook
  5. Open PokemonClassification.ipynb and run all cells

Usage

  1. Open the Jupyter notebook PokemonClassification.ipynb
  2. Run cells sequentially from top to bottom
  3. The notebook will:
    • Load and explore the Pokémon dataset
    • Perform data preprocessing and feature engineering
    • Train a Logistic Regression model
    • Display model performance metrics
    • Show visualizations and insights

Expected Output:

  • Model accuracy: ~94.17%
  • Correlation heatmap of features
  • Feature importance analysis
  • Predictions on test data

Results

Model Performance

  • Algorithm: Logistic Regression
  • Accuracy: 94.17%
  • Train/Test Split: 70/30 (stratified)
  • Scaling: MinMaxScaler for feature normalization

Key Insights

  1. Strong Predictors:

    • Total base stats show the strongest correlation with Legendary status
    • Individual stats (HP, Attack, Defense, Sp. Atk, Sp. Def, Speed) all contribute positively
  2. Feature Correlations:

    • High multicollinearity exists between Total stats and individual stats
    • Attack and Sp. Atk show positive correlation
    • Defense and Sp. Def are moderately correlated
  3. Type Analysis:

    • Both Type 1 and Type 2 show weak correlation with Legendary status
    • Dual-type Pokémon are slightly more likely to be Legendary
  4. Mega Evolution:

    • Mega Pokémon show strong correlation with Legendary classification
    • Only 65 Legendary Pokémon in the dataset (8.1%)

Sample Prediction

The model correctly predicts high-stat Pokémon as Legendary. For example, a Pokémon with stats:

  • Type: Dragon/Psychic
  • Total: 700
  • HP: 80, Attack: 130, Defense: 100
  • Sp. Atk: 160, Sp. Def: 120, Speed: 110
  • Generation: 3

Prediction: 35.4% probability of being Legendary (threshold: 50%)

Project Structure

pokemon-classification/
│
├── PokemonClassification.ipynb   # Main Jupyter notebook
├── Pokemon.csv                    # Dataset (in sample_data/ if using Colab)
├── README.md                      # This file
└── requirements.txt              # Python dependencies (optional)

Notes

  • The notebook was originally created in Google Colab (note the sample_data/ path)
  • The dataset includes Mega Evolutions but excludes Meganium and Yanmega when identifying Megas
  • Missing Type 2 values are filled with Type 1 (single-type Pokémon)
  • The Legendary column is converted to binary (0/1) for modeling

Contributing

This is a personal portfolio project, but suggestions and feedback are welcome! Feel free to:

  • Open an issue for bugs or suggestions
  • Fork the repository and submit pull requests
  • Share your own improvements or variations

License

This project is available under the MIT License. Feel free to use it for learning and portfolio purposes.

Acknowledgments

  • Dataset source: Kaggle Pokémon Dataset
  • Pokémon and all related properties are © Nintendo, Game Freak, and The Pokémon Company
  • Created as a machine learning portfolio project

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors