Skip to content

Latest commit

 

History

History
144 lines (96 loc) · 4.23 KB

File metadata and controls

144 lines (96 loc) · 4.23 KB

Galaxy Classification with Deep Learning

This repository documents a deep learning project on galaxy image classification using the Galaxy10 SDSS dataset. The project focuses on two core tasks: implementing a custom convolutional neural network (CNN) and applying transfer learning with DenseNet121.

Both approaches were trained and evaluated on the same dataset within a shared workflow for preprocessing, augmentation, training, and model evaluation.

Project Overview

The repository contains the training pipeline, model evaluation, and selected result visualizations for a deep learning classification task based on galaxy images.

The project focuses on:

  • data loading and preprocessing
  • stratified train/validation/test splitting
  • image augmentation
  • training a custom CNN
  • transfer learning with DenseNet121
  • model evaluation using accuracy, loss curves, and confusion matrices

Dataset

This project was developed for the Galaxy10 SDSS dataset.

The dataset file Galaxy10.h5 is not included in this repository.
To run the project, download the dataset separately and place it in the repository's data/ directory:

data/Galaxy10.h5

Repository Structure

galaxy-classification-deep-learning/
├── README.md
├── requirements.txt
├── .gitignore
├── data/
│   └── Galaxy10.h5
├── src/
│   └── galaxy_classifier_train.py
├── images/
├── models/

Installation

Create and activate a virtual environment, then install the required dependencies:

pip install -r requirements.txt

Requirements

The final version of this project does not require astroNN as an installation dependency.

A suitable requirements.txt for this repository is:

tensorflow>=2.13,<2.18
keras>=2.13,<3.0
h5py>=3.8,<4.0
numpy>=1.24,<2.0
scikit-learn>=1.3,<1.6
matplotlib>=3.7,<3.10

Usage

Set the model type in the training script:

MODEL_TYPE = "custom_cnn"   # Options: "custom_cnn" or "densenet121_transfer"

Then run:

python src/galaxy_classifier_train.py

The script will:

  • load the dataset
  • split the data into train, validation, and test sets
  • train the selected model
  • evaluate the trained model
  • save the trained model in models/
  • save result plots in images/

Implemented Approaches

1. Custom CNN

A convolutional neural network built from scratch using multiple Conv2D, MaxPooling2D, Dropout, and Dense layers.

2. Transfer Learning with DenseNet121

A pretrained DenseNet121 backbone with ImageNet weights, used as a frozen feature extractor and extended with a final classification layer.

Data Augmentation

The final training pipeline uses geometric augmentation:

  • random flipping
  • random rotation

Additional experiments with color-channel-based augmentation were explored during development, but were not included in the final pipeline because they did not provide a robust performance improvement on this dataset.

Evaluation

Model performance is evaluated using:

  • training and validation accuracy
  • training and validation loss
  • test set evaluation
  • normalized confusion matrix

Notes

  • This repository is intended as a portfolio project and demonstrates a deep learning workflow for image classification rather than a deployment-ready end-user application.
  • The dataset is strongly imbalanced across classes.
  • Extremely small classes make some balancing strategies less useful in practice.

Acknowledgments

This project builds on the Galaxy10 dataset resources published by Henry Leung (henrysky). The Galaxy10 repository also credits Jo Bovy as co-author of the dataset work.

Useful references:

During development, the project also drew on the surrounding documentation and examples from astroNN, an astronomy-focused deep learning project created by Henry Leung, with Jo Bovy credited in the project documentation.

Useful references:

Authors

  • Andreas Schulz
  • Stefan Anell