Skip to content

Latest commit

 

History

History
171 lines (111 loc) · 4.63 KB

File metadata and controls

171 lines (111 loc) · 4.63 KB

🖼️ Image Caption Generator

Automatically generate captions for images using a deep learning model.

🔗 Live App: https://image-caption-generator-using-cnn-lstm-nddimension.streamlit.app/

📓 Notebook: https://www.kaggle.com/code/nddimension/image-captioning-using-cnn-rnn

🚀 Model : https://drive.google.com/file/d/1d-qOyZaU34_N-cxEDtFPG9iApCbLrlmu/view?usp=drive_link

🗣️ Dataset : https://drive.google.com/file/d/1QNCjQCsQBoxlMyc9WFM_fg2LU5NEh5iJ/view?usp=drive_link


🎯 Project Overview

Image Caption Generator is an AI-powered web app that generates natural language descriptions for images using a deep learning model. It combines a CNN for image feature extraction and an LSTM decoder to produce coherent captions.

📷 Upload or select a sample image 🧠 AI generates descriptive captions 🗣️ Powered by a pre-trained CNN-LSTM model 🚀 Interactive and educational experience

✅ Pre-trained models loaded automatically ✅ Sample images included for quick testing ✅ Supports image uploads (JPG, PNG, JPEG) ✅ Built with Streamlit for ease of use


🔍 Features

Feature Description
🖼️ Image Upload Upload your own image or select from sample images
🧠 AI Captioning Generate natural-language captions using deep learning
📝 Caption Display Clean, styled caption output with real-time preview
⚙️ Model Caching Speeds up inference using Streamlit caching
📖 Educational Sections Learn how the model architecture works
🔍 Debug Mode Optional debug panel for technical details

📌 Workflow

  1. Load Pre-trained Models

  2. Image Preprocessing

    • Resize, normalize, and format image for the CNN
  3. Feature Extraction

    • CNN extracts image features (e.g., ResNet, Inception)
  4. Caption Generation

    • LSTM decoder predicts words one by one (auto-regressive)
  5. Display Output

    • Caption is cleaned and shown in real time

⚙️ How It Works

  1. Architecture

    • A CNN (e.g., ResNet) is used to extract image features
    • A pre-trained LSTM model takes these features and generates a caption word-by-word
  2. Tokenizer & Sequence

    • A tokenizer encodes/decodes the text data
    • Input sequences are padded to a fixed max length
  3. Inference

    • Starts with the token startseq
    • Predicts next word using softmax
    • Ends at endseq or when max length is reached
  4. Interface

    • Streamlit UI allows users to upload images or choose from samples
    • Captions are generated and displayed on the same page

🎹 App Preview

🧠 Image + Caption

Main


📦 Requirements

Install everything using:

pip install -r requirements.txt

🚀 Getting Started

1️⃣ Clone the repository

git clone https://github.com/NDDimension/Image-Caption-Generator-using-CNN-LSTM.git
cd  image-caption-generator

2️⃣ Install Dependencies

pip install -r requirements.txt

3️⃣ Run the Streamlit App

streamlit run app.py

✨ Highlights

✅ Automatic download of pre-trained models and tokenizer ✅ Streamlit-based interactive interface ✅ Works with sample and user-uploaded images ✅ Educational explanations included ✅ Debug mode for inspecting internals

🔮 Future Improvements

🧠 Add beam search for more accurate caption generation 🌐 Deploy to HuggingFace Spaces 📤 Allow batch caption generation 🗂️ Add support for custom training datasets 🎯 Add attention visualization for interpretability


🙌 Credits & Contributors

Notebook Revamped & Curated by: NISHTHA SHARMA

📌 GitHub: https://github.com/711nishtha

📌 Kaggle: https://www.kaggle.com/nishtha711

App and Training by: DHANRAJ SHARMA

📌 GitHub: https://github.com/NDDimension

Inspired by:


📜 License

Licensed under the MIT License.

Image Caption GeneratorAI that sees and speaks. ❤️ Made with love by Dhanraj Sharma.