Welcome to Urban Sounds Flask Deployment, a dynamic project blending deep learning and web development to classify urban sounds. This repository showcases an end-to-end pipeline, from data preprocessing to deploying a Flask-based application capable of predicting urban sound classes with high accuracy.
The project is powered by a Convolutional Neural Network (CNN) trained on the UrbanSound8K dataset, achieving a remarkable 94.5% accuracy. With an intuitive user interface, users can upload both audio files and image files (such as spectrograms) and get real-time predictions for urban sound categories such as sirens, dog barking, or engine idling. This feature makes the platform versatile for different input types, enhancing its practical usability in diverse scenarios.
Whether you are a researcher exploring urban noise or a developer looking to deploy an AI-powered application, this repository offers a seamless integration of machine learning and web deployment.
- AI Roadmap: Comprehensive pipeline covering EDA, preprocessing, hyperparameter tuning with Optuna, model training, testing, and performance visualization.
- Flexible Deployment: Supports both Development (SQLite) and Production (PostgreSQL) environments for seamless scalability.
- Interactive Web App: Built using Flask, featuring clean UI/UX, dark mode support, and a database to manage predictions.
- Visual Insights: Includes spectrogram visualizations, loss and accuracy plots, and model architecture diagrams.
Dive into the repository to explore the intersection of AI innovation and practical deployment! 🚀
The UrbanSound8K dataset is a collection of 8,732 labeled sound excerpts (≤ 4s) from field recordings. It contains 10 urban sound categories, including:
URBANSOUND (Image by Author)
The dataset is structured into 10 cross-validation folds and provides metadata such as class labels, fold IDs, and file paths. It’s a valuable resource for audio classification tasks. Learn more on the UrbanSound8K dataset homepage.
The AI pipeline for this project follows these steps:
- EDA & Data Visualization: Initial data exploration and spectrogram visualizations.
- Feature Extraction: Generating audio features and spectrograms (optional).
- Preprocessing: Data normalization and augmentation.
- Train-Test Split: Splitting the data for training and testing.
- Hyperparameter Tuning: Using Optuna to optimize CNN architecture (optional).
- Model Compilation & Training: Training a CNN using Keras/TensorFlow.
- Performance Evaluation: Visualizing accuracy, loss, confusion matrix, and predictions.
- Model Export: Saving the best-performing model (UrbanSound_CNN_94.5.h5).
The Flask app transforms the AI model into an interactive web application:
- Development Config: Uses SQLite for local testing.
- Production Config: Supports PostgreSQL for scalable deployment.
- UI/UX Features:
- Upload audio files and image files for predictions.
- Displays spectrograms of uploaded audio files.
- Shows predictions for both audio and image categories.
- Key Components:
- app.py: The main entry point for the Flask application.
- templates/: Contains HTML files for the web pages.
- static/: Holds CSS, JavaScript, and image assets.
- High-Accuracy Model: CNN achieving 94.5% classification accuracy.
- Interactive App: Interactive App: User-friendly interface for real-time predictions of audio and image files.
- Data Insights: Includes spectrogram and model performance visualizations.
- Customizable Deployment: Scales from development to production environments.
Project Demo (Image by Author)
Project Dark Mode Demo (Image by Author)
- Python 3.8+
- Libraries: TensorFlow, Keras, Librosa, Optuna, Flask, NumPy, Pandas, Matplotlib, Seaborn
- Install dependencies with:
pip install -r requirements.txt


