Audio DeepFake Detection using CNN-BiLSTM

APP Demo

Audio-DeepFake-Demo.mp4

Overview

This project aims to detect audio deepfakes using a hybrid approach that combines Convolutional Neural Networks (CNN) and Bidirectional Long Short-Term Memory Networks (BiLSTM). The system is designed to effectively classify audio data into genuine or fake categories, offering a robust solution to the growing challenges posed by audio-based misinformation.

Key Features

Hybrid Model Architecture: Combines the feature extraction power of CNNs with the sequential processing capabilities of BiLSTMs.
State-of-the-Art Accuracy: Achieves high detection accuracy, making it suitable for practical applications.
Research Contribution: Includes detailed insights and a research paper explaining the methodology and findings.

Project Overview
Key Features
Dataset
Model Architecture
Installation
Results
Future Work
Contributing

Dataset

For Real Audio : https://www.kaggle.com/datasets/mathurinache/the-lj-speech-dataset
For Fake Audio : https://www.kaggle.com/datasets/andreadiubaldo/wavefake-test
- The dataset includes audio recordings, labeled as either genuine or deepfake.
- Preprocessing steps involve:
  - Feature extraction using Mel-frequency cepstral coefficients (MFCCs).
  - Data augmentation techniques to enhance model robustness.

Model Architecture

The model leverages the strengths of:

CNN:
- Extracts spatial features from MFCCs.
- Efficiently identifies patterns and anomalies.
BiLSTM:
- Processes sequential data to capture temporal dependencies.
- Bidirectional design ensures both past and future context is utilized.

Installation

Clone the repository:

git clone https://github.com/VivekShinde7/Audio-DeepFake-Detection-using-CNN-BiLSTM.git
cd Audio-DeepFake-Detection-using-CNN-BiLSTM

Create a virtual environment (optional but recommended):

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```
Run app.py:
```
streamlit run app.py
```

Results

Performance Metrics:
- Accuracy: 98.3%
- Precision: 97.8%
- Recall: 98.8%
Visualization of confusion matrix, System Architecture & Evaluation is available in the results folder.

Future Work

Enhance the dataset to include diverse languages and accents.
Optimize the model for real-time detection.
Explore the integration of transformer-based architectures like Wav2Vec2.0.

Contributing

Contributions are welcome! Please follow these steps:

Fork the repository.
Create a feature branch:
```
git checkout -b feature-name
```
Commit your changes:
```
git commit -m "Add your message here"
```
Push to the branch:
```
git push origin feature-name
```
Create a pull request.

Acknowledgments

Special thanks to open-source contributors and dataset providers.
Inspiration drawn from advancements in audio deepfake detection research.

For queries or suggestions, feel free to open an issue or contact Vivek Shinde.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audio DeepFake Detection using CNN-BiLSTM

APP Demo

Overview

Key Features

Table of Contents

Dataset

Model Architecture

Installation

Results

Future Work

Contributing

Acknowledgments

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Audio DeepFake Detection using CNN-BiLSTM

APP Demo

Overview

Key Features

Table of Contents

Dataset

Model Architecture

Installation

Results

Future Work

Contributing

Acknowledgments