TinyML NLP for Semantic Wireless Sentiment Classification

This repository contains implementations of various approaches for performing sentiment analysis using TinyML techniques in a wireless communication context. The goal is to provide energy-efficient, privacy-preserving methods for natural language processing on resource-constrained devices.

Overview

The code implements three main learning paradigms for NLP sentiment classification:

Centralized Learning (CL) - Traditional approach where raw data is sent to a central server
Federated Learning (FL) - Distributed approach where model updates are shared
Split Learning (SL) - Hybrid approach where model is split between client and server

All implementations are designed with resource constraints in mind and utilize the Sentiment140 dataset for tweet sentiment classification.

Main Files

SentimentEco2AiCentral_Final.ipynb: Implementation of centralized learning
SentimentEco2Ai_FLnoQuan.ipynb: Federated learning without quantization
SentimentEco2Ai_Q8.ipynb: Federated learning with 8-bit quantization
SentimentEco2Ai_Q8FN.ipynb: Federated learning with 8-bit quantization and fading/noise
SentimentEco2Ai_Q8N_Final.ipynb: Final federated learning implementation with 8-bit quantization and noise
SentimentEco2AiSplitLearning_Final.ipynb: Split learning implementation

Key Features

Implementations of FL, CL, and SL for sentiment analysis
Energy consumption tracking using Eco2AI
Privacy evaluation through reconstruction error metrics
Simulation of wireless channel effects (Rayleigh fading and noise)
Quantization techniques for efficient communication
Communication energy optimization

Dependencies

To run the notebooks, you'll need:

tensorflow>=2.0.0
keras
numpy
pandas
matplotlib
nltk
eco2ai

You'll also need to download the Sentiment140 dataset and NLTK stopwords:

import nltk
nltk.download('stopwords')

Usage

Download or clone the repository
Download the Sentiment140 dataset:
- Dataset: Sentiment140
- Download file: training.1600000.processed.noemoticon.csv
- Place it in the main directory

Install required dependencies:

pip install tensorflow keras numpy pandas matplotlib nltk eco2ai

Configure the environment:
```
import nltk
nltk.download('stopwords')
```
Run the notebooks in the following order to fully replicate the experiments:
- First run SentimentEco2AiCentral_Final.ipynb to establish the centralized learning baseline
- Then run SentimentEco2Ai_Q8N_Final.ipynb for federated learning with 8-bit quantization
- Finally run SentimentEco2AiSplitLearning_Final.ipynb for the split learning approach

Implementation Details for Replication

Data Preprocessing

All notebooks follow similar preprocessing steps:

Read the Sentiment140 CSV file (with Latin encoding)
Extract only the sentiment labels and text
Convert sentiment labels (0 for negative, 4 for positive) to "Negative" and "Positive"
Remove stop words, URLs, special characters, and usernames
Limit vocabulary to 10,000 most frequent words
Pad sequences to a maximum length of 30

Model Architecture

All approaches use a similar architecture with:
- Embedding layer (8 dimensions)
- Conv1D layer with 32 filters and kernel size 3
- BatchNormalization
- MaxPooling1D
- LSTM layer with 32 units
- Dense layer with 16 units
- Output layer with sigmoid activation

Wireless Channel Simulation

To replicate the wireless channel effects:

Binary Phase Shift Keying (BPSK) modulation is used
Rayleigh fading coefficient is applied via f = np.sqrt(X**2 + Y**2) where X and Y are Gaussian random variables
Additive white Gaussian noise (AWGN) is applied with varying SNR levels
The signal transmission is modeled as Z̃ = f·Z + n

Energy Consumption Tracking

The Eco2AI library is used to track energy consumption
Measurements are taken every 5-10 seconds during training
For accurate measurements, ensure your system has proper power monitoring capabilities

Results

The experiments evaluate and compare CL, FL, and SL based on:

Model accuracy
Computational energy consumption
Communication energy consumption
Privacy preservation (reconstruction error)
Robustness to noise and Rayleigh fading

Key findings:

FL with 8-bit quantization (Q8) offers the best balance between accuracy, communication efficiency, and privacy
SL provides the highest privacy preservation (highest reconstruction error) and lowest computational energy on client devices
CL offers good accuracy but poor privacy protection and requires raw data transmission

Quantitative Results

Based on our experiments (averaged over 10 runs):

Algorithm	Accuracy	Reconstruction Error	Computational Energy (J)	Communication Energy (J)	Total Energy (J)
Central	0.7803	0.0154	0	0.3459	0.3459
FL Q8	0.7806	0.0671	60.8200	0.0021	60.8221
SL	0.7800	0.2681	3.4512	7.7162	11.1674

Note: Computational energy is reported on the user side. For Central Learning, computational load is on the server.

How to Reproduce Results

To verify these results:

Run each of the final notebooks (SentimentEco2AiCentral_Final.ipynb, SentimentEco2Ai_Q8N_Final.ipynb, and SentimentEco2AiSplitLearning_Final.ipynb)
For each notebook, set the number of runs to 10 (look for the parameter num_runs=10)
The averaged results will be saved in CSV files: Central_Model_Results_Avg.csv, FL_Avg_Results.csv, and Split_Learning_Vanilla_Privacy_Average.csv
Plots comparing accuracy, energy consumption, and privacy metrics will be automatically generated

Notes on Experimental Setup

All experiments use the Sentiment140 dataset (reduced to 50% size)
The maximum sequence length is set to 30
Vocabulary size is limited to the 10,000 most frequent words
Binary Phase Shift Keying (BPSK) is used for digital modulation
Noise is simulated with varying Signal-to-Noise Ratio (SNR) levels
Rayleigh fading is applied to simulate realistic wireless channel conditions

Citation

If you use this code for your research, please cite the paper:

@article{radwan2024tinyml,
  title={TinyML NLP Approach for Semantic Wireless Sentiment Classification},
  author={Radwan, Ahmed Y and Shehab, Mohammad and Alouini, Mohamed-Slim},
  journal={arXiv preprint arXiv:2411.06291},
  year={2024}
}

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Older_Notebooks		Older_Notebooks
.DS_Store		.DS_Store
.gitignore		.gitignore
SentimentEco2AiCentral_Final.ipynb		SentimentEco2AiCentral_Final.ipynb
SentimentEco2AiSplitLearning_Final.ipynb		SentimentEco2AiSplitLearning_Final.ipynb
SentimentEco2Ai_FLnoQuan.ipynb		SentimentEco2Ai_FLnoQuan.ipynb
SentimentEco2Ai_Q8.ipynb		SentimentEco2Ai_Q8.ipynb
SentimentEco2Ai_Q8FN.ipynb		SentimentEco2Ai_Q8FN.ipynb
SentimentEco2Ai_Q8N_Final.ipynb		SentimentEco2Ai_Q8N_Final.ipynb
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TinyML NLP for Semantic Wireless Sentiment Classification

Overview

Main Files

Key Features

Dependencies

Usage

Implementation Details for Replication

Data Preprocessing

Model Architecture

Wireless Channel Simulation

Energy Consumption Tracking

Results

Quantitative Results

How to Reproduce Results

Notes on Experimental Setup

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TinyML NLP for Semantic Wireless Sentiment Classification

Overview

Main Files

Key Features

Dependencies

Usage

Implementation Details for Replication

Data Preprocessing

Model Architecture

Wireless Channel Simulation

Energy Consumption Tracking

Results

Quantitative Results

How to Reproduce Results

Notes on Experimental Setup

Citation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages