This repository contains implementations of various approaches for performing sentiment analysis using TinyML techniques in a wireless communication context. The goal is to provide energy-efficient, privacy-preserving methods for natural language processing on resource-constrained devices.
The code implements three main learning paradigms for NLP sentiment classification:
- Centralized Learning (CL) - Traditional approach where raw data is sent to a central server
- Federated Learning (FL) - Distributed approach where model updates are shared
- Split Learning (SL) - Hybrid approach where model is split between client and server
All implementations are designed with resource constraints in mind and utilize the Sentiment140 dataset for tweet sentiment classification.
- SentimentEco2AiCentral_Final.ipynb: Implementation of centralized learning
- SentimentEco2Ai_FLnoQuan.ipynb: Federated learning without quantization
- SentimentEco2Ai_Q8.ipynb: Federated learning with 8-bit quantization
- SentimentEco2Ai_Q8FN.ipynb: Federated learning with 8-bit quantization and fading/noise
- SentimentEco2Ai_Q8N_Final.ipynb: Final federated learning implementation with 8-bit quantization and noise
- SentimentEco2AiSplitLearning_Final.ipynb: Split learning implementation
- Implementations of FL, CL, and SL for sentiment analysis
- Energy consumption tracking using Eco2AI
- Privacy evaluation through reconstruction error metrics
- Simulation of wireless channel effects (Rayleigh fading and noise)
- Quantization techniques for efficient communication
- Communication energy optimization
To run the notebooks, you'll need:
tensorflow>=2.0.0
keras
numpy
pandas
matplotlib
nltk
eco2ai
You'll also need to download the Sentiment140 dataset and NLTK stopwords:
import nltk
nltk.download('stopwords')- Download or clone the repository
- Download the Sentiment140 dataset:
- Dataset: Sentiment140
- Download file:
training.1600000.processed.noemoticon.csv - Place it in the main directory
- Install required dependencies:
pip install tensorflow keras numpy pandas matplotlib nltk eco2ai
- Configure the environment:
import nltk nltk.download('stopwords')
- Run the notebooks in the following order to fully replicate the experiments:
- First run
SentimentEco2AiCentral_Final.ipynbto establish the centralized learning baseline - Then run
SentimentEco2Ai_Q8N_Final.ipynbfor federated learning with 8-bit quantization - Finally run
SentimentEco2AiSplitLearning_Final.ipynbfor the split learning approach
- First run
All notebooks follow similar preprocessing steps:
- Read the Sentiment140 CSV file (with Latin encoding)
- Extract only the sentiment labels and text
- Convert sentiment labels (0 for negative, 4 for positive) to "Negative" and "Positive"
- Remove stop words, URLs, special characters, and usernames
- Limit vocabulary to 10,000 most frequent words
- Pad sequences to a maximum length of 30
- All approaches use a similar architecture with:
- Embedding layer (8 dimensions)
- Conv1D layer with 32 filters and kernel size 3
- BatchNormalization
- MaxPooling1D
- LSTM layer with 32 units
- Dense layer with 16 units
- Output layer with sigmoid activation
To replicate the wireless channel effects:
- Binary Phase Shift Keying (BPSK) modulation is used
- Rayleigh fading coefficient is applied via
f = np.sqrt(X**2 + Y**2)where X and Y are Gaussian random variables - Additive white Gaussian noise (AWGN) is applied with varying SNR levels
- The signal transmission is modeled as
Z̃ = f·Z + n
- The Eco2AI library is used to track energy consumption
- Measurements are taken every 5-10 seconds during training
- For accurate measurements, ensure your system has proper power monitoring capabilities
The experiments evaluate and compare CL, FL, and SL based on:
- Model accuracy
- Computational energy consumption
- Communication energy consumption
- Privacy preservation (reconstruction error)
- Robustness to noise and Rayleigh fading
Key findings:
- FL with 8-bit quantization (Q8) offers the best balance between accuracy, communication efficiency, and privacy
- SL provides the highest privacy preservation (highest reconstruction error) and lowest computational energy on client devices
- CL offers good accuracy but poor privacy protection and requires raw data transmission
Based on our experiments (averaged over 10 runs):
| Algorithm | Accuracy | Reconstruction Error | Computational Energy (J) | Communication Energy (J) | Total Energy (J) |
|---|---|---|---|---|---|
| Central | 0.7803 | 0.0154 | 0 | 0.3459 | 0.3459 |
| FL Q8 | 0.7806 | 0.0671 | 60.8200 | 0.0021 | 60.8221 |
| SL | 0.7800 | 0.2681 | 3.4512 | 7.7162 | 11.1674 |
Note: Computational energy is reported on the user side. For Central Learning, computational load is on the server.
To verify these results:
- Run each of the final notebooks (
SentimentEco2AiCentral_Final.ipynb,SentimentEco2Ai_Q8N_Final.ipynb, andSentimentEco2AiSplitLearning_Final.ipynb) - For each notebook, set the number of runs to 10 (look for the parameter
num_runs=10) - The averaged results will be saved in CSV files:
Central_Model_Results_Avg.csv,FL_Avg_Results.csv, andSplit_Learning_Vanilla_Privacy_Average.csv - Plots comparing accuracy, energy consumption, and privacy metrics will be automatically generated
- All experiments use the Sentiment140 dataset (reduced to 50% size)
- The maximum sequence length is set to 30
- Vocabulary size is limited to the 10,000 most frequent words
- Binary Phase Shift Keying (BPSK) is used for digital modulation
- Noise is simulated with varying Signal-to-Noise Ratio (SNR) levels
- Rayleigh fading is applied to simulate realistic wireless channel conditions
If you use this code for your research, please cite the paper:
@article{radwan2024tinyml,
title={TinyML NLP Approach for Semantic Wireless Sentiment Classification},
author={Radwan, Ahmed Y and Shehab, Mohammad and Alouini, Mohamed-Slim},
journal={arXiv preprint arXiv:2411.06291},
year={2024}
}