Skip to content

AhmedRadwan02/TinyEco2AI-NLP

Repository files navigation

TinyML NLP for Semantic Wireless Sentiment Classification

This repository contains implementations of various approaches for performing sentiment analysis using TinyML techniques in a wireless communication context. The goal is to provide energy-efficient, privacy-preserving methods for natural language processing on resource-constrained devices.

Overview

The code implements three main learning paradigms for NLP sentiment classification:

  1. Centralized Learning (CL) - Traditional approach where raw data is sent to a central server
  2. Federated Learning (FL) - Distributed approach where model updates are shared
  3. Split Learning (SL) - Hybrid approach where model is split between client and server

All implementations are designed with resource constraints in mind and utilize the Sentiment140 dataset for tweet sentiment classification.

Main Files

  • SentimentEco2AiCentral_Final.ipynb: Implementation of centralized learning
  • SentimentEco2Ai_FLnoQuan.ipynb: Federated learning without quantization
  • SentimentEco2Ai_Q8.ipynb: Federated learning with 8-bit quantization
  • SentimentEco2Ai_Q8FN.ipynb: Federated learning with 8-bit quantization and fading/noise
  • SentimentEco2Ai_Q8N_Final.ipynb: Final federated learning implementation with 8-bit quantization and noise
  • SentimentEco2AiSplitLearning_Final.ipynb: Split learning implementation

Key Features

  • Implementations of FL, CL, and SL for sentiment analysis
  • Energy consumption tracking using Eco2AI
  • Privacy evaluation through reconstruction error metrics
  • Simulation of wireless channel effects (Rayleigh fading and noise)
  • Quantization techniques for efficient communication
  • Communication energy optimization

Dependencies

To run the notebooks, you'll need:

tensorflow>=2.0.0
keras
numpy
pandas
matplotlib
nltk
eco2ai

You'll also need to download the Sentiment140 dataset and NLTK stopwords:

import nltk
nltk.download('stopwords')

Usage

  1. Download or clone the repository
  2. Download the Sentiment140 dataset:
    • Dataset: Sentiment140
    • Download file: training.1600000.processed.noemoticon.csv
    • Place it in the main directory
  3. Install required dependencies:
    pip install tensorflow keras numpy pandas matplotlib nltk eco2ai
  4. Configure the environment:
    import nltk
    nltk.download('stopwords')
  5. Run the notebooks in the following order to fully replicate the experiments:
    • First run SentimentEco2AiCentral_Final.ipynb to establish the centralized learning baseline
    • Then run SentimentEco2Ai_Q8N_Final.ipynb for federated learning with 8-bit quantization
    • Finally run SentimentEco2AiSplitLearning_Final.ipynb for the split learning approach

Implementation Details for Replication

Data Preprocessing

All notebooks follow similar preprocessing steps:

  • Read the Sentiment140 CSV file (with Latin encoding)
  • Extract only the sentiment labels and text
  • Convert sentiment labels (0 for negative, 4 for positive) to "Negative" and "Positive"
  • Remove stop words, URLs, special characters, and usernames
  • Limit vocabulary to 10,000 most frequent words
  • Pad sequences to a maximum length of 30

Model Architecture

  • All approaches use a similar architecture with:
    • Embedding layer (8 dimensions)
    • Conv1D layer with 32 filters and kernel size 3
    • BatchNormalization
    • MaxPooling1D
    • LSTM layer with 32 units
    • Dense layer with 16 units
    • Output layer with sigmoid activation

Wireless Channel Simulation

To replicate the wireless channel effects:

  • Binary Phase Shift Keying (BPSK) modulation is used
  • Rayleigh fading coefficient is applied via f = np.sqrt(X**2 + Y**2) where X and Y are Gaussian random variables
  • Additive white Gaussian noise (AWGN) is applied with varying SNR levels
  • The signal transmission is modeled as Z̃ = f·Z + n

Energy Consumption Tracking

  • The Eco2AI library is used to track energy consumption
  • Measurements are taken every 5-10 seconds during training
  • For accurate measurements, ensure your system has proper power monitoring capabilities

Results

The experiments evaluate and compare CL, FL, and SL based on:

  • Model accuracy
  • Computational energy consumption
  • Communication energy consumption
  • Privacy preservation (reconstruction error)
  • Robustness to noise and Rayleigh fading

Key findings:

  • FL with 8-bit quantization (Q8) offers the best balance between accuracy, communication efficiency, and privacy
  • SL provides the highest privacy preservation (highest reconstruction error) and lowest computational energy on client devices
  • CL offers good accuracy but poor privacy protection and requires raw data transmission

Quantitative Results

Based on our experiments (averaged over 10 runs):

Algorithm Accuracy Reconstruction Error Computational Energy (J) Communication Energy (J) Total Energy (J)
Central 0.7803 0.0154 0 0.3459 0.3459
FL Q8 0.7806 0.0671 60.8200 0.0021 60.8221
SL 0.7800 0.2681 3.4512 7.7162 11.1674

Note: Computational energy is reported on the user side. For Central Learning, computational load is on the server.

How to Reproduce Results

To verify these results:

  1. Run each of the final notebooks (SentimentEco2AiCentral_Final.ipynb, SentimentEco2Ai_Q8N_Final.ipynb, and SentimentEco2AiSplitLearning_Final.ipynb)
  2. For each notebook, set the number of runs to 10 (look for the parameter num_runs=10)
  3. The averaged results will be saved in CSV files: Central_Model_Results_Avg.csv, FL_Avg_Results.csv, and Split_Learning_Vanilla_Privacy_Average.csv
  4. Plots comparing accuracy, energy consumption, and privacy metrics will be automatically generated

Notes on Experimental Setup

  • All experiments use the Sentiment140 dataset (reduced to 50% size)
  • The maximum sequence length is set to 30
  • Vocabulary size is limited to the 10,000 most frequent words
  • Binary Phase Shift Keying (BPSK) is used for digital modulation
  • Noise is simulated with varying Signal-to-Noise Ratio (SNR) levels
  • Rayleigh fading is applied to simulate realistic wireless channel conditions

Citation

If you use this code for your research, please cite the paper:

@article{radwan2024tinyml,
  title={TinyML NLP Approach for Semantic Wireless Sentiment Classification},
  author={Radwan, Ahmed Y and Shehab, Mohammad and Alouini, Mohamed-Slim},
  journal={arXiv preprint arXiv:2411.06291},
  year={2024}
}

License

MIT License

About

Code for "TinyML NLP Scheme for Semantic Wireless Sentiment Classification." Implements energy-efficient, privacy-preserving FL and SL models over noisy wireless channels with quantization and semantic encoding.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors