A professional, modular malware classification system that analyzes network traffic patterns to detect and classify malware using machine learning.
- Multiple Feature Sets: Core flow features, SPL packet size features, and combined feature sets
- Multiple Models: Neural Networks, Random Forest, and FAISS k-NN
- Binary & Multiclass Classification: Detect malware vs benign, or classify specific malware families
- Reproducible Results: Fixed random seeds ensure consistent results across runs
- Comprehensive Analysis: Detailed metrics, confusion matrices, ROC-AUC, cross-validation, and feature importance analysis
- Professional Architecture: Clean, modular, and extensible codebase
├── config/ # Configuration files
│ ├── features.py # Feature definitions
│ └── hyperparameters.py # Model configurations
├── data/ # Data handling modules
│ ├── loader.py # Data loading utilities
│ └── preprocessor.py # Feature preprocessing
├── models/ # Model implementations
│ ├── base.py # Abstract base class
│ ├── neural_network.py # Neural network classifier
│ ├── random_forest.py # Random forest classifier
│ └── faiss_knn.py # FAISS k-NN classifier
├── evaluation/ # Evaluation and visualization
│ ├── metrics.py # Metrics calculation and analysis
│ └── visualization.py # Result visualization
├── experiments/ # Experiment orchestration
│ └── runner.py # Experiment runner
└── notebooks/ # Jupyter notebooks and results
├── experiments.ipynb # Main experimental notebook
├── results/ # Output CSV files
│ ├── classification_results.csv
│ ├── cv_results.csv
│ └── feature_importance.csv
└── plots/ # Confusion matrices and visualizations
-
Setup Environment:
# Install required packages pip install pandas numpy scikit-learn tensorflow faiss-cpu matplotlib seaborn -
Run Experiments:
from experiments.runner import ExperimentRunner # Initialize runner with your data path runner = ExperimentRunner('path/to/nfs_all_datasets_clean_final.csv') # Run a single experiment results = runner.run_single_experiment( feature_set='core', # 'core', 'splt', or 'core+splt' task='binary', # 'binary', 'malware_multiclass', or 'multiclass' model_type='random_forest' # 'neural_network', 'random_forest', or 'faiss' ) # Run all experiments all_results = runner.run_all_experiments() runner.save_results('results.csv')
-
Use Jupyter Notebook: Open
notebooks/experiments.ipynbfor an interactive experience with all experiments, ROC curves, cross-validation, and feature importance analysis.
- core: 33 network flow features (duration, bytes, packets, packet sizes, inter-arrival times)
- splt: 25 packet size features from SPL analysis
- core+splt: Combined 58 features
- Neural Network: Multi-layer perceptron with batch normalization and dropout
- Random Forest: Ensemble classifier with 300 estimators
- FAISS k-NN: Fast similarity search with cosine similarity
Preprocessing policy:
- Random Forest is scale-invariant, so feature scaling is skipped for RF across all feature sets.
- Neural Network and FAISS use standardized features.
All hyperparameters are centrally configured in config/hyperparameters.py to ensure reproducible results:
- Fixed random seeds (42)
- Model-specific configurations
- Preprocessing parameters
The system expects a CSV file with the following structure:
SourceFolder: Data source (desktop-malware, mobile-malware, desktop-apps, mobile-apps)StandardizedAppName: Malware family name (for multiclass classification)splt_ps: Packet size sequence (for SPL features)- Network flow features (bidirectional_, src2dst_, dst2src_*)
Notes on SPL features:
- NFStream typically provides 25-length SPL arrays (direction, packet sizes, and inter-arrival times). This project extracts all 25 elements into
ps_1..ps_25. - Shorter flows are padded by NFStream (with -1), longer flows are truncated, so all sequences are equal-length and do not require imputation.
The system provides comprehensive analysis including:
- Classification metrics (accuracy, precision, recall, F1-score, ROC-AUC)
- ROC curves for binary and multiclass classification
- 5-fold stratified cross-validation for stability assessment
- Feature importance analysis from Random Forest models
- Confusion matrices with heatmap visualizations
- Per-class error analysis for multiclass tasks
- Feature set and model comparisons
# Compare all feature sets for binary classification
feature_comparison = runner.run_feature_set_comparison(
task='binary',
model_type='random_forest'
)
# Compare all models for specific configuration
model_comparison = runner.run_model_comparison(
feature_set='core',
task='binary'
)
# Get best performing configurations
best_results = runner.get_best_results(metric='f1')
# Run 5-fold cross-validation
cv_results = runner.run_kfold_experiment(
feature_set='core+splt',
task='binary',
model_type='random_forest',
n_folds=5
)