The Electrical Fault Dataset is a multivariate time series dataset. It is obtained by modelling a 3 phase - transmission line of power system on MATLAB Simulink. Three-phase systems or electrical power systems often use three sinusoidal voltages (Va, Vb, Vc) that are phase-shifted by 120 degrees. Each phase also carries a current (Ia, Ib, Ic). The dataset is designed to find faults in transmission line and classify the type of fault using the line voltage and current.
The fault can be Line-to-line, Line-to-ground, Line-to-line-to-ground and more. Line-to-line (LL) fault is a fault between two phase conductors (e.g., A-B). This typically appears as a short-duration high-energy event affecting particular frequency bins. Line-to-ground (LG) fault: a fault between a phase conductor and ground. Pattern differs from LL faults and can be identified using combinations of voltage and current measurements. LLG / LLLG: multi-conductor faults involving two or three lines and possibly ground. These create distinct signatures across voltage and current channels.
Users can choose between two dataset options, each processed by a different script to produce a distinct output file:
- 2-class dataset (
detect_dataset.xlsx): Processed byelectrical_fault.pyto createelectrical_fault_dataset.zipfor fault detection (binary classification - fault vs no fault) - 6-class dataset (
classData.csv): Processed byelectrical_fault_6class.pyto createelectrical_fault_6class_dataset.zipfor fault type classification (6 fault types based on G,C,B,A combinations)
There are 6 measurable parameters/variables - (Va, Vb, Vc, Ia, Ib, Ic) i.e the voltage and current of three phases. There are two dataset files present in the compressed zip: electrical_fault_raw.zip
detect_dataset.xlsx: which finds if there is a fault or not. Target = [Output (S)]- There is one target only i.e. Output (S) which has two unique values (0, 1) denoting fault and no fault.
classData.csv: which classifies the type of fault Target = [G, C, B, A]- There are 4 target variables i.e G (Ground), C (Node C), B (Node B), A (NodeA). The value of each target is either 0 or 1.
- Examples [G, C, B, A]:
- [0, 0, 0, 0] means No Fault
- [0, 1, 1, 0] means LL Fault btw Node B and Node C
- [0, 1, 1, 1] means LLL Fault btw all Nodes
- [1, 0, 0, 1] means LG Fault btw Ground and Node A
- [1, 0, 1, 1] means LLG Fault btw Node A, Node B and Ground
- [1, 1, 1, 1] means LLLG Fault btw all Nodes and Ground
Depending on the dataset chosen, users can either detect whether there is an electrical fault (binary classification) or classify the type of fault (6-class classification).
Users can prepare the zipped dataset using either of two python scripts, depending on which dataset they want to use:
For 2-class dataset (fault detection):
cd examples/electrical_fault
python electrical_fault.pyThis creates electrical_fault_dataset.zip
For 6-class dataset (fault type classification):
cd examples/electrical_fault
python electrical_fault_6class.pyThis creates electrical_fault_6class_dataset.zip
The path to the appropriate zipped dataset file should be mentioned in configuration yaml under dataset.input_data_path, make sure it matches the script you ran.
dataset:
input_data_path: 'examples/electrical_fault/electrical_fault_dataset.zip' # OR 'examples/electrical_fault/electrical_fault_6class_dataset.zip'This zipped dataset is designed to work with Tiny ML ModelZoo. Run the modelzoo with the yaml configuration using the below code.
run_tinyml_modelzoo.sh examples/electrical_fault/config.yamlrun_tinyml_modelzoo.shis the script to run modelzoo. It take two required arguments.examples/electrical_fault/config.yamlpath of the yaml configuration to run
The users can configure the yaml configuration to change parameters related to data preprocessing feature extraction, training, testing, model and model compilation. In this example, we will configure the parameters of feature extraction.
Multicollinearity is a condition when independent variable posses a linear relationship with one or more than one independent variable. Raw voltage and current channels can be strongly correlated (for example, Vc may be nearly a linear combination of Va and Vb in some operating conditions). In the time domain this redundancy can confuse simple models. This gives rise to poor prediction of weights of ML model during training.
Transforming to the frequency domain (FFT) captures how energy is distributed across frequencies, and binning groups similar spectral regions together. Spectral features often decorrelate spatially correlated channels because they focus on frequency content rather than instantaneous amplitude. After binning and optional log-scaling, the model sees compact, lower-dimensional features that are more robust to correlated input channels.
In this electrical fault dataset, the independent variables are current and voltages of Phase A, B, C. If we check the collinearity between these variables. We find that Multicollinearity exists between some independent variables.
| Independent Variable | Ia | Ib | Ic | Va | Vb | Vc |
|---|---|---|---|---|---|---|
| Ia | 1.00 | -0.49 | -0.45 | 0.23 | 0.69 | -0.94 |
| Ib | -0.49 | 1.00 | -0.54 | -0.95 | 0.26 | 0.72 |
| Ic | -0.45 | -0.54 | 1.00 | 0.74 | -0.94 | 0.17 |
| Va | 0.23 | -0.95 | 0.74 | 1.00 | -0.52 | -0.51 |
| Vb | 0.69 | 0.26 | -0.94 | -0.52 | 1.00 | -0.46 |
| Vc | -0.94 | 0.72 | 0.17 | -0.51 | -0.46 | 1.00 |
To solve the problem of Multicollinearity, we can do one or more of the following:
- Remove the highly correlated features
- Perform feature extraction to do dimensionality reduction of features
In this example we will explore the method to do dimensionality reduction using FFT and BINNING of features. The data preprocessing feature extraction section of yaml configuration can be used to configure it.
When modifying the configuration file to disable feature extraction (as illustrated below) in modelmaker, you'll encounter an error that prevents proper model training with effective hyperparameters. This issue stems from multicollinearity in the data.
data_processing_feature_extraction:
feature_extraction_name: Custom_Default
feat_ext_transform: []Now, with this information lets see how feature extraction will effect this error.
- FFT related options: FFT_FE, FFT_POS_HALF, DC_REMOVE, ABS
FFT_FEis used to perform fft on a frameFFT_POS_HALFtakes the 1st half of the fft which is symmetrical from middleDC_REMOVEremoves the DC component of the FFTABStakes the magnitude of the real and imaginary values of FFT
- Binning related options: BINNING
BINNINGperforms the binning of magnitude of fft values
- Other options: LOG_DB, CONCAT
LOG_DBtakes the log of binned valuesCONCATdoes concatenation of current features with features from previous frames
frame_size = 256: using 256-sample frames. If the sampling rate is 2048 Hz, this corresponds to a time window of 125 ms.- After FFT, positive half of spectrum length is
frame_size/2 = 128bins. feature_size_per_frame = 32with binning: each feature aggregates128 / 32 = 4FFT bins (simple uniform grouping).num_frame_concat = 4: the final model input has32 * 4 = 128values per channel per example.
We have to add FFT and few more transforms in this transform variable.
data_processing_feature_extraction:
feat_ext_transform: ['FFT_FE', 'FFT_POS_HALF', 'DC_REMOVE', 'ABS', 'LOG_DB', 'CONCAT']Next we will define our features shape.
data_processing_feature_extraction:
feat_ext_transform: ['FFT_FE', 'FFT_POS_HALF', 'DC_REMOVE', 'ABS', 'LOG_DB', 'CONCAT']
frame_size: 256
feature_size_per_frame: 128
num_frame_concat: 1frame_size: slices frame from the dataset of size frame_sizefeature_size_per_frame: size of binned features from one framenum_frame_concat: number of frames used for concatenating features
After doing the above changes in yaml configuration file. Run the modelzoo again for this dataset.
run_tinyml_modelzoo.sh examples/electrical_fault/config.yamlYou can see that, you don't encounter any error during modelmaker run. This is because the feature extraction was succesfully able to mitigate the mulitcollinearity problem. This will resolve the error of to train the model properly with good hyper parameters.
Another feature extraction is to perform FFT with Binning. For this, we need to add BINNING to transforms. The feature size for each frame would become half of the frame size and can be reduced further based on feature_size_per_frame selected. So, yaml configuration would look like.
data_processing_feature_extraction:
feat_ext_transform: ['FFT_FE', 'FFT_POS_HALF', 'DC_REMOVE', 'BINNING', 'ABS', 'LOG_DB', 'CONCAT']
frame_size: 256
feature_size_per_frame: 32
num_frame_concat: 4The first configuration employed individual frames (num_frame_concat: 1), while the second configuration utilized a sequence of 4 frames (num_frame_concat: 4). To decrease the model's size, you can lower the number of frames processed. For instance, setting num_frame_concat to 2 in the second configuration would approximately halve both the model size and processing time on your device.
We benchmarked the performance of the CLS_ResCat_3k model. The device used is F28P55x which comes with a HW accelearator (TINPU) to give low latency performance on ML models. Numbers are provided for running the model on NPU & CPU. Here both the configuration of Feature extraction produces the same architecture of model, so the model performance will be same. We clubbed the two configuration as 'with Feature Extraction'.
| Configuration | AI Model Cycles | Inference Time (us) | Flash Usage (B) | SRAM Usage (B) |
|---|---|---|---|---|
| NPU (without Feature Extraction) | Bad Training | Bad Training | Bad Training | Bad Training |
| NPU (with Feature Extraction) | 125509 | 836.73 | 3175 | 3846 |
| CPU (with Feature Extraction) | 500860 | 3339.07 | 2995 | 4992 |
Update history: [29th Dec 2025]: Compatible with v1.2 of Tiny ML Modelmaker [12th Mar 2025]: Compatible with v1.0 of Tiny ML Modelmaker
