Skip to content

Latest commit

 

History

History
81 lines (59 loc) · 2.56 KB

File metadata and controls

81 lines (59 loc) · 2.56 KB

Feature Scaling in Machine Learning (This guide provides an overview of feature scaling techniques and their importance in machine learning.

)

Introduction

Feature scaling is an essential preprocessing step in machine learning that ensures numerical values across different features have similar ranges. This is especially crucial for algorithms that rely on distance-based calculations or gradient-based optimization.

Why Feature Scaling?

Different features in a dataset may have varying ranges, which can lead to:

  • Biased models where features with larger magnitudes dominate.
  • Slower convergence during training.
  • Poor model performance, especially in distance-based algorithms.

Common Feature Scaling Techniques

There are two main techniques used to scale features:

1. Min-Max Scaling (Normalization)

Rescales features to a fixed range, usually [0,1] or [-1,1].

Formula:

minmax

When to Use:

  • When features have different scales.
  • Suitable for deep learning models.

Example in Python:

from sklearn.preprocessing import MinMaxScaler
import numpy as np

# Example data
X = np.array([[5.1, 3.5, 1.4, 0.2],
              [4.9, 3.0, 1.4, 0.2],
              [4.7, 3.2, 1.3, 0.2]])

# Apply Min-Max Scaling
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)
print(X_scaled)

2. Standardization (Z-score Scaling)

Standardizes features by removing the mean and scaling to unit variance.

Formula:

standard

When to Use:

  • When data follows a normal distribution.
  • Required for algorithms like logistic regression, SVM, and k-means clustering.

Example in Python:

from sklearn.preprocessing import StandardScaler

# Apply Standardization
scaler = StandardScaler()
X_standardized = scaler.fit_transform(X)
print(X_standardized)

When is Feature Scaling Necessary?

Algorithm Requires Scaling?
Linear Regression Recommended
Logistic Regression Recommended
K-Means Clustering Yes
K-Nearest Neighbors (KNN) Yes
Support Vector Machine (SVM) Yes
Neural Networks Yes
Decision Trees No
Random Forest No

Conclusion

Feature scaling is an important step in machine learning that ensures fair treatment of all features. Using Min-Max Scaling or Standardization improves model accuracy and convergence speed, making it an essential preprocessing step.