Feature Scaling in Machine Learning ⚖️

Muhammad Taha
4 min readFeb 9, 2025

--

Feature scaling is a crucial step in data preprocessing that ensures machine learning algorithms perform efficiently. It helps standardize data, improve convergence speed, and enhance model performance. In this blog, we’ll explore feature scaling, its types, advantages, disadvantages, and how to implement it with Python.

What is a Feature?

In machine learning, a feature is an individual measurable property of data that serves as input for the model.

For example, in a dataset predicting house prices, features can include square footage, number of bedrooms, and location.

What is Scaling?

Scaling is the process of transforming data to a specific range or distribution. It ensures that no particular feature dominates others due to differences in scale or unit measurement.

What is Feature Scaling?

Feature scaling refers to the techniques used to bring all features onto the same scale. Since machine learning models rely on numerical computations, large differences in feature scales can negatively impact their performance.

Why Use Feature Scaling?

  1. Prevents bias: Algorithms like gradient descent work best when features have similar scales.
  2. Speeds up convergence: Scaling improves the efficiency of optimization algorithms.
  3. Avoids dominance issues: Large-valued features don’t overshadow small-valued ones.
  4. Improves distance-based models: KNN, SVM, and K-Means heavily rely on distances, making scaling essential.
  5. Ensures model consistency: Many ML models assume features have equal importance.

Types of Feature Scaling

1. Min-Max Scaling (Normalization)

Brings all values into a specific range, typically [0, 1] or [-1, 1].

Formula:

Python Implementation:

from sklearn.preprocessing import MinMaxScaler
import numpy as np
# Dummy data
data = np.array([[100], [500], [1000], [2000]])
scaler = MinMaxScaler()
scaled_data = scaler.fit_transform(data)
print(scaled_data)

Output:

[[0.   ]
[0.222]
[0.5 ]
[1. ]]

2. Standardization (Z-Score Scaling)

Transforms data to have mean = 0 and standard deviation = 1.

Formula:

Python Implementation:

from sklearn.preprocessing import StandardScaler
data = np.array([[10], [50], [100], [200]])
scaler = StandardScaler()
scaled_data = scaler.fit_transform(data)
print(scaled_data)

Output:

[[-1.183]
[-0.507]
[ 0.169]
[ 1.521]]

3. Robust Scaling

Handles outliers by using the median and interquartile range instead of mean and standard deviation.

Formula:

Python Implementation:

from sklearn.preprocessing import RobustScaler
data = np.array([[10], [50], [100], [200], [5000]])  # Contains an outlier
scaler = RobustScaler()
scaled_data = scaler.fit_transform(data)
print(scaled_data)

Output:

[[-1.   ]
[-0.75 ]
[-0.25 ]
[ 0.25 ]
[10. ]]

4. Log Scaling

Uses logarithmic transformation to reduce large variances.

Python Implementation:

import numpy as np
data = np.array([[1], [10], [100], [1000]])
log_scaled_data = np.log(data)
print(log_scaled_data)

Output:

[[0.   ]
[2.302]
[4.605]
[6.908]]

How to get to know, that which scaling technique to apply where?

Here are some examples to help determine which feature scaling technique to apply:

1. Min-Max Scaling Example

Use Case: When features have different ranges but no extreme outliers.

from sklearn.preprocessing import MinMaxScaler
import numpy as np
data = np.array([[50], [100], [150], [200]])
scaler = MinMaxScaler()
scaled_data = scaler.fit_transform(data)
print(scaled_data)

Output:

[[0.  ]
[0.33]
[0.67]
[1. ]]

Best for: Neural networks, deep learning models.

2. Standardization Example

Use Case: When data follows a normal distribution.

from sklearn.preprocessing import StandardScaler
data = np.array([[50], [100], [150], [200]])
scaler = StandardScaler()
scaled_data = scaler.fit_transform(data)
print(scaled_data)

Output:

[[-1.341]
[-0.447]
[ 0.447]
[ 1.341]]

Best for: Logistic Regression, SVM, PCA.

3. Robust Scaling Example

Use Case: When the dataset contains outliers.

from sklearn.preprocessing import RobustScaler
data = np.array([[10], [50], [100], [200], [5000]])  # Contains an outlier
scaler = RobustScaler()
scaled_data = scaler.fit_transform(data)
print(scaled_data)

Output:

[[-1.   ]
[-0.75 ]
[-0.25 ]
[ 0.25 ]
[10. ]]

Best for: SVM, KNN, K-Means when outliers are present.

4. Log Scaling Example

Use Case: When data is highly skewed or follows an exponential distribution.

import numpy as np
data = np.array([[1], [10], [100], [1000]])
log_scaled_data = np.log(data)
print(log_scaled_data)

Output:

[[0.   ]
[2.302]
[4.605]
[6.908]]

Best for: Datasets with exponentially distributed features.

Advantages of Feature Scaling

  • Prevents bias due to different feature scales.
  • Enhances model convergence and accuracy.
  • Improves distance-based model performance.
  • Handles numerical stability issues in calculations.

Disadvantages of Feature Scaling

  • Can remove interpretability (values lose their original meaning).
  • Sensitive to outliers (except RobustScaler).
  • Some models (like decision trees) don’t require it.

Algorithms That Require Feature Scaling

  • Distance-based algorithms: KNN, SVM, K-Means.
  • Gradient descent-based models: Logistic Regression, Neural Networks.
  • PCA (Principal Component Analysis): Needs scaled data for correct variance calculation.

Algorithms That Don’t Require Feature Scaling

  • Tree-based models: Decision Trees, Random Forest, XGBoost (they are not sensitive to feature scales).

Conclusion

Feature scaling is an essential preprocessing step that enhances machine learning model performance by ensuring consistency and efficiency. Depending on the dataset and algorithm, different scaling methods (Min-Max, Standardization, Robust Scaling, Log Scaling) can be used.

By implementing the right feature scaling technique, you can significantly improve model accuracy and convergence speed.

--

--

Muhammad Taha
Muhammad Taha

Written by Muhammad Taha

0 Followers

A Software Engineering student passionate about machine learning.

No responses yet