Feature Scaling in Machine Learning: Understanding Normalization and Standardization.

Why Should I Standardize or Normalize?

Terms like Feature Scaling, Normalization and Standardization are very common in Data science and Machine learning world. These terms can be very confusing in the beginning when one is trying to navigate through the waters of Machine Learning for the first time. The objective of this article is to develop an understanding of these terms.

Feature scaling is a data transformation method we use for making the scales of different features irrelevant to the importance assigned to that feature. When we work on datasets where the scales of features are varying than we use feature scaling.

When the features have a multitude of scales such as Kilogram, Gram, Milligram, Litre, meter cube etc. and feature scaling is not done than model can become biased towards one or more features. In case of gradient descent the scale of the data can influence the partial derivatives of the model parameters.

Feature scaling is very important for some machine learning algorithms while some algorithms are not affected by feature scaling at all. That happens because different algorithms utilise different learning techniques.

For example distance based algorithms and the ones that make use of gradient descent can benefit greatly from feature scaling. On the other hand, for decision trees feature scaling is inconsequential.

The Normalisation method of feature scaling is based on the use of Minimum and Maximum values in the feature. Let’s say we want to normalise a feature, how do I go about it. The new normalised value for each element in the feature is given by below formula:

  • where x’ is the normalised value and x is the original value.
  • All x’ values lie between 0 and 1.

Consider this simple example where we want to normalise a feature containing the electricity consumption of each house in a town. The consumption spans from 60 units to 360 units. To rescale this data we subtract 60 (minimum value) from each house’s electricity reading and divide it by 300 (maximum value minus minimum value).

Min max normalisation guarantees that all the features will have same scale but it does not handle the outliers very well. Outliers also get transferred to [0,1] range thus squishing the rest of the data.

Standardisation is way of feature scaling that can avoid outlier issue (it does not remove outliers for that there are other techniques such as clipping). Standardisation is done by below formula:

  • x’ is the standardised value and s is the original unstandardized value.
  • x bar is mean and sigma is standard deviation of the original feature.
  • Mean of the new standardized feature is zero and standard deviation is 1.
  • The values are not restricted to a particular range.

There is no specific rule defining when to use which technique.

Standardisation is preferred by people when the data is gaussian distributed.

Normalisation is preferred when the distribution of data is unknown. Normalisation gets heavily effected by outliers.

Just Another Curious Soul Trying Find The Way : Data Science | Market Research