Machine Learning: What is “Curse of Dimensionality”?

Sandeep Gurjar
2 min readMar 17, 2020
Source, Google Images

The words “Curse of dimensionality” seem scary like some Egyptian horror story but it is one of the most interesting concepts in machine learning. Curse of dimensionality is the challenges that come along the increase in the dimensionality of data.

Let’s take a simple example if I were catching a puppy that can run only in a straight line than I could catch it easily by running behind it. How hard does it get to catch it if it can run in two dimensions, on a football field, Huh? Now imagine if the dog could also fly, it could move in three dimensions, it would get really difficult to catch the flying puppy.

Above flying puppy example is an oversimplified explanation of the curse of dimensionality. As the dimensionality of data increases, it starts exhibiting counter-intuitive and somewhat weird behaviors. This curse of dimensionality is the reason why we use dimensionality reduction techniques like PCA and TSNE. Reducing the dimensionality helps us in getting the data to the point and unearth the underlying trends.

Hughes Phenomenon

Coming back to machine learning. It becomes increasingly difficult to predict the right answer if the dimensionality increases while the number of training points remains the same. This situation also called Hughes Phenomenon.

What happens with the Distance?

When the dimensionality increases, a really weird thing happen with Euclidian distance. All the data points appear to be equidistant from each other — the distances are approximately equal making it hard to do clustering. The concept, the intuition of Euclidian distance like we know it in 2D or 3D, doesn’t apply in higher dimensions.

Dimensionality Reduction

To get a good prediction from a machine learning model in the cases of high dimensional data we need a very large number of data points. The number of data points required for the model to perform well increase exponentially with an increase in dimensionality.

It is not feasible in some situations to train the model with a very large data set due to a lack of enough computational power or the time is an issue. Dimensionality reduction techniques like Principal component analysis can be used to reduce the dimensions to a manageable size.

--

--

Sandeep Gurjar

Just Another Curious Soul Trying Find The Way : Data Science, Machine learning and Artificial Intelligence