![]() Lets see what happens as we add more features, and therefore dimensions to our data: This is done via a feature selection or dimensionality reduction technique, which can be sort of seen as feature selection. One of the most important steps in a Machine Learning Pipeline is selecting those features that really improve our models and keep them compact, understandable, and scalable. Most times, we tend to throw our data sets with many features at our models and wait to see what comes out, hoping that if the results are not good enough, more features will do the trick. How do Higher Dimensions affect our Machine Learning algorithms?□ ![]() How exactly do more features affect Machine Learning? Lets check it out! □ 2. We are terrible however, at trying to understand what happens when we are taken away from this limited dimensional space.īecause of this, when our data has a lot of dimensions, the effects of that high dimensionality on our models and algorithms can escape our intuition, despite us knowing about the fact that more features doesn’t always lead to better results in our Machine Learning pipelines. ![]() If kept inside this frame of reference, we can visually deduct why things happen, and logically reach some assumptions. Our human intuition and understanding is limited to a three dimensional world. The Curse of Dimensionality refers to certain behaviours or effects that appear when analysing or playing with data in high dimensions (with many features), which do not appear when the number of dimensions is low. Lastly, we will merge everything together, in a highly visual manner so that the puzzle is solved, introducing other resources and insights. Then, we will see specifically how it makes distance metrics loose their meaning, and go through the most important parts of the paper. This paper however, is not amazingly intuitive, and has a lot of complex mathematical formulation, so in this post I will explain in a simple and gradual manner why distance metrics like the Euclidean distance, suffer so much when calculated in high dimensions.įirstly, we will see what the famous Curse of Dimensionality is and how it can affect our Machine Learning algorithms. There is a nominal paper in the field of Machine Learning that speaks about Distance Metrics specially suffering from the Curse of Dimensionality. Īlso, dimensionality reduction techniques like PCA or Kernel PCA are a very clever treatment option to apply to our data when we are going to use an algorithm that computes distance metrics. You can learn all about feature selection in this article. Reducing the dimensions of our data is something which should always try to be done in order to remove redundant or noisy features. This is not very easy to understand, and it comes into conflict with the general understanding that more features equals to better Machine Learning models, which is far from being true. Its like they dilute and loose their meaning. Photo by Daniele Levis Pelusi on Unsplash 0) Introductionĭistance metrics, like Euclidean, Manhattan, Minkowsky and so on, suffer a lot when we increase the number of dimensions (features) of our data.
0 Comments
Leave a Reply. |