The world of Machine Learning is full of Learning Algorithms and each Learning algorithm can produce different class of models. For eg. A Linear Regression Algorithm may produce a Best fit Straight Line Model or even a Polynomial curve Model for a given data. Looking at the data we may be a little inclined to choose a model which best fits the data at hand. But is that always the best class of model to choose?
When creating a model, more often than not, we may think that the major concern is about having to accurately predict the outcome with the data at hand. But, one of the underlying principles of learning algorithms selection is Occam’s razor. Occam’s razor originally states “entities should not be multiplied unnecessarily”. It translates to choose simplicity if given everything else is constant. What it means in the parlance of Model Selection is choose the model which is simpler and generic, given everything else is constant
But, a model that is simple has the following properties;-
- They are more generic and donot capture the entire pattern of given data
- Simpler Models make more error in the training,
Then why simplicity is considered better in Model Selection ? Well, simple models are more generic. They do not memorise or capture the exact pattern of the underlying training data and hence given a new set of data they usually perform better than model which are trained and very accurate with respect to training data (overfitting). Also, simpler models do not require a lot of samples. They only capture the differentiating characteristics of the data and build on that. Let us consider an analogy. We are training a person to drive car in the closed roads of our society. A society is usually a calm place and the roads are not crowded with people or with vehicles. A person who is always trained to drive on the roads of a closed society, when exposed to the rush hour madness of the city traffic, will he be able to perform well ? Driving in rush hour traffic has lot of challenges, you need to be able to handle the chaos of people jumping from every where and avoid bumping vehicles that are almost lined back to back. You may be required to make judgements of the narrow spaces. Hence, the answer is a big emphatic “No”. The same is the case with models. The merit of a model is not just to perform decently on the data at hand but its main usability lies in how well does it perform with the data that it has not come across. Hence, simplicity is sensibility
Leave a Reply