Bias and Variance, How to deal with them
The best way to evaluate the performance of a model is by measuring how a model works with data that it has never observed, it is the best scenario to verify how the predictions will behave in a production environment. Before starting the training, the data set available for analysis is divided into different subsets, each with its respective objective. In this concept, the test set is the one that is not taken into account for training, but is of the utmost importance at the time of evaluation.
In this sense, a model can be, in the best of cases and being optimistic, correctly created. But things are not as simple as they seem, in practice this happens very little, so the task of evaluating the model is usually extremely entertaining and fun, being honest, we will come across two scenarios, that the model is over adjusted, or that it is misaligned. Let’s find out what each of these scenarios means and discuss what to do to solve it.
Underfitting
If a model is suffering underfitting means that it is not generalizing to the data, and the metrics are bad even in the training set (Fig. 1). This behavior is usually detected quickly, without the need to run a set of tests to evaluate it, but it is necessary to verify that the aforementioned is really happening, so we cannot rule out this task.
Possible causes of the underfittig can be:
- A very simple architecture that does not fit the data set.
- An aggressive regularization of the variables. This makes it penalize too much and the model discards patterns that are important to make the prediction.
- Few variables supplied to the model.
This scenario, in technical terms, is known as high bias. In order to minimize it, we can apply the following:
- Increase the complexity of the model, that is, use a deeper architecture.
- Try to find additional variables, that is, look for other variables that can affect the prediction, for example, we are training a model with 2 variables that do not generalize the reality.
- Apply a smoother regularization, letting the variability of the data help the model generalize the patterns.
How to notice an underfitting problem.
Analyze the metrics and realize that the model makes poor predictions, even on the dataset it was trained with, and the test set is the signal that tells us that the model is underfitting.
Overfitting
If a model suffering overfitting means that, at the moment of being trained, it learns from the training data so well that we obtain extremely good metrics, this can become very dangerous if it is not detected, getting carried away by such good metrics can result in problems in production (Fig. 1).
Possible causes of the overfittig can be:
- The model memorizes the training data very well, but it does not generalize the patterns, that is, it can make almost exact predictions for the data with which it was trained, but it is deficient with new inputs.
- The architecture being handled can be so complex that the model optimizes the metrics in the training set very well.
- There are many variables that sometimes do not contribute much to the behavior of the model, but it continues to take them into account.
- We are working with a very small data set.
The problem of overfitting, in technical terms, translates into a high variance event, this effect can be managed by applying certain techniques that help minimize it. The most important and used within this concept are:
- Due to the fact that a high variance effect is caused by a complex architecture, the first step that is applied is to create a simpler model, this looking for it not to learn and memorize the training set, but to find patterns and generalize them.
- Apply regularization techniques, this helps to penalize certain variables that contribute little to the model.
- Reduce the number of variables supplied to the model.
- Increase the number of examples to obtain more variability from them.
How to notice an overfitting problem.
The high precision of the predictions in a training set versus the low precision in the test set is the signal to realize that the model is overfitted, it will be necessary to apply one of the bridges already mentioned, or more than one to avoid obtaining those results.
Just Right
Taking both scenarios into account, the best thing would be to have a balanced model, which is complex enough to be able to find the data patterns, but as simple as possible to prevent the model from learning and memorizing the training set in instead of generalizing patterns (Fig. 1).
Have a data set where you have the correct variables to be supplied to the model, an optimal number of them, increase them if you suffer from high bias, apply dimensionality reduction if you suffer from high variance.
In this context, the metrics will be our main allies to evaluate the behavior of the model and avoid falling into these two big problems in the world of machine learning.
Understanding the models
To understand the three scenarios we will take the header pic in order to analyze each situation and make conclusions.
Figure 1: Bias, Variance and just right plot.
We see three different plots, one of these correspond on a model with high bias, high variance and a correct one.
-
High Bias (Underfitting): If we look at the first plot of figure 1, we can conclude that it is used a linear model, the weights are trying to represent as well as possible the patterns of the data but getting a poor result, a linear model is not complex enough to generalize the patterns. The final cost is high and the model cannot be optimized anymore.
-
Hight Variance (Overfitting): Now we will analyze the third plot, a polynomial model is used, here, the data is being learned as well as the cost is almost zero. We can think it is the best scenario, but if we introduce new data to make predictions, the model will not be able to make good ones.
-
Just Right: We use a quadratic model, it is also a polynomial model, but not as coplex as the third one, avoiding to underfit the data, as the first plot done and overfit it as the third done. Now we have a model that understands the data and the patterns and it is possible to make a good prediction with new inputs.
Let’s introduce a new input on each model and see what happen.
Figure 2: Plots with new input added.
The new input is marked as a X, in the first and third plot, the difference between the prediction and the new input is high, that means we will obtaind a bad prediction of the new data. In constrast, on the model where we do not have overfitting and underfitting, the prediction line is almost the same as the new input, obtaining a lower cost as the first and third models.
Do not forget to make sure you do not suffer from these two concepts, before sending a model to production, it is of the utmost importance if you want to have high accuracy and good predictions. Remember that each model is different, for some, a technique to avoid models with bad results may work, for others, another technique may be ideal. Understand your data, understand the problem and learn with each situation.