mean squared error (MSE), the average squared difference between the value observed in a statistical study and the values predicted from a model. When comparing observations with predicted values, it is necessary to square the differences as some data values will be greater than the prediction (and so their differences will be positive) and others will be less (and so their differences will be negative). Given that observations are as likely to be greater than the predicted values as they are to be less, the differences would add to zero. Squaring these differences eliminates this situation.
The formula for the mean squared error is MSE = Σ(yi − pi)2/n, where yi is the ith observed value, pi is the corresponding predicted value for yi, and n is the number of observations. The Σ indicates that a summation is performed over all values of i.
If the prediction passes through all data points, the mean squared error is zero. As the distance between the data points and the associated values from the model increase, the mean squared error increases. Thus, a model with a lower mean squared error more accurately predicts dependent values for independent variable values.
For example, if temperature data is studied, forecast temperatures often differ from the actual temperatures. To measure the error in this data, mean squared error can be calculated. Here, it is not necessarily the case that actual differences will add to zero, as predicted temperatures are based on changing models for the weather in an area, and so the differences are based on a moving model used for predictions. The table below shows the actual monthly temperature in Fahrenheit, the predicted temperature, the error, and the square of the error.
Month | Actual | Predicted | Error | Squared Error |
---|---|---|---|---|
January | 42 | 46 | −4 | 16 |
February | 51 | 48 | 3 | 9 |
March | 53 | 55 | −2 | 4 |
April | 68 | 73 | −5 | 25 |
May | 74 | 77 | −3 | 9 |
June | 81 | 83 | −2 | 4 |
July | 88 | 87 | 1 | 1 |
August | 85 | 85 | 0 | 0 |
September | 79 | 75 | 4 | 16 |
October | 67 | 70 | −3 | 9 |
November | 58 | 55 | 3 | 9 |
December | 43 | 41 | 2 | 4 |
The squared errors are now added to generate the value of the summation in the numerator of the mean squared error formula:Σ(yi − pi)2 = 16 + 9 + 4 + 25 + 9 + 4 + 1 + 0 + 16 + 9 + 9 + 4 = 106. Applying the mean squared error formulaMSE = Σ(yi − pi)2/n = 106/12 = 8.83.
After calculating the mean squared error, one must interpret it. How can a value of 8.83 for the MSE in the above example be interpreted? Is 8.83 close enough to zero to represent a “good” value? Such questions sometimes do not have a simple answer.
However, what can be done in this particular example is to compare the predicted values for various years. If one year had a MSE value of 8.83 and the next year, the MSE value for the same type of data was 5.23, this would show that the methods of prediction in that next year were better than those used in the previous year. While, ideally, a MSE value for predicted and actual values would be zero, in practice, this is almost always not possible. However, the results can be used to evaluate how changes should be made in predicting temperatures.