Evaluation Metrics to Check Performance of Regression Models

We map input variables with the continuous output variable(s) in Regression problems. For example, predicting the stock market's share price, atmospheric temperature, etc. Based on the various usabilities, much research is going on in this area to build a more accurate model. When we make a solution for any regression problem, we compare its performance with the existing work. But to compare the two results, there should be some standard metric, like measuring distance in meters, plot size in square feet, etc. Similarly, we need some standard evaluation metrics to evaluate two regression models.

Popular methods covered in this article:

  1. Mean Absolute Error (MAE)
  2. Mean Absolute Percentage Error (MAPE)
  3. Mean Squared Error (MSE)
  4. Root Mean Squared Error (RMSE)
  5. R-Squared ()
  6. Adjusted R-Squared (Adjusted-R²)

But before moving ahead, let’s understand one crucial question.

Why can’t we use accuracy for Regression Problems?

Whenever we say that we have built a model, the first question that comes is, “What is the accuracy of our model?”. Accuracy is a general term formulated as “Out of all the predictions our model made, how many of them were accurate”. As regression problems use supervised data, we know Yactual, and predictions will be considered accurate when Ypredicted is exactly equal to Y_actual. But, in regression problems, we have a continuous target variable. So, if we start evaluating our model on accuracy parameters, we will end up overfitting our model.

What is a regression model in Machine Learning?

To avoid that, we use other evaluation metrics where we consider our model good even if the predictions are very close to the actual value but can differ from the predictions. However, to compare the performance of the regression models, there are some defined metrics based on which we can decide which model is performing better. So let’s understand these most common metrics for regression problems.

Mean Absolute Error (MAE)

MAE is a fundamental and most used evaluation metric for regression problems. Here we calculate the difference between the actual and predicted values. This difference is termed an error. Let’s say Ŷi is the predicted value, and Yi is the actual value. So, we can define Error in prediction as Error = Yi-Ŷi. This Error can be positive or negative, but we are more concerned about the magnitude. Hence we take modulus, Error = |Yi-Ŷi|.

If we have N such samples present in the data, the total Error would be the sum of errors over all those samples, i.e., Total Error = Σ |Yi-Ŷi|. But we can not represent the Error in terms of total Error as the number of samples can differ in different experiments. Hence, we use the mean of this Error. Mean says that whenever we will do inference using this model, the value of Ypredicted will lie in the range of (Ypredicted-MAE) ≤ Ypredicted ≤ (Ypredicted+MAE).

         1      N
MAE =  ----- *  Σ  | Yi - Ŷi |
         N     i=1
from sklearn.metrics import mean_absolute_error
print("MAE = ",mean_absolute_error(y_true, y_pred))

Mean Absolute Percentage Error (MAPE)

In different research works, it can be observed that when the target variable feature has a single dimension, some research normalizes that feature, and some don’t. For example, suppose our target variable can take values in [0–100]. One method kept the feature as it is, and the second method normalized this feature and brought it in the range of [0,1], where 0 represents 0 and 100 represents 1. But in such a scenario, the value of MAE would vary for the same model. The Error in the first method, where we kept the feature as it is, would be higher than the Error in the second method. 
To take care of these situations, we can define our Error in percentage variation from the actual values. In the equation below, Yi is the actual value, Ŷi is the predicted value, and N is the total number of samples.

         1      N    
MAPE =  ---- *  Σ  |(Yi - Ŷi)/Yi|*100
         N     i=1
from sklearn.metrics import mean_absolute_percentage_error
print("MAPE = ",mean_absolute_percentage_error(y_true, y_pred))

Mean Squared Error (MSE)

MSE is a prevalent evaluation metric for regression problems. It is similar to the mean absolute Error, but the Error is squared here, Error = |Yi-Ŷi|². Similarly, when this squared Error is calculated for N samples, the Total Error will beΣ |Yi-Ŷi|². The below formula represents this value as a Mean Squared Error, which reflects the average value of squared Error per sample,

         1      N
MSE =  ----- *  Σ  (Yi - Ŷi)^2
         N     i=1
from sklearn.metrics import mean_squared_error
print("MSE = ",mean_squared_error(y_true, y_pred))

Root Mean Squared Error (RMSE)

RMSE is the most famous evaluation metric for the regression model. The overall calculation of RMSE is similar to MSE; the final value is square-rooted as we calculated the square of errors in MSE. We learned in MAE that any new prediction would lie in the range of [Ypredicted-Error, Ypredicted+Error] at the time of inference. In MSE, we squared the Error, so we need to calculate the square root to return it to the normal stage. That’s RMSE for us.

               1      N
RMSE = sqrt (----- *  Σ  (Yi - Ŷi)^2)
               N     i=1

### Sqrt is square root
from sklearn.metrics import mean_squared_error
import nunpy as np
print("RMSE = ",np.sqrt(mean_squared_error(y_true, y_pred)))

R-Squared (R²)

The correlation between the two variables explains the strength of their relationship. In contrast, R-squared explains to what extent one variable's variance explains the second variable's variance. It is also known as the Coefficient of Determination. This metric is exciting and essential so let’s understand it by an example. Suppose we know the salaries of 10 government employees and want to guess the salary for the 11th employee. Assume that we are primary learners and don’t have any idea about machine learning techniques. What will be our most reasonable guess?

Mean of salaries of 10 employees. The mean value will be considered the baseline value. We will calculate the baseline error as the squared difference between the actual Y and the mean value. Let’s call this error TSS (Total Sum Squared).

Suppose we know ML and built a Machine Learning model to predict the salary. After learning better techniques like ML, we assume we can improve our prediction capability over the naive guess. Hence the total squared prediction error Σ |Yi-Ŷi|² will be lesser than TSS.

R-Square can be calculated using the equation below in which y̅i is the mean value, and Ŷi is the predicted value.

             Σ (Yi - Ŷi)^2
R^2 =  1 -  ----------------
             Σ (Yi - y̅)^2
from sklearn.metrics import r2_score
print("R_Squared = ",r2_score(y_true, y_pred))

How does R-Squared represents the quality of any regression model?

In theories, the R-squared value will always lie in the range of [0,1], while in practice, values lie in (-∞, 1]. Can you guess when we will have negative R²?

The problem is with the assumption. We thought the ML would beat the performance when we naively guessed the average, but it did not happen. The reason behind negative R² can be,

  1. The model is not learning the trend in the train data.
  2. Too little data has been used to evaluate the model compared to train data.
  3. Too many outliers are present in the dataset.

R² is a good measure and is widely used in the industry to measure the performance of regression models. But there are serious problems that can misguide machine learning engineers and researchers. If we look carefully, we can change the R² value without changing the model. Can we guess how?

We can increase the input features and make our baseline error higher. If too many independent variables exist, the model can overfit, and R² would be high. But on the test data, it will perform poorly.

Adjusted R-Squared (Adjusted R²)

To tackle R²’s problems, researchers formed a new metric that is considered an improvement in R² and is known as adjusted R². In the equation below, N is the total number of data samples, and k is the number of independent variables.

                             N - 1
Adjusted_R-square = 1 - [(-----------)*(1-R^2)]
                           N - k - 1
from sklearn.metrics import r2_score
r_sqr = r2_score(y_true, y_pred)
N = len(y_true)
k = # of independent variable in input features.
print("Adjusted R_Squared = ",(1-r_sqr)*(N-1)/(N-k-1))

Adjusted R-Squared value will always be lesser than the traditional R-Squared value. Adding a new independent variable will affect the calculations and help us better understand the score.

Possible Interview Questions

  • What are the standard methods used for evaluating our machine learning model’s performance?
  • What is MAPE, and why is it more helpful?
  • What is the range of R² values? What can be its limitations?
  • What people prefer R² rather than adjusted R²?

Recommendation and Conclusion

Industries and Research papers are more inclined toward RMSE or MSE values, so we must compare our results with these parameters. Additionally, there is a slight inclination toward R-Squared values, which can be directly correlated with accuracy. Adjusted R² is the only parameter considering the overfitting problem. But due to the dependency on several independent features, most frameworks have no direct library available to calculate it. We hope you enjoyed the article.

Next Blog: Evaluation of classification models

Enjoy Learning, Enjoy Algorithms!

Share Your Insights

More from EnjoyAlgorithms

Self-paced Courses and Blogs

Coding Interview

Machine Learning

System Design

Our Newsletter

Subscribe to get well designed content on data structure and algorithms, machine learning, system design, object orientd programming and math.