Evaluation Metrics for Regression Models in Machine Learning

In Regression problems, we map input variables with the continuous output variable(s). For example, predicting the share price in the stock market, predicting atmospheric temperature, etc. Based on the various usabilities, much research is going on in this area to build a more accurate model. When we build a solution for any regression problem, we compare its performance with the existing work. But to compare the two works, there should be some standard metric, like measuring distance in meters, plot size in square feet, etc. Similarly, we need to have some standard evaluation metrics to evaluate two regression models.

Popular methods covered in this article:

  1. Mean Absolute Error (MAE)
  2. Mean Absolute Percentage Error (MAPE)
  3. Mean Squared Error (MSE)
  4. Root Mean Squared Error (RMSE)
  5. R-Squared ()
  6. Adjusted R-Squared (Adjusted-R²)

But before moving ahead, let's understand one crucial question.

Why can't we use accuracy for Regression Problems?

Whenever we say that we have built a model, the first question that comes is, "What is the accuracy of our model?". Accuracy is a general term that can be formulated as "Out of all the predictions our model made, how many of them were accurate". As regression problems use supervised data, we know Yactual, and predictions will be considered accurate when Ypredicted is exactly equal to Y_actual. But, in regression problems, we have a continuous target variable. So, if we start evaluating our model on accuracy parameters, we will end up overfitting our model.

Regression model to fit the data

To avoid that, we use other evaluation metrics where we consider our model good even if the predictions are very close to the actual value but not exactly equal to the predictions. Hence we can not measure accuracy here. However, to compare the performance of the regression models, there are some defined metrics based on which we can decide which model is performing better. So let's understand these most common metrics for regression problems.

Mean Absolute Error (MAE)

MAE is a fundamental and most used evaluation metric for regression problems. Here we try to calculate the difference between the actual and predicted values. This difference is termed an error. Let's say Ŷi is the predicted value, and Yi is the actual value. So, we can define error in prediction as Error = Yi-Ŷi. This error can either be positive or negative, but we are more concerned about the magnitude. Hence we take modulus, Error = |Yi-Ŷi|.

If we have N such samples present in the data, the total error would be the sum of errors over all those samples, i.e., Total error = Σ |Yi-Ŷi|. But we can not represent the error in terms of total error as the number of samples can be different in different experiments. Hence, we use the mean of this error. Mean says that whenever we will do inference using this model, the value of Ypredicted will lie in the range of (Ypredicted-MAE) ≤ Ypredicted ≤ (Ypredicted+MAE).


from sklearn.metrics import mean_absolute_error
print("MAE = ",mean_absolute_error(y_true, y_pred))

Mean Absolute Percentage Error (MAPE)

In different research works, it can be observed that when the target variable feature has a single dimension, some research normalizes that feature, and some don't. For example, suppose our target variable can take values in [0–100]. One method kept the feature as it is, and the second method normalized this feature and brought it in the range of [0,1], where 0 represents 0 and 100 represents 1. But in such a scenario, for the same model, the value of MAE would vary. The error in the first method, where we kept the feature as it is, would be higher than the error in the second method. 
To take care of these situations, we can define our error in percentage variation from the actual values. In the equation below, Yi is the actual value, Ŷi is the predicted value, and N is the total number of samples.


from sklearn.metrics import mean_absolute_percentage_error
print("MAPE = ",mean_absolute_percentage_error(y_true, y_pred))

Mean Squared Error (MSE)

MSE is a very popular evaluation metric for regression problems. It is similar to the mean absolute error, but the error is squared here, Error = |Yi-Ŷi|². Similarly, when this squared error is calculated for N samples, the Total Error will beΣ |Yi-Ŷi|². The below formula represent this value as a Mean Squared Error, which reflects the average value of squared error per sample,


from sklearn.metrics import mean_squared_error
print("MSE = ",mean_squared_error(y_true, y_pred))

Root Mean Squared Error (RMSE)

RMSE is the most famous evaluation metric for the regression model. The overall calculation of RMSE is similar to MSE; just the final value is square-rooted as we calculated the square of errors in MSE. We learned in MAE that any new prediction would lie in the range of [Ypredicted-Error, Ypredicted+Error] at the time of inference. In MSE, we squared the error, so we need to calculate the square root to bring it back to the normal stage. That's RMSE for us.


from sklearn.metrics import mean_squared_error
import nunpy as np
print("RMSE = ",np.sqrt(mean_squared_error(y_true, y_pred)))

R-Squared (R²)

Correlation between two variables explains the strength of their relationship. In contrast, R-squared explains to what extent the variance of one variable explains the variance of the second variable. It is also known as the Coefficient of Determination. This metric is interesting and important so let's understand it by an example. Suppose we know the salaries of 10 government employees and want to guess the salary for the 11th employee. Assume that we are basic learners and don't have any idea about machine learning techniques. What will be our most reasonable guess?

Mean of salaries of 10 employees, right? The mean value will be considered as the baseline value. We will calculate the baseline error as the squared difference between actual Y and mean value. Let's call this error TSS (Total Sum Squared).

Now suppose we know ML, and we built a Machine Learning model to predict the salary. After learning better techniques like ML, we assume that we can improve our prediction capability over the naive guess. Hence the total squared prediction error Σ |Yi-Ŷi|² will be lesser than TSS.

R-Square can be calculated using the equation below in which y̅i is the mean value, and Ŷi is the predicted value.

R2 Score

from sklearn.metrics import r2_score
print("R_Squared = ",r2_score(y_true, y_pred))

More r-squared better is the fitment

In theories, the R_squared value will always lie in the range of [0,1], while in practice, values lie in (-∞, 1]. Can you guess when we will have negative R²? 

The problem is with the assumption. We thought the ML would beat the performance when we naively guessed the average, but it did not happen. The reason behind negative R² can be,

  1. Model is not learning the trend that is present in the train data.
  2. Too little data has been used to evaluate the model compared to train data.
  3. Too many outliers are present in the dataset.

R² is a good measure and is widely used in the industry to measure the performance of regression models. But there are serious problems that can misguide machine learning engineers and researchers. If we look carefully, we can change the R² value without changing the model at all. Can we guess how?

We can increase the input features and make our baseline error higher. If there are too many independent variables, the model can overfit, and R² would be high. But on the test data, it will perform poorly.

Adjusted R-Squared (Adjusted R²)

To tackle R²'s problems, researchers formed a new metric that is considered the improvement in R² and is known as adjusted R². In the equation below, N is the total number of data samples, and k is the number of independent variables.

Adjusted R2-score

from sklearn.metrics import r2_score
r_sqr = r2_score(y_true, y_pred)
N = len(y_true)
k = # of independent variable in input features.
print("Adjusted R_Squared = ",(1-r_sqr)*(N-1)/(N-k-1))

Adjusted R-Squared value will always be lesser than the traditional R-Squared value. Whenever we add a new independent variable, it will affect the calculations. So, we can never be misguided with the score now.

Possible Interview Questions

  • What are the standard methods used for evaluating our machine learning model's performance?
  • What is MAPE, and why is it more helpful?
  • What is the range of R² values? What can be its limitations?
  • What people prefer R² rather than adjusted R²?

Recommendation and Conclusion

Industries and Research papers are more inclined toward RMSE or MSE values, so we must compare our results with these parameters. Additionally, there is a slight inclination toward R-Squared values as well as it can be directly correlated with the accuracy. Adjusted R² is the only parameter considering the overfitting problem. But due to the dependency on several independent features, there is no direct library available in most frameworks to calculate it. We hope you enjoyed the article.

Enjoy Learning, Enjoy Algorithms!

More From EnjoyAlgorithms

Our weekly newsletter

Subscribe to get free weekly content on data structure and algorithms, machine learning, system design, oops design and mathematics.

Follow Us:


© 2020 EnjoyAlgorithms Inc.

All rights reserved.