Time Series Forecasting Using Machine Learning

EnjoyAlgorithms Blog Cover Image

Time Series is a set of observations taken at a specific periodic time. Time Series Forecasting refers to the use of statistical models to predict future values using the previously recorded observations. It is broadly classified into two parts:

  • Univariate Time Series Forecasting: Involves a single variable
  • Multivariate Time Series Forecasting: Involves multiple variables

Building a Statistical Forecasting model often involves a standard procedure. The image below depicts the steps to be carried out to build a forecasting model. We will use the ARIMA algorithm for this tutorial, extensively used in short-run forecasts. 

Let’s dive into data analysis:

Pipeline of building time series model

Data Analysis

For this tutorial, we will be using the Gold Price Forecasting Dataset available on Kaggle. It is a reasonably simple time-series data with “Timestamp” and “Price of Gold” as features. It contains approximately 10.8k rows and covers the last 50 years of the Gold Price Trend. 

Let’s take a look at the trend.

import pandas as pd
import plotly.express as px
gold_price = pd.read_csv("gold_price_data.csv")
fig = px.line(gold_price, x='Date', y="Gold Price")
fig.show()

Kaggle gold price Data visualization

Gold Price in USD/oz

Time Series Decomposition 

Understanding a whole time series could be a challenging task. Fortunately, the time series can be decomposed into a combination of trend, seasonality, and noise components. These components provide a valuable conceptual model for thinking about a time series prediction. 

  • Additive Decomposition: Assumes that the time series is a linear combination of Trend, Seasonality, and Residual. 

Additive decomposition

  • Multiplicative Decomposition: Assumes that the time series is a product of Trend, Seasonality, and Residual component.

Multiplicative decomposition

Additive Decomposition is most suitable when the absolute value of the seasonal fluctuation does not change with the level of time series. On the contrary, when the variation in the seasonal pattern changes according to the level of time series, then multiplicative decomposition is preferred. A multiplicative time series can be converted into an additive series using the log transformation. 

Yt = Tt x St x Rt
log(Yt) = log(Tt) + log(St) + log(Rt)

Let’s decompose our time series!

series = gold_price["Gold Price"]
result = seasonal_decompose(series, model='additive', period=120)
sns.set()  
result.plot()

Decomposed components of time series data

Stationary Test

Stationary vs non stationary time series

Before applying any forecasting models to a time series dataset, the time series needs to be stationary. While forecasting the future, the forecasting model assumes the time series is time-invariant to its mean, variance, and autocorrelation. In other words, for a series to be stationary, its mean, variance, and autocorrelation must not change with time. 

Several methods are available for testing the stationary nature of a time series. One such method is Dickey-Fuller Test. 

Dickey-Fuller Test:

In the null hypothesis, we first consider that time-series data is non-static and then calculate the Dickey-Fuller Test (ADF) value. If the test statistics of the Dickey-Fuller Test (ADF) is less than the critical value(s), then reject the null hypothesis of non-stationary (Series is stationary). On the contrary, if the ADF is greater than the critical value(s), we failed to reject the null hypothesis (Series is Non-stationary).

Let’s test the stationarity of a time series using the Dickey-Fuller Test:

from statsmodels.tsa.stattools import adfuller
result = adfuller(gold_price["Gold Price"])
print('ADF Statistic: %f' % result[0])
print('p-value: %f' % result[1])
print('Critical Values:')
for key, value in result[4].items():
  print('\t%s: %.3f' % (key, value))
if result[0] < result[4]["5%"]:
  print("Reject Ho - Time Series is Stationary!")
else:
  print("Failed to reject Ho - Time Series is not Stationary!")

ADF values for gold data

The bad news is, our series is non-stationary, and this is evident in the trend itself, but it is always a good practice to verify things. We need to apply some operations to make the series stationary. There are several options available to make a series stationary:

  • Log Transformation
  • Power Transformation
  • First-Order Differencing
  • Seasonal Differencing

There’s no perfect way of making a series stationary. Finding an optimum transformation requires an empirical process of hit and try. 

Let’s implement Log Transformation to our time series:

import numpy as np
gold_price["log_series"] = np.log(gold_price["Gold Price"])
rm = gold_price["log_series"].rolling(window=24,center=False).mean()
rm.dropna(inplace = True)
fig = px.line(rm)
fig.show()

Log transformed stationary gold data

ARIMA

Now that we have a stationary series, we can move ahead with our forecasting models. We will be using the ARIMA model, which stands for Auto-Regressive Integrative Moving Average for forecasting. It is a generalized version of the ARMA model and simply a combination of two distinct Auto-Regressive & Moving Average models. 

Let’s understand it component-wise:

Auto-Regressive (AR(p)): An Auto-Regressive model predicts the value at the current timestamp using the regression equation made from values at previous timestamps. Only past data is used for predicting the value at the current timestamp. Series having autocorrelation indicates the requirement of the auto-regressive component in ARIMA. 

Auto regressive equation

Integrative (I(d)): This component helps in making the series stationary using the differencing. Differencing is simply an operation where an observation is subtracted from the observation at the previous time step. This operation makes the time series stationary (removes trend and seasonality components from time series). 

Moving Average (MA(q)): A Moving Average model uses the errors from past forecasts in a regression equation. This model assumes that the value at the current timestamp can be predicted using the errors from the past forecasts.

Moving average equation

Integrate all these components to form the ARIMA equation:

ARIMA combined equation

Source: Towards Data Science

How to find the p, d, q values? 

While using an auto-ARIMA model, finding the optimum seasonal auto-regressive component(p), differencing component(d), and seasonal moving average component(q) is a lot easier. However, we are interested in manually determining these p, d, q components. 

Following are some ways of determining the p, d, & q parameters of ARIMA:

  • The partial autocorrelation plot helps determine the optimal set of q parameters for the Moving Average model. 
  • The autocorrelation plot helps determine the optimal set of p parameters for the Auto-Regressive model.
  • An extended autocorrelation plot of the data confirms whether the combination of the AR and MA terms is required for forecasting.
  • Akaike’s Information Criterion (AIC) assists in determining the optimal set of p, d, q. Usually, the model with a smaller absolute value of AIC is preferred.
  • Schwartz Bayesian Information Criterion (BIC) is another alternative of AIC and, lower BIC is better for selecting the optimal p, d, q.

Finding the set of p values:

from pylab import rcParams
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
rcParams['figure.figsize'] = 16,5
plot_acf(gold_price["log_series"].diff().dropna())
plot_pacf(gold_price["log_series"].diff().dropna())
plt.show()

ACF

ACF

P-ACF

P-ACF

We are not concerned about the spikes at the 0th lag as it simply represents the self-correlation which will always be 1. However, there’s a short spike at lag 1, and it is above the diminishing significant level so, let’s consider it.

The final candidate for both p and q is 1 each, and differencing is still required at 1st order so let’s also consider it and build our ARIMA forecasting model (1, 1, 1).

from statsmodels.tsa.arima.model import ARIMA
train = new_df["log_series"][:17500]
test = new_df["log_series"][17500:]
model = ARIMA(train, order=(1,1,1))
model_fit = model.fit()
print(model_fit.summary())
 
fc, se, conf = model_fit.forecast(steps=3, alpha=0.05)
fc_series = pd.Series(fc, index=test.index)
 
f, ax = plt.subplots(figsize=(16, 4))
sns.lineplot(data=new_df, x="Date", y=test, label='Actual')
sns.lineplot(data=new_df, x="Date", y=train, color='orange', label='Train')
sns.lineplot(data=new_df, x="Date", y=fc_series, color='g', label='Forecast')
 plt.show()

Forecast results of gold price

Plot of Forecast results of gold price

ARIMA (1, 1, 1) Forecast

On the contrary, AIC and BIC are pretty high, but our ARIMA order is optimum, as per the ACF and PACF. Further improvements can be added using the Box Jenkins method. It is used to find the best fit of ARIMA models, but the process is quite involved and requires some prerequisites to implement. 

Strengths & Limitations of ARIMA 

Limitations:

  • Forecasts are unreliable for an extended window
  • Data needs to be univariate 
  • Data should be stationary 
  • Outliers are challenging to forecast
  • Poor at predicting Turning Points

Strengths:

  • Highly reliable Forecasts for a short window 
  • Short-run forecasts frequently outperform the results from complex models, but that also depends on data
  • Easy to implement 
  • Unbiased Forecast
  • Realistic Confidence Intervals
  • High Interpretability

More Forecasting Models

  • Exponential Smoothing
  • Dynamic Linear Model
  • Linear Regression
  • Neural Network Models

Possible Interview Questions

Time series problems are quite famous and very useful across different industries. Interviewers ask questions on time series in two cases, 

  • If we have written some project on time series in our resume.
  • If the interviewer wants to hire you for the time series project.

Questions on this topic will be either generic, covered in this blog, or very specific to our projects. Possible questions on this topic can be:

  1. What is time-series data, and what makes it different from other datasets?
  2. What is a forecasting technique? Can you name some popular applications of it?
  3. Why do we need to decompose the time-series dataset, and what are the possible ways?
  4. What does the stationary test signify in time-series datasets?
  5. What is the ARIMA model, and what do we do to find the value of parameters involved in this algorithm?

Conclusion

In this article, we discussed the essential details about the time series data and forecasting models. We played with the real-world data of Gold price, in which we learned stationary testing, log transformation, and data decomposition techniques. After that, we built and evaluated our ARIMA model on that. We hope you enjoyed the article.

Enjoy Learning! Enjoy Algorithms!

We'd love to hear from you

More content from EnjoyAlgorithms

Our weekly newsletter

Subscribe to get free weekly content on data structure and algorithms, machine learning, system design, oops and math. enjoy learning!