What Is Autoregressive Integrated Moving Average (ARIMA)?
An autoregressive integrated moving average (ARIMA) refers to a statistical analysis model utilizing time series data to understand the data set better or project future trends. It provides a business’s managers with dependable guidelines regarding decision-making associated with supply chains.
Individuals and organizations can create such models in data science and data analytics software, for example, R and Python. The components of an ARIMA model are moving average (MA), autoregression, and integrated. This model is extremely popular among people who use it to forecast demand. Moreover, it is widely used to predict stock prices, the spread of diseases, etc.
Table of contents
- The autoregressive integrated moving average refers to a time series projecting model that enables individuals to estimate future values based on past values. This model’s purpose is to explain data utilizing time series data on past values and make predictions based on linear regression.
- There are three components of ARIMA. They are ‘AR’, ‘I’, and ‘MA.’
- This model offers various advantages. For example, it helps individuals and businesses in short-term forecasting. Moreover, it involves the use of only historical data.
- A noteworthy disadvantage of ARIMA is that one cannot utilize it to predict turning points accurately.
Autoregressive Integrated Moving Average Explained
The autoregressive integrated moving average refers to a kind of regression analysis measuring a dependent variable’s strength relative to the other altering variables. The main objective of this model is to estimate the future price movements of financial markets or securities by considering the differences between the values within a series rather than through actual values.
One must know all the components of the ARIMA model to understand how it works fully. So, let us look at them.
- Autoregression: AR or autoregression is a model showing an altering variable regressing on its own prior, or lagged, values.
- Integrated: Integrated or ‘I’ denotes raw materials differencing that enables the time series data to become stationary, which means the result obtained by subtracting the data values from the prior values replaces the data values.
- Moving Average: Also referred to as MA, it indicates that the outcome or forecast of such a model depends linearly on previous values. Moreover, this means the errors in projecting are linear functions of the prior errors. That said, individuals must note that the moving average or MA models are not the same as statistical moving averages.
The above three components are included in such a model as parameters, which are assigned particular integer values indicating the kind of ARIMA model.
As noted above, the ARIMA components function as parameters, and they have a standard notation. In the case of ARIMA models, the standard notation is ARIMA with d, q, and p, where the integer values substitute for parameters to denote the ARIMA model type utilized. These parameters are defined as follows:
- d: This is the total number of times raw observations in the model are subtracted. One can also refer to it as the ‘degree of differencing.’
- p: This denotes the total number of lag observations. This is the lag order.
- q: Also called the order of the MA, this refers to the size of the MA window.
For instance, linear regression models include the type and number of terms. A value of 0 that can be utilized as a parameter means a specific component must not be utilized in that model. This way, one can construct an ARIMA model to perform its function. Alternatively, individuals may build such a model to perform functions of even simple I, MA, or AR models.
Let us look at a few autoregressive integrated moving average examples to understand the concept better.
A hybrid of ARIMA and LSTM (long-term short memory) was utilized to build a model for forecasting the HIV or human immunodeficiency virus mortality and incidence in territories and nations of East Asia. The ARIMA time series model was utilized via Python’s Stats model package. On the other hand, The LTSM model was employed via Python’s Torch package to predict and train the ARIMA model’s residual values.
The assessment of the hybrid model’s performance took place through the measurement of the persistence, including the root mean square error or RMSE, mean absolute error (MAE), and the mean squared error or MSE. In North Korea, mortality and incidence of HIV increased rapidly. On the other hand, in Mainland China, the estimated incidence was stationary while the mortality dropped rapidly.
Overall, the incidence of HIV in combination with different diseases in the post-neonatal population surged prior to 2010 and then dropped during 2010-2019, while those patients’ mortality decreased in East Asia.
The comparison of an ARIMA model and a model combining the Elman recurrent neural network or ERNN with ARIMA aimed to predict the pertussis incidence in Mainland China. The establishment of the ARIMA and ARIMA-ERNN models took place through the use of SAS (ver. 9.4) and MATLAB (ver. R2019a) software, respectively.
The fitting and projection performances of the model combining ARIMA and ERNN were better than the ARIMA model. Besides offering theoretical support, it is beneficial for decision-making concerning public health.
How To Create?
Individuals must download as much price data as possible to start constructing an ARIMA model for any investment. Once they spot the trends for the price data, they must spot the lowest order of differencing or ‘d’ via the observation of autocorrelations. One must remember that if the lag-1 autocorrelation is negative or 0, the series must be already subtracted. That said, if lag-1 is more than 0, individuals must differentiate the series more.
After that, one needs to determine ‘p’ and ‘q’, which are the order of regression and moving average, by comparing serial correlations and partial autocorrelations. Once individuals find the required information, they can select the model they will utilize.
Pros And Cons
Let us discuss the benefits and limitations of the ARIMA average model:
- This model can help individuals in forecasting for the short term. For example, one can use it to predict a stock’s short-term price movements. Moreover, one can project a business’s sales and interpret the seasonal changes in revenue.
- It helps estimate the effect of new product launches, marketing events, and more.
- This model only requires historical data.
- It utilizes lagged MA for smoothing time series data.
- It does not help one in long-term forecasting.
- This model is poor at forecasting turning points.
- It is computationally expensive.
- This model is unsuitable for time series data with multiple variables.
- Such a model requires a lot of data tuning and processing as individuals must check the autocorrelation, stationarity, and partial correlation of the data besides finding the parameters’ optimal values utilizing grid search or trial and error.
Frequently Asked Questions (FAQs)
One can use this model as a tool to forecast how something will act in the future on the basis of past performance. For example, technical analysts can utilize such a model to predict how an asset, for example, a stock or a mutual fund, will perform in the future.
The difference between the two is the integration component. The ‘I’ or integrated represents the total number of times differencing is necessary for making the time series stationary. The use of ARIMA models is popular for real-life time series analysis as, in most cases, the data are non-stationary and require differencing.
There are two types of ARIMA models — non-seasonal and seasonal.
At least 50 observations are recommended for such a model to cover the seasonal effects and variations.
This article has been a guide to what is Autoregressive Integrated Moving Average. Here, we explain its examples, parameters, pros, cons, and how to create it. You may also find some useful articles here –