Time Series Introduction with R codes

Posts

1 What is a Time Series

  • A set of observed values ordered in time, or we can say, repeated measurement of something usually with the same fixed interval of time (hourly, weekly, monthly).

  • A collection of observations made sequentially in time[1].

  • If the variable we are measuring is a count variable, we may have a Poisson Time Series (that is for later).

  • A time series TRn is a sequence of real-valued numbers tiR:T=[t1,t2,,tn] where n is the length of T.

Most of the classic statistical theory is based on the assumption of sample randomness and independent observations. On the other hand, time series is just the opposite. Observations are usually dependent on previous values, and their analysis must take into account their temporal order.

For example, a prospective cohort study comparing “injury rate before and after” an implemented program, analyses of time trends, such as Poisson regression and time series analysis, considers the variability that occurs over the study period apart from the change associated with the intervention. They also avoid the loss of information about variability in incidence over time when rates are aggregated into one before and one after rate. The population is its own control.

If previous observations can predict future observations exactly, we have a deterministic process. However, the exact prediction is usually impossible since past values determine only part of the future value. In this case, we say the process is stochastic, and the future values must be seen as a probability conditioned on the past values.

There are many models available to describe the behavior of a particular series. The choice of such a model depends on factors such as the behavior of the phenomenon or the prior knowledge of its nature and the purpose of the analysis.

1.1 Types of Time Series

The R base installation already gives us lots of datasets to work on time-series. For this article I’ll first load the MASS package that contains some of the dataset we will use. We can list all datasets available with the function data() from package utils.

library(MASS)

data() # the output is not shown, but you can check in your RStudio
  1. Measured at regular time intervals (discrete), examples:

    • Economic: stock market;
    plot(EuStockMarkets[, 1], main = "Daily closing prices of DAX", ylab = "Price")

    • Physical/Biological: Pluviometry, DNA;
    plot(((nhtemp) - 32) * 5 / 9, main = "*Average* Yearly Temperatures in New Haven", ylab = "Temperature in  ºC")

    • Marketing: sales per month;
    plot(BJsales, main = "Daily Sales Data (150 days)", ylab = "Sales")

    • Demographics: population per year, car accidents per day;
    plot(austres, main = "Quarterly Time Series of the Number of Australian Residents", ylab = "Residents")

    • Process control: factory measurements like final can weights, quality scores;
    knitr::include_graphics("control_chart.png") # here I borrowed an image, sorry.

    image from package qicharts

  2. Measured at irregular time intervals (events), examples:

    • Point processes: earthquakes (events)
    eq_data <- readr::read_csv("https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_day.csv")
    eq_data$yes <- 0
    plot(eq_data$time, eq_data$yes, main = "USGS Magnitude 2.5+ Earthquakes, Past Day",
         ylab = "Event", xlab = "Time of the day", yaxt = 'n', sub = paste("Data from ", Sys.Date()))

    Data downloaded from usgs.gov

  3. Measured continuously, examples:

    • Binary processes: communication theory (turn-on/turn-off, zeros and ones);
    set.seed(2114)
    binary_data <- unlist(replicate(10, rep(runif(1) < 0.5, floor(runif(1, 10, 20)))))
    binary_data <- stepfun(seq_len(length(binary_data) - 1), binary_data)
    plot(binary_data, main = "Syntetic binary example", ylim = c(0,1),
         ylab = "Value", xlab = "Continuous time", do.points = FALSE, yaxp = c(0, 1, 1))

    • Analog signals: sound, temperature, humidity, ECG1.
    ecg_example <- readr::read_csv("ecg_example.csv") # this data you can find on physionet
    plot.ts(ecg_example, main = "ECG example", ylim = c(-4, 7), yaxp = c(-2, 4, 3),
         ylab = "Value", xlab = "Continuous time (ms)", xlim = c(0, 1000))

    Data from Physionet, record a103l

2 The goals of Time Series Analysis

2.1 Description

Like any kind of data analysis, the first step is to know the data. It is imperative to plot a time series before trying to analyze it. This simple step will show us any noticeable trend or seasonal variation and allow us to spot outliers2 or some turning point in the data, which may require the use of more than one mode to fit each part of the data.

2.2 Explanation

When we have multiple variables collected simultaneously, it may be possible to find some correlation between them. The variation of one time series may explain the variation in another time series. Multiple regression models may be helpful here. We can also convert an input series into an output series by a linear operation and try to understand the relationship between both series.

2.3 Prediction

Using the previously available data, we may want to predict the future values of that series. Historically, the terms “prediction” and “forecasting” may be used interchangeably or not. Thus it is essential to pay attention to how the literature is referring to both terms. Sometimes “prediction” may refer to subjective methods or the procedure to achieve the “forecasting” (the objective method or the actual future values). There is a close relationship between prediction and control problems where manufacturing processes that are going to move off-target can be proactively corrected.

2.4 Control

When the time series is originated from measures of “quality” of a manufacturing process, the objective of the analysis is to control this process. There are specific methods to do such control, and this topic is outside the scope of this article. Further information is available on specific literature called Statistical Quality Control[3].

3 Before Modeling Time Series

3.1 Transformations

  • Stabilize variance: a logarithmic transformation may reduce the variance;
# I'll explain this line once: this configures the output to hold two plots.
# And we store the old config. The last line is when we restore the config.
oldpar <- par(mfrow = c(2, 1), mar = c(3.1, 4.1, 2.1, 1.1))

# Get only the Claims column and let's transform that in a time series, with anual frequency
claims <- ts(Insurance$Claims, start = c(1973, 365.25*3/4), frequency = 365.25)
plot.ts(claims, main = "Numbers of Car Insurance claims (1973, Q3)", ylab = "Claims")
par(mar = c(5.1, 4.1, 0.1, 1.1))
plot(log(claims), ylab = "log(Claims)") # here we log-transform the data

par(oldpar)

  • Make seasonality additive: a logarithmic transformation transforms a multiplicative seasonality into an additive one. However, this will only stabilize the variance if the error term is also multiplicative.
oldpar <- par(mfrow = c(2, 1), mar = c(3.1, 4.1, 2.1, 1.1))

plot(AirPassengers, main = "Monthly totals of international airline passengers", ylab = "Passengers")
par(mar = c(5.1, 4.1, 0.1, 1.1))
plot(log(AirPassengers), ylab = "log(Passengers)")

par(oldpar)

  • Normalize data distribution: data is usually assumed to be normal. Logarithmic and square-root are used; however, they are just special cases of the Box-Cox transformation. The parameters may be estimated by inference, and in general, the transformation cannot overcome all requirements at the same time.
layout(mat = matrix(c(1, 1, 2, 3), ncol = 2, byrow = TRUE)) # just plot format

plot.ts(airquality$Ozone, main = "New York Air Quality Measurements, May to September 1973", ylab = "Ozone", xlab = "Time")
# Here we get the Ozone values, and remove the NA's so we can make 'statistics'
ozone <- ts(na.omit(airquality$Ozone))
hist(ozone, probability = 1); lines(density(ozone), col = "red") # data distribution
hist(log(ozone), probability = 1); lines(density(log(ozone)), col = "red") # fairly normalized

It is interesting to note that Nelson and Granger, in 1979, found little benefit in applying a general Box-Cox transformation in several datasets. Usually, it is advised to apply the least possible transformations, except when the variable has a direct physical interpretation.

3.3 Dealing with Seasons

As for trends, the analysis of seasonal variation depends on whether we want to measure the seasonal effect and/or remove the seasonality.

For a time series with a slight trend, a straightforward estimate of the seasonal effect is to take the average of every January (for example) and subtract (in additive case) or divide by (in multiplicative case) the average of the year.

For a time series with a significant trend, a more robust approach may be taken. For monthly data, we can use:

Sm(xt)=12xt6+xt5+xt4++xt+5+12xt+612

The seasonality can also be eliminated by differencing with lag. For example, with monthly data, we can use the operator 12:

oldpar <- par(mfrow = c(2, 1), mar = c(3.1, 4.1, 2.1, 1.1))

plot(USAccDeaths, main = "Accidental Deaths in the US 1973-1978", ylab = "Accidental Deaths")
par(mar = c(5.1, 4.1, 0.1, 1.1))
# here still a first degree differencing, but with a leap of 12 months.
# See how we remove the seasonality, but not the trend.
plot(diff(USAccDeaths, lag = 12), ylab = "diff(lag = 12)")

par(oldpar)

12xt=xtxt12

See that the example above removed the seasonality, but kept the trend.

3.4 Autocorrelation

It may be the fundamental property of a time series. As pointed before in section 1, time series data has the assumption of non-independence, and autocorrelation is just how we turn this property into an objective and measurable value. Autocorrelation measures the correlation between observations at different time lags. As we may observe, at time zero, the coefficient is one because the observed value totally agrees with itself, and sometimes this coefficient is skipped in some plots. The distribution of these coefficients provides us insight into the probability model that had generated this series. Later in section 4.1, we will write the definition of this property.

3.4.1 The correlogram

The graphic representation of the autocorrelation coefficients is called a correlogram, in which the coefficient r is plotted against the lag k.

Interpretation. Here we offer some general advice:

  • Random series: for a large N, rk0 for all non-zero values of k. Usually, 19 out of 20 of the values of rk lie between ±2/N.

  • Short-term correlation: Stationary series usually presents with a short-term correlation. What we see is a large value for r1, followed by a geometric decay.

  • Alternating series: If the data tends to alternate sequentially around the overall mean, the correlogram will also show this behavior: the value of r1 will be negative, r2 will be positive, and so on.

  • Non-stationary series: If the data has a trend, the values of rk will not come to zero, only for large lag values. This kind of correlogram has little to offer since the trend muffles any other features we may be interested in. Here we can conclude that the correlogram is only helpful after removing any trend (in other words, turn the series stationary).

  • Seasonal fluctuations: If the data contains seasonal fluctuations, the correlogram will display an oscillation of the same frequency.

3.4.2 Testing for randomness

There are valuable tools to test if the data is random or not. For the sake of simplicity, this subject will not be covered in this article. However, testing residuals for randomness is a different problem[1] and will be discussed later.

4 Stochastic Processes

In the real world, most processes have in their structure a random component. The term “stochastic” is a Greek word that means “pertaining to chance.” A more formal definition of a stochastic process is: “a collection of random variables which are ordered in time and defined at a set of time points which may be continuous or discrete”[1].

Most statistical problems are focused on estimating the properties of a population from a sample. In time series, we need to realize that each data point is a “sample” of the “population” at that given time. When we read a value from a sensor (for a plausible example, let us think of an arterial line that measures the blood pressure directly), we read a unique value at that time, not the distribution of values that could be possible read. This infinite set of possible values that could compose our time series is called an ensemble. The actual time series we have is one of the possibly realizations of the stochastic process.

As with other mathematical functions, a simple way to describe the stochastic process (as probability function) is using its moments. The first moment is the mean, and the second moment is the variance (and the autocovariance, for a sequence of random variables).

4.1 Stationary Processes

It is the process that is ready to model. In other words, the previous steps before modeling a time series are to make it stationary. Loosely speaking, a stationary process is a process that has a constant mean, variance, and no periodic variations.

Formally, a we can say a process is strictly stationary if the joint distribution of X(t1),,X(tn) is the same as the joint distribution of X(t1+τ),,X(tn+τ)for allt1,,tn,τ. Strict stationarity implies that

μ(t)=μσ2(t)=σ2

are constants independently of the value of t. In addition, the joint distribution of X(t1) and X(t2) depends only on (t2t1), which is called the lag. Thus the autocovariance function (ACVF) γ(t1,t2) also depends only on (t2t1) and may be written as γ(τ) (the autocovariance coefficient at lag τ).

As the autocovariance coefficient depends on the units in which X(t) is measured, the ACVF is standardized to what is called the autocorrelation function (ACF), which is given by

ρ(τ)=γ(τ)/γ(0)

which measures the correlation between X(t) and X(t+τ).

The reasoning behind the suggestion that the distribution of X(t) should be the same for all t resides in the fact that many processes that converge to an equilibrium as t, which the probability distribution of X(t) does not depend on the initial conditions. With this assumption, after the process has been running for some time, the probability distribution of X(t) will change very little.

Strict stationarity is very restrictive and few processes achieve it.

4.1.1 Second-order Stationarity

A more helpful definition, for practical reasons, is a less restricted definition of stationarity where the mean is constant, and ACVF only depends on the lag. This process is called second-order stationary.

This simplified definition of stationarity will be generally as long as the properties of the processes depend only on its structure as specified by its first and second moments.

4.2 The autocorrelation function

As shown in section 3.4.1, the autocorrelation coefficients are helpful in describing a time series. The autocorrelation function (ACF) is an essential tool for assessing its properties.

Here we will describe the properties of the ACF.

Suppose a stationary stochastic process X(t) has mean μ, variance σ2, ACVF γ(τ), and ACF ρ(τ). Then

ρ(τ)=γ(τ)/γ(0)=γ(τ)/σ2for ρ(0)=1

4.2.1 Property 1

The ACF is an even function of the lag in that

ρ(τ)=ρ(τ)

This property just states that the correlation between X(t) and X(t+τ) is the same as that between X(t) and X(tτ). The result is easily proved using γ(τ)=ρ(τ)σ2 by

γ(τ)=Cov[X(t),X(t+τ)]=Cov[X(tτ),X(t)]sinceX(t)stationary=γ(τ)

4.2.2 Property 2

|ρ(τ)|1. This is the “usual” property of a correlation. It is proved by noting that

Var[λ1X(t)+λ2X(t+τ)]0

for any constants λ1,λ2 since variance is always non-negative. This variance is equal to

λ12Var[X(t)]+λ22Var[X(t+τ)]+2λ1λ2Cov[X(t),X(t+τ)]=(λ12+λ22)σ2+2λ1λ2γ(τ)

When λ1=λ2=1, we find

γ(τ)σ2

so that ρ(τ)1. When λ1=1,λ2=1, we find

σ2γ(τ)

so that ρ(τ)+1

4.2.3 Property 3

Lack of uniqueness. A stochastic process has a unique covariance structure. However, the opposite is not valid. We can find other processes that produce the same ACF, adding another level of difficulty to the sample ACF interpretation. To overcome this problem, we have the invertibility condition that will be described later on Moving average processes.

4.3 Some useful stochastic processes

4.3.1 Purely random process

A process can be called purely random if it is composed of a sequence of random variables {Zt} that are independent and identically distributed. By definition, it has constant mean and variance, given that

γ(k)=Cov(Zt,Zt+k)=0fork=±1,2,

Since the mean and ACVF do not depend on time, the process is second-order stationary. In fact, it also satisfies the condition for a strictly stationary process. The ACF is given by

ρ(k)={1k=00k=±1,±2,

set.seed(2021)
oldpar <- par(mfrow = c(2, 1), mar = c(3.1, 4.1, 2.1, 1.1))

normal_random <- rnorm(500)
plot.ts(normal_random, main = "Purely random series - N(0, 1)", ylab = "Value")
par(mar = c(5.1, 4.1, 0.1, 1.1))
acf(normal_random, lag.max = 1000)

par(oldpar)

In the example above, the ACF function was kept along the entire dataset for academic purposes. Normally it is shown only the 10log10(N/m) lags, where N is the number of observations and m the number of series (from acf {stats} manual).

This type of process is of particular importance as building blocks of more complicated processes such as moving average processes.

4.3.2 Random walk

The Random Walk is a process very similar to the previous process. The difference lies in that the current observation sums the current random variable to the previous observation instead of being independent. The definition is given by

Xt=Xt1+Zt

Usually, the process starts at zero for t=0, so that

X1=Z1

and

Xt=i=1tZi

Then we find that E(Xt)=tμ and that Var(Xt)=tσZ2. As the mean and variance change with t, the process is non-stationary.

set.seed(2021)

random_walk <- cumsum(rnorm(500))
plot.ts(random_walk, main = "Random Walk - N(0, 1)", ylab = "Value")

Meanwhile, the interesting feature is that the first difference of a random walk forms a purely random process, which is therefore stationary.

set.seed(2021)

random_norm <- rnorm(500)
random_walk <- cumsum(random_norm) # same from the last plot
diff_walk <- diff(random_walk)
# below we use 2:500 because with diff, we lose the first observation
all.equal(diff_walk, random_norm[2:500])
## [1] TRUE

4.3.3 Moving average processes

First, not to be confused with the moving average algorithm. The moving average process is a common approach to model a univariate time series. The concept is that the current value Xt depends linearly on q past values of a stochastic process. Another practical way to see this process is imagining the process as a finite impulse applied to a white noise. This impulse “has” affected the q previous values and the current.

The moving average process only remembers the q previous components of the random process3, so it is also limited to q steps in the future. After that, one cannot predict any value without new random values being generated[4].

Here we will say that {Zt} is a process that only generates purely random values with mean zero and variance σZ2. Then the process {Xt} can be said to be a moving average process of order q (abbreviated to a MA(q) process) if

(4.4.3.1)Xt=β0Zt+β1Zt1++βqZtq

where {βi} are constants.

This equation is very similar to a linear regression y=a+bx where the dependent process {Xt} is modeled by an independent process {Zt} (a purely random process). Here we omit, for simplicity, the “intercept,” that may be a constant μ added to the end of the right side of the equation. The “intercept” is the overall mean of this process.

Usually, the random process is scaled so that β0=1. Then we find that

E(Xt)=0Var(Xt)=σZ2i=0qβi2

since the Zs are independent. We also have

γ(k)=Cov(Xt,Xt+k)=Cov(β0Zt++βqZtq,β0Ztk++βqZt+kq)={0k>qσZ2i=0qkβiβi+kk=0,1,,qγ(k)k<0

since

Cov(Zs,Zt)={σZ2s=t0st

As γ(k) does not depend on t, and the mean is constant, the process is second-order stationary for all values of the {βi}. Furthermore, if the Zs are normally distributed, so are the Xs, and we have a strictly stationary normal process.

The ACF of the MA(q) process is given by

ρ(k)={1k=0i=0qkβiβi+k/i=0qβi2k=1,,q0k>qρ(k)k<0

Note that the ACF is “clipped” to zero at lag q. This is a feature of MA processes that we can spot on ACF plots. In practice, the “zero” will be a value below the significance line.

oldpar <- par(mfrow = c(3, 1), mar = c(3.1, 4.1, 2.1, 1.1))
set.seed(2021)

mov_avg <- arima.sim(list(order = c(0,0,1), ma = 0.8), n = 200)
plot.ts(mov_avg, main = "Moving Average MA(1)", ylab = "Value")
par(mar = c(5.1, 4.1, 0.1, 1.1))
acf(mov_avg)
par(mar = c(5.1, 4.1, 0.1, 1.1))
pacf(mov_avg)

par(oldpar)

There is no restriction on {βi} values to produce a stationary MA process. However, as we briefly mentioned at the ACF Property 3, it is desirable that the process is invertible (e.g., Box and Jenkins, 1970, p. 50).

4.3.3.1 First-order process

The invertibility issue is shown below, where two different MA(1) processes results in the same ACF:

 A Xt=Zt+θZt1 B Xt=Zt+1θZt1

We can see the problem better if we express those processes putting Zt, in terms of Xt,Xt1,, we have:

 A Zt=XtθXt1+θ2Xt2 B Zt=Xt1θXt1+1θ2Xt2

In this form, if |θ|<1, the process A converges whereas the process B does not. Thus if |θ|<1, the process A is said to be invertible, whereas the process B is not. This assures that there will be only one MA process for each ACF (uniqueness)

To simplify the expression satisfying the invertibility condition, we can use the backward shift operator B, which is defined by

BjXt=Xtjfor all j

Then equation (4.4.3.1) may be written as

Xt=(β0+β1B++βqBq)Zt=θ(B)Zt

where θ(B) the polynomial representation of the power series of order q in B. A MA process of order q is invertible if the roots of the equation (regarding B as a complex variable and not an operator) all lie outside the unit circle (Box and Jenkins, 1970, p. 50)

θ(B)=β0+β1B++βqBq=0

For example, in the first-order case, MA(1), we have θ(B)=1+θB, which has root B=1/θ. Thus the root is outside the unit circle provided that |θ|<14.

4.3.3.2 General-order case

We can also extend the concept to the general case of MA(q), where we can decompose de polynomial θ(B) as θ(B)=(1+θ1B)(1+θqB). In this case, if all the roots 1/θ1,,1/θq shall lie outside the unit circle, so the process is invertible.

4.3.4 Autoregressive processes

In the previous section about the moving average process, we imagined the process as an experiment where you had an impulse applied to a random process with a finite time span influence. Here we can think of an experiment where this impulse persists in time, but its influence is not visible immediately, but we see it as a repeatable pattern over and over. This also implies that the AR process may not be stationary, in contrast with MA process. Moreover, MA process only “remembers” the previous components of the underlying random process, where the AR process depends directly on the previous observations, hence the prefix “auto” regressive.

Again, we will say that {Zt} is a process that only generates purely random values with mean zero and variance σZ2. Then the process {Xt} can be said to be an autoregressive process of order p if

(4.3.4.1)Xt=α1Xt1++αpXtp+Zt

In contrast with the MA process, the AR process looks like a multiple regression model since Xt is regressed not on independent variables but past values of Xt. An autoregressive process of order p will be abbreviated to an AR(p) process.

4.3.4.1 First-order process

For a better understanding, we will analyze the first-order case, for p=1. Then

(4.3.4.1.1)Xt=αXt1+Zt

The AR(1) is a special case of the Markov process, named after the Russian Andrey Markov. By successive substitution in equation (4.4.4.1.1) we may write

Xt=α(αXt2+Zt1)+Zt=α2(αXt3+Zt2)+αZt1+Zt

and eventually, we find that {Xt}, can be represented as an infinite-order MA process as

(4.3.4.1.2)Xt=Zt+αZt1+α2Zt2+provided1<α<+1

This duality between AR and MA processes is useful for a variety of purposes. The same transformation can be accomplished using the backward shift operator B. Then equation (4.3.4.1.1) may be written

(1αB)Xt=Zt

so that5Xt=Zt/(1αB)=(1+αB+α2B2+)Zt=Zt+αZt1+α2Zt2+same as eq. (4.3.4.1.2)

Comparing with the previous solution for moving average process, we see thatE(Xt)=0Var(Xt)=σZ2(1+α2+α4+)

The variance is finite if we assume that |α|<1, in which case

Var(Xt)=σX2=σZ2/(1α2)

The ACVF is given by

γ(k)=E[XtXt+k]=E[(ΣαiZti)(ΣαjZt+kj)]=σZ2i=0αiαk+ifork0=αkσZ2/(1α2)provided |α|<1=αkσX2

For k<0, we find γ(k)=γ(k). We see that γ(k) is independent of t, thus the AR process of order 1 is second-order stationary given that |α|<1. The ACF is given by

ρ(k)=αkk=0,1,2,

The ACF may also be obtained more simply by assuming a priori that the process is stationary, in which case E(Xt) must be zero, Multiply through equation (4.4.4.1.1) by Xtk and take expectations. Then we find, for k>0, that

γ(k)=αγ(k+1)

assuming that E(ZtXtk)=0 for k>0. Since γ(k) is an even function, we must also have

γ(k)=αγ(k1)fork>0

Now γ(0)=σX2, and so γ(k)=αkσX2 for k0. This means that, keeping the same restrictions, ρ(k)=αk. But if |α|=1, then |ρ(k)|=1 for all k, which is a degenerate case. Thus |α|<1 is required for a proper stationary process.

The above method of obtaining the ACF commonly used, over the assumption that the time series is stationary.

oldpar <- par(mfrow = c(3, 1), mar = c(3.1, 4.1, 2.1, 1.1))
set.seed(2021)

auto_reg <- arima.sim(list(order = c(1,0,0), ar = 0.8), n = 200)
plot.ts(auto_reg, main = "Autoregressive AR(1)", ylab = "Value")
par(mar = c(5.1, 4.1, 0.1, 1.1))
acf(auto_reg)
par(mar = c(5.1, 4.1, 0.1, 1.1))
pacf(auto_reg)

par(oldpar)

4.3.4.2 General-order case

In the general-order case, the same property of the first-order case stands true: an AR process of finite order can be represented as a MA process of infinite order. We can use the same methods as before, by successive substitution or by using the backward shift operator. Then equation (4.3.4.1) may be written as

(1α1BαpBp)Xt=Zt

or

Xt=Zt/(1α1BαpBp)=f(B)Zt

where

f(B)=(1α1BαpBp)1=(1+β1B+β2B2+)

The relationship between the αs and the βs may then be found. Having expressed Xt as a MA process, it follows that E(Xt)=0. The variance is finite provided that Σβi2 converges, and this is a necessary condition for stationarity. The ACVF is given by

γ(k)=σZ2i=0βiβi+kwhereβ0=1

We can simply state that if Σ|βi| converges, the process is stationary.

Yule-Walker equations

We can, in principle, find the ACF of the general-order AR process using the above procedure, but the {βi} may be hard to find by algebraic methods. We can simplify this by assuming the process is stationary and multiply through equation (4.3.4.1) by Xtk, take expectations, and divide by σX2, assuming that the variance of Xt, is finite. Then, using the fact that ρ(k)=ρ(k) for all k, we have

ρ(k)=α1ρ(k1)++αpρ(kp)for all k>0

These equations composes the group of equations called the Yule-Walker equations named after G. Yule and G. Walker. Which has the general form

ρ(k)=A1π1|k|++Apπp|k|

where {πi} are the roots of the auxiliary equation

ypα1yp1αp=0

The constants {Ai} must satisfy the initial condition of that ΣAi=1, depending on ρ(0)=1.

Stationarity conditions

From the general form of ρ(k), it is clear that ρ(k) tends to zero as k increases provided that |πi|<1 for all i, and this is enough for the AR(p) process to be stationary.

We can also say that if the roots of the following equation lie outside the unit circle the process is stationary (Box and Jenkins, 1970, Section 3.2)

(4.3.4.2.1)ϕ(B)=1α1BαpBp=0

Of particular interest is the AR(2) process, when π1, π2 are the roots of the quadratic equation

y2α1yα2=0

Thus if |πi|<1 if

|α1±(α12+4α2)2|<1

from which the stationarity region is the triangular region satisfying

α1+α2<1α1α2>1α2>1

The roots are real if α12+4α2>0, in which case the ACF decreases exponentially with k, but the roots are complex if α12+4α2<0, in which case we find that the ACF is a damped sinusoidal wave.

When the roots are real, the constants A1, A2 are also real and are found as follows. Since ρ(0)=1, we have

A1+A2=1

From the first of the Yule-Walker equations, we have

ρ(1)=α1ρ(0)+α2ρ(1)=α1+α2ρ(1)

Thus

ρ(1)=α1/(1α2)=A1π1+A2π2=A1π1+(1A1)π2

Hence we find

A1=[α1/(1α2)π2]/(π1π2)A2=1A1

The AR processes are useful to model time series that we assume that the current observation depends on the immediate past values plus a random error. It is usual to assume that the mean of the process is zero, as a way to improve computation. In reality this is not true for the observed values. We can turn the process a zero-mean process by rewriting equation (4.3.4.1) in the form

Xtμ=α1(Xt1μ)++αp(Xtpμ)+Zt

This does not affect the ACF.

4.3.5 Mixed ARMA models

Using the previous knowledge of MA and AR processes, and their relations, we can combine both into a mixed autoregressive/moving-average process containing p AR terms and q MA terms. This is the ARMA process of order (p,q), and it is given by

(4.3.5.1)Xt=α1Xt1++αpXtp+Zt+β1Zt1++βqZtq

Using the backward shift operator B, equation (4.3.5.1) may be written in the form

(4.3.5.1a)ϕ(B)Xt=θ(B)Zt

where ϕ(B), θ(B) are polynomials of order p, q respectively, such that

ϕ(B)=1α1BαpBp

and

θ(B)=1+β1B++βqBq

4.3.5.1 Stationarity and invertibility conditions

The values of {αi} which makes the AR process stationary must be such that the roots of

ϕ(B)=0

lie outside the unit circle.

While the values of {βi} which makes the MA process invertible are such that the roots of

θ(B)=0

lie outside the unit circle.

In this article we will not explain how to compute the ACF for an ARMA process. It is not obscure, but tedious. You can find it on Box and Jenkins, 1970, Section 3.4).

But in short, for the ARMA(1,1) case, we have

ρ(k)=αρ(k1)k2

set.seed(2021)
oldpar <- par(mfrow = c(3, 1), mar = c(3.1, 4.1, 2.1, 1.1))

stoch_arma <- arima.sim(list(order = c(1,0,1), ma = 0.8, ar = 0.8), n = 200)
plot.ts(stoch_arma, main = "ARMA AR(1) MA(1)", ylab = "Value")
par(mar = c(5.1, 4.1, 0.1, 1.1))
acf(stoch_arma)
par(mar = c(5.1, 4.1, 0.1, 1.1))
pacf(stoch_arma)

par(oldpar)

And the general case, we have

ρ(k)=(1+αβ)(α+β)1+2αβ+β2αk1k1

Our primary objective here is to see how we can describe a stationary time series using an ARMA model using fewer parameters than if we used a MA or AR process alone. This is also known as the Principle of Parsimony, where it means that we want a model with fewer parameters possible that still represents our data adequately.

4.3.5.2 The AR and MA representations

It is clearer to express an ARMA model as a pure MA process in the form

(4.3.5.1b)Xt=ψ(B)Zt

Or a pure AR process in the form

(4.3.5.1c)π(B)Xt=Zt

where ψ(B)=Σi>0ψiBi is the MA operator that may be of infinite order. The ψ weights can be used to make predictions and assess the validity of the model.

Moving around the functions we can see lots of equities: ψ(B)=θ(B)/ϕ(B) and π(B)=ϕ(B)/θ(B). Also π(B)ψ(B)=1 and ψ(B)ϕ(B)=θ(B).

By convention we write π(B)=1i1πiBi, since the natural way to write an AR model is in the form

Xt=i=1πiXti+Zt

4.3.6 Integrated ARIMA models

In real life, most of the time series we have are non-stationary. This means we have to first remove this source of variation before working with the models we have seen until now, or, we use another composition that already takes in account the non-stationarity. As suggested in section 3.2.3, we can difference the time series to turn it stationary.

Formally we replace Xt by dXt where d is how many times we take the difference () of Xt. This model is called an “integrated” model because the fitted model on the differenced data needs to be summed (or “integrated”) to fit the original data.

Here we define the ARIMA model as

Wt=dXt=(1B)dXtdN0

the general autoregressive integrated moving average process (abbreviated ARIMA process) is of the form

(4.3.6.1)Wt=α1Wt1++αpWtp+Zt++βqZtq

By analogy with equation (4.3.5.1a), we may write equation (4.3.6.1) in the form

(4.3.6.1a)ϕ(B)Wt=θ(B)Zt

or

(4.3.6.1b)ϕ(B)(1B)dXt=θ(B)Zt

oldpar <- par(mfrow = c(3, 1), mar = c(3.1, 4.1, 2.1, 1.1))
set.seed(2021)

stoch_arma <- arima.sim(list(order = c(2,1,2), ma = c(-0.2279, 0.2488), ar = c(0.8897, -0.4858)), n = 200)
plot.ts(stoch_arma, main = "ARIMA(2, 1, 2)", ylab = "Value")
par(mar = c(5.1, 4.1, 0.1, 1.1))
acf(stoch_arma)
par(mar = c(5.1, 4.1, 0.1, 1.1))
pacf(stoch_arma)

par(oldpar)

Thus we have an ARIMA process of order (p,d,q). The model for Xt is clearly non-stationary, as the AR operator ϕ(B)(1B)d has d roots on the unit circle. Just for curiosity, see that the random walk process can be modeled in the form of an ARIMA(0,1,0) process.

Glossary

E

Expected value or expectation

B

Backwards shift operator: BXt=Xt1

τ

Period of time, lag

λ

Transformation parameter or some other constant

μ

Mean

σ2

Variance

rk

Correlogram coefficient at lag index ‘k’

Difference operator

γ(τ)

Autocovariance of a period of time: γ(t1,t2)

ρ(τ)

Autocorrelation of a period of time: γ(τ)/γ(0)

References

1.
Chatfield, C.: The Analysis of Time Series: An Introduction. Chapman and Hall/CRC (1996).
2.
3.
Montgomery, D.: Introduction to Statistical Quality Control. Wiley (2012).
4.
Siegel, A.F.: Time Series. In: Practical Business Statistics. pp. 431–466 Elsevier (2016). https://doi.org/10.1016/b978-0-12-804250-2.00014-6.

  1. Usually, analog signals are “digitized” by reading the value at discrete intervals. The pitfalls of this approach are beyond this article.↩︎

  2. Treatment of outliers is another field of research and out of the scope of this article. An outlier may be a perfectly valid but extreme value, or a sensor that has gone entirely wild, or even a strange observation that may or not repeat for some reason. So common sense is as important as the theory.↩︎

  3. For those looking for more trouble: the moving average process, despite what was said about the moving average algorithm, is, in fact, a (weighted) moving average of a random process! Try this at home: simulate a MA(2) process with coefficients 1 (so we skip the weighted part). Then, apply a rollmean() function with window 3 to the random process used in the simulation (yes, window is 3, because we need to consider the current value). Then compare both resulting time series. If all correct, you will have the same values. Tip: you may need to align the starting value and multiply the rollmean() output by 2 (the order of the moving average process).↩︎

  4. In the R language, we often see the inverse roots when we plot an ARIMA model. In this case, we will see the roots inside the unit circle when the process is invertible.↩︎

  5. Remember the power series where a1x=n=0axn. Thus we have Zt1(αB)=n=0Zt(αB)n↩︎