- Anisha Mohanty

# Time Series and Forecasting

Updated: Apr 30, 2021

It is a model to predict future values based on previous forecasting values. Making predictions about the longer term is named extrapolation within the classical statistical handling of your time series data.

**Forecasting Process**

Define Goal

Get data

Explore and visualize series

Pre-process data

Partition series

Apply forecasting methods

Evaluate and compare performance

Implement forecast/systems

**Goal Definition**

Includes purpose, type, cost of forecast errors, data to be available in future.

Descriptive Analysis

Predictive Analysis

**Descriptive Analysis**

Determine components and relations

Models with explanation

Retrospective in nature

**Predictive Analysis**

Forecast future values

High accuracy

**Basic Notations**

t Time Period

yₜ value of series at time t(actual value)

Fₜ forecast value

Fₜ ₊ ₖ k-step ahead forecast

eₜ forecast error for time t(inaccuracy of forecast) = (yₜ - Fₜ)

k Forecast horizon(for large horizon, large uncertainty and less accuracy)

**Some Important Concepts**

White Noise

It is the sequence of random numbers. If the series is of white noise, then we can’t forecast or predict the future values of the series. Hence the series should not be of white noise, but the errors should be.

Random walk/Drunken walk

Similar to randomly taking steps in any direction, but from here we can make sure that the next step will be near the previous step. In this case we can use naive forecasting which predicts the next value by using previous values as the forecast.

Decomposing Time Series in python

Additive Model

y(t) = Level + Trend + Seasonality + Noise

Multiplicative Model

y(t) = Level * Trend * Seasonality * Noise

```
from statsmodels.tsa.seasonal import seasonal_decompose
miles_decomp_df.head()
```

**Month MilesMM**

1963-01-01 6827

1963-02-01 6178

1963-03-01 7084

1963-04-01 8162

1963-05-01 8462

```
miles_decomp_df.index = miles_decomp_df['Month']
result = seasonal_decompose(miles_decomp_df['MilesMM'], model='additive')
result.plot()
```

```
result2 = seasonal_decompose(miles_decomp_df['MilesMM'], model='multiplicative')
result2.plot()
```

Differencing

A simple and popular method for removing trends and seasonality from a series is given by differencing. This suggests taking the difference between two consecutive values during a series.

Lag 1 Differencing **Yₜ - Yₜ ₋ ₁**

Lag K Differencing **Yₜ - Yₜ ₋ ₖ**

With one time differencing, we can remove linear trends. For both quadratic and exponential trends, we have another round of lag 1 differencing on the new series. For removing a monthly seasonality pattern on a yearly sales data, we can apply lag 12 differencing. In case, we have both trends and seasonality in data, we have to differentiate twice, once to remove trend and again to remove seasonality.

Differencing in Python

**Month MilesMM**

1963-01-01 6827

1963-02-01 6178

1963-03-01 7084

1963-04-01 8162

1963-05-01 8462

```
miles_df['lag1'] = miles_df['MilesMM'].shift(1)
miles_df['MilesMM_diff_1'] = miles_df['MilesMM'].diff(periods=1)
miles_df.head()
```

**Month MilesMM lag1 MilesMM_diff_1**

1963-01-01 6827 NaN NaN

1963-02-01 6178 6827.0 -649.0

1963-03-01 7084 6178.0 906.0

1963-04-01 8162 7084.0 1078.0

1963-05-01 8462 8162.0 300.0

```
miles_df.index = miles_df['Month']
result_a = seasonal_decompose(miles_df['MilesMM'], model='additive')
result_a.plot()
```

```
miles_df.index = miles_df['Month']
result_b = seasonal_decompose(miles_df.iloc[1:,3], model='additive')
result_b.plot()
```

`miles_df['MilesMM'].plot()`

`miles_df['MilesMM_diff_1'].plot()`

```
miles_df['MilesMM_diff_12'] = miles_df['MilesMM_diff_1'].diff(periods=12)
miles_df['MilesMM_diff_12'].plot()
```

`miles_df.head()`

**Month MilesMM lag1 MilesMM_diff_1 MilesMM_diff_12**
1963-01-01 6827 NaN NaN NaN

1963-02-01 6178 6827.0 -649.0 NaN

1963-03-01 7084 6178.0 906.0 NaN

1963-04-01 8162 7084.0 1078.0 NaN

1963-05-01 8462 8162.0 300.0 NaN