Anisha Mohanty

Apr 29, 20212 min

Time Series and Forecasting

Updated: Apr 30, 2021

It is a model to predict future values based on previous forecasting values. Making predictions about the longer term is named extrapolation within the classical statistical handling of your time series data.

Forecasting Process

  1. Define Goal

  2. Get data

  3. Explore and visualize series

  4. Pre-process data

  5. Partition series

  6. Apply forecasting methods

  7. Evaluate and compare performance

  8. Implement forecast/systems

Goal Definition

Includes purpose, type, cost of forecast errors, data to be available in future.

  1. Descriptive Analysis

  2. Predictive Analysis

Descriptive Analysis

  1. Determine components and relations

  2. Models with explanation

  3. Retrospective in nature

Predictive Analysis

  1. Forecast future values

  2. High accuracy

Basic Notations

  • t Time Period

  • yₜ value of series at time t(actual value)

  • Fₜ forecast value

  • Fₜ ₊ ₖ k-step ahead forecast

  • eₜ forecast error for time t(inaccuracy of forecast) = (yₜ - Fₜ)

  • k Forecast horizon(for large horizon, large uncertainty and less accuracy)

Some Important Concepts

  • White Noise

It is the sequence of random numbers. If the series is of white noise, then we can’t forecast or predict the future values of the series. Hence the series should not be of white noise, but the errors should be.

  • Random walk/Drunken walk

Similar to randomly taking steps in any direction, but from here we can make sure that the next step will be near the previous step. In this case we can use naive forecasting which predicts the next value by using previous values as the forecast.

  • Decomposing Time Series in python

Additive Model

y(t) = Level + Trend + Seasonality + Noise

Multiplicative Model

y(t) = Level * Trend * Seasonality * Noise

from statsmodels.tsa.seasonal import seasonal_decompose
 
miles_decomp_df.head()
 

Month MilesMM

1963-01-01 6827

1963-02-01 6178

1963-03-01 7084

1963-04-01 8162

1963-05-01 8462

miles_decomp_df.index = miles_decomp_df['Month']
 
result = seasonal_decompose(miles_decomp_df['MilesMM'], model='additive')
 
result.plot()

result2 = seasonal_decompose(miles_decomp_df['MilesMM'], model='multiplicative')
 
result2.plot()

  • Differencing

A simple and popular method for removing trends and seasonality from a series is given by differencing. This suggests taking the difference between two consecutive values during a series.

Lag 1 Differencing Yₜ - Yₜ ₋ ₁

Lag K Differencing Yₜ - Yₜ ₋ ₖ

With one time differencing, we can remove linear trends. For both quadratic and exponential trends, we have another round of lag 1 differencing on the new series. For removing a monthly seasonality pattern on a yearly sales data, we can apply lag 12 differencing. In case, we have both trends and seasonality in data, we have to differentiate twice, once to remove trend and again to remove seasonality.

  • Differencing in Python


 
Month MilesMM

1963-01-01 6827

1963-02-01 6178

1963-03-01 7084

1963-04-01 8162

1963-05-01 8462

miles_df['lag1'] = miles_df['MilesMM'].shift(1)
 
miles_df['MilesMM_diff_1'] = miles_df['MilesMM'].diff(periods=1)
 
miles_df.head()

Month MilesMM lag1 MilesMM_diff_1

1963-01-01 6827 NaN NaN

1963-02-01 6178 6827.0 -649.0

1963-03-01 7084 6178.0 906.0

1963-04-01 8162 7084.0 1078.0

1963-05-01 8462 8162.0 300.0

miles_df.index = miles_df['Month']
 
result_a = seasonal_decompose(miles_df['MilesMM'], model='additive')
 
result_a.plot()

miles_df.index = miles_df['Month']
 
result_b = seasonal_decompose(miles_df.iloc[1:,3], model='additive')
 
result_b.plot()

miles_df['MilesMM'].plot()

miles_df['MilesMM_diff_1'].plot()

miles_df['MilesMM_diff_12'] = miles_df['MilesMM_diff_1'].diff(periods=12)
 
miles_df['MilesMM_diff_12'].plot()

miles_df.head()

Month MilesMM lag1 MilesMM_diff_1 MilesMM_diff_12
 

 
1963-01-01 6827 NaN NaN NaN

1963-02-01 6178 6827.0 -649.0 NaN

1963-03-01 7084 6178.0 906.0 NaN

1963-04-01 8162 7084.0 1078.0 NaN

1963-05-01 8462 8162.0 300.0 NaN

    1220
    7