It is a model to predict future values based on previous forecasting values. Making predictions about the longer term is named extrapolation within the classical statistical handling of your time series data.
Forecasting Process
Define Goal
Get data
Explore and visualize series
Pre-process data
Partition series
Apply forecasting methods
Evaluate and compare performance
Implement forecast/systems
Goal Definition
Includes purpose, type, cost of forecast errors, data to be available in future.
Descriptive Analysis
Predictive Analysis
Descriptive Analysis
Determine components and relations
Models with explanation
Retrospective in nature
Predictive Analysis
Forecast future values
High accuracy
Basic Notations
t Time Period
yₜ value of series at time t(actual value)
Fₜ forecast value
Fₜ ₊ ₖ k-step ahead forecast
eₜ forecast error for time t(inaccuracy of forecast) = (yₜ - Fₜ)
k Forecast horizon(for large horizon, large uncertainty and less accuracy)
Some Important Concepts
White Noise
It is the sequence of random numbers. If the series is of white noise, then we can’t forecast or predict the future values of the series. Hence the series should not be of white noise, but the errors should be.
Random walk/Drunken walk
Similar to randomly taking steps in any direction, but from here we can make sure that the next step will be near the previous step. In this case we can use naive forecasting which predicts the next value by using previous values as the forecast.
Decomposing Time Series in python
Additive Model
y(t) = Level + Trend + Seasonality + Noise
Multiplicative Model
y(t) = Level * Trend * Seasonality * Noise
from statsmodels.tsa.seasonal import seasonal_decompose
miles_decomp_df.head()
Month MilesMM
1963-01-01 6827
1963-02-01 6178
1963-03-01 7084
1963-04-01 8162
1963-05-01 8462
miles_decomp_df.index = miles_decomp_df['Month']
result = seasonal_decompose(miles_decomp_df['MilesMM'], model='additive')
result.plot()
result2 = seasonal_decompose(miles_decomp_df['MilesMM'], model='multiplicative')
result2.plot()
Differencing
A simple and popular method for removing trends and seasonality from a series is given by differencing. This suggests taking the difference between two consecutive values during a series.
Lag 1 Differencing Yₜ - Yₜ ₋ ₁
Lag K Differencing Yₜ - Yₜ ₋ ₖ
With one time differencing, we can remove linear trends. For both quadratic and exponential trends, we have another round of lag 1 differencing on the new series. For removing a monthly seasonality pattern on a yearly sales data, we can apply lag 12 differencing. In case, we have both trends and seasonality in data, we have to differentiate twice, once to remove trend and again to remove seasonality.
Differencing in Python
Month MilesMM
1963-01-01 6827
1963-02-01 6178
1963-03-01 7084
1963-04-01 8162
1963-05-01 8462
miles_df['lag1'] = miles_df['MilesMM'].shift(1)
miles_df['MilesMM_diff_1'] = miles_df['MilesMM'].diff(periods=1)
miles_df.head()
Month MilesMM lag1 MilesMM_diff_1
1963-01-01 6827 NaN NaN
1963-02-01 6178 6827.0 -649.0
1963-03-01 7084 6178.0 906.0
1963-04-01 8162 7084.0 1078.0
1963-05-01 8462 8162.0 300.0
miles_df.index = miles_df['Month']
result_a = seasonal_decompose(miles_df['MilesMM'], model='additive')
result_a.plot()
miles_df.index = miles_df['Month']
result_b = seasonal_decompose(miles_df.iloc[1:,3], model='additive')
result_b.plot()
miles_df['MilesMM'].plot()
miles_df['MilesMM_diff_1'].plot()
miles_df['MilesMM_diff_12'] = miles_df['MilesMM_diff_1'].diff(periods=12)
miles_df['MilesMM_diff_12'].plot()
miles_df.head()
Month MilesMM lag1 MilesMM_diff_1 MilesMM_diff_12 1963-01-01 6827 NaN NaN NaN
1963-02-01 6178 6827.0 -649.0 NaN
1963-03-01 7084 6178.0 906.0 NaN
1963-04-01 8162 7084.0 1078.0 NaN
1963-05-01 8462 8162.0 300.0 NaN
Comments