Time Series:

  • A time series is a collection of data points that are recorded in order over time, such as tracking something regularly, such as every day, week, or month.
  • Each point shows how a value changes with time, and the time order is very important in time series data.
  • We can use time series to analyze daily stock prices, energy consumption rates, social media engagement metrics, retail demand, etc.
  • Analyzing time series data yields insights like trends, seasonal patterns, and forecasts of future events that can help generate profits.
  • For example, companies can plan promotions to maximize sales throughout the year by understanding the seasonal trends in demand for retail products.

Components of a Time Series:

A time series typically consists of the following components:

  1. Trend
    A trend represents the long-term progression of the series, i.e, a general direction in which the data is moving over time.

    • It can be upward, downward, or stationary (no trend).

    • The trend can follow different forms: linear, exponential, or nonlinear.

  2. Seasonality
    Seasonality refers to patterns that repeat at regular, fixed intervals due to seasonal factors (e.g., months, quarters, days of the week).

    • These fluctuations are consistent and predictable.

    • For example, retail sales might peak every December due to holiday shopping.

  3. Cyclic Patterns
    Cyclic behavior involves rises and falls in the data not tied to fixed calendar intervals (unlike seasonality).

    • Cycles are typically influenced by economic or business conditions.

    • Their duration is usually longer than one year and irregular in length.

  4. Random (Irregular) Component / Noise
    This represents the unpredictable variation in the data that cannot be attributed to trend, seasonality, or cycles.

    • Often referred to as white noise, this component consists of random shocks or anomalies.

    • In a well-modeled time series, the residuals (errors) should resemble white noise: independent, identically distributed, and mean zero.

Types of Time Series Models:

  1. Moving Average (MA) Model
  • The Moving Average (MA) model captures the dependency between an observation and a residual error from a moving average model applied to lagged observations(past value of the time series).
  • Simply, it helps smooth out a time series by averaging the values over a fixed number of past time points.
  • In an MA model, the current value of the series depends only on past error terms (random shocks).
  • Mathematical form (MA(q)):

Yt=μ+εt1εt−12εt−2++θqεtq

where:

    • εt: white noise
    • θ12,…: MA coefficients

A lag is the amount of time by which a time series is shifted

  1. Autoregressive (AR) Model
  • The Autoregressive (AR) model uses the past values of the time series to predict current and future values.
  • The assumption is that the current value of the series is a linear function of its previous values.
  • Mathematical form (AR(p)):

Yt01Yt−12Yt−2++βpYtp+εt

where:

    • Yt−k: past observations
    • βk ​: AR coefficients
    • εt: white noise
  1. ARMA Model (Autoregressive Moving Average)
  • The ARMA(p, q) model combines both AR and MA components.
  • It is suitable for modeling stationary time series data.
  • Mathematical form:

    • p: order of autoregression
    • q: order of moving average

4. ARIMA Model (Autoregressive Integrated Moving Average)

  • The ARIMA(p, d, q) model extends ARMA by adding a differencing step to make the series stationary, which is essential for many time series models.
  • Suitable for non-stationary univariate time series.

  • It combines:

    • AR: autoregression

    • I: differencing (to remove trend and stabilize the mean)

    • MA: moving average

  • Mathematical Signature:

    • p: number of autoregressive terms (lags)

    • d: number of differences needed to make the series stationary

    • q: the size of the moving average window

ModelStationarity RequiredBased on Past ValuesBased on ErrorsUses Differencing
MA(q)YesNoYesNo
AR(p)YesYesNoNo
ARMA(p,q)YesYesYesNo
ARIMA(p,d,q)NoYesYesYes

5. SARIMA model (Seasonal AutoRegressive Integrated Moving Average):

  • An advanced version of ARIMA that is specifically designed to handle time series data with seasonality patterns that repeat at regular intervals (like every week, month, or quarter).
  • SARIMA = ARIMA + Seasonality

  • The full model is written as: SARIMA(p, d, q) × (P, D, Q, s)

TermMeaning
pNumber of autoregressive terms
dNumber of differences to make data stationary
qNumber of moving average terms
PSeasonal autoregressive order
DSeasonal differencing
QSeasonal moving average order
sThe seasonal period (e.g. 12 for monthly data with yearly seasonality)

6. SARIMAX model(Seasonal AutoRegressive Integrated Moving Average with eXogenous variables):

  • It is an extension of the SARIMA model that allows you to include external variables (called exogenous variables) that might help explain or improve your forecast.
  • The full model is written as SARIMAX(p, d, q) × (P, D, Q, s), exog = X

Stationarity in Time Series:

Stationarity means that a time series’s statistical properties, such as its mean, variance, and autocovariance, remain constant over time.

Why is Stationarity Important?

  1. Stationary processes are easier to model and interpret.

  2. Models like AR, MA, ARMA, and ARIMA are based on the assumption that the series is stationary (or made stationary through differencing).

How to Check for Stationarity?

One way to check is through the Autocorrelation Function (ACF):

  • Autocorrelation measures how similar a time series is to its past values (lagged versions of itself).

  • Plotting autocorrelation values for increasing lags gives a correlogram (ACF plot).

What to look for:

  • In a stationary series, the autocorrelation drops off quickly (usually within a few lags).

  • In a non-stationary series, autocorrelation declines slowly, suggesting long-term dependency.

Check Stationarity:

  • Visual inspection: Plot the series — if it shows obvious trends or changing variance, it’s likely non-stationary.

  • Statistical tests:

    • Augmented Dickey-Fuller (ADF) test

    • KPSS test

These tests formally assess whether a series is stationary.

Autocorrelation:

  • We get to know from autocorrelation how much a time series is related to its past values.
  • It checks if there’s a pattern that repeats over time.

Partial Autocorrelation:

  • Partial autocorrelation shows the direct connection between today’s value and a specific past value, without the influence of the values in between.

Python Implementation for  ARIMA & SARIMAX:

  • MA, AR, and ARMA models are more academic or stepping stones toward understanding ARIMA and SARIMAX
  • In addition to  SARIMA is a subset of SARIMAX. If we use SARIMAX without any exog variables, it works just like SARIMA
  • So, we will  focus on ARIMA  & SARIMAX for implementation
import pandas as pd
import numpy as np
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.statespace.sarimax import SARIMAX
import matplotlib.pyplot as plt

# Loading Data
data = pd.read_csv('/content/drive/MyDrive/AirPassengers.csv',parse_dates=[0],index_col='Month')

ts = data['#Passengers']

# Fit ARIMA model (p, d, q)
model = ARIMA(ts, order=(2, 1, 2))
model_fit = model.fit()

# Forecast next 10 steps
forecast = model_fit.forecast(steps=10)

# Plot original series and forecast
plt.figure(figsize=(10, 5))
plt.plot(ts, label='Original')
plt.plot(forecast.index, forecast, label='Forecast', color='red')
plt.title('ARIMA Forecast')
plt.legend()
plt.show()


# Fit SARIMAX model
sarimax_model = SARIMAX(ts, order=(1, 1, 1), seasonal_order=(1, 1, 1, 12))
sarimax_fit = sarimax_model.fit(disp=False)

# Forecast next 12 steps
sarimax_forecast = sarimax_fit.forecast(steps=12)

# Plot original series and forecast
plt.figure(figsize=(10, 5))
plt.plot(ts, label='Original')
plt.plot(sarimax_forecast.index, sarimax_forecast, label='SARIMAX Forecast', color='green')
plt.title('SARIMAX Forecast')
plt.legend()
plt.show()

Notes:

order=(2, 1, 2) → for ARIMA

PositionParameterMeaning
2pAutoregressive (AR) terms — uses 2 past values
1dDifferencing — subtracts previous value once to make the series stationary
2qMoving Average (MA) terms — uses 2 past error terms

order=(1, 1, 1), seasonal_order=(1, 1, 1, 12) → for SARIMAX

PositionParameterMeaning
1PSeasonal AR (lag of seasonal period, i.e. 12 months)
1DSeasonal differencing (removes seasonal patterns)
1QSeasonal MA (lag of past seasonal residuals)
12sSeasonal period (12 months for monthly data)

Prophet Model:

Prophet is an open-source forecasting tool by Facebook (Meta) built for:

  • Business users and data scientists

  • Handling seasonality, holidays, and trend changes

  • Working well out-of-the-box with minimal tuning

  • The model works as y(t) = trend(t) + seasonality(t) + holiday(t) + error(t)

    • trend(t): growth (linear or logistic)

    • seasonality(t): repeating cycles (like yearly or weekly)

    • holiday(t): impact of known events (e.g., Black Friday)

    • error(t): unpredictable noise

Python Implementation for Prophet Model:

Required Column Names: ds and y
ds (Date Stamp): This must contain the dates of your time series. Prophet uses this column to understand the timeline and seasonality patterns.

y: This must contain the numeric values (target variable) you want to forecast.

from prophet import Prophet
import pandas as pd

# Prepare our data (must be columns: ds = date, y = value)

data.rename(columns={'Month': 'ds', '#Passengers': 'y'}, inplace=True)

# Fit the model
model = Prophet(yearly_seasonality=True)
model.fit(data)

# Make future dataframe (next 12 months)
future = model.make_future_dataframe(periods=12, freq='M')

# Forecast
forecast = model.predict(future)

# Plot
model.plot(forecast)
plt.title('Prophet Model Forecast & Components')
model.plot_components(forecast)

Feature Engineering for Time Series:

Lag Features (Past Values):

  • Capture the value of the target variable from previous time steps.
  • This helps the model understand how the past affects the present.
  • Use for
    • Tree-based models (XGBoost, LightGBM),
    • Neural networks
    • Prophet (only indirectly, via external regressors)
# Creating a new feature that shows the previous time step's value.
data['lag_1'] = data['y'].shift(1)   

# Creating a new feature that shows value from 12 months ago
data['lag_12'] = data['y'].shift(12) 

Rolling/Window Statistics (Smoothing/Trends):

  • Calculate values by averaging (or summarizing) over a moving window of past data.
  • Use for
    • Any machine learning model
    • Also useful for exploratory data analysis
# First Shifts the target column down by 1 row to use only past data
# Second For each row, it looks at the 3 values just before it
# Last Takes the average of those 3 values

data['rolling_mean_3'] = data['y'].shift(1).rolling(window=3).mean()

# Similarly it takes standard deviation of  6 values just before it 

data['rolling_std_6'] = data['y'].shift(1).rolling(window=6).std()

Date-Based Features (Calendar):

  • Break down the timestamp into separate calendar components.
  • Use for
    • All models
    • Prophet already uses month, day, and year internally, but we can add custom date-based regressors if needed
data['month'] = data['ds'].dt.month

data['dayofweek'] = data['ds'].dt.dayofweek

data['year'] = data['ds'].dt.year

data['is_weekend'] = data['dayofweek'].isin([5, 6]).astype(int) #5 means Saturday, 6 means  Sunday

Trend Features (Time Index):

  • A simple number that increases over time is useful for capturing long-term growth or decline.
  • Use for
    • Linear Regression
    • Polynomial Regression
    • Prophet (if trend isn’t captured automatically)
data['t'] = np.arange(len(data))          # 0, 1, 2, 3, ...

 # captures curvature of trend & Introduces a non-linear trend component to the model.
data['t_squared'] = data['t'] ** 2       

External Regressors (Additional Inputs):

  • Other factors that influence the target variable like promotions, weather, or holidays.
  • Use for
    • Prophet
    • Tree-based models
    • Deep learning
data['promo'] = [1 if x in promo_dates else 0 for x in data['ds']]
model = Prophet()
model.add_regressor('promo')
model.fit(data)

Cross-Validation for Time Series:

  • Splitting the data into training and testing sets to evaluate performance without peeking into the future.
# Sklearn Example:

from sklearn.model_selection import TimeSeriesSplit
tscv = TimeSeriesSplit(n_splits=5)

# Prophet Example:

from prophet.diagnostics import cross_validation
df_cv = cross_validation(model, initial='730 days', period='180 days', horizon='365 days')

Evaluation Metrics (Measuring Accuracy)

MetricDescriptionWhen to Use
MAE (Mean Absolute Error)Average of absolute errorsEasy to interpret, doesn’t penalize large errors too harshly
RMSE (Root Mean Squared Error)Penalizes large errorsGood when large errors are very bad (e.g. inventory planning)
MAPE (Mean Absolute Percentage Error)Percentage of errorGreat for business, but fails when values are near 0
SMAPE (Symmetric MAPE)Better version of MAPEAvoids divide-by-zero issue, balances over/under forecasting
from sklearn.metrics import mean_absolute_error, mean_squared_error
import numpy as np

mae = mean_absolute_error(y_true, y_pred)
rmse = np.sqrt(mean_squared_error(y_true, y_pred))
mape = np.mean(np.abs((y_true - y_pred) / y_true)) * 100

# As there is no inbuilt function in sklearn, we need to use this for smape
def smape(y_true, y_pred):
    denominator = (np.abs(y_true) + np.abs(y_pred)) / 2
    diff = np.abs(y_pred - y_true) / denominator
    return np.mean(diff) * 100

smape_score = smape(y_true, y_pred)
print(f"SMAPE: {smape_score:.2f}%")

Let's move to Time Series Forecasting - Machine  Learning Model>>>

Register

Login here

Forgot your password?

ads

ads

I am an enthusiastic advocate for the transformative power of data in the fashion realm. Armed with a strong background in data science, I am committed to revolutionizing the industry by unlocking valuable insights, optimizing processes, and fostering a data-centric culture that propels fashion businesses into a successful and forward-thinking future. - Masud Rana, Certified Data Scientist, IABAC

© Data4Fashion 2023-2025

Developed by: Behostweb.com

Please accept cookies
Accept All Cookies