Dr. Kay
Time Series, ARIMA, Python
0 comment
10 Dec, 2025
import pandas as pd from datetime import datetime #1. Load the dataset data = pd.read_csv('london_daily_temperature.csv') #2. Extract just the DATE and TX Columns data = pd.DataFrame(data=data, columns=['DATE', 'TX']) #3. convert the DATE column to DateTim data['DATE'] = pd.to_datetime(data['DATE'], format='%Y%m%d') #4. Rename the columns to meaningful names data.rename(columns={'DATE': 'Date', 'TX': 'Temperature'}, inplace=True) #5. Set the index of the dataframe data.set_index('Date', inplace=True)Note that in #5, we set the index of the dataset to the Date column instead of leaving the default integer type index. This is because in a time-series analysis, the date/time when the data is received is generally expected to be unique.
# Decompose the time series data import matplotlib.pyplot as plt from statsmodels.tsa.seasonal import seasonal_decompose decomposition = seasonal_decompose(train['Temperature'], model='additive', period=365) fig, axes = plt.subplots(4, 1, figsize=(12, 12)) # Create a figure and 4 subplots decomposition.observed.plot(ax=axes[0], title='Observed') decomposition.seasonal.plot(ax=axes[1], title='Seasonal Component') decomposition.trend.plot(ax=axes[2], title='Trend Component') decomposition.resid.plot(ax=axes[3], title='Residual Component') plt.tight_layout() plt.show()[caption id="attachment_15165" align="aligncenter" width="1024"]
Time-Series Decomposition[/caption]
# Create and fit an ARIMA model from statsmodels.tsa.arima.model import ARIMA model = ARIMA(train['Temperature'], order=(1, 1, 1)) model_fit = model.fit()
# Create both th original and fitted plot train['Temperature'].plot(figsize=(14, 6), title='Daily Temperature in London') model_fit.fittedvalues.plot(color='red') # fitted plot plt.show()[caption id="attachment_15166" align="aligncenter" width="1024"]
Time Series Fit[/caption]
# Plot the performance for a 5-months slice of data train['Temperature'][(train.index >'2010-01-01') & (train.index <= '2010-05-28')].plot(figsize=(12, 6), label='Original') model_fit.fittedvalues[(train.index > '2010-01-01') & (train.index <= '2010-05-28')].plot(label='Fitted') plt.legend() plt.show()[caption id="attachment_15167" align="aligncenter" width="986"]
Time Series Fit on Data Slice[/caption]
# Access the model metrics from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score import numpy as np mse = mean_squared_error(train['Temperature'], model_fit.fittedvalues) rmse = np.sqrt(mse) mae = mean_absolute_error(train['Temperature'], model_fit.fittedvalues) r2 = r2_score(train['Temperature'], model_fit.fittedvalues) print('MSE:', mse) print('RMSE:', rmse) print('MAE:', mae) print('R2:', r2)The output is given below
MSE: 566.5997912577614 RMSE: 23.80335672248268 MAE: 18.618313872554147 R2: 0.868029565993606
Dr. Kay
0 comment