Time Series Models vs. DSGE: A Forecasting Horse Race
“All models are wrong, but some are useful. The right question is not which model is true, but which model forecasts better — and under what conditions.” — George Box (paraphrased)
Cross-reference: Principles Ch. 39 (future of macroeconomics: forecasting, ML, model uncertainty); Appendix B (DSGE forecasting, real-time data) [P:Ch.39, P:AppB]
39.1 The Forecasting Problem¶
Macroeconomic forecasting — predicting GDP growth, inflation, and unemployment — is one of the most practically important applications of the methods in this book. Central banks use forecasts to set policy rates; firms use them to make investment decisions; governments use them to project revenues and plan spending.
The formal forecasting problem: Given observations up to time , produce forecasts for horizons .
Competing approaches:
Reduced-form time series: BVAR with Minnesota prior, time-varying parameter VAR, factor models.
DSGE models: The Kalman filter prediction step from Chapter 20 generates -step-ahead forecasts.
Machine learning: LASSO, Ridge, random forests, neural networks applied to large datasets (FRED-MD: 130 monthly series).
This chapter conducts a horse race: all three approaches on the same data, evaluated by the same metrics.
39.2 Forecast Evaluation Metrics¶
Definition 39.1 (Root Mean Squared Error). For a sequence of -step-ahead forecasts evaluated over :
Definition 39.2 (Continuous Ranked Probability Score). For density forecasts :
The CRPS reduces to RMSE for point forecasts. It penalizes both poor location (bias) and poor spread (overconfidence or underconfidence). Lower is better.
39.3 BVAR with Minnesota Prior¶
Definition 39.3 (Minnesota Prior). The Minnesota prior (Doan, Litterman, and Sims, 1984) for a VAR() with variables imposes:
Own lags: coefficient on lag of variable in equation has prior , where for I(1) variables (random walk prior) and for I(0).
Cross-variable lags: coefficient on lag of variable in equation has prior , where is the overall tightness.
Intercepts: loose prior centered at zero.
The Minnesota prior shrinks coefficients toward the random walk () or zero (), with shrinkage increasing at longer lags. This regularizes the VAR for forecasting.
Theorem 39.1 (Minnesota Prior as Augmented Regression). The Minnesota prior posterior for the VAR coefficients is equivalent to OLS on augmented data where dummy observations encode the prior beliefs:
In APL: B_MN ← (⌹ X_aug) +.× Y_aug — one expression for the BVAR estimator.
Proof. The Minnesota prior is conjugate (Normal–Wishart) for the VAR with fixed covariance. The posterior mode equals the OLS estimator on the augmented system where the dummy observations implement the prior beliefs as data. Kadiyala and Karlsson (1997) show this equivalence formally.
Constructing the dummy observations: For the standard Minnesota prior with hyperparameters :
import numpy as np
def minnesota_dummies(Y, p, lambda_tight=0.2, delta=1.0):
"""
Generate Minnesota prior dummy observations.
Y: T×n data matrix, p: lag order, lambda_tight: shrinkage, delta: prior mean on own lags.
Returns Y_dum, X_dum to be prepended to actual data for BVAR.
"""
T, n = Y.shape
sigma = np.std(Y, axis=0) # marginal standard deviations
dummies = []
# Own-lag dummies: one dummy per variable per lag
for j in range(1, p+1):
for i in range(n):
y_d = np.zeros(n); x_d = np.zeros(n*p + 1)
y_d[i] = sigma[i] * j / lambda_tight
x_d[i + (j-1)*n] = sigma[i] * j / lambda_tight
dummies.append((y_d, x_d))
# Sum-of-coefficients dummies
y_d = delta * np.mean(Y[:5], axis=0)
x_d = np.zeros(n*p+1)
for j in range(p): x_d[j*n:(j+1)*n] = delta * np.mean(Y[:5], axis=0)
dummies.append((y_d, x_d))
Y_dum = np.array([d[0] for d in dummies])
X_dum = np.array([d[1] for d in dummies])
return Y_dum, X_dum39.4 DSGE Forecasting via Kalman Filter¶
From the gensys solution (Chapter 28), the NK-DSGE has the state-space form:
with and .
The -step-ahead forecast from the Kalman filter prediction step:
The DSGE forecasts are computed in real time: at each evaluation date , the state is updated by the Kalman filter using all available data, then projected forward steps.
39.5 Machine Learning: LASSO for Macro Forecasting¶
With predictor variables (FRED-MD), standard OLS has an degrees-of-freedom problem. LASSO (Tibshirani, 1996) adds an penalty to shrink many coefficients to exactly zero:
Theorem 39.2 (LASSO Solution via Coordinate Descent). The LASSO objective has a component-wise closed-form update. For each coordinate :
where is the soft-thresholding operator. Cycling through all until convergence gives the LASSO solution.
Proof. The -th component subproblem (with all other coefficients fixed) is , where is the partial residual. This is the Lasso regression of on , with solution by the KKT conditions for the term.
In APL, the soft-threshold operator and coordinate descent step:
⍝ APL — LASSO coordinate descent
⎕IO←0 ⋄ ⎕ML←1
soft_threshold ← {z lam ← ⍵
(×z) × 0⌈(|z)-lam} ⍝ sign(z)*max(|z|-λ, 0)
⍝ One full coordinate descent pass over all K predictors
cd_pass ← {X y lam beta ← ⍵
:For j :In ⍳ ≢beta
r_j ← y - (X[;~j]) +.× beta[~j] ⍝ partial residual
z_j ← (X[;j] +.× r_j) ÷ ≢y ⍝ univariate OLS coefficient
beta[j] ← soft_threshold z_j lam ⍝ soft-threshold
:EndFor
beta}
⍝ LASSO: iterate until convergence
lasso ← {X y lam ← ⍵
K ← (⍴X)[1]
beta0 ← K ⍴ 0
cd_pass∘(X y lam) ⍣ (1e¯6∘>⌈/|⊢-cd_pass∘(X y lam)) ⊢ beta0}39.6 Forecast Combination¶
No single model dominates across all variables, horizons, and sample periods. Forecast combination pools multiple models to reduce forecast risk.
Simple average: .
Optimal linear pool: Minimize s.t. , — a constrained OLS problem.
Bayesian Model Averaging (BMA): Weight each model by its marginal likelihood (posterior probability). For models: .
39.7 The Diebold–Mariano Test¶
Definition 39.4 (Diebold–Mariano Test). The DM test compares the predictive accuracy of two models, and . Define the differential loss , where and is the loss function (e.g., for MSE). The test statistic:
where and is the Newey–West HAC variance estimator.
: equal predictive accuracy. : one model is significantly better.
39.8 Worked Example: U.S. Inflation and GDP Forecasting¶
Data: U.S. quarterly GDP growth and CPI inflation, 1960Q1–2019Q4. Pseudo-real-time evaluation: estimate on rolling 20-year windows, forecast 1–8 quarters ahead.
Python¶
import numpy as np
from sklearn.linear_model import LassoCV
from sklearn.preprocessing import StandardScaler
from scipy.stats import norm
np.random.seed(42)
T_total = 200
T_train = 80
H_horizon = 4
# --- 1. SIMULATE DATA ---
y_gdp = np.zeros(T_total); y_gdp[0] = 2.0
y_infl = np.zeros(T_total); y_infl[0] = 2.0
for t in range(1, T_total):
y_gdp[t] = 0.5 * y_gdp[t-1] + np.random.normal(0, 1.0)
y_infl[t] = 0.6 * y_infl[t-1] + 0.2 * y_gdp[t-1] + np.random.normal(0, 0.5)
rmse = {'AR1': [], 'BVAR': [], 'LASSO': []}
# --- 2. FORECASTING LOOP ---
# We stop at T_total - H_horizon to ensure we have a 'true' value to compare against
for T in range(T_train, T_total - H_horizon):
# Current point is T-1. Target is y_gdp[T + H_horizon - 1]
y_train = y_gdp[:T]
y_infl_train = y_infl[:T]
y_target = y_gdp[T + H_horizon - 1]
# --- AR(1) Forecast ---
# Fit: y_t = b0 + b1*y_{t-1}
ar_coef = np.polyfit(y_train[:-1], y_train[1:], 1)
f_ar = y_train[-1]
for _ in range(H_horizon):
f_ar = ar_coef[0] * f_ar + ar_coef[1]
rmse['AR1'].append((y_target - f_ar)**2)
# --- BVAR (Minnesota Prior via Augmented OLS) ---
p = 2; n_vars = 2
Y = np.column_stack([y_train, y_infl_train])
# Construct X = [Lag1, Lag2, Constant]
X_bvar = np.column_stack([Y[1:-1], Y[:-2], np.ones(T-2)])
Y_dep = Y[2:]
# Minnesota Dummies (Simplified)
sig = np.std(np.diff(Y, axis=0), axis=0)
lam = 0.2 # Tightness
Y_dum = []; X_dum = []
for j in range(1, p + 1):
for i in range(n_vars):
# Prior: random walk (first lag=1, others=0)
row_y = np.zeros(n_vars)
if j == 1: row_y[i] = sig[i] / lam
row_x = np.zeros(n_vars * p + 1)
row_x[i + (j-1)*n_vars] = (sig[i] * j) / lam # Higher lags = tighter shrinkage
Y_dum.append(row_y); X_dum.append(row_x)
Y_aug = np.vstack([np.array(Y_dum), Y_dep])
X_aug = np.vstack([np.array(X_dum), X_bvar])
B = np.linalg.lstsq(X_aug, Y_aug, rcond=None)[0]
# Iterative Forecast
curr_state = np.concatenate([Y[-1], Y[-2], [1]])
for _ in range(H_horizon):
next_vals = curr_state @ B
curr_state = np.concatenate([next_vals, curr_state[:n_vars], [1]])
rmse['BVAR'].append((y_target - curr_state[0])**2)
# --- LASSO (Direct Forecast) ---
# We want to predict y_{t+h} using y_{t}, y_{t-1}...
Lags = 6
# Construct features: rows are time, columns are lags
X_lasso_full = np.column_stack([y_train[i:-(H_horizon+Lags-i-1)] for i in range(Lags)])
y_lasso_target = y_train[H_horizon + Lags - 1:]
if len(y_lasso_target) > 30:
scaler = StandardScaler()
X_sc = scaler.fit_transform(X_lasso_full)
model_lasso = LassoCV(cv=5).fit(X_sc, y_lasso_target)
# Predict using the most recent available lags
x_latest = scaler.transform(y_train[-Lags:].reshape(1, -1))
f_lasso = model_lasso.predict(x_latest)[0]
else:
f_lasso = y_train[-1] # Fallback
rmse['LASSO'].append((y_target - f_lasso)**2)
# --- 3. EVALUATION ---
print("Model Performance (RMSE):")
for model, losses in rmse.items():
print(f"{model:5}: {np.sqrt(np.mean(losses)):.4f}")
# Diebold-Mariano Test (BVAR vs AR1)
d = np.array(rmse['AR1']) - np.array(rmse['BVAR'])
n = len(d)
# Simple NW-style variance
def nw_var(resids, max_lag):
gamma0 = np.var(resids)
acc = 0
for l in range(1, max_lag + 1):
gamma_l = np.mean(resids[l:] * resids[:-l])
acc += 2 * (1 - l/(max_lag+1)) * gamma_l
return (gamma0 + acc) / len(resids)
dm_stat = np.mean(d) / np.sqrt(nw_var(d, H_horizon - 1))
p_val = 2 * (1 - norm.cdf(abs(dm_stat)))
print(f"\nDM Test (BVAR vs AR1): Stat={dm_stat:.3f}, p={p_val:.4f}")Model Performance (RMSE):
AR1 : 1.1415
BVAR : 1.1473
LASSO: 1.1623
DM Test (BVAR vs AR1): Stat=-1.957, p=0.0503
Julia¶
using Statistics, GLMNet
# Julia: LASSO for macro forecasting
function lasso_forecast(y, T_train, H)
X_lags = hcat([y[j:T_train-H+j-1] for j in 1:8]...)
y_target = y[H+1:T_train]
path = glmnet(X_lags, y_target)
# Cross-validate to select lambda
cv = glmnetcv(X_lags, y_target)
lambda_opt = cv.lambda[argmin(cv.meanloss)]
# Predict
x_new = reshape(y[T_train-7:T_train], 1, 8)
pred = GLMNet.predict(path, x_new, outtype=:response)
idx = argmin(abs.(path.lambda .- lambda_opt))
return pred[1, idx]
end
println("Macro forecasting in Julia: use GLMNet.jl for LASSO")
println("Key finding: LASSO often beats AR(1) at 4-8 quarter horizon due to variable selection")
println("BVAR beats LASSO at short horizons due to Minnesota prior regularization")
println("DSGE beats both when model is correctly specified but loses during crises")39.9 Programming Exercises¶
Exercise 39.1 (APL — LASSO Coordinate Descent)¶
Implement the LASSO coordinate descent from Section 39.5 in APL: (a) soft_threshold ← {(×⍺)×0⌈(|⍺)-⍵} as a dfn taking ; (b) one full coordinate descent pass over predictors; (c) iterate until convergence using ⍣≡; (d) test on simulated data with , predictors, half of which are relevant. Verify LASSO selects approximately the right 25 predictors.
Exercise 39.2 (Python — Forecasting Horse Race)¶
Run the full horse race on FRED-MD data (or simulated data with the same structure): (a) three models: AR(4), BVAR(2) with Minnesota prior, LASSO with 8 lags of 20 macro variables; (b) evaluation period: 1990Q1–2019Q4, rolling 40-quarter estimation windows; (c) target: 4-quarter GDP growth and 4-quarter inflation; (d) report RMSE ratios relative to AR(4) and DM test p-values.
Exercise 39.3 (Julia — DSGE Real-Time Forecasting)¶
# Real-time DSGE forecasting via Kalman filter
using LinearAlgebra
function dsge_forecast(A, D, F, H_obs, Q, R, y_obs, h)
"""h-step-ahead forecast from DSGE Kalman filter."""
T, p = size(y_obs,1), size(H_obs,1)
m = size(A, 1)
# Filter (Chapter 20)
alpha = zeros(m); P = 10*I(m)
for t in 1:T
# Predict
alpha_pred = A * alpha
P_pred = A * P * A' + D * Q * D'
# Update
v = y_obs[t,:] - H_obs * alpha_pred
Fv = H_obs * P_pred * H_obs' + R
K = P_pred * H_obs' * inv(Fv)
alpha = alpha_pred + K * v
P = (I(m) - K*H_obs) * P_pred
end
# h-step forecast: alpha_{T+h|T} = A^h * alpha_{T|T}
alpha_fcast = A^h * alpha
return H_obs * alpha_fcast
end
println("DSGE h-step forecast: H_obs * A^h * alpha_filtered")
println("Error grows with h due to uncertainty accumulation:")
for h in [1, 4, 8]
uncertainty_factor = h^0.5 # approx: error ~ sqrt(h) for stable systems
println(" h=$h: relative uncertainty ≈ $(round(uncertainty_factor, digits=2))x baseline")
endExercise 39.4 — Model Uncertainty ()¶
Implement Bayesian Model Averaging (BMA) for the three-model forecast combination: (a) compute the log marginal likelihood for each model using the Laplace approximation; (b) compute BMA weights with equal model priors; (c) compare BMA weights to the ex-post optimal combination weights (OLS on the forecasts). Do the BMA weights select the model that actually forecast best?
39.10 Chapter Summary¶
Key results:
BVAR Minnesota prior: prior on own lags, on cross-lags; equivalent to OLS on augmented data (Theorem 39.1), computed as
B_MN ← (⌹ X_aug) +.× Y_aug.DSGE forecasting: -step forecast from the Kalman filter; valid iff the model is correctly specified (fails during structural breaks).
LASSO coordinate descent (Theorem 39.2): soft-threshold update ; in APL:
{soft_threshold (X[;j]+.×partial_residual÷T) lam}⍣≡.DM test (Theorem implicit): ; tests equal predictive accuracy; uses Newey–West variance.
Findings from the literature: BVAR dominates AR at short horizons (1–2 quarters); LASSO dominates BVAR when many predictors are relevant (large-data settings); DSGE dominates during normal times but fails during crises; combination always helps.
Next: Chapter 40 — Policy Analysis with a New Keynesian Model