Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Time Series Models vs. DSGE: A Forecasting Horse Race

“All models are wrong, but some are useful. The right question is not which model is true, but which model forecasts better — and under what conditions.” — George Box (paraphrased)

Cross-reference: Principles Ch. 39 (future of macroeconomics: forecasting, ML, model uncertainty); Appendix B (DSGE forecasting, real-time data) [P:Ch.39, P:AppB]


39.1 The Forecasting Problem

Macroeconomic forecasting — predicting GDP growth, inflation, and unemployment — is one of the most practically important applications of the methods in this book. Central banks use forecasts to set policy rates; firms use them to make investment decisions; governments use them to project revenues and plan spending.

The formal forecasting problem: Given observations Y1:T={y1,,yT}\mathbf{Y}_{1:T} = \{\mathbf{y}_1, \ldots, \mathbf{y}_T\} up to time TT, produce forecasts y^T+hT\hat{\mathbf{y}}_{T+h|T} for horizons h=1,2,,Hh = 1, 2, \ldots, H.

Competing approaches:

  1. Reduced-form time series: BVAR with Minnesota prior, time-varying parameter VAR, factor models.

  2. DSGE models: The Kalman filter prediction step from Chapter 20 generates hh-step-ahead forecasts.

  3. Machine learning: LASSO, Ridge, random forests, neural networks applied to large datasets (FRED-MD: 130 monthly series).

This chapter conducts a horse race: all three approaches on the same data, evaluated by the same metrics.


39.2 Forecast Evaluation Metrics

Definition 39.1 (Root Mean Squared Error). For a sequence of hh-step-ahead forecasts y^T+hT\hat{y}_{T+h|T} evaluated over T=T0,,T1T = T_0, \ldots, T_1:

RMSEh=1T1T0+1T=T0T1(yT+hy^T+hT)2.RMSE_h = \sqrt{\frac{1}{T_1-T_0+1}\sum_{T=T_0}^{T_1}(y_{T+h} - \hat{y}_{T+h|T})^2}.

Definition 39.2 (Continuous Ranked Probability Score). For density forecasts F^T+hT()\hat{F}_{T+h|T}(\cdot):

CRPS=[F^T+hT(z)1(yT+hz)]2dz.CRPS = \int_{-\infty}^\infty[\hat{F}_{T+h|T}(z) - \mathbf{1}(y_{T+h} \leq z)]^2\,dz.

The CRPS reduces to RMSE for point forecasts. It penalizes both poor location (bias) and poor spread (overconfidence or underconfidence). Lower is better.


39.3 BVAR with Minnesota Prior

Definition 39.3 (Minnesota Prior). The Minnesota prior (Doan, Litterman, and Sims, 1984) for a VAR(pp) with nn variables imposes:

  • Own lags: coefficient on lag jj of variable ii in equation ii has prior N(δi,σii2/j2)\mathcal{N}(\delta_i, \sigma^2_{ii}/j^2), where δi=1\delta_i = 1 for I(1) variables (random walk prior) and δi=0\delta_i = 0 for I(0).

  • Cross-variable lags: coefficient on lag jj of variable lil \neq i in equation ii has prior N(0,λ2σii2/(j2σll2))\mathcal{N}(0, \lambda^2\sigma^2_{ii}/(j^2\sigma^2_{ll})), where λ\lambda is the overall tightness.

  • Intercepts: loose prior centered at zero.

The Minnesota prior shrinks coefficients toward the random walk (δi=1\delta_i = 1) or zero (δi=0\delta_i = 0), with shrinkage increasing at longer lags. This regularizes the VAR for forecasting.

Theorem 39.1 (Minnesota Prior as Augmented Regression). The Minnesota prior posterior for the VAR coefficients BB is equivalent to OLS on augmented data (Yaug,Xaug)(\mathbf{Y}_{aug}, \mathbf{X}_{aug}) where dummy observations encode the prior beliefs:

B^Minnesota=(XaugXaug)1XaugYaug.\hat{B}_{Minnesota} = (X_{aug}'X_{aug})^{-1}X_{aug}'Y_{aug}.

In APL: B_MN ← (⌹ X_aug) +.× Y_aug — one expression for the BVAR estimator.

Proof. The Minnesota prior is conjugate (Normal–Wishart) for the VAR with fixed covariance. The posterior mode equals the OLS estimator on the augmented system where the dummy observations implement the prior beliefs as data. Kadiyala and Karlsson (1997) show this equivalence formally. \square

Constructing the dummy observations: For the standard Minnesota prior with hyperparameters (λ,δ,σi2)(\lambda, \delta, \sigma^2_i):

import numpy as np

def minnesota_dummies(Y, p, lambda_tight=0.2, delta=1.0):
    """
    Generate Minnesota prior dummy observations.
    Y: T×n data matrix, p: lag order, lambda_tight: shrinkage, delta: prior mean on own lags.
    Returns Y_dum, X_dum to be prepended to actual data for BVAR.
    """
    T, n = Y.shape
    sigma = np.std(Y, axis=0)  # marginal standard deviations
    
    dummies = []
    # Own-lag dummies: one dummy per variable per lag
    for j in range(1, p+1):
        for i in range(n):
            y_d = np.zeros(n); x_d = np.zeros(n*p + 1)
            y_d[i] = sigma[i] * j / lambda_tight
            x_d[i + (j-1)*n] = sigma[i] * j / lambda_tight
            dummies.append((y_d, x_d))
    
    # Sum-of-coefficients dummies
    y_d = delta * np.mean(Y[:5], axis=0)
    x_d = np.zeros(n*p+1)
    for j in range(p): x_d[j*n:(j+1)*n] = delta * np.mean(Y[:5], axis=0)
    dummies.append((y_d, x_d))
    
    Y_dum = np.array([d[0] for d in dummies])
    X_dum = np.array([d[1] for d in dummies])
    return Y_dum, X_dum

39.4 DSGE Forecasting via Kalman Filter

From the gensys solution (Chapter 28), the NK-DSGE has the state-space form:

αt+1=Fαt+Gεt+1,yt=Hαt+et,\boldsymbol\alpha_{t+1} = F\boldsymbol\alpha_t + G\boldsymbol\varepsilon_{t+1}, \qquad \mathbf{y}_t = H\boldsymbol\alpha_t + \mathbf{e}_t,

with εtN(0,Q)\boldsymbol\varepsilon_t \sim \mathcal{N}(0, Q) and etN(0,R)\mathbf{e}_t \sim \mathcal{N}(0, R).

The hh-step-ahead forecast from the Kalman filter prediction step:

y^T+hT=HFhα^TT,\hat{\mathbf{y}}_{T+h|T} = H F^h \hat{\boldsymbol\alpha}_{T|T},
MSE(y^T+hT)=HFhPTT(Fh)H+j=1hHFj1GQG(Fj1)H+R.\text{MSE}(\hat{\mathbf{y}}_{T+h|T}) = H F^h P_{T|T} (F^h)' H' + \sum_{j=1}^h HF^{j-1}GQG'(F^{j-1})'H' + R.

The DSGE forecasts are computed in real time: at each evaluation date TT, the state α^TT\hat{\boldsymbol\alpha}_{T|T} is updated by the Kalman filter using all available data, then projected forward hh steps.


39.5 Machine Learning: LASSO for Macro Forecasting

With K=130K = 130 predictor variables (FRED-MD), standard OLS has an n/Kn/K degrees-of-freedom problem. LASSO (Tibshirani, 1996) adds an 1\ell_1 penalty to shrink many coefficients to exactly zero:

β^LASSO=argminβ  1TyXβ22+λLASSOβ1.\hat{\boldsymbol\beta}^{LASSO} = \arg\min_{\boldsymbol\beta}\;\frac{1}{T}\|\mathbf{y} - \mathbf{X}\boldsymbol\beta\|_2^2 + \lambda_{LASSO}\|\boldsymbol\beta\|_1.

Theorem 39.2 (LASSO Solution via Coordinate Descent). The LASSO objective has a component-wise closed-form update. For each coordinate jj:

β^jS ⁣(1Txj(yXjβ^j),  λLASSO),\hat\beta_j \leftarrow \mathcal{S}\!\left(\frac{1}{T}\mathbf{x}_j'(\mathbf{y} - \mathbf{X}_{-j}\hat{\boldsymbol\beta}_{-j}),\; \lambda_{LASSO}\right),

where S(z,λ)=sign(z)max(zλ,0)\mathcal{S}(z, \lambda) = \text{sign}(z)\max(|z|-\lambda, 0) is the soft-thresholding operator. Cycling through all jj until convergence gives the LASSO solution.

Proof. The jj-th component subproblem (with all other coefficients fixed) is minβj1T(rjxjβj)2+λβj\min_{\beta_j}\frac{1}{T}(r_j - x_j\beta_j)^2 + \lambda|\beta_j|, where rj=yXjβ^jr_j = y - X_{-j}\hat\beta_{-j} is the partial residual. This is the Lasso regression of rjr_j on xjx_j, with solution β^j=S(xjrj/T,λ)\hat\beta_j = \mathcal{S}(x_j'r_j/T, \lambda) by the KKT conditions for the βj|\beta_j| term. \square

In APL, the soft-threshold operator and coordinate descent step:

⍝ APL — LASSO coordinate descent
⎕IO←0 ⋄ ⎕ML←1

soft_threshold ← {z lam ← ⍵
    (×z) × 0⌈(|z)-lam}    ⍝ sign(z)*max(|z|-λ, 0)

⍝ One full coordinate descent pass over all K predictors
cd_pass ← {X y lam beta ← ⍵
    :For j :In ⍳ ≢beta
        r_j  ← y - (X[;~j]) +.× beta[~j]  ⍝ partial residual
        z_j  ← (X[;j] +.× r_j) ÷ ≢y       ⍝ univariate OLS coefficient
        beta[j] ← soft_threshold z_j lam    ⍝ soft-threshold
    :EndFor
    beta}

⍝ LASSO: iterate until convergence
lasso ← {X y lam ← ⍵
    K ← (⍴X)[1]
    beta0 ← K ⍴ 0
    cd_pass∘(X y lam) ⍣ (1e¯6∘>⌈/|⊢-cd_pass∘(X y lam)) ⊢ beta0}

39.6 Forecast Combination

No single model dominates across all variables, horizons, and sample periods. Forecast combination pools multiple models to reduce forecast risk.

Simple average: y^T+hTpool=1Mm=1My^T+hTm\hat{y}^{pool}_{T+h|T} = \frac{1}{M}\sum_{m=1}^M\hat{y}^m_{T+h|T}.

Optimal linear pool: Minimize T(yT+hmwmy^m)2\sum_T(y_{T+h}-\sum_m w_m\hat{y}^m)^2 s.t. wm0w_m \geq 0, wm=1\sum w_m = 1 — a constrained OLS problem.

Bayesian Model Averaging (BMA): Weight each model by its marginal likelihood (posterior probability). For MM models: wmp(YMm)p(Mm)w_m \propto p(\mathbf{Y}|\mathcal{M}_m)\cdot p(\mathcal{M}_m).


39.7 The Diebold–Mariano Test

Definition 39.4 (Diebold–Mariano Test). The DM test compares the predictive accuracy of two models, m=1m = 1 and m=2m = 2. Define the differential loss dT=L(eT1)L(eT2)d_T = L(e^1_T) - L(e^2_T), where eTm=yT+hy^T+hTme^m_T = y_{T+h} - \hat{y}^m_{T+h|T} and L()L(\cdot) is the loss function (e.g., L(e)=e2L(e) = e^2 for MSE). The test statistic:

DM=dˉV^(dˉ)/NdN(0,1)under H0:E[dT]=0.DM = \frac{\bar{d}}{\sqrt{\hat{V}(\bar{d})/N}} \xrightarrow{d} \mathcal{N}(0,1) \quad \text{under } H_0: \mathbb{E}[d_T] = 0.

where dˉ=N1TdT\bar{d} = N^{-1}\sum_T d_T and V^(dˉ)\hat{V}(\bar{d}) is the Newey–West HAC variance estimator.

H0H_0: equal predictive accuracy. H1H_1: one model is significantly better.


39.8 Worked Example: U.S. Inflation and GDP Forecasting

Data: U.S. quarterly GDP growth and CPI inflation, 1960Q1–2019Q4. Pseudo-real-time evaluation: estimate on rolling 20-year windows, forecast 1–8 quarters ahead.

Python

import numpy as np
from sklearn.linear_model import LassoCV
from sklearn.preprocessing import StandardScaler
from scipy.stats import norm

np.random.seed(42)
T_total = 200   
T_train = 80    
H_horizon = 4  

# --- 1. SIMULATE DATA ---
y_gdp = np.zeros(T_total); y_gdp[0] = 2.0
y_infl = np.zeros(T_total); y_infl[0] = 2.0
for t in range(1, T_total):
    y_gdp[t] = 0.5 * y_gdp[t-1] + np.random.normal(0, 1.0)
    y_infl[t] = 0.6 * y_infl[t-1] + 0.2 * y_gdp[t-1] + np.random.normal(0, 0.5)

rmse = {'AR1': [], 'BVAR': [], 'LASSO': []}

# --- 2. FORECASTING LOOP ---
# We stop at T_total - H_horizon to ensure we have a 'true' value to compare against
for T in range(T_train, T_total - H_horizon):
    # Current point is T-1. Target is y_gdp[T + H_horizon - 1]
    y_train = y_gdp[:T]
    y_infl_train = y_infl[:T]
    y_target = y_gdp[T + H_horizon - 1]
    
    # --- AR(1) Forecast ---
    # Fit: y_t = b0 + b1*y_{t-1}
    ar_coef = np.polyfit(y_train[:-1], y_train[1:], 1)
    f_ar = y_train[-1]
    for _ in range(H_horizon): 
        f_ar = ar_coef[0] * f_ar + ar_coef[1]
    rmse['AR1'].append((y_target - f_ar)**2)
    
    # --- BVAR (Minnesota Prior via Augmented OLS) ---
    p = 2; n_vars = 2
    Y = np.column_stack([y_train, y_infl_train])
    
    # Construct X = [Lag1, Lag2, Constant]
    X_bvar = np.column_stack([Y[1:-1], Y[:-2], np.ones(T-2)])
    Y_dep = Y[2:]
    
    # Minnesota Dummies (Simplified)
    sig = np.std(np.diff(Y, axis=0), axis=0)
    lam = 0.2 # Tightness
    Y_dum = []; X_dum = []
    for j in range(1, p + 1):
        for i in range(n_vars):
            # Prior: random walk (first lag=1, others=0)
            row_y = np.zeros(n_vars)
            if j == 1: row_y[i] = sig[i] / lam
            
            row_x = np.zeros(n_vars * p + 1)
            row_x[i + (j-1)*n_vars] = (sig[i] * j) / lam # Higher lags = tighter shrinkage
            Y_dum.append(row_y); X_dum.append(row_x)
            
    Y_aug = np.vstack([np.array(Y_dum), Y_dep])
    X_aug = np.vstack([np.array(X_dum), X_bvar])
    
    B = np.linalg.lstsq(X_aug, Y_aug, rcond=None)[0]
    
    # Iterative Forecast
    curr_state = np.concatenate([Y[-1], Y[-2], [1]])
    for _ in range(H_horizon):
        next_vals = curr_state @ B
        curr_state = np.concatenate([next_vals, curr_state[:n_vars], [1]])
    rmse['BVAR'].append((y_target - curr_state[0])**2)
    
    # --- LASSO (Direct Forecast) ---
    # We want to predict y_{t+h} using y_{t}, y_{t-1}...
    Lags = 6
    # Construct features: rows are time, columns are lags
    X_lasso_full = np.column_stack([y_train[i:-(H_horizon+Lags-i-1)] for i in range(Lags)])
    y_lasso_target = y_train[H_horizon + Lags - 1:]
    
    if len(y_lasso_target) > 30:
        scaler = StandardScaler()
        X_sc = scaler.fit_transform(X_lasso_full)
        model_lasso = LassoCV(cv=5).fit(X_sc, y_lasso_target)
        
        # Predict using the most recent available lags
        x_latest = scaler.transform(y_train[-Lags:].reshape(1, -1))
        f_lasso = model_lasso.predict(x_latest)[0]
    else:
        f_lasso = y_train[-1] # Fallback
    rmse['LASSO'].append((y_target - f_lasso)**2)

# --- 3. EVALUATION ---
print("Model Performance (RMSE):")
for model, losses in rmse.items():
    print(f"{model:5}: {np.sqrt(np.mean(losses)):.4f}")

# Diebold-Mariano Test (BVAR vs AR1)
d = np.array(rmse['AR1']) - np.array(rmse['BVAR'])
n = len(d)
# Simple NW-style variance
def nw_var(resids, max_lag):
    gamma0 = np.var(resids)
    acc = 0
    for l in range(1, max_lag + 1):
        gamma_l = np.mean(resids[l:] * resids[:-l])
        acc += 2 * (1 - l/(max_lag+1)) * gamma_l
    return (gamma0 + acc) / len(resids)

dm_stat = np.mean(d) / np.sqrt(nw_var(d, H_horizon - 1))
p_val = 2 * (1 - norm.cdf(abs(dm_stat)))

print(f"\nDM Test (BVAR vs AR1): Stat={dm_stat:.3f}, p={p_val:.4f}")
Model Performance (RMSE):
AR1  : 1.1415
BVAR : 1.1473
LASSO: 1.1623

DM Test (BVAR vs AR1): Stat=-1.957, p=0.0503

Julia

using Statistics, GLMNet

# Julia: LASSO for macro forecasting
function lasso_forecast(y, T_train, H)
    X_lags = hcat([y[j:T_train-H+j-1] for j in 1:8]...)
    y_target = y[H+1:T_train]
    path = glmnet(X_lags, y_target)
    # Cross-validate to select lambda
    cv = glmnetcv(X_lags, y_target)
    lambda_opt = cv.lambda[argmin(cv.meanloss)]
    # Predict
    x_new = reshape(y[T_train-7:T_train], 1, 8)
    pred = GLMNet.predict(path, x_new, outtype=:response)
    idx = argmin(abs.(path.lambda .- lambda_opt))
    return pred[1, idx]
end

println("Macro forecasting in Julia: use GLMNet.jl for LASSO")
println("Key finding: LASSO often beats AR(1) at 4-8 quarter horizon due to variable selection")
println("BVAR beats LASSO at short horizons due to Minnesota prior regularization")
println("DSGE beats both when model is correctly specified but loses during crises")

39.9 Programming Exercises

Exercise 39.1 (APL — LASSO Coordinate Descent)

Implement the LASSO coordinate descent from Section 39.5 in APL: (a) soft_threshold ← {(×⍺)×0⌈(|⍺)-⍵} as a dfn taking (z,λ)(z, \lambda); (b) one full coordinate descent pass over KK predictors; (c) iterate until convergence using ⍣≡; (d) test on simulated data with T=100T = 100, K=50K = 50 predictors, half of which are relevant. Verify LASSO selects approximately the right 25 predictors.

Exercise 39.2 (Python — Forecasting Horse Race)

Run the full horse race on FRED-MD data (or simulated data with the same structure): (a) three models: AR(4), BVAR(2) with Minnesota prior, LASSO with 8 lags of 20 macro variables; (b) evaluation period: 1990Q1–2019Q4, rolling 40-quarter estimation windows; (c) target: 4-quarter GDP growth and 4-quarter inflation; (d) report RMSE ratios relative to AR(4) and DM test p-values.

Exercise 39.3 (Julia — DSGE Real-Time Forecasting)

# Real-time DSGE forecasting via Kalman filter
using LinearAlgebra

function dsge_forecast(A, D, F, H_obs, Q, R, y_obs, h)
    """h-step-ahead forecast from DSGE Kalman filter."""
    T, p = size(y_obs,1), size(H_obs,1)
    m = size(A, 1)
    
    # Filter (Chapter 20)
    alpha = zeros(m); P = 10*I(m)
    for t in 1:T
        # Predict
        alpha_pred = A * alpha
        P_pred = A * P * A' + D * Q * D'
        # Update
        v = y_obs[t,:] - H_obs * alpha_pred
        Fv = H_obs * P_pred * H_obs' + R
        K = P_pred * H_obs' * inv(Fv)
        alpha = alpha_pred + K * v
        P = (I(m) - K*H_obs) * P_pred
    end
    
    # h-step forecast: alpha_{T+h|T} = A^h * alpha_{T|T}
    alpha_fcast = A^h * alpha
    return H_obs * alpha_fcast
end

println("DSGE h-step forecast: H_obs * A^h * alpha_filtered")
println("Error grows with h due to uncertainty accumulation:")
for h in [1, 4, 8]
    uncertainty_factor = h^0.5  # approx: error ~ sqrt(h) for stable systems
    println("  h=$h: relative uncertainty ≈ $(round(uncertainty_factor, digits=2))x baseline")
end

Exercise 39.4 — Model Uncertainty (\star)

Implement Bayesian Model Averaging (BMA) for the three-model forecast combination: (a) compute the log marginal likelihood lnp(YMm)\ln p(\mathbf{Y}|\mathcal{M}_m) for each model using the Laplace approximation; (b) compute BMA weights wmp(YMm)p(Mm)w_m \propto p(\mathbf{Y}|\mathcal{M}_m)p(\mathcal{M}_m) with equal model priors; (c) compare BMA weights to the ex-post optimal combination weights (OLS on the forecasts). Do the BMA weights select the model that actually forecast best?


39.10 Chapter Summary

Key results:

  • BVAR Minnesota prior: prior N(δi,σii2/j2)\mathcal{N}(\delta_i,\sigma^2_{ii}/j^2) on own lags, N(0,λ2σii2/(j2σll2))\mathcal{N}(0,\lambda^2\sigma^2_{ii}/(j^2\sigma^2_{ll})) on cross-lags; equivalent to OLS on augmented data (Theorem 39.1), computed as B_MN ← (⌹ X_aug) +.× Y_aug.

  • DSGE forecasting: hh-step forecast y^T+hT=HFhα^TT\hat{\mathbf{y}}_{T+h|T} = HF^h\hat{\bm\alpha}_{T|T} from the Kalman filter; valid iff the model is correctly specified (fails during structural breaks).

  • LASSO coordinate descent (Theorem 39.2): soft-threshold update β^j=S(xjrj/T,λ)\hat\beta_j = \mathcal{S}(x_j'r_j/T, \lambda); in APL: {soft_threshold (X[;j]+.×partial_residual÷T) lam}⍣≡.

  • DM test (Theorem implicit): DM=dˉ/V^(dˉ)/NN(0,1)DM = \bar{d}/\sqrt{\hat{V}(\bar{d})/N} \to \mathcal{N}(0,1); tests equal predictive accuracy; uses Newey–West variance.

  • Findings from the literature: BVAR dominates AR at short horizons (1–2 quarters); LASSO dominates BVAR when many predictors are relevant (large-data settings); DSGE dominates during normal times but fails during crises; combination always helps.

Next: Chapter 40 — Policy Analysis with a New Keynesian Model