Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

MCMC Methods and the Metropolis–Hastings Algorithm

“Bayesian estimation takes the model seriously as a data-generating process, while acknowledging that we are uncertain about its parameters.” — Frank Smets and Raf Wouters

Cross-reference: Principles Appendix B (Bayesian estimation, Metropolis–Hastings); Ch. 27 (Smets–Wouters calibration and estimation) [P:AppB, P:Ch.27]


30.1 Why Bayesian Estimation?

Chapter 21 developed GMM and MLE for structural parameter estimation. For DSGE models, Bayesian estimation — combining the likelihood with prior distributions over parameters — has become the dominant approach for three reasons:

1. Identification. DSGE models typically have many parameters but limited observables. Many parameter combinations are nearly observationally equivalent — different parameter values produce similar time series. The prior provides regularization, guiding the posterior toward economically plausible regions of the parameter space.

2. Incorporating prior information. Microeconomic studies provide independent evidence about preference and technology parameters (e.g., labor supply elasticities from household data, price rigidity from firm surveys). The Bayesian prior formalizes this information.

3. Model comparison. The Bayesian framework provides a principled method for comparing non-nested models via the Bayes factor — the ratio of marginal likelihoods. This is particularly useful when comparing different DSGE specifications.

The price: Bayesian inference requires specifying priors and running MCMC chains that can take hours to converge. For large models (Smets–Wouters with 41 estimated parameters), each likelihood evaluation takes seconds, and the chain needs 105106 draws. In practice, this is done using Dynare’s optimized implementation.


30.2 Bayesian Inference: The Posterior Distribution

Bayes’ theorem for parameters:

p(θY1:T)=p(Y1:Tθ)p(θ)p(Y1:T),p(\boldsymbol\theta | \mathbf{Y}_{1:T}) = \frac{p(\mathbf{Y}_{1:T} | \boldsymbol\theta)\cdot p(\boldsymbol\theta)}{p(\mathbf{Y}_{1:T})},

where:

  • p(θY)p(\boldsymbol\theta|\mathbf{Y})posterior: our beliefs about θ\boldsymbol\theta after seeing the data.

  • p(Yθ)=L(θ)p(\mathbf{Y}|\boldsymbol\theta) = \mathcal{L}(\boldsymbol\theta)likelihood: probability of the data given parameters (from Kalman filter, Chapter 20).

  • p(θ)p(\boldsymbol\theta)prior: beliefs before seeing the data.

  • p(Y)=p(Yθ)p(θ)dθp(\mathbf{Y}) = \int p(\mathbf{Y}|\boldsymbol\theta)p(\boldsymbol\theta)d\boldsymbol\thetamarginal likelihood: normalizing constant.

The posterior combines the prior with the likelihood. When the likelihood is very informative, the posterior is concentrated near the MLE regardless of the prior. When the data are sparse, the posterior is closer to the prior.

Definition 30.1 (Posterior Mode and Posterior Mean). The posterior mode θ^MAP\hat{\boldsymbol\theta}^{MAP} (maximum a posteriori) maximizes lnp(θY)=(θ)+lnp(θ)+const\ln p(\boldsymbol\theta|\mathbf{Y}) = \ell(\boldsymbol\theta) + \ln p(\boldsymbol\theta) + \text{const}. The posterior mean θˉ=E[θY]=θp(θY)dθ\bar{\boldsymbol\theta} = \mathbb{E}[\boldsymbol\theta|\mathbf{Y}] = \int\boldsymbol\theta\, p(\boldsymbol\theta|\mathbf{Y})d\boldsymbol\theta is the Bayes estimator under squared error loss.

Finding the posterior mode is an optimization problem (Chapter 24); computing the posterior mean requires integrating over the posterior distribution — which is analytically intractable for DSGE models. MCMC methods approximate the posterior by sampling.


30.3 Prior Distributions for DSGE Parameters

The choice of prior reflects both economic beliefs and practical considerations.

Standard choices:

ParameterTypical PriorSupportMotivation
Calvo probability θ\thetaBeta(α,β\alpha,\beta)[0,1][0,1]Fraction, must be in [0,1][0,1]
CRRA σ\sigmaGamma(α,β\alpha,\beta)(0,)(0,\infty)Must be positive
AR persistence ρ\rhoBeta(α,β\alpha,\beta)[0,1][0,1]Must be in [0,1][0,1] for stationarity
Taylor rule ϕπ\phi_\piNormal (truncated)[1,)[1,\infty)Determinacy requires ϕπ>1\phi_\pi > 1
Shock std dev σi\sigma_iInverse-Gamma(0,)(0,\infty)Scale parameter, must be positive
Investment adj. cost ψ\psiNormal (positive)(0,)(0,\infty)Must be positive

Definition 30.2 (Conjugate Prior). A prior p(θ)p(\boldsymbol\theta) is conjugate to the likelihood p(Yθ)p(\mathbf{Y}|\boldsymbol\theta) if the posterior p(θY)p(\boldsymbol\theta|\mathbf{Y}) is in the same distributional family as the prior. Conjugate priors allow analytical posterior updates.

For DSGE models, the full likelihood is nonlinear in parameters (through the Kalman filter), so no conjugate priors exist. MCMC is necessary.


30.4 The Metropolis–Hastings Algorithm

The Metropolis–Hastings (MH) algorithm generates samples from the posterior distribution p(θY)p(\boldsymbol\theta|\mathbf{Y}) without computing the normalizing constant p(Y)p(\mathbf{Y}).

Algorithm 30.1 (Metropolis–Hastings).

Initialize: θ(0)\boldsymbol\theta^{(0)} (e.g., the posterior mode from optimization). Scale matrix Σq\Sigma_q (proposal covariance, typically tuned to give 20–30% acceptance rate).

For s=1,2,,Ns = 1, 2, \ldots, N:

  1. Propose: Draw θq(θθ(s1))=N(θ(s1),Σq)\boldsymbol\theta^* \sim q(\boldsymbol\theta | \boldsymbol\theta^{(s-1)}) = \mathcal{N}(\boldsymbol\theta^{(s-1)}, \Sigma_q) (symmetric random walk proposal).

  2. Compute acceptance ratio:

    α(θ(s1),θ)=min ⁣{1,  p(θY)p(θ(s1)Y)}=min ⁣{1,  L(θ)p(θ)L(θ(s1))p(θ(s1))}.\alpha(\boldsymbol\theta^{(s-1)}, \boldsymbol\theta^*) = \min\!\left\{1,\; \frac{p(\boldsymbol\theta^*|\mathbf{Y})}{p(\boldsymbol\theta^{(s-1)}|\mathbf{Y})}\right\} = \min\!\left\{1,\; \frac{\mathcal{L}(\boldsymbol\theta^*)p(\boldsymbol\theta^*)}{\mathcal{L}(\boldsymbol\theta^{(s-1)})p(\boldsymbol\theta^{(s-1)})}\right\}.
  3. Accept/Reject: Draw uUniform[0,1]u \sim \text{Uniform}[0,1].

    • If uαu \leq \alpha: θ(s)=θ\boldsymbol\theta^{(s)} = \boldsymbol\theta^* (accept).

    • If u>αu > \alpha: θ(s)=θ(s1)\boldsymbol\theta^{(s)} = \boldsymbol\theta^{(s-1)} (reject, stay).

Theorem 30.1 (MH Invariance). The MH chain with target π(θ)L(θ)p(θ)\pi(\boldsymbol\theta) \propto \mathcal{L}(\boldsymbol\theta)p(\boldsymbol\theta) has π\pi as its stationary distribution. Starting from any initial point θ(0)\boldsymbol\theta^{(0)}, the chain converges in distribution to π\pi under mild regularity conditions (irreducibility, aperiodicity).

Proof sketch. The detailed balance condition ensures π\pi is stationary. For any two states θ,θ\boldsymbol\theta, \boldsymbol\theta^*:

π(θ)q(θθ)min ⁣{1,π(θ)π(θ)}=π(θ)q(θθ)min ⁣{1,π(θ)π(θ)}.\pi(\boldsymbol\theta)q(\boldsymbol\theta^*|\boldsymbol\theta)\min\!\left\{1,\frac{\pi(\boldsymbol\theta^*)}{\pi(\boldsymbol\theta)}\right\} = \pi(\boldsymbol\theta^*)q(\boldsymbol\theta|\boldsymbol\theta^*)\min\!\left\{1,\frac{\pi(\boldsymbol\theta)}{\pi(\boldsymbol\theta^*)}\right\}.

This is verified by considering the two cases π(θ)π(θ)\pi(\boldsymbol\theta^*) \geq \pi(\boldsymbol\theta) and π(θ)<π(θ)\pi(\boldsymbol\theta^*) < \pi(\boldsymbol\theta). Detailed balance \Rightarrow stationarity. Irreducibility and aperiodicity \Rightarrow convergence. \square

In APL, the MH chain is a scan over draws:

⍝ APL — Metropolis-Hastings chain
⎕IO←0 ⋄ ⎕ML←1

⍝ Scalar log-posterior (placeholder: in production, ⎕PY calls the Kalman filter)
⍝ Posterior ∝ L(theta) × prior = N(theta|1.5,1) × N(theta|1.0,1) → N(1.25, 0.5)
log_post ← {
    ll ← -0.5×(⍵-1.5)*2        ⍝ Gaussian log-likelihood
    lp ← -0.5×(⍵-1.0)*2        ⍝ Normal(1, 1) log-prior
    ll + lp}

⍝ Box-Muller standard normal: cos branch (1○ = cosine, not 2○ = sine)
rnorm ← {1○(2×○1×?0) × (-2×⍟?0)*0.5}   ⍝ scalar standard normal draw

⍝ One MH step: ⍵ is a 2-element vector (theta, n_accepted)
⍝ Returns updated (theta_new, n_accepted_new)
mh_step ← {
    theta n_acc ← ⍵
    proposal  ← theta + scale × rnorm ⍬       ⍝ random walk proposal
    log_ratio ← (log_post proposal) - log_post theta
    accept    ← log_ratio > ⍟?0               ⍝ log(U[0,1]) < log_ratio
    new_theta ← accept⌷theta proposal          ⍝ pick proposal if accepted, else theta
    new_theta (n_acc+accept)}                  ⍝ accumulate acceptance count

⍝ Run chain: N_draws iterations in O(N) time using ⍣ (power operator)
⍝ ⍣ iterates the function exactly N_draws times — NOT a scan (which is O(N²))
N_draws ← 10000
scale   ← 0.5        ⍝ proposal standard deviation (tune for 20–30% acceptance)
theta0  ← 0          ⍝ starting value
state0  ← theta0 0   ⍝ initial state: (theta, n_accepted)

⍝ Collect chain: run mh_step N_draws times, storing each theta
⍝ We use a loop via ⎕DF (or direct accumulation) since ⍣ doesn't return history.
⍝ The idiomatic APL approach for storing all draws is explicit accumulation:
chain   ← (N_draws+1) ⍴ theta0           ⍝ pre-allocate chain vector (N_draws+1 draws)
state   ← state0
{
    chain[⍵] ← ⊃state                    ⍝ store current theta
    state ∘← mh_step state               ⍝ advance state (⍣1 equivalent)
} ¨ 1+⍳N_draws                           ⍝ iterate over draw indices

⍝ Final acceptance rate from accumulated count in state
n_accepted ← state[1]

⍝ Posterior statistics (post-burnin)
burnin    ← 2000
samples   ← burnin↓chain
post_mean ← (+/samples) ÷ ≢samples
post_std  ← ((+/(samples-post_mean)*2) ÷ ≢samples)*0.5
acc_rate  ← n_accepted ÷ N_draws         ⍝ overall acceptance rate

⍝ Acceptance count among consecutive distinct draws (alternative rate measure)
transitions   ← (1↓chain) ≠ ¯1↓chain    ⍝ 1 where chain moved; APL drop, not Python slice
acc_rate_alt  ← (+/transitions) ÷ N_draws

post_mean post_std acc_rate               ⍝ target: mean≈1.25, std≈0.71, rate≈20–40%

30.5 MCMC Diagnostics

Running an MCMC chain is only half the task — the other half is verifying that the chain has converged to the target distribution and that the draws provide reliable posterior estimates.

30.5.1 Trace Plots and Mixing

The trace plot shows the parameter value at each iteration. A well-mixing chain moves freely around the parameter space, not getting stuck in one region. Poor mixing (high autocorrelation in the chain) suggests the proposal variance Σq\Sigma_q is too small.

Definition 30.3 (Effective Sample Size). The effective sample size accounts for autocorrelation in the chain:

Neff=N1+2j=1ρ^j,N_{eff} = \frac{N}{1 + 2\sum_{j=1}^\infty\hat\rho_j},

where ρ^j\hat\rho_j is the sample autocorrelation at lag jj. If the chain has high autocorrelation, NeffNN_{eff} \ll N — 10,000 correlated draws may be worth only 100 independent draws.

30.5.2 The Gelman–Rubin Statistic

Definition 30.4 (Gelman–Rubin R^\hat{R} Statistic). Run m2m \geq 2 independent chains from different starting points, each of length nn. The R^\hat{R} statistic compares within-chain variance to between-chain variance:

R^=V^W,\hat{R} = \sqrt{\frac{\hat{V}}{W}},

where W=1mjSj2W = \frac{1}{m}\sum_j S_j^2 (average within-chain variance), B=nm1j(θˉjθˉˉ)2B = \frac{n}{m-1}\sum_j(\bar\theta_j - \bar{\bar\theta})^2 (between-chain variance), and V^=(11/n)W+B/n\hat{V} = (1-1/n)W + B/n (pooled variance estimate).

R^1.0\hat{R} \approx 1.0 indicates convergence (within- and between-chain variances agree). The standard threshold: convergence when R^<1.1\hat{R} < 1.1 for all parameters.


30.6 The DSGE Likelihood via the Kalman Filter

The log-likelihood for the DSGE model with parameters θ\boldsymbol\theta is evaluated using the Kalman filter (Chapter 20):

(θ)=Tp2ln(2π)12t=1T[lnFt(θ)+vt(θ)Ft(θ)1vt(θ)],\ell(\boldsymbol\theta) = -\frac{Tp}{2}\ln(2\pi) - \frac{1}{2}\sum_{t=1}^T\left[\ln|\mathbf{F}_t(\boldsymbol\theta)| + \mathbf{v}_t(\boldsymbol\theta)'\mathbf{F}_t(\boldsymbol\theta)^{-1}\mathbf{v}_t(\boldsymbol\theta)\right],

where:

  1. From θ\boldsymbol\theta: construct the DSGE model matrices Γ0(θ)\Gamma_0(\boldsymbol\theta), Γ1(θ)\Gamma_1(\boldsymbol\theta), Ψ(θ)\Psi(\boldsymbol\theta).

  2. Solve using gensys (Chapter 28): get decision rules C(θ)C(\boldsymbol\theta), D(θ)D(\boldsymbol\theta).

  3. Map to state-space form: F(θ)F(\boldsymbol\theta), H(θ)H(\boldsymbol\theta), Q(θ)Q(\boldsymbol\theta), R(θ)R(\boldsymbol\theta).

  4. Run Kalman filter: get innovations vt(θ)\mathbf{v}_t(\boldsymbol\theta) and innovation variances Ft(θ)\mathbf{F}_t(\boldsymbol\theta).

  5. Sum the log-likelihood contributions.

Each MH iteration requires one full Kalman filter evaluation — the bottleneck of the estimation.


30.7 Model Comparison: Marginal Likelihood and Bayes Factors

Definition 30.5 (Marginal Likelihood). The marginal likelihood of model M\mathcal{M} is:

p(YM)=p(Yθ,M)p(θM)dθ.p(\mathbf{Y}|\mathcal{M}) = \int p(\mathbf{Y}|\boldsymbol\theta, \mathcal{M})p(\boldsymbol\theta|\mathcal{M})d\boldsymbol\theta.

The Bayes factor for model M1\mathcal{M}_1 vs. M2\mathcal{M}_2:

BF12=p(YM1)p(YM2).BF_{12} = \frac{p(\mathbf{Y}|\mathcal{M}_1)}{p(\mathbf{Y}|\mathcal{M}_2)}.

If BF12>1BF_{12} > 1: M1\mathcal{M}_1 is supported by the data; BF12>10BF_{12} > 10 is “strong” evidence.

The Laplace approximation for the marginal likelihood:

lnp(YM)(θ^MAP)+lnp(θ^MAPM)+k2ln(2π)12lnH^,\ln p(\mathbf{Y}|\mathcal{M}) \approx \ell(\hat{\boldsymbol\theta}^{MAP}) + \ln p(\hat{\boldsymbol\theta}^{MAP}|\mathcal{M}) + \frac{k}{2}\ln(2\pi) - \frac{1}{2}\ln|\hat{H}|,

where H^\hat{H} is the negative Hessian of the log-posterior at the mode and kk is the number of parameters. The Laplace approximation is used in Dynare as the default marginal likelihood estimate.


30.8 Worked Example: MH for a 3-Parameter NK Model

Python

import numpy as np
from scipy.stats import norm, gamma, beta as beta_dist
from scipy.linalg import solve_discrete_lyapunov

# Simplified 3-parameter NK model
# Parameters: theta = [kappa, phi_pi, sigma_u]
# Prior: kappa ~ Beta(2,10), phi_pi ~ N(1.5, 0.25) truncated [1,∞), sigma_u ~ IG(0.1, 2)

def log_prior(theta):
    kappa, phi_pi, sigma_u = theta
    if kappa <= 0 or kappa >= 1 or phi_pi <= 1 or sigma_u <= 0:
        return -np.inf
    lp = beta_dist.logpdf(kappa, 2, 10)
    lp += norm.logpdf(phi_pi, 1.5, 0.5)
    lp += -2*np.log(sigma_u) - 0.1/sigma_u**2  # IG(0.1, 2) log-density
    return lp

def dsge_kalman_loglik(theta, Y_obs):
    """Evaluate NK model log-likelihood via Kalman filter."""
    kappa, phi_pi, sigma_u = theta
    beta_NK, sigma_IS = 0.99, 1.0
    phi_y = 0.5
    
    # Build model matrices
    G0 = np.array([[1.0, -kappa], [sigma_IS*phi_pi, 1+sigma_IS*phi_y]])
    G1 = np.array([[beta_NK, 0.0], [-sigma_IS, 1.0]])
    
    A = np.linalg.solve(G0, G1)
    eigs = np.linalg.eigvals(A)
    if not np.all(np.abs(eigs) > 1.0):
        return -np.inf  # indeterminate
    
    # MSV solution: [pi; x] = Omega * u_t, u_t = rho_u * u_{t-1} + eps_t
    rho_u = 0.7; Psi = np.linalg.solve(G0, np.array([1.0, 0.0]))
    Omega = np.linalg.solve(np.eye(2) - rho_u*A, Psi)
    
    # State-space: alpha_t = [pi_t, x_t, u_t]
    F = np.zeros((3,3)); F[:2,:2] = A; F[2,2] = rho_u
    F[:2, 2] = Psi
    H_obs = np.array([[1,0,0], [0,0,0]])  # only pi observed, x latent
    H_obs = np.eye(2)[:, :2] @ np.eye(2)  # simplify: observe pi only
    H_obs = np.array([[1,0,0]])  # observe pi only
    Q = np.zeros((3,3)); Q[2,2] = sigma_u**2
    R = np.array([[0.01]])  # small measurement error
    
    # Kalman filter
    T = len(Y_obs); m = F.shape[0]; p = H_obs.shape[0]
    a = np.zeros(m); P = np.eye(m) * 10.0  # diffuse initialization
    log_lik = 0.0
    
    for t in range(T):
        a_pred = F @ a
        P_pred = F @ P @ F.T + Q
        v = Y_obs[t:t+1] - H_obs @ a_pred
        Fv = H_obs @ P_pred @ H_obs.T + R
        K = P_pred @ H_obs.T @ np.linalg.inv(Fv)
        
        sign, ld = np.linalg.slogdet(Fv)
        log_lik -= 0.5*(ld + v @ np.linalg.solve(Fv, v) + p*np.log(2*np.pi))
        
        a = a_pred + K @ v
        P = (np.eye(m) - K @ H_obs) @ P_pred
    
    return float(log_lik)

def log_posterior(theta, Y_obs):
    lp = log_prior(theta)
    if not np.isfinite(lp): return -np.inf
    ll = dsge_kalman_loglik(theta, Y_obs)
    return lp + ll if np.isfinite(ll) else -np.inf

# Generate synthetic data
np.random.seed(42)
T_data = 200; kappa_t, phi_pi_t, sigma_u_t = 0.15, 1.5, 0.01
u = np.zeros(T_data)
for t in range(1,T_data): u[t] = 0.7*u[t-1] + sigma_u_t*np.random.randn()

# Simulate pi: approximate first-order solution
beta_NK, sigma_IS, phi_y = 0.99, 1.0, 0.5
G0_t = np.array([[1,-kappa_t],[sigma_IS*phi_pi_t,1+sigma_IS*phi_y]])
G1_t = np.array([[beta_NK,0],[-sigma_IS,1]]); A_t = np.linalg.inv(G0_t)@G1_t
Omega_t = np.linalg.solve(np.eye(2)-0.7*A_t, np.linalg.solve(G0_t,np.array([1,0])))
pi_data = Omega_t[0]*u + 0.003*np.random.randn(T_data)

# MH sampling
N_draws = 5000; burnin = 1000
theta_chain = np.zeros((N_draws, 3))
theta_chain[0] = [0.12, 1.6, 0.012]  # starting point
Sigma_prop = np.diag([0.01**2, 0.05**2, 0.002**2])  # proposal covariance
L_prop = np.linalg.cholesky(Sigma_prop)

lp_curr = log_posterior(theta_chain[0], pi_data)
n_accept = 0

for s in range(1, N_draws):
    proposal = theta_chain[s-1] + L_prop @ np.random.randn(3)
    lp_prop = log_posterior(proposal, pi_data)
    
    log_alpha = lp_prop - lp_curr
    if np.log(np.random.rand()) < log_alpha:
        theta_chain[s] = proposal
        lp_curr = lp_prop
        n_accept += 1
    else:
        theta_chain[s] = theta_chain[s-1]

accept_rate = n_accept / N_draws
posterior = theta_chain[burnin:]
print(f"Acceptance rate: {accept_rate*100:.1f}% (target: 20-30%)")
print(f"\nPosterior estimates (true values in brackets):")
params = ['kappa', 'phi_pi', 'sigma_u']
true_vals = [kappa_t, phi_pi_t, sigma_u_t]
for i, (p, tv) in enumerate(zip(params, true_vals)):
    pm = np.mean(posterior[:,i]); ps = np.std(posterior[:,i])
    ci = np.percentile(posterior[:,i], [5, 95])
    print(f"  {p}: mean={pm:.4f}, std={ps:.4f}, 90%CI=[{ci[0]:.4f},{ci[1]:.4f}]  (true={tv})")

30.9 Programming Exercises

Exercise 30.1 (APL — Gelman–Rubin Statistic)

Run two independent MH chains from different starting points (θ0(1)=0.5\theta_0^{(1)} = 0.5 and θ0(2)=3.0\theta_0^{(2)} = 3.0) for the simple Gaussian posterior. Compute the Gelman–Rubin R^\hat{R} statistic: (a) within-chain variance W=(1/m)jSj2W = (1/m)\sum_j S_j^2 using (+/S_j_sq)÷m; (b) between-chain variance B=n/(m1)j(θˉjθˉ)2B = n/(m-1)\sum_j(\bar\theta_j - \bar\theta)^2; (c) R^=V^/W\hat{R} = \sqrt{\hat{V}/W}. Verify R^1\hat{R} \to 1 as the chain length increases and the two chains converge.

Exercise 30.2 (Python — Adaptive MH)

Implement adaptive MH (Haario, Saksman, Tamminen, 2001): after a warmup of s0s_0 draws, update the proposal covariance as Σq(s)=(2.38)2/kCov(θ(1),,θ(s))+εI\Sigma_q^{(s)} = (2.38)^2/k \cdot \text{Cov}(\theta^{(1)},\ldots,\theta^{(s)}) + \varepsilon I where kk is the dimension. (a) Show that this targets the optimal proposal covariance for Gaussian posteriors. (b) Compare acceptance rates between fixed and adaptive proposals for the 3-parameter NK model. (c) Verify the adaptive chain converges to the same posterior as the fixed chain.

Exercise 30.3 (Julia — Marginal Likelihood via Laplace)

using LinearAlgebra, ForwardDiff, Optim

function laplace_marginal_lik(log_posterior, theta_mode)
    # Compute negative Hessian at mode
    H = ForwardDiff.hessian(theta -> -log_posterior(theta), theta_mode)
    k = length(theta_mode)
    
    # Laplace approximation
    lml = log_posterior(theta_mode) + (k/2)*log(2π) - 0.5*log(det(H))
    return lml
end

# Example: simple 2-parameter model
log_post(θ) = -0.5*((θ[1]-1.5)^2/0.25 + (θ[2]-0.15)^2/0.01)  # Gaussian posterior
θ_mode = [1.5, 0.15]
lml = laplace_marginal_lik(log_post, θ_mode)
println("Log marginal likelihood (Laplace): ", round(lml, digits=4))
println("Exact (Gaussian): ", -log(2π) - 0.5*log(0.25*0.01))

Exercise 30.4 — Prior Sensitivity (\star)

For the 3-parameter NK model, investigate prior sensitivity. (a) Run the MH estimator with three different priors for κ\kappa: informative Beta(5,20) (mean 0.20), weakly informative Beta(2,10) (mean 0.17), and very flat Beta(1,1). (b) Report the posterior means and standard deviations for all three priors. (c) Compute the Bayes factor between the informative and flat prior specifications. (d) Interpret: is the data informative enough to overcome the prior, or does the posterior depend strongly on the prior specification?


30.10 Chapter Summary

Key results:

  • The Bayesian posterior p(θY)L(θ)p(θ)p(\boldsymbol\theta|\mathbf{Y}) \propto \mathcal{L}(\boldsymbol\theta)p(\boldsymbol\theta) combines the Kalman filter likelihood with prior distributions over model parameters.

  • Standard priors for DSGE: Beta for fractions (θ\theta, ρ\rho); Gamma/Inverse-Gamma for positive parameters (σ\sigma, ψ\psi, shock std devs); Normal (truncated) for Taylor rule coefficients.

  • The Metropolis–Hastings algorithm generates posterior draws: propose θq(θθ(s1))\boldsymbol\theta^* \sim q(\boldsymbol\theta|\boldsymbol\theta^{(s-1)}); accept with probability min{1,π(θ)/π(θ(s1))}\min\{1, \pi(\boldsymbol\theta^*)/\pi(\boldsymbol\theta^{(s-1)})\}. Invariance to the target distribution proved via detailed balance (Theorem 30.1).

  • Convergence diagnostics: trace plots, effective sample size Neff=N/(1+2ρ^j)N_{eff} = N/(1+2\sum\hat\rho_j), and Gelman–Rubin R^<1.1\hat{R} < 1.1.

  • The Laplace approximation to the marginal likelihood: lnp(YM)(θ^MAP)+lnp(θ^MAP)+k2ln(2π)12lnH^\ln p(\mathbf{Y}|\mathcal{M}) \approx \ell(\hat{\boldsymbol\theta}^{MAP}) + \ln p(\hat{\boldsymbol\theta}^{MAP}) + \frac{k}{2}\ln(2\pi) - \frac{1}{2}\ln|\hat{H}|.

  • In APL: MH chain as a scan {mh_step ⍵ scale}\ N_draws ⍴ theta0; acceptance check via log_ratio > ⍟?0 (accept when logU<logα\log U < \log\alpha).

Next: Chapter 31 — Implementing DSGE Models: Workflows in Dynare, APL, Python, Julia, and R