www.gusucode.com > econ 案例源码程序 matlab代码 > econ/Demo_TSReg8.m

    %% Time Series Regression VIII: Lagged Variables and Estimator Bias
%
% This example shows how lagged predictors affect least-squares estimation
% of multiple linear regression models. It is the eighth in a series of
% examples on time series regression, following the presentation in
% previous examples.

% Copyright 2012-2014 The MathWorks, Inc.

%% Introduction
%
% Many econometric models are _dynamic_, using lagged variables to
% incorporate feedback over time. By contrast, _static_ time series models
% represent systems that respond exclusively to current events.
%
% Lagged variables come in several types:
%
% $\bullet$ *Distributed Lag (DL)* variables are lagged values $x_{t-k}$ of
% observed exogenous predictor variables $x_t$.
%
% $\bullet$ *Autoregressive (AR)* variables are lagged values $y_{t-k}$ of
% observed endogenous response variables $y_t$.
%
% $\bullet$ *Moving Average (MA)* variables are lagged values $e_{t-k}$ of
% unobserved stochastic innovations processes $e_t$.
%
% Dynamic models are often constructed using linear combinations of
% different types of lagged variables, to create ARMA, ARDL, and other
% hybrids. The modeling goal, in each case, is to reflect important
% interactions among relevant economic factors, accurately and concisely.
%
% Dynamic model specifications ask the question: _Which lags are
% important?_ Some models, such as seasonal models, use lags at distinct
% periods in the data. Other models base their lag structure on theoretical
% considerations of how, and when, economic agents react to changing
% conditions. In general, lag structures identify the time delay of the
% response to known leading indicators.
%
% However, lag structures must do more than just represent the available
% theory. Because dynamic specifications produce interactions among
% variables that can affect standard regression techniques, lag structures
% must also be designed with accurate model estimation in mind.

%% Specification Issues
%
% Consider the multiple linear regression (MLR) model:
%
% $$y_t = Z_t \beta + e_t,$$
%
% where $y_t$ is an observed response, $Z_t$ includes columns for each
% potentially relevant predictor variable, including lagged variables, and
% $e_t$ is a stochastic innovations process. The accuracy of estimation of
% the coefficients in $\beta$ depends on the constituent columns of $Z_t$,
% as well as the joint distribution of $e_t$. Selecting predictors for
% $Z_t$ that are both statistically and economically significant usually
% involves cycles of estimation, residual analysis, and respecification.
%
% Classical linear model (CLM) assumptions, disscussed in the example on
% "Linear Models," allow ordinary least squares (OLS) to produce estimates
% of $\beta$ with desirable properties: unbiased, consistent, and efficient
% relative to other estimators. Lagged predictors in $Z_t$, however, can
% introduce violations of CLM assumptions. Specific violations depend on
% the types of lagged variables in the model, but the presence of dynamic
% feedback mechanisms, in general, tends to exaggerate the problems
% associated with static specifications.
%
% Model specification issues are usually discussed relative to a
% data-generating process (DGP) for the response variable $y_t$.
% Practically, however, the DGP is a theoretical construct, realized only
% in simulation. No model ever captures real-world dynamics entirely, and
% model coefficients in $\beta$ are always a subset of those in the true
% DGP. As a result, innovations in $e_t$ become a mix of the inherent
% stochasticity of the process and a potentially large number of omitted
% variables (OVs). Autocorrelations in $e_t$ are common in econometric
% models where OVs exhibit persistence over time. Rather than comparing a
% model to a theoretical DGP, it is more practical to evaluate whether, or
% to what degree, dynamics in the data have been distinguished from
% autocorrelations in the residuals.
%
% Initially, lag structures may include observations of economic factors at
% multiple, proximate times. However, observations at time _t_ are likely
% to be correlated with observations at times _t_ - 1, _t_ - 2, and so
% forth, through economic inertia. Thus, a lag structure may _overspecify_
% the dynamics of the response by including a sequence of lagged predictors
% with only marginal contributions to the DGP. The specification will
% overstate the effects of past history, and fail to impose relevant
% restrictions on the model. Extended lag structures also require extended
% presample data, reducing the sample size and decreasing the number of
% degrees of freedom in estimation procedures. Consequently, overspecified
% models may exhibit pronounced problems of collinearity and high estimator
% variance. The resulting estimates of $\beta$ have low precision, and it
% becomes difficult to separate individual effects.
%
% To reduce predictor dependencies, lag structures can be restricted. If
% the restrictions are too severe, however, other problems of estimation
% arise. A restricted lag structure may _underspecify_ the dynamics of the
% response by excluding predictors that are actually a significant part of
% the DGP. This leads to a model that underestimates the effects of past
% history, forcing significant predictors into the innovations process. If
% lagged predictors in $e_t$ are correlated with proximate lagged
% predictors in $Z_t$, the CLM assumption of strict exogeneity of the
% regressors is violated, and OLS estimates of $\beta$ become biased and
% inconsistent.
%
% Specific issues are associated with different types of lagged predictors.
%
% Lagged exogenous predictors $x_{t-k}$, by themselves, do not violate CLM
% assumptions. However, DL models are often described, at least initially,
% by a long sequence of _potentially_ relevant lags, and so suffer from the
% problems of overspecification mentioned above. Common, if somewhat ad
% hoc, methods for imposing restrictions on the lag weights (that is, the
% coefficients in $\beta$) are discussed in the example on "Lag Order
% Selection." In principle, however, the analysis of a DL model parallels
% that of a static model. Estimation issues related to collinearity,
% influential observations, spurious regression, autocorrelated or
% heteroscedastic innovations, etc. must still be examined.
%
% Lagged endogenous predictors $y_{t-k}$ are more problematic. AR models
% introduce violations of CLM assumptions that lead to biased OLS estimates
% of $\beta$. Absent any other CLM violations, the estimates are,
% nevertheless, consistent and relatively efficient. Consider a simple
% first-order autoregression of $y_t$ on $y_{t-1}$:
%
% $$y_t = \beta y_{t-1} + e_t.$$
%
% In this model, $y_t$ is determined by both $y_{t-1}$ and $e_t$. Shifting
% the equation backwards one step at a time, $y_{t-1}$ is determined by
% both $y_{t-2}$ and $e_{t-1}$, $y_{t-2}$ is determined by both $y_{t-3}$
% and $e_{t-2}$, and so forth. Transitively, the predictor $y_{t-1}$ is
% correlated with the entire previous history of the innovations process.
% Just as with underspecification, the CLM assumption of strict exogeneity
% is violated, and OLS estimates of $\beta$ become biased. Because $\beta$
% must absorb the effects of each $e_{t-k}$, the model residuals no longer
% represent true innovations [10].
%
% The problem is compounded when the innovations in an AR model are
% autocorrelated. As discussed in the example on "Residual Diagnostics,"
% autocorrelated innovations in the absence of other CLM violations produce
% unbiased, if potentially high variance, OLS estimates of model
% coefficients. The major complication, in that case, is that the usual
% estimator for the standard errors of the coefficients becomes biased.
% (The effects of heteroscedastic innovations are similar, though typically
% less pronounced.) If, however, autocorrelated innovations are combined
% with violations of strict exogeneity, like those produced by AR terms,
% estimates of $\beta$ become both biased and inconsistent.
%
% If lagged innovations $e_{t-k}$ are used as predictors, the nature of the
% estimation process is fundamentally changed, since the innovations cannot
% be directly observed. Estimation requires that the MA terms be inverted
% to form infinite AR representations, and then restricted to produce a
% model that can be estimated in practice. Since restrictions must be
% imposed during estimation, numerical optimization techniques other than
% OLS, such as maximum likelihood estimation (MLE), are required. Models
% with MA terms are considered in the example on "Lag Order Selection."

%% Simulating Estimator Bias
%
% To illustrate the estimator bias introduced by lagged endogenous
% predictors, consider the following DGP:
%
% $$y_t = \beta_0 y_{t-1} + e_t,$$
%
% $$e_t = \gamma_0 e_{t-1} + \delta_t,$$
%
% $$\delta_t \sim N(0,\sigma^2).$$
% 
% We run two sets of repeated Monte Carlo simulations of the model. The
% first set uses normally and independently distributed (NID) innovations
% with $\gamma_0 = 0$. The second set uses AR(1) innovations with
% $|\gamma_0| > 0$ .

% Build the model components:
beta0 = 0.9;  % AR(1) parameter for y_t
gamma0 = 0.2; % AR(1) parameter for e_t
AR1 = arima('AR',beta0,'Constant',0,'Variance',1);
AR2 = arima('AR',gamma0,'Constant',0,'Variance',1);

% Simulation sample sizes:
T = [10,50,100,500,1000];
numSizes = length(T);

% Run the simulations:
numObs = max(T); % Length of simulation paths
numPaths = 1e4;  % Number of simulation paths
burnIn = 100;    % Initial transient period, to be discarded
sigma = 2.5;     % Standard deviation of the innovations
E0 = sigma*randn(burnIn+numObs,numPaths,2); % NID innovations
E1Full = E0(:,:,1);
Y1Full = filter(AR1,E1Full); % AR(1) process with NID innovations
E2Full = filter(AR2,E0(:,:,2));
Y2Full = filter(AR1,E2Full); % AR(1) process with AR(1) innovations
clear E0

% Extract simulation data, after transient period:
Y1 = Y1Full(burnIn+1:end,:);  % Y1(t)
LY1 = Y1Full(burnIn:end-1,:); % Y1(t-1)
Y2 = Y2Full(burnIn+1:end,:);  % Y2(t)
LY2 = Y2Full(burnIn:end-1,:); % Y2(t-1)
clear Y1Full Y2Full

% Compute OLS estimates of beta0:
BetaHat1 = zeros(numSizes,numPaths);
BetaHat2 = zeros(numSizes,numPaths);
for i = 1:numSizes
    
    n = T(i);
    
    for j = 1:numPaths
        BetaHat1(i,j) = LY1(1:n,j)\Y1(1:n,j);
        BetaHat2(i,j) = LY2(1:n,j)\Y2(1:n,j);
    end
    
end

% Set plot domains:
w1 = std(BetaHat1(:));
x1 = (beta0-w1):(w1/1e2):(beta0+w1);
w2 = std(BetaHat2(:));
x2 = (beta0-w2):(w2/1e2):(beta0+w2);

% Create figures and plot handles: 
hFig1 = figure;
hold on
hPlots1 = zeros(numSizes,1);
hFig2 = figure;
hold on
hPlots2 = zeros(numSizes,1);

% Plot estimator distributions:
colors = winter(numSizes);    
for i = 1:numSizes

    c = colors(i,:);
    
    figure(hFig1);
    f1 = ksdensity(BetaHat1(i,:),x1);    
    hPlots1(i) = plot(x1,f1,'Color',c,'LineWidth',2);
    
    figure(hFig2);
    f2 = ksdensity(BetaHat2(i,:),x2);    
    hPlots2(i) = plot(x2,f2,'Color',c,'LineWidth',2);
    
end

% Annotate plots:
figure(hFig1)
hBeta1 = line([beta0 beta0],[0 (1.1)*max(f1)],'Color','c','LineWidth',2);
xlabel('Estimate')
ylabel('Density')
title(['{\bf OLS Estimates of \beta_0 = ',num2str(beta0,2),', NID Innovations}'])
legend([hPlots1;hBeta1],[strcat({'T = '},num2str(T','%-d'));['\beta_0 = ',num2str(beta0,2)]])
axis tight
grid on
hold off

figure(hFig2)
hBeta2 = line([beta0 beta0],[0 (1.1)*max(f2)],'Color','c','LineWidth',2);
xlabel('Estimate')
ylabel('Density')
title(['{\bf OLS Estimates of \beta_0 = ',num2str(beta0,2),', AR(1) Innovations}'])
legend([hPlots2;hBeta2],[strcat({'T = '},num2str(T','%-d'));['\beta_0 = ',num2str(beta0,2)]])
axis tight
grid on
hold off

%%
% In all of the simulations above, $\beta_0 = 0.9$. The plots are the
% distributions of $\hat{\beta_0}$ across multiple simulations of each
% process, showing the bias and variance of the OLS estimator as a function
% of sample size.
%
% The skew of the distributions makes it difficult to visually assess their
% centers. Bias is defined as $E[\hat{\beta_0}] - \beta_0$, so we use the
% mean to measure the aggregate estimate. In the case of NID innovations, a
% relatively small negative bias disappears asymptotically as the aggregate
% estimates increase monotonically toward $\beta_0$:

AggBetaHat1 = mean(BetaHat1,2);
fprintf('%-6s%-6s\n','Size','Mean1')
for i = 1:numSizes
    fprintf('%-6u%-6.4f\n',T(i),AggBetaHat1(i))
end

%%
% In the case of AR(1) innovations, aggregate estimates with a negative
% bias in small samples increase monotonically toward $\beta_0$, as above,
% but then pass through the DGP value at moderate sample sizes, and become
% progressively more positively biased in large samples:

AggBetaHat2 = mean(BetaHat2,2);
fprintf('%-6s%-6s\n','Size','Mean2')
for i = 1:numSizes
    fprintf('%-6u%-6.4f\n',T(i),AggBetaHat2(i))
end

%%
% The inconsistency of the OLS estimator in the presence of autocorrelated
% innovations is widely known among econometricians. That it nevertheless
% gives accurate estimates for a range of sample sizes has practical
% consequences that are less widely appreciated. We descibe this behavior
% further in the section on "Dynamic & Correlation Effects."
%
% The principal difference between the two sets of simulations above, in
% terms of OLS estimation, is whether or not there is a delay in the
% interaction between the innovations and the predictor. In the AR(1)
% process with NID innovations, the predictor $y_{t-1}$ is
% contemporaneously uncorrelated with $e_t$, but correlated with all of the
% previous innovations, as described earlier. In the AR(1) process with
% AR(1) innovations, the predictor $y_{t-1}$ becomes correlated with $e_t$
% as well, through the autocorrelation between $e_t$ and $e_{t-1}$.
%
% To see these relationships, we compute the correlation coefficients
% between $y_{t-1}$ and both $e_t$ and $e_{t-1}$, respectively, for each
% process:

% Extract innovations data, after transient period:
E1 = E1Full(burnIn+1:end,:);  % E1(t)
LE1 = E1Full(burnIn:end-1,:); % E1(t-1) 
E2 = E2Full(burnIn+1:end,:);  % E2(t)
LE2 = E2Full(burnIn:end-1,:); % E2(t-1)
clear E1Full E2Full

% Preallocate for correlation coefficients:
CorrE1 = zeros(numSizes,numPaths);
CorrLE1 = zeros(numSizes,numPaths);
CorrE2 = zeros(numSizes,numPaths);
CorrLE2 = zeros(numSizes,numPaths);

% Compute correlation coefficients:
for i = 1:numSizes
    
    n = T(i);
    
    for j = 1:numPaths
        
        % With NID innovations:
        CorrE1(i,j) = corr(LY1(1:n,j),E1(1:n,j));
        CorrLE1(i,j) = corr(LY1(1:n,j),LE1(1:n,j));
        
        % With AR(1) innovations
        CorrE2(i,j) = corr(LY2(1:n,j),E2(1:n,j));
        CorrLE2(i,j) = corr(LY2(1:n,j),LE2(1:n,j));
        
    end
end

% Set plot domains:
sigmaE1 = std(CorrE1(:));
muE1 = mean(CorrE1(:));
xE1 = (muE1-sigmaE1):(sigmaE1/1e2):(muE1+sigmaE1);
sigmaLE1 = std(CorrLE1(:));
muLE1 = mean(CorrLE1(:));
xLE1 = (muLE1-sigmaLE1/2):(sigmaLE1/1e3):muLE1;
sigmaE2 = std(CorrE2(:));
muE2 = mean(CorrE2(:));
xE2 = (muE2-sigmaE2):(sigmaE2/1e2):(muE2+sigmaE2);
sigmaLE2 = std(CorrLE2(:));
muLE2 = mean(CorrLE2(:));
xLE2 = (muLE2-sigmaLE2):(sigmaLE2/1e2):(muLE2+sigmaLE2);

% Create figures and plot handles:
hFigE1 = figure;
hold on
hPlotsE1 = zeros(numSizes,1);
hFigLE1 = figure;
hold on
hPlotsLE1 = zeros(numSizes,1);
hFigE2 = figure;
hold on
hPlotsE2 = zeros(numSizes,1);
hFigLE2 = figure;
hold on
hPlotsLE2 = zeros(numSizes,1);

% Plot correlation coefficient distributions:
colors = copper(numSizes);    
for i = 1:numSizes

    c = colors(i,:);
    
    figure(hFigE1)
    fE1 = ksdensity(CorrE1(i,:),xE1);    
    hPlotsE1(i) = plot(xE1,fE1,'Color',c,'LineWidth',2);
    
    figure(hFigLE1)
    fLE1 = ksdensity(CorrLE1(i,:),xLE1);    
    hPlotsLE1(i) = plot(xLE1,fLE1,'Color',c,'LineWidth',2);
        
    figure(hFigE2)
    fE2 = ksdensity(CorrE2(i,:),xE2);    
    hPlotsE2(i) = plot(xE2,fE2,'Color',c,'LineWidth',2);
    
    figure(hFigLE2)
    fLE2 = ksdensity(CorrLE2(i,:),xLE2);    
    hPlotsLE2(i) = plot(xLE2,fLE2,'Color',c,'LineWidth',2);
    
end

clear CorrE1 CorrLE1 CorrE2 CorrLE2

% Annotate plots:
figure(hFigE1)
xlabel('Correlation Coefficient')
ylabel('Density')
title('{\bf Sample Correlation of {\it y_{t-1}} and NID {\it e_t}}')
legend(hPlotsE1,strcat({'T = '},num2str(T','%-d')),'Location','NW')
axis tight
grid on
ylim([0 (1.1)*max(fE1)])
hold off

figure(hFigLE1)
xlabel('Correlation Coefficient')
ylabel('Density')
title('{\bf Sample Correlation of {\it y_{t-1}} and NID {\it e_{t-1}}}')
legend(hPlotsLE1,strcat({'T = '},num2str(T','%-d')),'Location','NW')
axis tight
grid on
ylim([0 (1.1)*max(fLE1)])
hold off

figure(hFigE2)
xlabel('Correlation Coefficient')
ylabel('Density')
title('{\bf Sample Correlation of {\it y_{t-1}} and AR(1) {\it e_t}}')
legend(hPlotsE2,strcat({'T = '},num2str(T','%-d')),'Location','NW')
axis tight
grid on
ylim([0 (1.1)*max(fE2)])
hold off

figure(hFigLE2)
xlabel('Correlation Coefficient')
ylabel('Density')
title('{\bf Sample Correlation of {\it y_{t-1}} and AR(1) {\it e_{t-1}}}')
legend(hPlotsLE2,strcat({'T = '},num2str(T','%-d')),'Location','NW')
axis tight
grid on
ylim([0 (1.1)*max(fLE2)])
hold off

%%
% The plots show correlation between $y_{t-1}$ and $e_{t-1}$ in both cases.
% Contemporaneous correlation between $y_{t-1}$ and $e_t$, however,
% persists asymptotically only in the case of AR(1) innovations.
%
% The correlation coefficient is the basis for standard measures of
% autocorrelation. The plots above highlight the bias and variance of the
% correlation coefficient in finite samples, which complicates the
% practical evaluation of autocorrelations in model residuals. Correlation
% measures were examined extensively by Fisher ([3],[4],[5]), who suggested
% a number of alternatives.
%
% Using biased estimates of $\beta_0$ to estimate $\gamma_0$ in the
% residuals is also biased [11]. As described earlier, OLS residuals
% in the case of AR(1) innovations do not accurately represent process
% innovations, because of the tendency for $\hat{\beta_0}$ to absorb the
% systematic impact produced by autocorrelated disturbances.
%
% To further complicate matters, the Durbin-Watson statistic, popularly
% reported as a measure of the degree of first-order autocorrelation, is
% biased against detecting any relationship between $\hat{e_t}$ and
% $\hat{e}_{t-1}$ in exactly the AR models where such a relationship is
% present. The bias is twice as large as the bias in $\hat{\beta_0}$
% [8].
%
% Thus, OLS can persistently overestimate $\beta_0$ while standard measures
% of residual autocorrelation underestimate the conditions that lead to
% inconsistency. This produces a distorted sense of goodness of fit, and a
% misrepresentation of the significance of dynamic terms. Durbin's $h$ test
% is similarly ineffective in this context [7]. Durbin's $m$ test, or
% the equivalent Breusch-Godfrey test, are often preferred [1].

%% Approximating Estimator Bias
%
% In practice, the process that produces a time series must be discovered
% from the available data, and this analysis is ultimately limited by the
% loss of confidence that comes with estimator bias and variance. Sample
% sizes for economic data are often at the lower end of those considered in
% the simulations above, so inaccuracies can be significant. Effects on the
% forecast performance of autoregressive models can be severe.
%
% For simple AR models with simple innovations structures, approximations
% of the OLS estimator bias are obtained theoretically. These formulas are
% useful when assessing the reliability of AR model coefficients derived
% from a single data sample.
%
% In the case of NID innovations, we can compare the simulation bias with
% the widely-used approximate value of [11],[13]:
%
% $$E[\hat{\beta_0}] - \beta_0 \approx -2\beta_0/T.$$

m = min(T);
M = max(T);
eBias1 = AggBetaHat1-beta0; % Estimated bias
tBias1 = -2*beta0./T;       % Theoretical bias
eB1interp = interp1(T,eBias1,m:M,'pchip');
tB1interp = interp1(T,tBias1,m:M,'pchip');

figure
plot(T,eBias1,'ro','LineWidth',2)
hold on
he1 = plot(m:M,eB1interp,'r','LineWidth',2);
plot(T,tBias1,'bo')
ht1 = plot(m:M,tB1interp,'b');
hold off
legend([he1 ht1],'Simulated Bias','Approximate Theoretical Bias','Location','E')
xlabel('Sample Size')
ylabel('Bias')
title('{\bf Estimator Bias, NID Innovations}')
grid on

%%
% The approximation is reasonably reliable in even moderately-sized
% samples, and generally improves as $\beta_0$ decreases in absolute value.
%
% In the case of AR(1) innovations, the bias depends on both $\beta_0$ and
% $\gamma_0$. Asymptotically, it is approximated by [6]:
%
% $$E[\hat{\beta_0}] - \beta_0 \approx \gamma_0(1 - \beta_0^2)/(1 + \gamma_0\beta_0).$$

eBias2 = AggBetaHat2-beta0; % Estimated bias
tBias2 = gamma0*(1-beta0^2)/(1+gamma0*beta0); % Asymptotic bias
eB2interp = interp1(T,eBias2,m:M,'pchip');

figure
plot(T,eBias2,'ro','LineWidth',2)
hold on
he2 = plot(m:M,eB2interp,'r','LineWidth',2);
ht2 = plot(0:M,repmat(tBias2,1,M+1),'b','LineWidth',2);
hold off
legend([he2 ht2],'Simulated Bias','Approximate Asymptotic Bias','Location','E')
xlabel('Sample Size')
ylabel('Bias')
title('{\bf Estimator Bias, AR(1) Innovations}')
grid on

%%
% Here we see the bias move from negative to positive values as the sample
% size increases, then eventually approach the asymptotic bound. There is a
% range of sample sizes, from about 25 to 100, where the absolute value of
% the bias is below 0.02. In such a "sweet spot," the OLS estimator may
% outperform alternative estimators designed to specifically account for
% the presence of autocorrelation. We descibe this behavior further in the
% section on "Dynamic & Correlation Effects."
%
% It is useful to plot the approximate asymptotic bias in $\hat{\beta_0}$
% as a function of both $\beta_0$ and $\gamma_0$, to see the affect of
% varying the degree of autocorrelation in both $y_t$ and $e_t$:

figure
beta = -1:0.05:1;
gamma = -1:0.05:1;
[Beta,Gamma] = meshgrid(beta,gamma);
hold on
surf(Beta,Gamma,Gamma.*(1-Beta.^2)./(1+Gamma.*Beta))
fig = gcf;
CM = fig.Colormap;
numC = size(CM,1);
zL = zlim;
zScale = zL(2)-zL(1);
iSim = (tBias2-zL(1))*numC/zScale;    
cSim = interp1(1:numC,CM,iSim);
hSim = plot3(beta0,gamma0,tBias2,'ko','MarkerSize',8,'MarkerFaceColor',cSim);
view(-20,20)
ax = gca;
u = ax.XTick;
v = ax.YTick;
mesh(u,v,zeros(length(v),length(u)),'FaceAlpha',0.7,'EdgeColor','k','LineStyle',':')
hold off
legend(hSim,'Simulated Model','Location','Best')
xlabel('\beta_0')
ylabel('\gamma_0')
zlabel('Bias')
title('{\bf Approximate Asymptotic Bias}')
camlight
colorbar
grid on

%%
% The asymptotic bias becomes significant when $\beta_0$ and $\gamma_0$
% move in opposite directions away from zero autocorrelation. Of course,
% the bias may be considerably less in finite samples.

%% Dynamic and Correlation Effects
%
% As discussed, the challenges of using OLS for dynamic model estimation
% arise from violations of CLM assumptions. Two violations are critical,
% and we discuss their effects here in more detail.
%
% The first is the _dynamic effect_, caused by the correlation between the
% predictor $y_{t-1}$ and all of the previous innovation $e_{t-k}$. This
% occurs in any AR model, and results in biased OLS estimates from finite
% samples. In the absence of other violations, OLS nevertheless remains
% consistent, and the bias disappears in large samples.
%
% The second is the _correlation effect_, caused by the contemporaneous
% correlation between the predictor $y_{t-1}$ and the innovation $e_t$.
% This occurs when the innovations process is autocorrelated, and results
% in the OLS coefficient of the predictor receiving too much, or too
% little, credit for contemporaneous variations in the response, depending
% on the sign of the correlation. That is, it produces a persistent bias.
% 
% The first set of simulations above illustrate a situation in which
% $\beta_0$ is positive and $\gamma_0$ is zero. The second set of
% simulations illustrate a situation in which both $\beta_0$ and $\gamma_0$
% are positive. For positive $\beta_0$, the dynamic effect on
% $\hat{\beta_0}$ is negative. For positive $\gamma_0$, the correlation
% effect on $\hat{\beta_0}$ is positive. Thus, in the first set of
% simulations there is a negative bias across sample sizes. In the second
% set of simulations, however, there is a competition between the two
% effects, with the dynamic effect dominating in small samples, and the
% correlation effect dominating in large samples.
%
% Positive AR coefficients are common in econometric models, so it is
% typical for the two effects to offset each other, creating a range of
% sample sizes for which the OLS bias is significantly reduced. The width
% of this range depends on $\beta_0$ and $\gamma_0$, and determines the
% _OLS-superior range_ in which OLS outperforms alternative estimators
% designed to directly account for autocorrelations in the innovations.
%
% Some of the factors affecting the size of the dynamic and correlation
% effects are summarized in [9]. Among them:
%
% *Dynamic Effect*
%
% $\bullet$ Increases with decreasing sample size.
%
% $\bullet$ Decreases with increasing $\beta_0$ if the variance of the
% innovations is held fixed.
%
% $\bullet$ Decreases with increasing $\beta_0$ if the variance of the
% innovations is adjusted to maintain a constant $R^2$.
%
% $\bullet$ Increases with the variance of the innovations.
%
% *Correlation Effect*
%
% $\bullet$ Increases with increasing $\gamma_0$, at a decreasing rate.
%
% $\bullet$ Decreases with increasing $\beta_0$, at an increasing rate.
%
% The influence of these factors can be tested by changing the coefficients
% in the simulations above. In general, the larger the dynamic effect and
% the smaller the correlation effect, the wider the OLS-superior range.

%% Jackknife Bias Reduction
%
% The _jackknife_ procedure is a cross-validation technique commonly used
% to reduce the bias of sample statistics. Jacknife estimators of model
% coefficients are relatively easy to compute, without the need for
% large simulations or resampling.
%
% The basic idea is to compute an estimate from the full sample _and_ from
% a sequence of subsamples, then combine the estimates in a manner that
% eliminates some portion of the bias. In general, for a sample of size
% $T$, the bias of the OLS estimator $\hat{\beta_0}$ can be expressed as an
% expansion in powers of $T^{-1}$:
%
% $$E(\hat{\beta_0}) - \beta_0 = {w_1 \over T} + {w_2 \over {T^2}} + O(T^{-3}),$$
%
% where the weights $w_1$ and $w_2$ depend on the specific coefficient and
% model. If estimates $\hat{\beta_i}$ are made on a sequence $i = 1, ...,
% m$ of subsamples of length $l = O(T)$, then the jackknife estimator of
% $\beta_0$ is:
%
% $$\hat{\beta_J} = \left({T \over {T-l}}\right)\hat{\beta_0} - \left({l \over {T-l}}\right){1 \over m}\sum_{i=1}^m \hat{\beta_i}.$$
%
% It can be shown that the jackknife estimator satisfies:
%
% $$E(\hat{\beta_0}) - \beta_0 = O(T^{-2}),$$
%
% thus removing the $O(T^{-1})$ term from the bias expansion. Whether the
% bias is actually reduced depends on the size of the remaining terms in
% the expansion, but jackknife estimators have performed well in practice.
% In particular, the technique is robust with respect to nonnormal
% innovations, ARCH effects, and various model misspecifications
% [2].
%
% The Statistics and Machine Learning Toolbox(TM) function |jackknife|
% implements a jackknife procedure using a systematic sequence of
% "leave-one-out" subsamples. For time series, deleting observations alters
% the autocorrelation structure. To maintain the dependence structure in a
% time series, a jackknife procedure must use nonoverlapping subsamples,
% such as partitions or moving blocks.
%
% The following implements a simple jackknife estimate of $\hat{\beta_0}$
% using a partition of the data in each of the simulations to produce the
% subsample estimates $\hat{\beta_i}$. We compare the performance before
% and after jackknifing, on the simulated data with either NID or AR(1)
% innovations:

m = 5; % Number of subsamples

% Preallocate memory:
betaHat1 = zeros(m,1); % Subsample estimates, NID innovations
betaHat2 = zeros(m,1); % Subsample estimates, AR(1) innovations
BetaHat1J = zeros(numSizes,numPaths); % Jackknife estimates, NID innovations
BetaHat2J = zeros(numSizes,numPaths); % Jackknife estimates, AR(1) innovations

% Compute jackknife estimates:
for i = 1:numSizes
    
    n = T(i); % Sample size
    l = n/m;  % Length of partition subinterval
                
    for j = 1:numPaths
        
        for s = 1:m
              
            betaHat1(s) = LY1((s-1)*l+1:s*l,j)\Y1((s-1)*l+1:s*l,j);
            betaHat2(s) = LY2((s-1)*l+1:s*l,j)\Y2((s-1)*l+1:s*l,j);
            
            BetaHat1J(i,j) = (n/(n-l))*BetaHat1(i,j)-(l/((n-l)*m))*sum(betaHat1);
            BetaHat2J(i,j) = (n/(n-l))*BetaHat2(i,j)-(l/((n-l)*m))*sum(betaHat2);
            
        end
        
    end
    
end

clear BetaHat1 BetaHat2

% Display mean estimates, before and after jackknifing:
AggBetaHat1J = mean(BetaHat1J,2);
clear BetaHat1J
fprintf('%-6s%-8s%-8s\n','Size','Mean1','Mean1J')
for i = 1:numSizes
    fprintf('%-6u%-8.4f%-8.4f\n',T(i),AggBetaHat1(i),AggBetaHat1J(i))
end

fprintf('\n')

AggBetaHat2J = mean(BetaHat2J,2);
clear BetaHat2J
fprintf('%-6s%-8s%-8s\n','Size','Mean2','Mean2J')
for i = 1:numSizes
    fprintf('%-6u%-8.4f%-8.4f\n',T(i),AggBetaHat2(i),AggBetaHat2J(i))
end

%%
% The number of subsamples, $m = 5$, is chosen with the smallest sample
% size, $n = 10$, in mind. Larger $m$ may improve performance in larger
% samples, but there is no accepted heuristic for choosing the subsample
% sizes, so some experimentation is necessary. The code is easily adapted
% to use alternative subsampling methods, such as moving blocks.
%
% The results show a uniform reduction in bias for the case of NID
% innovations. In the case of AR(1) innovations, the procedure seems to
% push the estimate more quickly through the OLS-superior range.

%% Summary
%
% This example shows a simple AR model, together with a few simple
% innovations structures, as a way of illustrating some general issues
% related to the estimation of dynamic models. The code here is easily
% modified to observe the effects of changing parameter values, adjusting
% the innovations variance, using different lag structures, and so on.
% Explanatory DL terms can also be added to the models. DL terms have the
% ability to reduce estimator bias, though OLS tends to overestimate AR
% coefficients at the expense of the DL coefficients [11]. The general
% set-up here allows for a great deal of experimentation, as is often
% required when evaluating models in practice.
%
% When considering the trade-offs presented by the bias and variance of any
% estimator, it is important to remember that biased estimators with
% reduced variance may have superior mean-squared error characteristics
% when compared to higher-variance unbiased estimators. A strong point of
% the OLS estimator, beyond its simplicity in computation, is its relative
% efficiency in reducing its variance with increasing sample size. This is
% often enough to adopt OLS as the estimator of choice, even for dynamic
% models. Another strong point, as this example has shown, is the presence
% of an OLS-superior range, where OLS may outperform other estimators, even
% under what are generally regarded as adverse conditions. The weakest
% point of the OLS estimator is its performance in small samples, where the
% bias and variance may be unacceptable.
%
% The estimation issues raised in this example suggest the need for new
% indicators of autocorrelation, and more robust estimation methods to be
% used in its presence. Some of these methods are described in the example
% on "Generalized Least Squares & HAC Estimators." However, as we have
% seen, the inconsistency of the OLS estimator for AR models with
% autocorrelation is not enough to rule it out, in general, as a viable
% competitor to more complicated, consistent estimators such as maximum
% likelihood, feasible generalized least squares, and instrumental
% variables, which attempt to eliminate the correlation effect, but do not
% alter the dynamic effect. The best choice will depend on the sample size,
% the lag structure, the presence of exogenous variables, and so on, and
% often requires the kinds of simulations presented in this example.

%% References
%
% [1] Breusch, T.S., and L. G. Godfrey. "A Review of Recent Work on Testing
% for Autocorrelation in Dynamic Simultaneous Models." In Currie, D., R.
% Nobay, and D. Peel (Eds.), _Macroeconomic Analysis: Essays in
% Macroeconomics and Econometrics._ London: Croom Helm, 1981.
%
% [2] Chambers, M. J. "Jackknife Estimation of Stationary Autoregressive
% Models." University of Essex Discussion Paper No. 684, 2011.
% 
% [3] Fisher, R. A.. "Frequency Distribution of the Values of the
% Correlation Coefficient in Samples from an Indefinitely Large
% Population." _Biometrika_. Vol 10, 1915, pp. 507-521.
% 
% [4] Fisher, R. A. "On the "Probable Error" of a Coefficient of
% Correlation Deduced from a Small Sample." _Metron_. Vol 1, 1921, pp. 3-32.
% 
% [5] Fisher, R. A. "The Distribution of the Partial Correlation
% Coefficient." _Metron_. Vol 3, 1924, pp. 329-332.
%
% [6] Hibbs, D. "Problems of Statistical Estimation and Causal Inference in
% Dynamic Time Series Models." In Costner, H. (Ed.) _Sociological
% Methodology_. San Francisco: Jossey-Bass, 1974.
%
% [7] Inder, B. A. "Finite Sample Power of Tests for Autocorrelation in
% Models Containing Lagged Dependent Variables." _Economics Letters_. Vol
% 14, 1984, pp.179-185.
%
% [8] Johnston, J. _Econometric Methods_. New York: McGraw-Hill, 1972.
%
% [9] Maeshiro, A. "Teaching Regressions with a Lagged Dependent Variable
% and Autocorrelated Disturbances." _Journal of Economic Education_. Vol
% 27, 1996, pp. 72-84.
%
% [10] Maeshiro, A. "An Illustration of the Bias of OLS for $Y_t = \lambda
% Y_{t-1} + U_t$." _Journal of Economic Education_. Vol 31, 2000, pp.
% 76-80.
%
% [11] Malinvaud, E. _Statistical Methods of Econometrics_. Amsterdam:
% North-Holland, 1970.
%
% [12] Marriott, F. and J. Pope. "Bias in the Estimation of
% Autocorrelations." _Biometrika_. Vol 41, 1954, pp. 390-402.
%
% [13] White, J. S. "Asymptotic Expansions for the Mean and Variance of the
% Serial Correlation Coefficient." _Biometrika_. Vol 48, 1961, pp. 85-94.