www.gusucode.com > econ 案例源码程序 matlab代码 > econ/Demo_TSReg9.m

    %% Time Series Regression IX: Lag Order Selection
%
% This example shows how to select statistically significant predictor
% histories for multiple linear regression models. It is the ninth in a
% series of examples on time series regression, following the presentation
% in previous examples.

% Copyright 2012-2014 The MathWorks, Inc.

%% Introduction
%
% Predictors in dynamic regression models may include lagged values of
% exogenous explanatory variables (distributed lag, or DL, terms), lagged
% values of endogenous response variables (autoregressive, or AR, terms),
% or both. Lagged values of an innovations process (moving average, or MA,
% terms) may have economic significance, representing the persistence of
% shocks, but they are most often included to offset the need for
% additional DL or AR terms. (See the example on "Lagged Variables and OLS
% Estimator Bias.")
%
% Ideally, economic theory suggests which lags to include in a model.
% Often, however, the delay between predictor changes and corresponding
% response changes must be discovered through data analysis. A common
% modeling approach is to include the history of a predictor at times _t_ -
% 1, _t_ - 2, _t_ - 3, ..., _t - p_, with the assumption that significant
% effects on the current response are only produced by recent changes in
% the predictor. Specification analysis then considers extending or
% restricting the lag structure, and finally selecting an appropriate _lag
% order p_.
%
% This example investigates strategies for lag order selection. Although
% details depend on the data and the modeling context, a common goal is to
% identify a concise, easy-to-interpret description of the data-generating
% process (DGP) that leads to accurate estimation and reliable forecasting.
%
% We begin by loading relevant data from previous examples in this series:

load Data_TSReg8

%% Basic Tests
%
% The classical, normal linear model (CNLM), introduced in the example on
% "Linear Models," filters data to generate white noise residuals.
% Econometric models do not always aspire to such a thorough statistical
% description of the DGP, especially when predictors are dictated by
% theory or policy, and modeling goals are focused on specific effects.
% Nevertheless, departures from the CNLM, and their degree, are common
% measures of model misspecification.
%
% At any point in the model specification process, residuals may display
% nonnormality, autocorrelation, heteroscedasticity, and other violations
% of CNLM assumptions. As predictors are added or removed, models can be
% evaluated by the relative improvement in the quality of the residuals.
% Tests for model fit via residual analysis are described in the example on
% "Residual Diagnostics."
%
% Model specification must also consider the statistical significance of
% predictors, to avoid over-fitting in the service of residual whitening,
% and to produce a parsimonius representation of the DGP. Basic tests
% include the _t_-test, which evaluates the significance of individual
% predictors, and the _F_-test, which is used to evaluate the joint
% significance of, say, an entire lag structure. These tests are usually
% used together, since a predictor with an insignificant individual effect
% may still contribute to a significant joint effect.
%
% Many lag order selection procedures use these basic tests to evaluate
% extensions and restrictions of an initial lag specification. Good
% econometric practice suggests careful evaluation of each step in the
% process. Econometricians must judge statistical decisions in the context
% of economic theory and model assumptions. Automated procedures are
% discussed in the example on "Predictor Selection," but it is often
% difficult to completely automate the identification of a useful lag
% structure. We take a more "manual" approach in this example. Of course,
% the reliability of any such procedure depends critically on the
% reliability of the underlying tests.
%
% Consider the basic model of credit defaults introduced in the example on
% "Linear Models":

M0

%%
% Based on the _p_-values of the _t_-statistics, |AGE| is the most
% significant individual risk factor (positive coefficient) for the default
% rates measured by the response |IGD|. |AGE| represents the percentage of
% investment-grade bond issuers first rated 3 years ago. Defaults often
% occur after this period, when capital from an initial issue is expended,
% but they may occur sooner or later. It seems reasonable to consider
% models that include lags or leads of |AGE|.
%
% The fit of |M0| is based on only 21 observations, and the 5 coefficients
% already estimated leave only 16 degrees of freedom for further fitting.
% Extended lag structures, and the corresponding reduction in sample size,
% would bring into question the validity of diagnostic statistics.
%
% For reference, we create tables and fit models with |AGE| lag orders of
% 1, 2, 3, 4, and 5:

% Lagged data:

AGE = DataTable.AGE;
maxLag = 5;
lags = 1:maxLag;
AGELags = lagmatrix(AGE,lags);
lagNames = strcat({'AGELag'},num2str(lags','%-d'));
AGELags = array2table(AGELags,'VariableNames',lagNames);

% Preallocate tables and models:

DTAL = cell(maxLag,1);
MAL = cell(maxLag,1);

% Fit models:

AL = AGELags;
DT = DataTable;

for lagOrder = lags
    
    lagRange = 1:lagOrder;
    
    % Trim next presample row:    
    
    AL(1,:) = [];
    DT(1,:) = [];

    % Fit model:    
    
    DTAL{lagOrder} = [AL(:,lagRange),DT];
    MAL{lagOrder} = fitlm(DTAL{lagOrder});
    MALName{lagOrder} = strcat('MAL',num2str(lagRange,'%u'));

end

%%
% We begin by considering the model with |AGE| lag order 2, that is, with
% |AGE| data on issuers first rated 3 years ago, and lagged |AGE| data on
% issuers first rated 4 and 5 years ago:

MAL12 = MAL{2}

%%
% The lagged variables reduce the sample size by two. Together with two new
% estimated coefficients, the degrees of freedom are reduced by 4, to 12.
%
% The model fit, as measured by the root-mean-squared error and the
% adjusted $R^2$ statistic (which accounts for the additional predictors), is
% slightly improved relative to |M0|. The significance of the predictors
% already in |M0|, as measured by the _p_ values of their _t_-statistics,
% is reduced. This is typical when predictors are added, unless the new
% predictors are completely insignificant. The overall _F_-statistic shows
% that the extended model has a slightly reduced significance in describing
% the variation of the response.
%
% Of the two new predictors, |AGELag2| appears to be much more significant
% than |AGELag1|. This is difficult to interpret in economic terms, and it
% brings into question the accuracy of the significance measures. Are the
% _p_-values an artifact of the small-sample size? Are they affected by
% violations of the CNLM assumptions that underly ordinary least squares
% (OLS) estimation? In short, do they provide a legitimate reason to modify
% the lag structure? Obtaining reliable answers to these questions in
% realistically-sized economic data samples is often a challenge.
%
% The distribution of any diagnostic statistic depends on the distribution
% of process innovations, as exhibited in the model residuals. For _t_ and
% _F_ tests, normal innovations are sufficinet to produce test statistics
% with _t_ and _F_ distributions in finite samples. If the innovations
% depart from normality, however, statistics may not follow these expected
% distributions. Tests suffer from size distortions, where nominal
% significance levels misrepresent the actual frequency of rejecting the
% null hypothesis. When this happens, inferences about predictor
% significance become unreliable.
%
% This is a fundamental problem in specification analysis, since at any
% stage in the process, candidate models are likely to be misspecified, and
% data incompletely filtered. Tests results must be considered in the
% context of the residual series. A normal probability plot of the |MAL12|
% residuals shows some reason to suspect the reported _p_-values:

resAL12 = MAL12.Residuals.Raw;

normplot(resAL12)
xlabel('Residual')
title('{\bf MAL12 Residuals}')

%%
% A common belief is that _t_ and _F_ tests are robust to nonnormal
% innovations. To some degree, this is true. Innovations from _elliptically
% symmetric distributions_, such as _t_ distributions, which generalize the
% multivariate normal distribution, produce _t_ and _F_ statistics that
% follow _t_ and _F_ distributions in finite samples [12]. This result,
% however, assumes a diagonal covariance structure. When innovations
% exhibit heteroscedasticity and autocorrelation, standard _t_ and _F_
% tests become much less robust [5], [16]. Size distortions can be
% substantial in finite samples. Practically, however, the nature of the
% innovations distribution, and the degree of distortion, can be difficult
% to measure.

%% Robust Tests
%
% _t_ and _F_ statistics involve both coefficient estimates and their
% standard errors. In the presence of heteroscedasticity or
% autocorrelation, OLS coefficient estimates remain unbiased, provided the
% predictors are exogenous. Standard errors, however, when estimated with
% the usual CNLM formulas, are biased.
%
% One response is to form statistics using standard error estimates that
% are robust to nonspherical innovations [2], [3], as in the example on
% "Generalized Least Squares and HAC Estimators." We illustrate this
% strategy here.
%
% The _p_-value of a _t_ statistic is usually computed using Student's _t_
% distribution. For example, for |AGELag2| in |MAL12|:

AGELag2Idx = find(strcmp(MAL12.CoefficientNames,'AGELag2'));
coeff_AGELag2 = MAL12.Coefficients.Estimate(AGELag2Idx);
se_AGELag2 = MAL12.Coefficients.SE(AGELag2Idx);
t_AGELag2 = coeff_AGELag2/se_AGELag2;

dfeAL12 = MAL12.DFE;
p_AGELag2 = 2*(1-tcdf(t_AGELag2,dfeAL12))

%%
% This is the _p_-value reported in the previous display of |MAL12|.
%
% Using heteroscedasticity-consistent (HC), or more general
% heteroscedasticity-and-autocorrelation-consistent (HAC), estimates of the
% standard error, while continuing to assume a Student's _t_ distribution
% for the resulting statistics, leads to very different _p_-values:

% HC estimate:

[~,seHC] = hac(MAL12,'type','HC','weights','HC3','display','off');
se_AGELag2HC = seHC(AGELag2Idx);
t_AGELag2HC = coeff_AGELag2/se_AGELag2HC;

p_AGELag2HC = 2*(1-tcdf(t_AGELag2HC,dfeAL12))

%%

% HAC estimate:

[~,seHAC] = hac(MAL12,'type','HAC','weights','BT','display','off');
se_AGELag2HAC = seHAC(AGELag2Idx);
t_AGELag2HAC = coeff_AGELag2/se_AGELag2HAC;

p_AGELag2HAC = 2*(1-tcdf(t_AGELag2HAC,dfeAL12))

%%
% The HC _p_-value shows |AGELag2| to be relatively insignificant, while
% the HAC _p_-value shows it to be potentially quite significant. The CNLM
% _p_-value lies in-between.
%
% There are a number of problems with drawing reliable inferences from
% these results. First, without a more thorough analysis of the residual
% series (as in the example on "Residual Diagnostics"), there is no reason
% to choose one standard error estimator over the other. Second, the
% standard error estimates are only consistent asymptotically, and the
% sample here is small, even by econometric standards. Third, the
% estimators require several, sometimes arbitrarily selected, _nuisance
% parameters_ (|'weights'|, |'bandwidth'|, |'whiten'|) that can
% significantly change the results, especially in small samples. Finally,
% the newly-constituted _t_ and _F_ statistics, formed with the robust
% standard errors, do _not_ follow _t_ and _F_ distributions in finite
% samples.
%
% In short, the significance estimates here may be no better than the
% traditional ones, based on CNLM assumptions. Modifications of HAC-based
% tests, such as _KVB tests_ [11], are effective in addressing issues with
% individual nuisance parameters, but they do not address the larger
% problem of applying asymptotic techniques to finite samples.

%% Bootstrapped Tests
%
% Another response to size distortions in traditional specification tests
% is _bootstrapping_. The test statistic computed from the original data is
% maintained, but its distribution is re-evaluated through simulated
% resampling, with the goal of producing a more accurate _p_-value.
%
% Resampling data from a population is a standard technique for evaluating
% the distribution of a statistic. The nature of economic time series,
% however, usually makes this impractical. Economies have fixed histories.
% Generating realistic alternative paths, with statistical properties
% similar to empirical data, requires additional assumptions.
%
% In a bootstrapped test, a null model is fit to the available data, and
% the null distribution of residuals is used to approximate the population
% distribution of innovations. The residuals are then resampled, with
% replacement, to generate new residual series. Corresponding bootstrap
% responses are computed using the fixed predictor histories. Finally, the
% new response data is used to refit the alternative (original) model and
% recompute the test statistic. This process is repeated, to generate a
% bootstrapped distribution.
%
% For comparison, we bootstrap the the _t_ statistic of |AGELag2| under the
% null hypothesis that its coefficient is zero. The null model is:

MAL1 = MAL{1}

%%
% |AGELag1|, highly insignificant in the model |MAL12| that includes both
% |AGELag1| and |AGELag2|, becomes more significant in the absence of
% |AGELag2|, but its role is still minor relative to the predictors in
% |M0|. Its coefficient becomes negative, contrary to our understanding of
% it as a predictor of default risk. The inference may be that |AGELag1| is
% irrelevant. Nevertheless, we maintain it to evaluate the specific
% restriction of |MAL12| to |MAL1|, reducing the lag order by 1:

DTAL1 = DTAL{1};  % Lag order 1 table
DTAL12 = DTAL{2}; % Lag order 2 table (one less observation)

numBoot = 1e3;                           % Number of statistics
res0 = MAL1.Residuals.Raw;               % Bootstrap "population"
[~,IdxBoot] = bootstrp(numBoot,[],res0); % Bootstrap indices
ResBoot = res0(IdxBoot);                 % Bootstrap residuals

IGD0 = DTAL1.IGD - res0; % Residual-free response
IGDB = zeros(size(DTAL12,1),numBoot); % Bootstrap responses

DTBoot = DTAL12;
tBoot = zeros(numBoot,1); % Bootstrap t statistics

for boot = 1:numBoot
    
    IGDBoot = IGD0 + ResBoot(:,boot);
    IGDBoot(1) = []; % Trim to size of DTBoot
    IGDBoot(IGDBoot < 0) = 0; % Set negative default rates to 0
    
    DTBoot.IGD = IGDBoot;
    MBoot = fitlm(DTBoot);
    tBoot(boot) = MBoot.Coefficients.tStat(AGELag2Idx);
    IGDB(:,boot) = IGDBoot;
    
end

%%
% The procedure generates |numBoot| bootstrap responses, which replace the
% original response to the fixed predictor data:

figure
hold on
bootDates = dates(3:end);
hIGD = plot(bootDates,IGDB,'b.');
hIGDEnd = plot(bootDates,IGDB(:,end),'b-');
hIGD0 = plot(bootDates,DTAL12.IGD,'ro-','LineWidth',2);
hold off
xlabel('Date')
ylabel('Default Rate')
title('{\bf Bootstrap Responses}')
legend([hIGD(end),hIGDEnd,hIGD0],'Bootstrap Responses',...
                                 'Typical Bootstrap Response',...
                                 'Empirical Response',...
                                 'Location','NW')

%%
% The bootstrapped _p_-value is not very different from the original
% _p_-value, |p_AGELag2|, found using Student's _t_ distribution:

p_AGELag2

p_AGELag2Boot = sum(tBoot > t_AGELag2)/length(tBoot)

%%
% However, a histogram shows that the distribution of bootstrapped _t_
% statistics has shifted:

figure
hold on

numBins = 50;
hHist = histogram(tBoot,numBins,'Normalization','Probability',...
                                'FaceColor',[.8 .8 1]);

x = hHist.BinLimits(1):0.001:hHist.BinLimits(end);
y = tpdf(x,dfeAL12);
hPDF = plot(x,y*hHist.BinWidth,'m','LineWidth',2);

hStat = plot(t_AGELag2,0,'ro','MarkerFaceColor','r');
line([t_AGELag2 t_AGELag2],1.2*[0 max(hHist.Values)],'Color','r')
axis tight

legend([hHist,hPDF,hStat],'Bootstrap {\it t} Distribution',...
                          'Student''s {\it t} Distribution',...
                          'Original {\it t} Statistic',...
                          'Location','NE')
xlabel('{\it t}')
title('{\bf Bootstrap {\it t} Statistics}')

hold off

%%
% The _t_ statistic is less significant in the bootstrap distribution,
% suggesting the possibile influence of nonspherical innovations on the
% original test.
%
% The bootstrapped test has its own difficulties. To enforce nonnegativity
% of default rates, it is necessary to trim bootstrapped responses with
% negative values. The consequences for inference are unclear. Moreover,
% the bootstrapped test relies, fundamentally, on the assumption that the
% empirical distribution of residuals faithfully represents the relevant
% characteristics of the distribution of innovations in the DGP. In smaller
% samples, this is hard to justify.
%
% There are many variations of the bootstrap. For example, the _wild
% bootstrap_ [7], which combines robust estimation with residual
% resampling, seems to perform well with smaller samples in the presence of
% heteroscedasticity.

%% Likelihood-Based Tests
%
% Versions of _t_ and _F_ tests, formulated using CNLM assumptions, can
% provide reliable inferences in a variety of situations where the
% innovations distribution departs from specification. Likelihood-based
% tests, by contrast, require a formal model of the innovations to operate
% at all. Data likelihoods are typically computed under the assumption of
% independent and normally distributed innovations with fixed variance.
% This underlying model of the DGP can be adjusted to accomodate different
% innovation patterns, including higher probabilities for extreme events,
% but the strong _distributional assumption_ remains.
%
% Like _F_ statistics, data likelihoods (or, in practice, loglikelihoods)
% measure the fit of an entire model, or lag structure, rather than the
% significance of individual model terms. Likelihoods are based on the
% joint probability of the data under the distributional assumption, rather
% than on residual sums of squares. Larger likelihoods indicate a better
% fit, but to evaluate the relative quality of models, the statistical
% significance of likelihood _differences_ must be evaluated.
%
% Consider the normal loglikelihoods of the OLS estimates of |MAL12| and
% its restrictions. We begin by constructing |MAL2|, with only AGELag2, to
% complete the set of restrictions considered:

DTAL2 = [AGELags(:,2),DataTable];
MAL2 = fitlm(DTAL2)

%%

% Unrestricted loglikelihood of MAL12:
uLLOLS = MAL12.LogLikelihood

%%

% Restricted loglikelihoods of M0, MAL1, and MAL2:
rLLOLS = [M0.LogLikelihood;MAL1.LogLikelihood;MAL2.LogLikelihood]

%%
% These are OLS loglikelihoods, based on the residual series. For example,
% the loglikelihood of the data with respect to |M0| is computed using:

resM0 = M0.Residuals.Raw;
MSEM0 = M0.MSE;
muM0 = mean(resM0);
LLM0 = sum(log(normpdf(resM0,muM0,sqrt(MSEM0))))

%%
% Since OLS does not, necessarily, maximize likelihoods, unless CNLM
% assumptions are satisfied, it is possible for restrictions of the model
% parameter space to _increase_ data likelihoods. We see this for the
% restrictions |M0| and |MAL2|. This suggests, once again, a nonnormal
% innovations process.
%
% For comparison, consider measures based on _maximum_ likelihood estimates
% (MLEs) of model coefficients, using the |arima| function. We fit ARMAX
% models with zero-order AR and MA specifications (that is, pure regression
% models):

% Prepare data:

LLOLS = [uLLOLS;rLLOLS];

DataAL12 = table2array(DTAL12);
y = DataAL12(:,7);
X = DataAL12(:,1:6);
PredCols = {1:6,3:6,[1,3:6],2:6};
ModelNames = {'MAL12','M0','MAL1','MAL2'};

% Compute MLEs:

LLMLE = zeros(4,1);
Mdl = arima(0,0,0);
options = optimset('fmincon');
options = optimset(options,'Display','off','Diagnostics','off',...
                           'Algorithm','sqp','TolCon',1e-7);
for model = 1:4
    [~,~,LL] = estimate(Mdl,y,'X',X(:,PredCols{model}),...
                              'print',false,'options',options);
    LLMLE(model) = LL;
end

% Display results:

fprintf('\nLoglikelihoods\n')
fprintf('\n%-8s%-9s%-9s','Model |','OLSLL','MLELL')
fprintf(['\n',repmat('=',1,24)])
for model = 1:4    
    fprintf(['\n%-6s','| ','%-9.4f%-9.4f'],...
            ModelNames{model},LLOLS(model),LLMLE(model))    
end
fprintf('\n')

%%
% In the case of MLEs, all of the restricted models have a reduced
% likelihood of describing the data, as expected. The OLS and MLE measures
% disagree on the model of largest likelihood, with OLS choosing |M0|, and
% MLE choosing |MAL12|.
%
% Are the differences in likelihood significant? For MLEs, this question
% has traditionally been addressed by some version of the likelihood ratio
% test (implemented by |lratiotest|), the Wald test (implemented by
% |waldtest|), or the Lagrange multiplier test (implemented by |lmtest|).
% These are discussed in the example on "The Classical Model
% Misspecification Tests" (CMM tests). The geometry of comparison for CMM
% tests is based, fundamentally, on the optimality of the model
% coefficients. They should not be used with OLS likelihoods unless there
% is evidence that CNLM assumptions are satisfied.
%
% Like _F_ tests, CMM tests are only applicable to comparisons of _nested
% models_, which are restrictions or extensions of each other. This is the
% typical situation when evaluating lag structures. Unlike _F_ tests, CMM
% tests are applicable to comparisons involving nonlinear models, nonlinear
% restrictions, and nonnormal (but fully specified) innovation
% distributions. This is important in certain econometric settings, but
% rarely in lag order selection. A disadvantage of CMM tests is that they
% assign significance to model differences only asymptotically, and so must
% be used with caution in finite samples.
%
% For example, the likelihood ratio test, the most direct assesment of MLE
% likelihood differences, can be used to evaluate the adequacy of various
% restrictions:

% Restrictions of |MAL12| to |M0|, |MAL1|, and |MAL2|:
dof = [2;1;1]; % Number of restrictions
[hHist,pValue] = lratiotest(LLMLE(1),LLMLE(2:4),dof)

%%

% Restrictions of |MAL1| and |MAL2| to |M0|:
dof = [1;1]; % Number of restrictions
[hHist,pValue] = lratiotest(LLMLE(3:4),LLMLE(2),dof)

%%
% At the default 5% significance level, the test fails to reject the null,
% restricted model in favor of the alternative, unrestricted model in all
% cases. That is, the statistical case for including any lag structure is
% weak. The original model, |M0|, might be chosen solely for reasons of
% model parsimony.
%
% An alternative to CMM tests are the various forms of _information
% criteria_ (IC). IC also consider goodness-of-fit, measured by
% likelihoods, but penalize for lack of parsimony, measured by the number
% of model coefficients. As with pure likelihoods, adjusted IC likelihoods
% provide a relative, but not an absolute, measure of model adequacy.
% However, there are no commonly-used hypothesis tests, corresponding to
% CMM tests, for evaluating the significance of IC differences. The main
% advantage, in practice, is that IC can be used to compare non-nested
% models, though this is often irrelevant when comparing lag structures.
%
% The following table compares two common IC, AIC and BIC, together with
% the OLS equivalent, the adjusted $R^2$:

AR2 = [MAL12.Rsquared.Adjusted; M0.Rsquared.Adjusted; ...
       MAL1.Rsquared.Adjusted; MAL2.Rsquared.Adjusted];
   
AIC = [MAL12.ModelCriterion.AIC; M0.ModelCriterion.AIC; ...
       MAL1.ModelCriterion.AIC; MAL2.ModelCriterion.AIC];
   
BIC = [MAL12.ModelCriterion.BIC; M0.ModelCriterion.BIC; ...
       MAL1.ModelCriterion.BIC; MAL2.ModelCriterion.BIC];

fprintf('\nSize-Adjusted Fit\n')
fprintf('\n%-8s%-7s%-9s%-9s','Model |','AR2','AIC','BIC')
fprintf(['\n',repmat('=',1,32)])
for model = 1:4    
    fprintf(['\n%-6s','| ','%-7.4f','%-9.4f','%-9.4f'],...
            ModelNames{model},AR2(model),AIC(model),BIC(model))    
end
fprintf('\n')

%%
% When comparing models, higher adjusted $R^2$, and lower IC, indicate a
% better trade-off between the fit and the reduced degrees of freedom. The
% results show a preferance for including a lag structure when
% evaluated by adjusted $R^2$, but not when evaluated by IC. This kind of
% disagreement is not uncommon, especially with small samples, and further
% suggests the comparative use of multiple testing methods.
%
% BIC, with its typically heavier penalties on additional coefficients,
% tends to choose simpler models, though often not as simple as those
% selected by sequential _t_ and _F_ testing with standard settings. BIC
% has some superior large-sample properties, such as being asymptotically
% consistent, but Monte Carlo studies have shown AIC can outperform BIC in
% correctly identifying the DGP in small data samples [6]. An alternative
% version of AIC, AICc, corrects for small samples, and is especially
% useful in these situations.

%% Testing Up, Testing Down
%
% When attempting to specify significant but parsimonious lag structures in
% econometric models, two general strategies are often used. The first is
% to start with a small model, then test additional lags until their
% individual significance, or the joint significance of the entire lag
% structure, drops below a set level. This is called _testing up_.
% Alternatively, a generous initial lag structure is systematically trimmed
% until the largest lag, or the entire lag structure, becomes significant.
% This is called _testing down_.
% 
% Testing up starts with a parsimonious description of the data, such as a
% static model with contemporaneous values of relevant predictors but no
% dynamic terms. It then proceeds from the specific to the general. Each
% step in the process evaluates the effect of adding a new lag, usually
% using some combination of _t_ tests, _F_ tests, CMM tests, or IC. It
% stops when adding a new lag becomes insignificant at some predetermined
% level. It guarantees, in this way, that the initial model parsimony is
% maintained to some degree.
%
% Acknowledging Occam's razor and principles of scientific method, testing
% up offers a number of advantages. Simple models are less costly to
% compute, easier to interpret and detect misspecification, work better
% with small samples, and are more amenable to generalization. Moreover,
% they often produce better forecasts [10].
%
% However, testing up is often discouraged for lag order selection, and
% economic modeling in general. There is a common scenario where
% significant lags lie beyond the first insignificant one, such as with
% seasonal lags. Automatic testing up will not detect these. Moreover,
% sequential testing in the presence of omitted variables, which have not
% yet been added to the model, creates estimator bias and distortions of
% test sizes and power, ultimately leading to incorrect inferences.
% Omitted variable bias is discussed in the examples on "Spurious
% Regression" and "Lagged Variables and Estimator Bias."
%
% As a consequence, testing down is often recommended [9]. This strategy
% begins with a model that includes all potential explanatory variables.
% That is, it includes a mix of predictors with more or less significance
% in explaining the variation of the response. It then proceeds from the
% general to the specific (sometimes called GETS). Each step in the process
% evaluates the effect of removing a predictor, using the same sorts of
% tests used for testing up. It stops when the restricted model achieves
% some predetermined level of significance.
%
% There are several advantages to this approach. If the initial model and
% lag structure is comprehensive enough, then all of the testing is done,
% at least in principle, in the absence of ommitted variable bias.
% Localized tests, such as tests on the largest lag, can lead to models
% that continue to contain a mix of significant and insignificant lags, but
% since they are all present in the model, they can be examined for joint
% significance. A disadvantage to this approach is the lack of theoretical
% guidance, or even good heuristic tips, when choosing the initial lag
% order in different modeling situations.
%
% The following table shows _p_ values of _t_ statistics on coefficients of
% lagged |AGE| in lag structures of order 1 to 5:

fprintf('\nt Statistic p Values\n')
fprintf('\n%-11s%-5s%-5s%-5s%-5s%-5s','Model    |',...
        'AL1','AL2','AL3','AL4','AL5')
fprintf(['\n',repmat('=',1,35)])
for lag = 1:5
    pVals = MAL{lag}.Coefficients.pValue(2:lag+1);
    fprintf(['\n%-9s','| ',repmat('%-5.2f',1,lag)],...
            MALName{lag},pVals(1:lag))    
end
fprintf('\n')

%%
% At a 15% significance level, testing up from |M0| adds no lags to the
% model, since it fails to reject a zero coefficient for |AgeLag1| in the
% first model tested, |MAL1|. At the same level, testing down from the
% largest model, sequentially evaluating the significance of the largest
% lag, |MAL12| is selected, adding two lags to |M0|. The relative
% significance of specific lags across different lag structures highlights
% the risk of automating these localized evaluations.
%
% _F_ statistics add useful information about joint significance. _F_ tests
% on additional lags, relative to the model with all previous lags, are
% equivalent to _t_ tests, with the same _p_ values. However, _F_ tests of
% an entire lag structure, relative to a static specification, can provide
% hints of significant lags prior to the largest lag. _F_ ratios are
% computed by the |coefTest| method of the |LinearModel| class:

fprintf('\nF Statistic p Values\n')
fprintf('\n%-11s%-5s%-5s','Model    |','Last','All')
fprintf(['\n',repmat('=',1,20)])
for lag = 1:5
    
    % Sequential F test (last lag = 0):
    HSq = [zeros(1,lag),1,zeros(1,4)];
    pSq = coefTest(MAL{lag},HSq);
        
    % Static F test (all lags = 0):
    HSt = [zeros(lag,1),eye(lag),zeros(lag,4)];
    pSt = coefTest(MAL{lag},HSt);

    fprintf(['\n%-9s','| ','%-5.2f','%-5.2f'],MALName{lag},pSq,pSt)
    
end
fprintf('\n')

%%
% The _F_ statistics cannot jump-start a stalled testing-up strategy, but
% when testing down, they do give reason to reconsider the largest model,
% with its relatively lower _p_ value. The increased significance of
% |AgeLag3| and |AgeLag4| in |MAL12345|, indicated by the _t_ statistic _p_
% values, raise the joint significance of that lag structure. Still, the
% most significant overall lag structure is in |MAL12|, consistent with
% testing-down via _t_ statistics.
%
% Predictably, Monte Carlo studies show that automated testing up
% strategies will frequently under-fit when the DGP is a supermodel, and
% testing down will likewise over-fit when the DGP is a submodel [14].
% Performance, in either case, is improved by systematically adjusting
% significance levels to account for varying degrees of freedom. In
% general, however, the statistical consequences of excluding relevant lags
% is usually considered more serious than including irrelevant lags, and
% rejection tolerances must be set accordingly.
%
% In practice, hybrid strategies are usually preferred, moving predictors
% in and out of the model until some measure of fit is optimized, and an
% economically sensible model is obtained. Stepwise regression (described
% in the example on "Predictor Selection") is one way to automate this
% approach. With modern computing power, there is also the possibility, in
% some cases, to exhaustively evaluate all models of relevance. As this
% example illustrates, however, automation of model selection procedures
% must be viewed with some skepticism. The process is necessarily dynamic,
% tests have varying degrees of relevance, and decisions ultimately require
% some consideration of economic theory and modeling goals.

%% Special Cases
%
% Because of the difficulties of using standard testing procedures in
% modeling contexts where CNLM assumptions are violated, a number of
% specialized procedures have been developed for use with specific model
% types. In some cases, appropriate lag orders can be determined solely
% through data analysis. Other cases require sequential estimation and
% evaluation of a range of candidate models.
%
% $\bullet$ *ARMA Models*. Stationary time series are often represented,
% theoretically, by infinite-order MA processes [18]. ARMA models translate
% these representations into finite, rational form. Lag orders for the AR
% and MA components of a model must be chosen together, to achieve a
% balance between accuracy and model parsimony. The standard identification
% method [4], described in the example "Box-Jenkins Model Selection,"
% examines patterns in sample autocorrelation functions to determine the
% relative significance of candidate lag structures.
%
% $\bullet$ *ARDL Models*. Many economic variables are influenced by
% exogenous driving processes that can impart persistent shocks to a DGP.
% Theoretically, these are represented by infinite-order DL models, but as
% with ARMA models, finite, rational forms are required for practical
% estimation. Standard techniques, such as those proposed by Almon [1] and
% Koyck [13], assign weights to the lag structure in such a way that the
% model can be transformed into AR, ARMA, or ARMAX form. The methods are
% more ad hoc than data-driven, and subject to the problems of collinearity
% that come from working with many lags of a predictor at proximate times.
% (See the example on "Collinearity and Estimator Variance".)
%
% $\bullet$ *GARCH Models*. GARCH models are commonly used to model
% patterns of hetroscedasticity in an innovations process, especially in
% financial applications. Like ARMA and ARDL models, they combine two types
% of lags, with orders that must be balanced appropriately. In practice,
% there are methods for tranforming GARCH models to ARMA form [8], where
% Box-Jenkins methods can be applied, but this is rarely done in practice.
% For most economic and financial series, lag orders of 1 and 1 seem to
% serve well.
%
% $\bullet$ *Unit Root Tests*. Unit root and stationarity tests, such as
% |adftest| and |lmctest|, use dynamic models of the test process, and
% require users to choose a lag order. This nuisance parameter can have
% significant effect on the test results. The potential presence of
% nonstationary data in this setting calls into question the use of
% standard _t_ and _F_ tests. However, Sims, Stock, and Watson [17] have
% shown that they are justified when the regression includes all
% deterministic components of the DGP.
%
% $\bullet$ *VAR Models*. VAR models are a generic, widely-used form for
% representing systems of interacting economic variables. They require a
% lag order that captures the relevant past history of _all_ variables in
% the model. Since the models are multivariate, estimation costs grow
% quickly with increasing lag order, so a parsimonious selection procedure
% is essential. L&uuml;tkepohl [15] discusses a variety of strategies, most of
% which are multivariate generalizations of the techniques presented in
% this example.

%% Summary
%
% This example surveys common strategies for lag order selection, and makes
% the case for adapting strategies to individual data sets and models. The
% data considered here are few, and far from the idealizations of
% asymptotic analysis. It is also quite possible the model under study may
% be misspecified, confounding its own evaluation. However, these obstacles
% are fairly typical in econometric practice. Without considered,
% "hands-on" application, guided by some sense of economic reality, lag
% order selection presents many opportunities for distorted inference that
% can lead to poorly performing models. Familiarity with the common
% difficulties, however, can help guide the path to sharper specifications.
%
% Of course, it is not always necessary to choose a "best" model or lag
% order. Often, given the statistical uncertainty, it is enough to
% eliminate a large subset of highly unlikely candidates, leaving a smaller
% subset for further analysis and data collection. The strategies of this
% example serve that purpose well.
%
%% References
%
% [1] Almon, S. "The Distributed Lag Between Capital Appropriations and
% Expenditures." _Econometrica._ Vol. 33, 1965, pp. 178-196.
% 
% [2] Andrews, D. W. K. "Heteroskedasticity and Autocorrelation Consistent
% Covariance Matrix Estimation." _Econometrica._ Vol. 59, 1991, pp.
% 817-858.
%
% [3] Andrews, D. W. K., and J. C. Monohan. "An Improved Heteroskedasticity
% and Autocorrelation Consistent Covariance Matrix Estimator."
% _Econometrica._ Vol. 60, 1992, pp. 953-966.
%
% [4] Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. _Time Series
% Analysis: Forecasting and Control._ Upper Saddle River, NJ:
% Prentice-Hall, 1994.
%
% [5] Banerjee, A. N., and J. R. Magnus. "On the Sensitivity of the Usual
% _t_- and _F_-Tests to Covariance Misspecification." _Journal of
% Econometrics._ Vol. 95, 2000, pp. 157-176.
%
% [6] Burnham, K. P., and D. R. Anderson. _Model Selection and Multi-Model
% Inference: A Practical Information-Theoretic Approach_. New York:
% Springer, 2010.
%
% [7] Davidson, R., and E. Flachaire. "The Wild Bootstrap, Tamed at Last."
% _Journal of Econometrics._ Vol. 146, 2008, pp. 162-169.
%
% [8] Hamilton, J.D. _Time Series Analysis._ Princeton, NJ: Princeton
% University Press, 1994.
%
% [9] Hendry, D. F. _Econometrics: Alchemy or Science?_ Oxford: Oxford
% University Press, 2001.
%
% [10] Keuzenkamp, H. A., and M. McAleer. "Simplicity, Scientific Inference
% and Economic Modeling." _Economic Journal._ Vol. 105, 1995, pp. 1-21.
%
% [11] Kiefer, N. M., T. J. Vogelsang, and H. Bunzel. "Simple Robust
% Testing of Regression Hypotheses." _Econometrica._ Vol. 68, 2000, pp.
% 695-714.
%
% [12] King, M. L. "Robust Tests for Spherical Symmetry and Their
% Application to Least Squares Regression." _Annals of Statistics._ Vol. 8,
% 1980, pp. 1265-1271.
%
% [13] Koyck, L. M. _Distributed Lags Models and Investment Analysis._
% Amsterdam: North-Holland, 1954.
%
% [14] Krolzig, H. M., and Hendry, D.F. "Computer Automation of
% General-To-Specific Model Selection Procedures." _Journal of Economic
% Dynamics & Control_. Vol. 25, 2001, pp. 831-866.
%
% [15] L&uuml;tkepohl, H. _New Introduction to Multiple Time Series Analysis._
% New York: Springer, 2007.
%
% [16] Qin, H., and A. T. K. Wan. "On the Properties of the _t_- and
% _F_-Ratios in Linear Regressions with Nonnormal Errors." _Econometric
% Theory._ Vol. 20, No. 4, 2004, pp. 690-700.
%
% [17] Sims, C. A., J. H. Stock, and M. W. Watson. "Inference in Linear
% Time Series Models with some Unit Roots." _Econometrica._ Vol. 58, 1990,
% pp. 113-144.
%
% [18] Wold, H. _A Study in the Analysis of Stationary Time Series._
% Uppsala, Sweden: Almqvist & Wiksell, 1938.