www.gusucode.com > econ 案例源码程序 matlab代码 > econ/Demo_TSReg2.m

    %% Time Series Regression II: Collinearity and Estimator Variance
%
% This example shows how to detect correlation among predictors and
% accommodate problems of large estimator variance. It is the second in a
% series of examples on time series regression, following the presentation
% in the previous example.

% Copyright 2012 The MathWorks, Inc.

%% Introduction
%
% Economic models are always underspecified with respect to the true
% data-generating process (DGP). Model predictors never fully represent the
% totality of causal factors producing an economic response. Omitted
% variables, however, continue to exert their influence through the
% innovations process, forcing model coefficients to account for variations
% in the response that they do not truly explain. Coefficient estimates
% that are too large (type I errors) or too small (type II errors) distort
% the marginal contribution of each predictor. In some cases, coefficients
% even end up with the wrong sign.
%
% Economic models can also be overspecified, by including a theory-blind
% mix of predictors with the hope of capturing some significant part of the
% DGP. Often, "general-to-specific" (GETS) estimation methods are applied
% with a misplaced trust that standard diagnostic statistics will sort out
% the good predictors. However, the very presence of causally insignificant
% predictors tends to increase estimator variance, raising the possibility
% that standard inferences will be unreliable.
%
% The reality of working with misspecified models is addressed in this, and
% subsequent, examples in this series. Underspecified models often
% introduce correlation between predictors and omitted variables in the
% innovations process. Overspecified models often introduce correlation
% among predictors. Each presents its own problems for model estimation. In
% this example, we look specifically at problems arising from correlated
% predictors. The somewhat more complicated issues related to correlation
% between predictors and innovations (exogeneity violations) are addressed
% in the example on "Lagged Variables and Estimator Bias."
%
% We begin by loading relevant data from the previous example on "Linear
% Models," and continue the analysis of the credit default model presented
% there:

load Data_TSReg1

%% Correlation and Condition Numbers
%
% As a first step toward model specification, it is useful to identify any
% possible dependencies among the predictors. The correlation matrix is a
% standard measure of the strength of pairwise linear relationships:

R0 = corrcoef(X0)

%%
% The utility function |corrplot| helps to visualize the results in the
% correlation matrix by plotting a matrix of pairwise scatters. Slopes of
% the displayed least-squares lines are equal to the displayed correlation
% coefficients. It is convenient to work with the tabular array version of
% the data, |X0Tbl|, which contains the predictor names for the plots:

corrplot(X0Tbl,'testR','on') 

%%
% Correlation coefficients highlighted in red have a significant
% $t$-statistic. The predictor |BBB| again distinguishes itself by its
% relatively high correlations with the other predictors, though the
% strength of the relationships is moderate. Here the visualization is
% particularly helpful, as |BBB| displays fairly disorganized scatters,
% with the possibility of a number of small, potentially influential data
% subsets. The plots are a reminder of the limitations of the linear
% correlation coefficient as a summary statistic.
%
% Both the scale and correlations of |BBB| have the potential to inflate
% the _condition number_ $\kappa$ of $X_t$. The condition number is often
% used to characterize the overall sensitivity of OLS estimates to changes
% in the data. For an MLR model with intercept:

kappa0I = cond(X0I)

%%
% The condition number is well above the "well-conditioned" benchmark of 1,
% which is achieved when $X_t$ has orthonormal columns. As a rule of thumb,
% a 1% relative error in the data $X_t$ can produce up to a $\kappa$%
% relative error in the coefficient estimates $\beta$ [4]:
%
% $${\|\delta \beta\| \over \|\beta\|} \le \kappa{\|\delta X_t\| \over \|X_t\|}$$
%
% As shown in the previous example on "Linear Models," coefficient
% estimates for this data are on the order of $10^{-2}$, so a $\kappa$ on
% the order of $10^2$ leads to absolute estimation errors $\|\delta
% \beta\|$ that are approximated by the relative errors in the data.

%% Estimator Variance
%
% Correlations and condition numbers are widely used to flag potential data
% problems, but their diagnostic value is limited. Correlations consider
% only pairwise dependencies between predictors, while condition numbers
% consider only $X_t$ in aggregate. Relationships among arbitrary predictor
% subsets (_multicollinearities_) can fall somewhere in between. CLM
% assumptions forbid exact relationships, but identifying the strength and
% source of any near relationships, and their specific effect on
% coefficient estimation, is an essential part of specification analysis.
%
% Many methods for detecting near collinearities focus on the coefficient
% estimates in $\hat{\beta}$, rather than the data in $X_t$. Each of the
% following has been suggested as a telltale sign of predictor
% dependencies:
%
% $\bullet$ Statistically insignificant coefficients on theoretically
% important predictors
%
% $\bullet$ Coefficients with signs or magnitudes that do not make
% theoretical sense
%
% $\bullet$ Extreme sensitivity of a coefficient to insertion or deletion
% of other predictors
%
% The qualitative nature of these criteria is apparent, and unfortunately
% none of them is necessary or sufficient for detecting collinearity.
%
% To illustrate, we again display OLS fit statistics of the credit default
% model:

M0

%%
% The signs of the coefficient estimates are consistent with theoretical
% expectations: |AGE|, |BBB|, and |SPR| add risk; |CPF| reduces it. The
% _t_-statistics, which scale the coefficient estimates by their standard
% errors (computed under the assumption of normal innovations), show that
% all predictors are significantly different from 0 at the 20% level. |CPF|
% appears especially significant here. The significance of a predictor,
% however, is relative to the other predictors in the model.
%
% There is nothing in the standard regression results to raise substantial
% concern about collinearity. To put the results in perspective, however,
% it is necessary to consider other sources of estimator variance. Under
% CLM assumptions, the variance of the $i^{th}$ component of $\hat{\beta}$,
% $\hat{\beta_i}$, can be decomposed as follows [6]:
%
% $$Var(\hat{\beta_i}) = {\sigma^2 \over SST_i(1-R_i^2)},$$
%
% where $\sigma^2$ is the variance of the innovations process (assumed
% constant), $SST_i$ is the total sample variation of predictor $i$, and
% $R_i^2$ is the coefficient of determination from a regression of
% predictor $i$ on the remaining predictors (and intercept, if present).
%
% The term
%
% $$VIF_i = {1 \over 1-R_i^2}$$
%
% is called the _variance inflation factor_ (VIF), and is another common
% collinearity diagnostic. When the variation of predictor $i$ is largely
% explained by a linear combination of the other predictors, $R_i^2$ is
% close to $1$, and the VIF for that predictor is correspondingly large.
% The inflation is measured relative to an $R_i^2$ of 0 (no collinearity),
% and a VIF of 1.
%
% VIFs are also the diagonal elements of the inverse of the correlation
% matrix [1], a convenient result that eliminates the need to set up the
% various regressions:

VIF = diag(inv(R0))'
predNames0

%%
% How large a VIF is cause for concern? As with significance levels for
% standard hypothesis tests, experience with certain types of data may
% suggest useful tolerances. Common ad hoc values, in the range of 5 to 10,
% are of little use in general. In this case, |BBB| has the highest VIF,
% but it does not jump out from the rest of the predictors.
%
% More importantly, VIF is only one factor in the variance decomposition
% given above. A large VIF can be balanced by either a small innovations
% variance $\sigma^2$ (good model fit) or a large sample variation $SST_i$
% (sufficient data). As such, Goldberger [2] ironically compares the
% "problem" of multicollinearity, viewed in isolation, to the problem of
% data "micronumerosity." Evaluating the combined effect of the different
% sources of estimator variance requires a wider view.
%
% Econometricians have developed a number of rules of thumb for deciding
% when to worry about collinearity. Perhaps the most common says that it is
% acceptable to ignore evidence of collinearity if the resulting
% _t_-statistics are all greater than 2 in absolute value. This ensures
% that 0 is outside of the approximate 95% confidence interval of each
% estimate (assuming normal innovations or a large sample). Because
% _t_-statistics are already adjusted for estimator variance, the
% presumption is that they adequately account for collinearity in the
% context of other, balancing effects. The regression results above show
% that three of the potential predictors in |X0| fail this test.
%
% Another rule of thumb is based on an estimate of $Var(\hat{\beta_i})$
% [5]:
%
% $$\widehat{Var}(\hat{\beta_i}) = {1 \over {T-n}}{\hat\sigma_y^2 \over \hat\sigma_i^2}{1-R^2 \over 1-R_i^2},$$
%
% where $T$ is the sample size, $n$ is the number of predictors,
% $\hat\sigma_y^2$ is the estimated variance of $y_t$, $\hat\sigma_i^2$ is
% the estimated variance of predictor $i$, $R^2$ is the coefficient of
% determination for the regression of $y_t$ on $X_t$, and $R_i^2$ is as
% above. The rule says that concerns about collinearity can be ignored if
% $R^2$ exceeds $R_i^2$ for each predictor, since each VIF will be balanced
% by $1-R^2$. All of the potential predictors in |X0| pass this test:

RSquared = M0.Rsquared
RSquared_i = 1-(1./VIF)
predNames0

%%
% These rules attempt to identify the _consequences_ of collinearity, as
% expressed in the regression results. As we have seen, they can offer
% conflicting advice on when, and how much, to worry about the integrity of
% the coefficient estimates. They do not provide any accounting of the
% nature of the multiple dependencies within the data, nor do they provide
% any reliable measure of the extent to which these dependencies degrade
% the regression.

%% Collinearity Diagnostics
%
% A more detailed analytic approach is provided by Belsley [1]. Instability
% of OLS estimates can be traced to small eigenvalues in the cross-product
% matrix $X_t^T X_t$ appearing in the normal equations for $\hat{\beta}$:
%
% $$\hat{\beta} = (X_t^T X_t)^{\rm-1} X_t^T y_t.$$
%
% Belsley reformulates the eigensystem of $X_t^T X_t$ in terms of the
% singular values of the matrix $X_t$, which can then be analyzed directly,
% with greater numerical accuracy. If the singular values of $X_t$ are
% $\mu_1, ..., \mu_n$, where $n$ is the number of predictors, then the
% condition number of $X_t$ is $\kappa = \mu_{max}/\mu_{min}$. Belsley
% defines a spectrum of _condition indices_ $\eta_j = \mu_{max}/\mu_j$ for
% each $j = 1, ..., n$, and shows that high indices indicate separate near
% dependencies in the data.
%
% Belsley goes further by describing a method for identifying the specific
% predictors involved in each near dependency, and provides a measure of
% how important those dependencies are in affecting coefficient estimates.
% This is achieved with yet another decomposition of $Var(\hat{\beta_i})$,
% this time in terms of the singular values. If $X_t$ has a singular-value
% decomposition $USV^T$, with $V = (v_{ij})$, then:
%
% $$Var(\hat{\beta_i}) = \sigma^2 \sum_{j=1}^n v_{ij}^2 / \mu_j^2,$$
%
% where $\sigma^2$ is the innovations variance. The _variance decomposition
% proportions_ $\pi_{ji}$ are defined by:
%
% $$\phi_{ij} = v_{ij}^2 / \mu_j^2,$
%
% $$\phi_i = \sum_{j=1}^n \phi_{ij},$
%
% $$\pi_{ji} = \phi_{ij} / \phi_i.$
%
% The $\pi_{ji}$ give the proportion of $Var(\hat{\beta_i})$ associated
% with singular value $\mu_j$.
%
% Indices and proportions are interpreted as follows:
%
% $\bullet$ The number of high condition indices identifies the number of
% near dependencies.
%
% $\bullet$ The size of the condition indices identifies the tightness of
% each dependency.
%
% $\bullet$ The location of high proportions in a high index row identifies
% the dependent predictors.
%
% $\bullet$ The size of the proportions identifies the degree of
% degradation to regression estimates.
%
% Again, a tolerance for "high" must be determined. Belsley's simulation
% experiments suggest that condition indices in the range of 5 to 10
% reflect weak dependencies, and those in the range 30 to 100 reflect
% moderate to high dependencies. He suggests a tolerance of 0.5 for
% variance decomposition proportions identifying individual predictors.
% Simulation experiments, however, are necessarily based on specific models
% of mutual dependence, so tolerances need to be reevaluated in each
% empirical setting.
%
% The function |collintest| implements Belsley's procedure. Outputs are
% displayed in tabular form:

collintest(X0ITbl);

%%
% If we lower the index tolerance to 10 and maintain a proportion tolerance
% of 0.5, the analysis identifies one weak dependency between |AGE| and
% |SPR| in the final row. It can be visualized by setting the |'tolIdx'|
% and |'tolProp'| parameters in |collintest| and turning on the |'plot'|
% flag:

collintest(X0ITbl,'tolIdx',10,'tolProp',0.5,'display','off','plot','on');

%%
% The plot shows critical rows in the variance decomposition table, above
% the index tolerance. The row associated with condition index 12 has only
% one predictor, |BBB|, with a proportion above the tolerance, not the two
% or more predictors required for a dependency. The row associated with
% condition index 15.3 shows the weak dependence involving |AGE|, |SPR|,
% and the intercept. This relationship was not apparent in the initial plot
% of the correlation matrix.
%
% In summary, the results of the various collinearity diagnostics are
% consistent with data in which no degrading near relationships exist.
% Indeed, a review of the economic meaning of the potential predictors
% (easily lost in a purely statistical analysis) does not suggest any
% theoretical reason for strong relationships. Regardless of weak
% dependencies, OLS estimates remain BLUE, and the standard errors in the
% regression results show an accuracy that is probably acceptable for most
% modeling purposes.

%% Ridge Regression
%
% To conclude, we briefly examine the technique of _ridge regression_,
% which is often suggested as a remedy for estimator variance in MLR models
% of data with some degree of collinearity. The technique can also be used
% as a collinearity diagnostic.
%
% To address the problem of near singularity in $X_t^T X_t$, ridge
% regression estimates $\hat{\beta}$ using a _regularization_ of the normal
% equations:
%
% $$\hat{\beta}_{ridge} = (X_t^T X_t + kI)^{\rm-1} X_t^T y_t,$$
%
% where $k$ is a positive _ridge parameter_ and $I$ is the identity matrix.
% The perturbation to the diagonal of $X_t^T X_t$ is intended to improve
% the conditioning of the eigenvalue problem and reduce the variance of the
% coefficient estimates. As $k$ increases, ridge estimates become biased
% toward zero, but a reduced variance can result in a smaller mean-squared
% error (MSE) relative to comparable OLS estimates, especially in the
% presence of collinearity.
%
% Ridge regression is carried out by the function |ridge|. To examine the
% results for a range of ridge parameters $k$, a _ridge trace_ [3] is
% produced:

Mu0I = mean(diag(X0I'*X0I));   % Scale of cross-product diagonal

k = 0:Mu0I/10;                 % Range of ridge parameters
ridgeBetas = ridge(y0,X0,k,0); % Coefficients for MLR model with intercept

figure
plot(k,ridgeBetas(2:end,:),'LineWidth',2)
xlim([0 Mu0I/10])
legend(predNames0)
xlabel('Ridge Parameter') 
ylabel('Ridge Coefficient Estimate') 
title('{\bf Ridge Trace}')
axis tight
grid on 

%%
% The OLS estimates, with $k = 0$, appear on the left. The important
% question is whether any of the ridge estimates reduce the MSE:

[numRidgeParams,numRidgeBetas] = size(ridgeBetas);
y0Hat = X0I*ridgeBetas;
RidgeRes = repmat(y0,1,numRidgeBetas)-y0Hat;
RidgeSSE = RidgeRes'*RidgeRes;
RidgeDFE = T0-numRidgeParams;
RidgeMSE = diag(RidgeSSE/RidgeDFE);

figure
plot(k,RidgeMSE,'m','LineWidth',2)
xlim([0 Mu0I/10])
xlabel('Ridge Parameter') 
ylabel('MSE') 
title('{\bf Ridge MSE}')
axis tight
grid on

%%
% The plot shows exactly the opposite of what one would hope for when
% applying ridge regression. The MSE actually increases over the entire
% range of ridge parameters, suggesting again that there is no significant
% collinearity in the data for ridge regression to correct.
%
% A technique related to ridge regression, the _lasso_, is described in
% the example on "Predictor Selection."

%% Summary
%
% This example has focused on properties of predictor data that can lead to
% high OLS estimator variance, and so unreliable coefficient estimates. The
% techniques of Belsley are useful for identifying specific data
% relationships that contribute to the problem, and for evaluating the
% degree of the effects on estimation. One method for accommodating
% estimator variance is ridge regression. Methods for selectively removing
% problematic predictors are addressed in the examples on "Influential
% Observations" and "Predictor Selection."

%% References
%
% [1] Belsley, D. A., E. Kuh, and R. E. Welsch. _Regression Diagnostics_.
% Hoboken, NJ: John Wiley & Sons, 1980.
%
% [2] Goldberger, A. T. _A Course in Econometrics_. Cambridge, MA: Harvard
% University Press, 1991.
%
% [3] Hoerl, A. E., and R. W. Kennard. "Ridge Regression: Applications to
% Nonorthogonal Problems." Technometrics. Vol. 12, No. 1, 1970, pp. 69-82.
%
% [4] Moler, C. _Numerical Computing with MATLAB_. Philadelphia, PA:
% Society for Industrial and Applied Mathematics, 2004.
%
% [5] Stone, R. "The Analysis of Market Demand." _Journal of the Royal
% Statistical Society_. Vol. 108, 1945, pp. 1-98.
%
% [6] Wooldridge, J. M. _Introductory Econometrics_. Cincinnati, OH:
% South-Western, 2009.