www.gusucode.com > econ 案例源码程序 matlab代码 > econ/Demo_TSReg2.m
%% Time Series Regression II: Collinearity and Estimator Variance % % This example shows how to detect correlation among predictors and % accommodate problems of large estimator variance. It is the second in a % series of examples on time series regression, following the presentation % in the previous example. % Copyright 2012 The MathWorks, Inc. %% Introduction % % Economic models are always underspecified with respect to the true % data-generating process (DGP). Model predictors never fully represent the % totality of causal factors producing an economic response. Omitted % variables, however, continue to exert their influence through the % innovations process, forcing model coefficients to account for variations % in the response that they do not truly explain. Coefficient estimates % that are too large (type I errors) or too small (type II errors) distort % the marginal contribution of each predictor. In some cases, coefficients % even end up with the wrong sign. % % Economic models can also be overspecified, by including a theory-blind % mix of predictors with the hope of capturing some significant part of the % DGP. Often, "general-to-specific" (GETS) estimation methods are applied % with a misplaced trust that standard diagnostic statistics will sort out % the good predictors. However, the very presence of causally insignificant % predictors tends to increase estimator variance, raising the possibility % that standard inferences will be unreliable. % % The reality of working with misspecified models is addressed in this, and % subsequent, examples in this series. Underspecified models often % introduce correlation between predictors and omitted variables in the % innovations process. Overspecified models often introduce correlation % among predictors. Each presents its own problems for model estimation. In % this example, we look specifically at problems arising from correlated % predictors. The somewhat more complicated issues related to correlation % between predictors and innovations (exogeneity violations) are addressed % in the example on "Lagged Variables and Estimator Bias." % % We begin by loading relevant data from the previous example on "Linear % Models," and continue the analysis of the credit default model presented % there: load Data_TSReg1 %% Correlation and Condition Numbers % % As a first step toward model specification, it is useful to identify any % possible dependencies among the predictors. The correlation matrix is a % standard measure of the strength of pairwise linear relationships: R0 = corrcoef(X0) %% % The utility function |corrplot| helps to visualize the results in the % correlation matrix by plotting a matrix of pairwise scatters. Slopes of % the displayed least-squares lines are equal to the displayed correlation % coefficients. It is convenient to work with the tabular array version of % the data, |X0Tbl|, which contains the predictor names for the plots: corrplot(X0Tbl,'testR','on') %% % Correlation coefficients highlighted in red have a significant % $t$-statistic. The predictor |BBB| again distinguishes itself by its % relatively high correlations with the other predictors, though the % strength of the relationships is moderate. Here the visualization is % particularly helpful, as |BBB| displays fairly disorganized scatters, % with the possibility of a number of small, potentially influential data % subsets. The plots are a reminder of the limitations of the linear % correlation coefficient as a summary statistic. % % Both the scale and correlations of |BBB| have the potential to inflate % the _condition number_ $\kappa$ of $X_t$. The condition number is often % used to characterize the overall sensitivity of OLS estimates to changes % in the data. For an MLR model with intercept: kappa0I = cond(X0I) %% % The condition number is well above the "well-conditioned" benchmark of 1, % which is achieved when $X_t$ has orthonormal columns. As a rule of thumb, % a 1% relative error in the data $X_t$ can produce up to a $\kappa$% % relative error in the coefficient estimates $\beta$ [4]: % % $${\|\delta \beta\| \over \|\beta\|} \le \kappa{\|\delta X_t\| \over \|X_t\|}$$ % % As shown in the previous example on "Linear Models," coefficient % estimates for this data are on the order of $10^{-2}$, so a $\kappa$ on % the order of $10^2$ leads to absolute estimation errors $\|\delta % \beta\|$ that are approximated by the relative errors in the data. %% Estimator Variance % % Correlations and condition numbers are widely used to flag potential data % problems, but their diagnostic value is limited. Correlations consider % only pairwise dependencies between predictors, while condition numbers % consider only $X_t$ in aggregate. Relationships among arbitrary predictor % subsets (_multicollinearities_) can fall somewhere in between. CLM % assumptions forbid exact relationships, but identifying the strength and % source of any near relationships, and their specific effect on % coefficient estimation, is an essential part of specification analysis. % % Many methods for detecting near collinearities focus on the coefficient % estimates in $\hat{\beta}$, rather than the data in $X_t$. Each of the % following has been suggested as a telltale sign of predictor % dependencies: % % $\bullet$ Statistically insignificant coefficients on theoretically % important predictors % % $\bullet$ Coefficients with signs or magnitudes that do not make % theoretical sense % % $\bullet$ Extreme sensitivity of a coefficient to insertion or deletion % of other predictors % % The qualitative nature of these criteria is apparent, and unfortunately % none of them is necessary or sufficient for detecting collinearity. % % To illustrate, we again display OLS fit statistics of the credit default % model: M0 %% % The signs of the coefficient estimates are consistent with theoretical % expectations: |AGE|, |BBB|, and |SPR| add risk; |CPF| reduces it. The % _t_-statistics, which scale the coefficient estimates by their standard % errors (computed under the assumption of normal innovations), show that % all predictors are significantly different from 0 at the 20% level. |CPF| % appears especially significant here. The significance of a predictor, % however, is relative to the other predictors in the model. % % There is nothing in the standard regression results to raise substantial % concern about collinearity. To put the results in perspective, however, % it is necessary to consider other sources of estimator variance. Under % CLM assumptions, the variance of the $i^{th}$ component of $\hat{\beta}$, % $\hat{\beta_i}$, can be decomposed as follows [6]: % % $$Var(\hat{\beta_i}) = {\sigma^2 \over SST_i(1-R_i^2)},$$ % % where $\sigma^2$ is the variance of the innovations process (assumed % constant), $SST_i$ is the total sample variation of predictor $i$, and % $R_i^2$ is the coefficient of determination from a regression of % predictor $i$ on the remaining predictors (and intercept, if present). % % The term % % $$VIF_i = {1 \over 1-R_i^2}$$ % % is called the _variance inflation factor_ (VIF), and is another common % collinearity diagnostic. When the variation of predictor $i$ is largely % explained by a linear combination of the other predictors, $R_i^2$ is % close to $1$, and the VIF for that predictor is correspondingly large. % The inflation is measured relative to an $R_i^2$ of 0 (no collinearity), % and a VIF of 1. % % VIFs are also the diagonal elements of the inverse of the correlation % matrix [1], a convenient result that eliminates the need to set up the % various regressions: VIF = diag(inv(R0))' predNames0 %% % How large a VIF is cause for concern? As with significance levels for % standard hypothesis tests, experience with certain types of data may % suggest useful tolerances. Common ad hoc values, in the range of 5 to 10, % are of little use in general. In this case, |BBB| has the highest VIF, % but it does not jump out from the rest of the predictors. % % More importantly, VIF is only one factor in the variance decomposition % given above. A large VIF can be balanced by either a small innovations % variance $\sigma^2$ (good model fit) or a large sample variation $SST_i$ % (sufficient data). As such, Goldberger [2] ironically compares the % "problem" of multicollinearity, viewed in isolation, to the problem of % data "micronumerosity." Evaluating the combined effect of the different % sources of estimator variance requires a wider view. % % Econometricians have developed a number of rules of thumb for deciding % when to worry about collinearity. Perhaps the most common says that it is % acceptable to ignore evidence of collinearity if the resulting % _t_-statistics are all greater than 2 in absolute value. This ensures % that 0 is outside of the approximate 95% confidence interval of each % estimate (assuming normal innovations or a large sample). Because % _t_-statistics are already adjusted for estimator variance, the % presumption is that they adequately account for collinearity in the % context of other, balancing effects. The regression results above show % that three of the potential predictors in |X0| fail this test. % % Another rule of thumb is based on an estimate of $Var(\hat{\beta_i})$ % [5]: % % $$\widehat{Var}(\hat{\beta_i}) = {1 \over {T-n}}{\hat\sigma_y^2 \over \hat\sigma_i^2}{1-R^2 \over 1-R_i^2},$$ % % where $T$ is the sample size, $n$ is the number of predictors, % $\hat\sigma_y^2$ is the estimated variance of $y_t$, $\hat\sigma_i^2$ is % the estimated variance of predictor $i$, $R^2$ is the coefficient of % determination for the regression of $y_t$ on $X_t$, and $R_i^2$ is as % above. The rule says that concerns about collinearity can be ignored if % $R^2$ exceeds $R_i^2$ for each predictor, since each VIF will be balanced % by $1-R^2$. All of the potential predictors in |X0| pass this test: RSquared = M0.Rsquared RSquared_i = 1-(1./VIF) predNames0 %% % These rules attempt to identify the _consequences_ of collinearity, as % expressed in the regression results. As we have seen, they can offer % conflicting advice on when, and how much, to worry about the integrity of % the coefficient estimates. They do not provide any accounting of the % nature of the multiple dependencies within the data, nor do they provide % any reliable measure of the extent to which these dependencies degrade % the regression. %% Collinearity Diagnostics % % A more detailed analytic approach is provided by Belsley [1]. Instability % of OLS estimates can be traced to small eigenvalues in the cross-product % matrix $X_t^T X_t$ appearing in the normal equations for $\hat{\beta}$: % % $$\hat{\beta} = (X_t^T X_t)^{\rm-1} X_t^T y_t.$$ % % Belsley reformulates the eigensystem of $X_t^T X_t$ in terms of the % singular values of the matrix $X_t$, which can then be analyzed directly, % with greater numerical accuracy. If the singular values of $X_t$ are % $\mu_1, ..., \mu_n$, where $n$ is the number of predictors, then the % condition number of $X_t$ is $\kappa = \mu_{max}/\mu_{min}$. Belsley % defines a spectrum of _condition indices_ $\eta_j = \mu_{max}/\mu_j$ for % each $j = 1, ..., n$, and shows that high indices indicate separate near % dependencies in the data. % % Belsley goes further by describing a method for identifying the specific % predictors involved in each near dependency, and provides a measure of % how important those dependencies are in affecting coefficient estimates. % This is achieved with yet another decomposition of $Var(\hat{\beta_i})$, % this time in terms of the singular values. If $X_t$ has a singular-value % decomposition $USV^T$, with $V = (v_{ij})$, then: % % $$Var(\hat{\beta_i}) = \sigma^2 \sum_{j=1}^n v_{ij}^2 / \mu_j^2,$$ % % where $\sigma^2$ is the innovations variance. The _variance decomposition % proportions_ $\pi_{ji}$ are defined by: % % $$\phi_{ij} = v_{ij}^2 / \mu_j^2,$ % % $$\phi_i = \sum_{j=1}^n \phi_{ij},$ % % $$\pi_{ji} = \phi_{ij} / \phi_i.$ % % The $\pi_{ji}$ give the proportion of $Var(\hat{\beta_i})$ associated % with singular value $\mu_j$. % % Indices and proportions are interpreted as follows: % % $\bullet$ The number of high condition indices identifies the number of % near dependencies. % % $\bullet$ The size of the condition indices identifies the tightness of % each dependency. % % $\bullet$ The location of high proportions in a high index row identifies % the dependent predictors. % % $\bullet$ The size of the proportions identifies the degree of % degradation to regression estimates. % % Again, a tolerance for "high" must be determined. Belsley's simulation % experiments suggest that condition indices in the range of 5 to 10 % reflect weak dependencies, and those in the range 30 to 100 reflect % moderate to high dependencies. He suggests a tolerance of 0.5 for % variance decomposition proportions identifying individual predictors. % Simulation experiments, however, are necessarily based on specific models % of mutual dependence, so tolerances need to be reevaluated in each % empirical setting. % % The function |collintest| implements Belsley's procedure. Outputs are % displayed in tabular form: collintest(X0ITbl); %% % If we lower the index tolerance to 10 and maintain a proportion tolerance % of 0.5, the analysis identifies one weak dependency between |AGE| and % |SPR| in the final row. It can be visualized by setting the |'tolIdx'| % and |'tolProp'| parameters in |collintest| and turning on the |'plot'| % flag: collintest(X0ITbl,'tolIdx',10,'tolProp',0.5,'display','off','plot','on'); %% % The plot shows critical rows in the variance decomposition table, above % the index tolerance. The row associated with condition index 12 has only % one predictor, |BBB|, with a proportion above the tolerance, not the two % or more predictors required for a dependency. The row associated with % condition index 15.3 shows the weak dependence involving |AGE|, |SPR|, % and the intercept. This relationship was not apparent in the initial plot % of the correlation matrix. % % In summary, the results of the various collinearity diagnostics are % consistent with data in which no degrading near relationships exist. % Indeed, a review of the economic meaning of the potential predictors % (easily lost in a purely statistical analysis) does not suggest any % theoretical reason for strong relationships. Regardless of weak % dependencies, OLS estimates remain BLUE, and the standard errors in the % regression results show an accuracy that is probably acceptable for most % modeling purposes. %% Ridge Regression % % To conclude, we briefly examine the technique of _ridge regression_, % which is often suggested as a remedy for estimator variance in MLR models % of data with some degree of collinearity. The technique can also be used % as a collinearity diagnostic. % % To address the problem of near singularity in $X_t^T X_t$, ridge % regression estimates $\hat{\beta}$ using a _regularization_ of the normal % equations: % % $$\hat{\beta}_{ridge} = (X_t^T X_t + kI)^{\rm-1} X_t^T y_t,$$ % % where $k$ is a positive _ridge parameter_ and $I$ is the identity matrix. % The perturbation to the diagonal of $X_t^T X_t$ is intended to improve % the conditioning of the eigenvalue problem and reduce the variance of the % coefficient estimates. As $k$ increases, ridge estimates become biased % toward zero, but a reduced variance can result in a smaller mean-squared % error (MSE) relative to comparable OLS estimates, especially in the % presence of collinearity. % % Ridge regression is carried out by the function |ridge|. To examine the % results for a range of ridge parameters $k$, a _ridge trace_ [3] is % produced: Mu0I = mean(diag(X0I'*X0I)); % Scale of cross-product diagonal k = 0:Mu0I/10; % Range of ridge parameters ridgeBetas = ridge(y0,X0,k,0); % Coefficients for MLR model with intercept figure plot(k,ridgeBetas(2:end,:),'LineWidth',2) xlim([0 Mu0I/10]) legend(predNames0) xlabel('Ridge Parameter') ylabel('Ridge Coefficient Estimate') title('{\bf Ridge Trace}') axis tight grid on %% % The OLS estimates, with $k = 0$, appear on the left. The important % question is whether any of the ridge estimates reduce the MSE: [numRidgeParams,numRidgeBetas] = size(ridgeBetas); y0Hat = X0I*ridgeBetas; RidgeRes = repmat(y0,1,numRidgeBetas)-y0Hat; RidgeSSE = RidgeRes'*RidgeRes; RidgeDFE = T0-numRidgeParams; RidgeMSE = diag(RidgeSSE/RidgeDFE); figure plot(k,RidgeMSE,'m','LineWidth',2) xlim([0 Mu0I/10]) xlabel('Ridge Parameter') ylabel('MSE') title('{\bf Ridge MSE}') axis tight grid on %% % The plot shows exactly the opposite of what one would hope for when % applying ridge regression. The MSE actually increases over the entire % range of ridge parameters, suggesting again that there is no significant % collinearity in the data for ridge regression to correct. % % A technique related to ridge regression, the _lasso_, is described in % the example on "Predictor Selection." %% Summary % % This example has focused on properties of predictor data that can lead to % high OLS estimator variance, and so unreliable coefficient estimates. The % techniques of Belsley are useful for identifying specific data % relationships that contribute to the problem, and for evaluating the % degree of the effects on estimation. One method for accommodating % estimator variance is ridge regression. Methods for selectively removing % problematic predictors are addressed in the examples on "Influential % Observations" and "Predictor Selection." %% References % % [1] Belsley, D. A., E. Kuh, and R. E. Welsch. _Regression Diagnostics_. % Hoboken, NJ: John Wiley & Sons, 1980. % % [2] Goldberger, A. T. _A Course in Econometrics_. Cambridge, MA: Harvard % University Press, 1991. % % [3] Hoerl, A. E., and R. W. Kennard. "Ridge Regression: Applications to % Nonorthogonal Problems." Technometrics. Vol. 12, No. 1, 1970, pp. 69-82. % % [4] Moler, C. _Numerical Computing with MATLAB_. Philadelphia, PA: % Society for Industrial and Applied Mathematics, 2004. % % [5] Stone, R. "The Analysis of Market Demand." _Journal of the Royal % Statistical Society_. Vol. 108, 1945, pp. 1-98. % % [6] Wooldridge, J. M. _Introductory Econometrics_. Cincinnati, OH: % South-Western, 2009.