www.gusucode.com > stats 源码程序 matlab案例代码 > stats/FindGoodLassoPenaltyCrossValidatedClassificationLossExample.m
%% Find Good Lasso Penalty Using _k_-fold Classification Loss % To determine a good lasso-penalty strength for a linear classification % model that uses a logistic regression learner, compare test-sample % classification error rates. %% % Load the NLP data set. Preprocess the data as in % <docid:stats_ug.bu7kw58-1>. load nlpdata Ystats = Y == 'stats'; X = X'; %% % Create a set of 11 logarithmically-spaced regularization strengths from % $10^{-6}$ through $10^{0.5}$. Lambda = logspace(-6,-0.5,11); %% % Cross-validate binary, linear classification models using 5-fold % cross-validation, and that use each of the regularization strengths. % Solve the objective function using SpaRSA. Lower the tolerance on the % gradient of the objective function to |1e-8|. % rng(10); % For reproducibility CVMdl = fitclinear(X,Ystats,'ObservationsIn','columns',... 'KFold',5,'Learner','logistic','Solver','sparsa',... 'Regularization','lasso','Lambda',Lambda,'GradientTolerance',1e-8) %% % Extract a trained linear classification model. Mdl1 = CVMdl.Trained{1} %% % |Mdl1| is a |ClassificationLinear| model object. Because |Lambda| is a % sequence of regularization strengths, you can think of |Mdl| as 11 % models, one for each regularization strength in |Lambda|. %% % Estimate the cross-validated classification error. ce = kfoldLoss(CVMdl); %% % Because there are 11 regularization strengths, |ce| is a 1-by-11 vector % of classification error rates. %% % Higher values of |Lambda| lead to predictor variable sparsity, which is a % good quality of a classifier. For each regularization strength, train a % linear classification model for each regularization strength using the % entire data set and the same options as when you cross-validated the % models. Determine the number of nonzero coefficients per model. Mdl = fitclinear(X,Ystats,'ObservationsIn','columns',... 'Learner','logistic','Solver','sparsa','Regularization','lasso',... 'Lambda',Lambda,'GradientTolerance',1e-8); numNZCoeff = sum(Mdl.Beta~=0); %% % In the same figure, plot the cross-validated, classification error rates % and frequency of nonzero coefficients for each regularization strength. % Plot all variables on the log scale. figure; [h,hL1,hL2] = plotyy(log10(Lambda),log10(ce),... log10(Lambda),log10(numNZCoeff)); hL1.Marker = 'o'; hL2.Marker = 'o'; ylabel(h(1),'log_{10} classification error') ylabel(h(2),'log_{10} nonzero-coefficient frequency') xlabel('log_{10} Lambda') title('Test-Sample Statistics') hold off %% % Choose the indes of the regularization strength that balances predictor % variable sparsity and low classification error. In this case, a value % between $10^{-4}$ to $10^{-1}$ should suffice. idxFinal = 7; %% % Select the model from |Mdl| with the chosen regularization strength. MdlFinal = selectModels(Mdl,idxFinal); %% % |MdlFinal| is a |ClassificationLinear| model containing one % regularization strength. To estimate labels for new observations, pass % |MdlFinal| and the new data to |predict|.