www.gusucode.com > stats 源码程序 matlab案例代码 > stats/FindGoodLassoPenaltyUsingkfoldMarginsExample.m
%% Find Good Lasso Penalty Using _k_-fold Margins % To determine a good lasso-penalty strength for a linear classification % model that uses a logistic regression learner, compare distributions of % _k_-fold margins. %% % Load the NLP data set. Preprocess the data as in % <docid:stats_ug.bu622gg>. load nlpdata Ystats = Y == 'stats'; X = X'; %% % Create a set of 11 logarithmically-spaced regularization strengths from % $10^{-8}$ through $10^{1}$. Lambda = logspace(-8,1,11); %% % Cross-validate a binary, linear classification model using 5-fold % cross-validation and that uses each of the regularization strengths. % Solve the objective function using SpaRSA. Lower the tolerance on the % gradient of the objective function to |1e-8|. % rng(10); % For reproducibility CVMdl = fitclinear(X,Ystats,'ObservationsIn','columns','KFold',5,... 'Learner','logistic','Solver','sparsa','Regularization','lasso',... 'Lambda',Lambda,'GradientTolerance',1e-8) %% % |CVMdl| is a |ClassificationPartitionedLinear| model. Because |fitclinear| % implements 5-fold cross-validation, |CVMdl| contains 5 % |ClassificationLinear| models that the software trains on each fold. %% % Estimate the _k_-fold margins for each regularization strength. m = kfoldMargin(CVMdl); size(m) %% % |m| is a 31572-by-11 matrix of cross-validated margins for each % observation. The columns correspond to the regularization strengths. %% % Plot the _k_-fold margins for each regularization strength. Because % logistic regression scores are in [0,1], margins are in [-1,1]. Rescale % the margins to help identify the regularization strength that maximizes % the margins over the grid. figure; boxplot(10000.^m) ylabel('Exponentiated test-sample margins') xlabel('Lambda indices') %% % Several values of |Lambda| yield _k_-fold margin distributions that are % compacted near $10000^1$. Higher values of lambda lead to predictor % variable sparsity, which is a good quality of a classifier. %% % Choose the regularization strength that occurs just before % the centers of the _k_-fold margin distributions start decreasing. LambdaFinal = Lambda(5); %% % Train a linear classification model using the entire data set and specify % the desired regularization strength. MdlFinal = fitclinear(X,Ystats,'ObservationsIn','columns',... 'Learner','logistic','Solver','sparsa','Regularization','lasso',... 'Lambda',LambdaFinal); %% % To estimate labels for new observations, pass |MdlFinal| and the new data % to |predict|.