www.gusucode.com > stats 源码程序 matlab案例代码 > stats/EstimateGeneralizationErrorOfBoostingEnsemblesExample.m

    %% Estimate Generalization Error of Boosting Ensemble
% Estimate the generalization error of ensemble of boosted
% classification trees.
%%
% Load the |ionosphere| data set.
load ionosphere
%%
% Cross-validate an ensemble of classification trees using AdaBoostM1 and
% 10-fold cross-validation.  Specify that each tree should be split a
% maximum of five times using a decision tree template.
rng(5); % For reproducibility
t = templateTree('MaxNumSplits',5);
Mdl = fitcensemble(X,Y,'Method','AdaBoostM1','Learners',t,'CrossVal','on');
%%
% |Mdl| is a |ClassificationPartitionedEnsemble| model.
%%
% Plot the cumulative, 10-fold cross-validated, misclassification rate.
% Display the estimated generalization error of the ensemble.
kflc = kfoldLoss(Mdl,'Mode','cumulative');
figure;
plot(kflc);
ylabel('10-fold Misclassification rate');
xlabel('Learning cycle');

estGenError = kflc(end)
%%
% |kfoldLoss| returns the generalization error by default.  However,
% plotting the cumulative loss allows you to monitor how the loss changes
% as weak learners accumulate in the ensemble.
%%
% The ensemble achieves a misclassification rate of around 0.06 after
% accumulating about 50 weak learners.  Then, the misclassification rate
% increase slightly as more weak learners enter the ensemble.
%%
% If you are satisfied with the generalization error of the ensemble, then,
% to create a predictive model, train the ensemble again using all of the
% settings except cross-validation. However, it is good practice to tune
% hyperparameters, such as the maximum number of decision splits per tree
% and the number of learning cycles.