www.gusucode.com > stats 源码程序 matlab案例代码 > stats/EstimateResubstitutionLossOfBoostingEnsembleExample.m

    %% Train Classification Ensemble
% Create a predictive classification ensemble using all available predictor
% variables in the data.  Then, train another ensemble using fewer
% predictors.  Compare the in-sample predictive accuracies of the
% ensembles.
%%
% Load the |census1994| data set.
load census1994
%%
% Train an ensemble of classification models using the entire data set and
% default options.
Mdl1 = fitcensemble(adultdata,'salary')
%%
% |Mdl| is a |ClassificationEnsemble| model.  Some notable
% characteristics of |Mdl| are:
%
% * Because two classes are represented in the data, LogitBoost is
% the ensemble-aggregation algorithm.
% * Because the ensemble-aggregation method is a boosting algorithm,
% classification trees that allow a maximum of 10 splits compose the
% ensemble.
% * One hundred trees compose the ensemble.
%
%%
% Use the classification ensemble to predict the labels of a random set of
% five observations from the data.  Compare the predicted labels with their
% true values.
rng(1) % For reproducibility
[pX,pIdx] = datasample(adultdata,5);
label = predict(Mdl1,pX);
table(label,adultdata.salary(pIdx),'VariableNames',{'Predicted','Truth'})
%%
% Train a new ensemble using |age| and |education| only.
Mdl2 = fitcensemble(adultdata,'salary ~ age + education');
%%
% Compare the resubstitution losses between |Mdl1| and |Mdl2|.
rsLoss1 = resubLoss(Mdl1)
rsLoss2 = resubLoss(Mdl2)
%%
% The in-sample misclassification rate for the ensemble that uses all
% predictors is lower.