www.gusucode.com > stats 源码程序 matlab案例代码 > stats/GeneralizedLinearModelWorkflowExample.m

    %% Generalized Linear Model Workflow  
% This example shows how to fit a generalized linear model and analyze the
% results. A typical workflow involves the following: import data, fit a
% generalized linear model, test its quality, modify it to improve the quality,
% and make predictions based on the model. It computes the probability that
% a flower is in one of two classes, based on the Fisher iris data.   

%% Step 1. Load the data. 
% Load the Fisher iris data. Extract the rows that have classification versicolor
% or virginica. These are rows 51 to 150. Create logical response variables
% that are |true| for versicolor flowers. 
load fisheriris
X = meas(51:end,:); % versicolor and virginica
y = strcmp('versicolor',species(51:end));  

%% Step 2. Fit a generalized linear model. 
% Fit a binomial generalized linear model to the data. 
mdl = fitglm(X,y,'linear',...
    'distr','binomial')  

%% Step 3. Examine the result, consider alternative models. 
% Some $p$-values in the |pValue| column are not very small. Perhaps the
% model can be simplified. 
%
% See if some 95% confidence intervals for the coefficients include 0. If
% so, perhaps these model terms could be removed. 
confint = coefCI(mdl) 

%%
% Only two of the predictors have coefficients whose confidence intervals
% do not include 0.  

%% 
% The coefficients of |'x1'| and |'x2'| have the largest $p$-values. Test
% whether both coefficients could be zero. 
M = [0 1 0 0 0     % picks out coefficient for column 1
     0 0 1 0 0];   % picks out coefficient for column 2
p = coefTest(mdl,M)  

%% 
% The $p$-value of about 0.14 is not very small. Drop those terms from the
% model. 
mdl1 = removeTerms(mdl,'x1 + x2')  

%% 
% Perhaps it would have been better to have |stepwiseglm| identify the model
% initially. 
mdl2 = stepwiseglm(X,y,...
    'constant','Distribution','binomial','upper','linear') 

%%
% |stepwiseglm| included |'x2'| in the model, because it neither adds nor
% removes terms with $p$-values between 0.05 and 0.10.  

%% Step 4. Look for outliers and exclude them. 
% Examine a leverage plot to look for influential outliers. 
plotDiagnostics(mdl2,'leverage')    

%%
% There is one observation with a leverage close to one. Using the Data
% Cursor, click the point, and find it has index 69.     

%% 
% See if the model coefficients change when you fit a model excluding this
% point. 
oldCoeffs = mdl2.Coefficients.Estimate;
mdl3 = fitglm(X,y,'linear',...
    'distr','binomial','pred',2:4,'exclude',69);
newCoeffs = mdl3.Coefficients.Estimate;
disp([oldCoeffs newCoeffs]) 

%%
% The model coefficients do not change, suggesting that the response at
% the high-leverage point is consistent with the predicted value from the
% reduced model.  

%% Step 5. Predict the probability that a new flower is versicolor. 
% Use |mdl2| to predict the probability that a flower with average measurements
% is versicolor. Generate confidence intervals for your prediction. 
[newf,newc] = predict(mdl2,mean(X)) 

%%
% The model gives almost a 50% probability that the average flower is versicolor,
% with a wide confidence interval about this estimate.