www.gusucode.com > stats 源码程序 matlab案例代码 > stats/DetermineOutliersUsingCooksDistanceExample.m

    %% Determine Outliers Using Cook's Distance  
% This example shows how to use Cook's Distance to determine the outliers
% in the data.   

% Copyright 2015 The MathWorks, Inc.


%% 
% Load the sample data and define the independent and response variables. 
load hospital
X = double(hospital(:,2:5));
y = hospital.BloodPressure(:,1);  

%% 
% Fit the linear regression model. 
mdl = fitlm(X,y);  

%% 
% Plot the Cook's distance values. 
plotDiagnostics(mdl,'cookd')    

%%
% The dashed line in the figure corresponds to the recommended threshold
% value, |3*mean(mdl.Diagnostics.CooksDistance)|. The plot has some observations
% with Cook's distance values greater than the threshold value, which for
% this example is 3*(0.0108) = 0.0324. In particular, there are two Cook's
% distance values that are relatively higher than the others, which exceed
% the threshold value. You might want to find and omit these from your data
% and rebuild your model.  

%% 
% Find the observations with Cook's distance values that exceed the threshold
% value. 
find((mdl.Diagnostics.CooksDistance)>3*mean(mdl.Diagnostics.CooksDistance))  

%% 
% Find the observations with Cook's distance values that are relatively
% larger than the other observations with Cook's distances exceeding the
% threshold value. 
find((mdl.Diagnostics.CooksDistance)>5*mean(mdl.Diagnostics.CooksDistance))