www.gusucode.com > stats 源码程序 matlab案例代码 > stats/ExploratoryAnalysisOfDataExample.m

    %% Exploratory Analysis of Data
% This example shows how to explore the distribution of data using
% descriptive statistics.

% Copyright 2015 The MathWorks, Inc.


%% Generate sample data.
% Generate a vector containing randomly-generated sample data.
rng default  % For reproducibility
x = [normrnd(4,1,1,100),normrnd(6,0.5,1,200)];

%% Plot a histogram.
% Plot a histogram of the sample data with a normal density fit. This 
% provides a visual comparison of the sample data and a normal
% distribution fitted to the data.
histfit(x)

%%
% The distribution of the data appears to be left skewed. A normal distribution
% does not look like a good fit for this sample data.

%% Obtain a normal probability plot.
% Obtain a normal probability plot. This plot provides another way to
% visually compare the sample data to a normal distribution fitted to the
% data.
probplot('normal',x)

%%
% The probability plot also shows the deviation of data from normality.

%% Conpute the quantiles.
% Compute the quantiles of the sample data.
p = 0:0.25:1;
y = quantile(x,p);
z = [p;y]

%%
% Create a box plot to visualize the statistics.
boxplot(x)

%%
% The box plot shows the 0.25, 0.5, and 0.75 quantiles. The
% long lower tail and plus signs show the lack of symmetry in the
% sample data values.

%% Compute descriptive statistics.
% Compute the mean and median of the data.
y = [mean(x),median(x)]

%%
% The mean and median values seem close to each other, but a mean smaller
% than the median usually indicates that the data is left skewed.

%%
% Compute the skewness and kurtosis of the data.
y = [skewness(x),kurtosis(x)]

%%
% A negative skewness value means the data is left skewed. The data has a
% larger peakedness than a normal distribution because the kurtosis value
% is greater than 3.

%% Compute z-scores.
% Identify possible outliers by computing the z-scores and finding the
% values that are greater than 3 or less than -3.
Z = zscore(x);
find(abs(Z)>3);

%%
% Based on the z-scores, the 3rd and 35th observations might be outliers.