www.gusucode.com > stats 源码程序 matlab案例代码 > stats/BootstrapResamplingExample.m

    %% Bootstrap Resampling
%

% Copyright 2015 The MathWorks, Inc.


%%
% The bootstrap procedure involves choosing random
% samples with replacement from a data set and analyzing each sample
% the same way. Sampling with replacement means that each observation
% is selected separately at random from the original dataset. So a particular
% data point from the original data set could appear multiple times
% in a given bootstrap sample. The number of elements in each bootstrap
% sample equals the number of elements in the original data set. The
% range of sample estimates you obtain enables you to establish the
% uncertainty of the quantity you are estimating.

%%
% This example from Efron and Tibshirani compares Law School Admission Test
% (LSAT) scores and subsequent law school grade point average (GPA) for a
% sample of 15 law schools.
load lawdata
plot(lsat,gpa,'+')
lsline

%%
% The least-squares fit line indicates that higher LSAT scores go with
% higher law school GPAs. But how certain is this conclusion? The plot
% provides some intuition, but nothing quantitative.

%%
% You can calculate the correlation coefficient of the variables using the
% |corr|function.
rhohat = corr(lsat,gpa)

%%
% Now you have a number describing the positive connection between LSAT and
% GPA; though it may seem large, you still do not know if it is
% statistically significant.

%%
% Using the |bootstrp| function you can resample the |lsat| and |gpa|
% vectors as many times as you like and consider the variation in the
% resulting correlation coefficients.
rng default  % For reproducibility
rhos1000 = bootstrp(1000,'corr',lsat,gpa);

%%
% This resamples the |lsat| and |gpa| vectors 1000 times and computes the
% |corr| function on each sample. You can then plot the result in a
% histogram.
histogram(rhos1000,30,'FaceColor',[.8 .8 1])

%%
% Nearly all the estimates lie on the interval [0.4 1.0].

%%
% It is often desirable to construct a confidence interval for a parameter
% estimate in statistical inferences. Using the |bootci| function, you can
% use bootstrapping to obtain a confidence interval for the |lsat| and
% |gpa| data.
ci = bootci(5000,@corr,lsat,gpa)

%%
% Therefore, a 95% confidence interval for the correlation coefficient
% between LSAT and GPA is [0.33 0.94]. This is strong quantitative evidence
% that LSAT and subsequent GPA are positively correlated. Moreover, this
% evidence does not require any strong assumptions about the probability
% distribution of the correlation coefficient.

%%
% Although the |bootci| function computes the Bias
% Corrected and accelerated (BCa) interval as the default type, it is
% also able to compute various other types of bootstrap confidence intervals,
% such as the studentized bootstrap confidence interval.