www.gusucode.com > stats_featured 源码程序 matlab案例代码 > stats_featured/survivaldemo.m

    %% Analyzing Survival or Reliability Data
% This example shows how to analyze lifetime data with censoring.
% In biological or
% medical applications, this is known as survival analysis, and the times
% may represent the survival time of an organism or the time until a
% disease is cured.  In engineering applications, this is known as
% reliability analysis, and the times may represent the time to failure of
% a piece of equipment.
%
% Our example models the time to failure of a throttle from an automobile
% fuel injection system.

%   Copyright 2004-2014 The MathWorks, Inc.


%% Special Properties of Lifetime Data
% Some features of lifetime data distinguish them other types of data.
% First, the lifetimes are always positive values, usually representing
% time.  Second, some lifetimes may not be observed exactly, so that they
% are known only to be larger than some value.  Third, the distributions and
% analysis techniques that are commonly used are fairly specific to
% lifetime data
%
% Let's simulate the results of testing 100 throttles until failure.  We'll
% generate data that might be observed if most throttles had a fairly long
% lifetime, but a small percentage tended to fail very early.
rng(2,'twister');
lifetime = [wblrnd(15000,3,90,1); wblrnd(1500,3,10,1)];

%%
% In this example, assume that we are testing the throttles under stressful
% conditions, so that each hour of testing is equivalent to 100 hours of
% actual use in the field.  For pragmatic reasons, it's often the case that
% reliability tests are stopped after a fixed amount of time.  For this
% example, we will use 140 hours, equivalent to a total of 14,000 hours of
% real service.  Some items fail during the test, while others survive the
% entire 140 hours.  In a real test, the times for the latter would be
% recorded as 14,000, and we mimic this in the simulated data.  It is also
% common practice to sort the failure times.
T = 14000;
obstime = sort(min(T, lifetime));

%%
% We know that any throttles that survive the test will fail eventually,
% but the test is not long enough to observe their actual time to failure.
% Their lifetimes are only known to be greater than 14,000 hours.  These
% values are said to be censored.  This plot shows that about 40% of our
% data are censored at 14,000.
failed = obstime(obstime<T); nfailed = length(failed);
survived = obstime(obstime==T); nsurvived = length(survived);
censored = (obstime >= T);
plot([zeros(size(obstime)),obstime]', repmat(1:length(obstime),2,1), ...
     'Color','b','LineStyle','-')
line([T;3e4], repmat(nfailed+(1:nsurvived), 2, 1), 'Color','b','LineStyle',':');
line([T;T], [0;nfailed+nsurvived],'Color','k','LineStyle','-')
text(T,30,'<--Unknown survival time past here')
xlabel('Survival time'); ylabel('Observation number')

%% Ways of Looking at Distributions
% Before we examine the distribution of the data, let's consider different
% ways of looking at a probability distribution.
%
% * A probability density function (PDF) indicates the relative probability
%   of failure at different times.
% * A survivor function gives the probability of survival as a
%   function of time, and is simply one minus the cumulative distribution
%   function (1-CDF).
% * The hazard rate gives the instantaneous probability of failure
%   given survival to a given time.  It is the PDF divided by the survivor
%   function.  In this example the hazard rates turn out to be increasing,
%   meaning the items are more susceptible to failure as time passes
%   (aging).
% * A probability plot is a re-scaled CDF, and is used to compare data
%   to a fitted distribution.
%
% Here are examples of those four plot types, using the Weibull
% distribution to illustrate.  The Weibull is a common distribution for
% modeling lifetime data.
x = linspace(1,30000);
subplot(2,2,1);
plot(x,wblpdf(x,14000,2),x,wblpdf(x,18000,2),x,wblpdf(x,14000,1.1))
title('Prob. Density Fcn')
subplot(2,2,2);
plot(x,1-wblcdf(x,14000,2),x,1-wblcdf(x,18000,2),x,1-wblcdf(x,14000,1.1))
title('Survivor Fcn')
subplot(2,2,3);
wblhaz = @(x,a,b) (wblpdf(x,a,b) ./ (1-wblcdf(x,a,b)));
plot(x,wblhaz(x,14000,2),x,wblhaz(x,18000,2),x,wblhaz(x,14000,1.1))
title('Hazard Rate Fcn')
subplot(2,2,4);
probplot('weibull',wblrnd(14000,2,40,1))
title('Probability Plot')

%% Fitting a Weibull Distribution
% The Weibull distribution is a generalization of the exponential
% distribution.  If lifetimes follow an exponential distribution, then they
% have a constant hazard rate.  This means that they do not age, in the
% sense that the probability of observing a failure in an interval, given
% survival to the start of that interval, doesn't depend on where the
% interval starts.  A Weibull distribution has a hazard rate that may
% increase or decrease.
%
% Other distributions used for modeling lifetime data include the
% lognormal, gamma, and Birnbaum-Saunders distributions.
%
% We will plot the empirical cumulative distribution function of our data,
% showing the proportion failing up to each possible survival time.  The
% dotted curves give 95% confidence intervals for these probabilities.
subplot(1,1,1);
[empF,x,empFlo,empFup] = ecdf(obstime,'censoring',censored);
stairs(x,empF);
hold on;
stairs(x,empFlo,':'); stairs(x,empFup,':');
hold off
xlabel('Time'); ylabel('Proportion failed'); title('Empirical CDF')

%%
% This plot shows, for instance, that the proportion failing by time 4,000
% is about 12%, and a 95% confidence bound for the probability of failure
% by this time is from 6% to 18%.  Notice that because our test only ran
% 14,000 hours, the empirical CDF only allows us to compute failure
% probabilities out to that limit.  Almost half of the data were censored at
% 14,000, and so the empirical CDF only rises to about 0.53, instead of
% 1.0.
%
% The Weibull distribution is often a good model for equipment failure. The
% function |wblfit| fits the Weibull distribution to data, including data
% with censoring.  After computing parameter estimates, we'll evaluate the
% CDF for the fitted Weibull model, using those estimates.  Because the CDF
% values are based on estimated parameters, we'll compute confidence bounds
% for them.
paramEsts = wblfit(obstime,'censoring',censored);
[nlogl,paramCov] = wbllike(paramEsts,obstime,censored);
xx = linspace(1,2*T,500);
[wblF,wblFlo,wblFup] = wblcdf(xx,paramEsts(1),paramEsts(2),paramCov);

%%
% We can superimpose plots of the empirical CDF and the fitted CDF, to
% judge how well the Weibull distribution models the throttle reliability
% data.
stairs(x,empF);
hold on
handles = plot(xx,wblF,'r-',xx,wblFlo,'r:',xx,wblFup,'r:');
hold off
xlabel('Time'); ylabel('Fitted failure probability'); title('Weibull Model vs. Empirical')

%%
% Notice that the Weibull model allows us to project out and compute
% failure probabilities for times beyond the end of the test.  However, it
% appears the fitted curve does not match our data well.  We have too many
% early failures before time 2,000 compared with what the Weibull model
% would predict, and as a result, too few for times between about 7,000 and
% about 13,000.  This is not surprising -- recall that we generated data
% with just this sort of behavior.

%% Adding a Smooth Nonparametric Estimate
% The pre-defined functions provided with the
% Statistics and Machine Learning Toolbox(TM) don't
% include any distributions that have an excess of early failures like
% this.  Instead, we might want to draw a smooth, nonparametric curve
% through the empirical CDF, using the function |ksdensity|.  We'll remove
% the confidence bands for the Weibull CDF, and add two curves, one with
% the default smoothing parameter, and one with a smoothing parameter 1/3
% the default value.  The smaller smoothing parameter makes the curve
% follow the data more closely.
delete(handles(2:end))
[npF,ignore,u] = ksdensity(obstime,xx,'cens',censored,'function','cdf');
line(xx,npF,'Color','g');
npF3 = ksdensity(obstime,xx,'cens',censored,'function','cdf','width',u/3);
line(xx,npF3,'Color','m');
xlim([0 1.3*T])
title('Weibull and Nonparametric Models vs. Empirical')
legend('Empirical','Fitted Weibull','Nonparametric, default','Nonparametric, 1/3 default', ...
       'location','northwest');

%%
% The nonparametric estimate with the smaller smoothing parameter matches the
% data well.  However, just as for the empirical CDF, it is not possible
% to extrapolate the nonparametric model beyond the end of the test -- the
% estimated CDF levels off above the last observation.
%
% Let's compute the hazard rate for this nonparametric fit and plot it over
% the range of the data.
hazrate = ksdensity(obstime,xx,'cens',censored,'width',u/3) ./ (1-npF3);
plot(xx,hazrate)
title('Hazard Rate for Nonparametric Model')
xlim([0 T])

%%
% This curve has a bit of a "bathtub" shape, with a hazard rate that is
% high near 2,000, drops to lower values, then rises again.  This is typical
% of the hazard rate for a component that is more susceptible to failure
% early in its life (infant mortality), and again later in its life
% (aging).
%
% Also notice that the hazard rate cannot be estimated above the largest
% uncensored observation for the nonparametric model, and the graph drops
% to zero.

%% Alternative Models
% For the simulated data we've used for this example, we found that a
% Weibull distribution was not a suitable fit.  We were able to fit the
% data well with a nonparametric fit, but that model was only useful within
% the range of the data.
%
% One alternative would be to use a different parametric distribution.  The
% Statistics and Machine Learning Toolbox includes functions for other common lifetime
% distributions such as the lognormal, gamma, and Birnbaum-Saunders, as well
% as many other distributions that are not commonly used in lifetime
% models.  You can also define and fit custom parametric models to lifetime
% data, as described in the <docid:stats_examples.example-ex99562812> example.
%
% Another alternative would be to use a mixture of two parametric
% distributions -- one representing early failure and the other
% representing the rest of the distribution.  Fitting mixtures of
% distributions is described in the <docid:stats_examples.example-ex48933917> example.