www.gusucode.com > stats_featured 源码程序 matlab案例代码 > stats_featured/wnlsdemo.m

    %% Weighted Nonlinear Regression
% This example shows how to fit a nonlinear regression model for data with
% nonconstant error variance.
%
% Regular nonlinear least squares algorithms are appropriate when
% measurement errors all have the same variance.  When that assumption is
% not true, it is appropriate to used a weighted fit. This example shows
% how to use weights with the |fitnlm| function.

%   Copyright 2005-2012 The MathWorks, Inc.


%% Data and Model for the Fit
% We'll use data collected to study water pollution caused by industrial
% and domestic waste.  These data are described in detail in Box, G.P.,
% W.G. Hunter, and J.S. Hunter, Statistics for Experimenters (Wiley, 1978,
% pp. 483-487).  The response variable is biochemical oxygen demand in
% mg/l, and the predictor variable is incubation time in days.
x = [1 2 3 5 7 10]';
y = [109 149 149 191 213 224]';

plot(x,y,'ko');
xlabel('Incubation (days), x'); ylabel('Biochemical oxygen demand (mg/l), y');

%%
% We'll assume that it is known that the first two observations were made
% with less precision than the remaining observations.  They might, for
% example, have been made with a different instrument.  Another common
% reason to weight data is that each recorded observation is actually
% the mean of several measurements taken at the same value of x.  In the
% data here, suppose the first two values represent a single raw
% measurement, while the remaining four are each the mean of 5 raw
% measurements.  Then it would be appropriate to weight by the number of
% measurements that went into each observation.
w = [1 1 5 5 5 5]';

%%
% The model we'll fit to these data is a scaled exponential curve that
% becomes level as x becomes large.
modelFun = @(b,x) b(1).*(1-exp(-b(2).*x));

%%
% Just based on a rough visual fit, it appears that a curve drawn through
% the points might level out at a value of around 240 somewhere in the
% neighborhood of x = 15.  So we'll use 240 as the starting value for b1,
% and since e^(-.5*15) is small compared to 1, we'll use .5 as the starting
% value for b2.
start = [240; .5];

%% Fit the Model without Weights
% The danger in ignoring measurement error is that the fit may be overly
% influenced by imprecise measurements, and may therefore not provide a
% good fit to measurements that are known precisely. Let's fit the data
% without weights and compare it to the points.
nlm = fitnlm(x,y,modelFun,start);
xx = linspace(0,12)';
line(xx,predict(nlm,xx),'linestyle','--','color','k')

%% Fit the Model with Weights
% Notice that the fitted curve is pulled toward the first two points, but
% seems to miss the trend of the other points. Let's try repeating the fit
% using weights.
wnlm = fitnlm(x,y,modelFun,start,'Weight',w)
line(xx,predict(wnlm,xx),'color','b')

%%
% The estimated population standard deviation in this case describes the
% average variation for a "standard" observation with a weight, or
% measurement precision, of 1.
wnlm.RMSE

%% 
% An important part of any analysis is an estimate of the precision of the
% model fit.  The coefficient display shows standard errors for the
% parameters, but we can also compute confidence intervals for them.
coefCI(wnlm)

%% Estimate the Response Curve
% Next, we'll compute the fitted response values and confidence intervals
% for them. By default, those widths are for pointwise confidence
% bounds for the predicted value, but we will request simultaneous
% intervals for the entire curve.
[ypred,ypredci] = predict(wnlm,xx,'Simultaneous',true);
plot(x,y,'ko', xx,ypred,'b-', xx,ypredci,'b:');
xlabel('x'); ylabel('y');
legend({'Data', 'Weighted fit', '95% Confidence Limits'},'location','SouthEast');
%%
% Notice that the two downweighted points are not fit as well by the curve
% as the remaining points.  That's as you would expect for a weighted fit.
%
% It's also possible to estimate prediction intervals
% for future observations at specified values of x.  Those intervals
% will in effect assume a weight, or measurement precision, of 1.
[ypred,ypredci] = predict(wnlm,xx,'Simultaneous',true,'Prediction','observation');
plot(x,y,'ko', xx,ypred,'b-', xx,ypredci,'b:');
xlabel('x'); ylabel('y');
legend({'Data', 'Weighted fit', '95% Prediction Limits'},'location','SouthEast');

%%
% The absolute scale of the weights actually doesn't affect the parameter
% estimates. Rescaling the weights by any constant would have given us the
% same estimates. But they do affect the confidence bounds, since the
% bounds represent an observation with weight 1. Here you can see that the
% points with higher weight seem too close to the fitted line, compared
% with the confidence limits.
%
% While the |predict| method doesn't allow us to change the weights, it is
% possible for us to do some post-processing and investigate how the curve
% would look for a more precise estimate. Suppose we are interested in a
% new observation that is based on the average of five measurements, just
% like the last four points in this plot. We could reduce the width of the
% intervals by a factor of sqrt(5).

halfwidth = ypredci(:,2)-ypred;
newwidth = halfwidth/sqrt(5);
newci = [ypred-newwidth, ypred+newwidth];
plot(x,y,'ko', xx,ypred,'b-', xx,newci,'r:');
xlabel('x'); ylabel('y');
legend({'Data', 'Weighted fit', 'Limits for weight=5'},'location','SouthEast');


%% Residual Analysis
% In addition to plotting the data and the fit, we'll plot residuals from a
% fit against the predictors, to diagnose any problems with the model.
% The residuals should appear independent and identically distributed but
% with a variance proportional to the inverse of the weights. We can
% standardize this variance to make the plot easier to interpret.
r = wnlm.Residuals.Raw;
plot(x,r.*sqrt(w),'b^');
xlabel('x'); ylabel('Residuals, yFit - y');
%%
% There is some evidence of systematic patterns in this residual plot.  Notice
% how the last four residuals have a linear trend, suggesting that the model
% might not increase fast enough as x increases.  Also, the magnitude of the
% residuals tends to decrease as x increases, suggesting that measurement
% error may depend on x.  These deserve investigation, however, there are so
% few data points, that it's hard to attach significance to these apparent
% patterns.