www.gusucode.com > matlab 案例源码 matlab代码程序 > matlab/LinearRegressionUGExample.m

    %% Linear Regression (retired, for future use)
% Linear regression models the relation between a dependent, or response,
% variable $y$ and one or more independent, or predictor, variables $x_1, ..., x_n$.
% Simple linear regression considers only one independent variable using the
% relation
% 
% $$y = \beta_0 + \beta_1x + \epsilon,$$
% 
% where $\beta_0$ is the y-intercept, $\beta_1$ is the slope (or regression
% coefficient), and $\epsilon$ is the error term.
% 
% Multiple (or general) linear regression considers multiple independent variables, where
% 
% $$y = \beta_0 + \beta_1x_1 + ... + \beta_nx_n + \epsilon.$$
% 
% You can model a curve by using higher powers of $x$ in place of $x_1,
% ..., x_n$. The relation is now 
% 
% $$y = \beta_0 + \beta_1x + \beta_2x^2 + ... + \beta_nx^n + \epsilon.$$
% 
% Linear regression is defined as being linear in the coefficients
% $\beta_0, ..., \beta_n$. The independent variables $x_1, ..., x_n$ can be
% nonlinear.
%% Preparing Data
% Before running a linear regression, you might need to import your data
% into MATLAB(TM).
% 
% To import data from a *text file*, see <docid:import_export.f5-35378>.
% 
% To import data from an *Excel(R)* spreadsheet, see
% <docid:import_export.bs5bkj_>.
%
%% Simple Linear Regression Using the \ Operator
% You can find the linear regression relation between dataset $X$ for the
% independent variable and dataset $Y$ for 
% the dependent variable using the <docid:matlab_ref.btg5qam> operator.
%
% The dataset |accidents| contains data for fatal traffic accidents in U.S.
% states. Load accident data in |y| and state population data in |x|. 
% Find the linear regression relation $y = \beta_1x$ between the accidents in a state and
% a state's population using the |\| operator. The |\| operator performs
% a least-squares regression. 

% Copyright 2015 The MathWorks, Inc.

load accidents
x = hwydata(:,14); %Population of states
y = hwydata(:,4); %Accidents per state
b1 = x\y
%% 
% |b1| is the slope or regression coefficient. The linear relation is |y = b1*x = 1.4802*x|.
% 
% Calculate the accidents per state |yCalc| from |x| using
% the relation. Visualize the regression by plotting the actual values
% |y| and the calculated values |yCalc|.
yCalc = b1*x;
h1 = figure(1);
plot(x,y,x,yCalc)
xlabel('Population of state')
ylabel('Fatal traffic accidents per state')
title('Linear Regression Relation Between Accidents & Population')
grid on
%% 
% You can improve the fit by including a y-intercept $\beta_0$ in your model as $y =
% \beta_0 + \beta_1x$. Calculate the y-intercept by padding |x| with a
% column of ones and using the |\| operator. 
X = [ones(length(x),1) x];
b = X\y
%%
% This result represents the relation |y = b0 + b1*x = 142.7120 +
% 1.2564e-04x|.
% 
% Visualize the relation by plotting it on the same figure. The plot shows that including a
% y-intercept in the relation improves the fit.
yCalc = (b'*X')';
hold on
h = plot(x,yCalc,'--');
legend('Data','Slope','Slope & Intercept','Location','best');
%% Evaluate Model Using Residues, $R^2$ (R-Squared), and adjusted $R^2$
% Residues are the difference between the predicted values and the actual
% values. A pattern in the residues means your model can be improved.
% 
% Calculate and plot the residues for simple linear regression.
yResi = yCalc - y;
h2 = figure(2);
scatter(x,yResi,'+');
%%
% The residues show that the model is a good fit at the extremes but loses
% accurate between the extremes. Thus, the model can be improved.
% 
% The coefficient of determination, $R^2$, is another measure of how useful the
% model is in predicting the data. $R^2$ falls between $0$ to $1$. The
% higher the value, the better the model is at predicting the data.
% 
% For simple linear regression, R-squared is defined as the square of the oorrelation
% coefficient R between $Y$ and $X$. Repeat the simple linear regression example, 
% calculate |R| using the <docid:matlab_ref.f80-999628> function, and find R-squared.
b1 = x\y;
yCalc = b1*x;
R = corrcoef(x,y);
Rsq = R(1,2)^2
%%
% Adjusted $R^2$ is defined as
% $$ R^2_{adj} = 1 -
% \frac{SS_{resid}}{SS_{total}}\bigg(\frac{n_{obs}-1}{n_{obs} - d -
% 1}\biggr),$$
% where $SS_{resid}$ is the sum of the residuals and $SS_{total}$ is the
% sum of the differences of the data from mean. The residuals are the
% differences between the calculated values and the data.
% 
% Find the adjusted $R^2$.
SSresid = sum((yCalc - y).^2);
SStotal = (length(y)-1)*var(y);
deg = length(b1);
Radj = 1 - SSresid/SStotal*((length(y)-1)/(length(y)-deg-1))
%% Linear Regression by Fitting a Curve
% To improve your fit, you can use a higher-order polynomial. This section
% shows how to fit polynomials models using the |\| operator. If you have
% only one independent variable, use |polyfit| instead. For details, see <docid:matlab_math.f1-8450>. 
% 
% A polynomial regression is linear because the coefficients are linear,
% while the terms are nonlinear. Choose a fourth-order polynomial model
% 
% $$y = \beta_0 + \beta_1x + \beta_2x^2 + \beta_3x^3 + \beta_4x^4$$
% 
% Scale down |x| to avoid a rank deficient matrix. If you do not scale |x|, the |\| operator
% returns a warning stating that your matrix is rank deficient.
xSc = x/10000;
%% 
% Form the design matrix |X|.
X = [ones(length(xSc),1) xSc xSc.^2 xSc.^3 xSc.^4];
%% 
% Perform linear regression using the |\| operator.
B = X\y;
%%
% Plot the result on the first figure.
yCalc = (B'*X')';
figure(h1)
plot(x,yCalc,'-.')
legend('Data','Slope','Slope & intercept','Polynomial fit','Location','best')
%% Linear Regression Using Custom Equation
% To fit the data, you can use a custom equation. Use the equation
% 
% $$y = \beta_0 + \beta_1x\sin(x) + \beta_2\log(x)$$
% 
% Construct the design matrix |X|.
X = [ones(length(x),1) x.*sin(x) log(x)];
%% 
% Perform linear regression using the |\| operator and plot the result.
B = X\y;
yCalc = (B'*X')';
plot(x,yCalc,':')
legend('Data','Slope','Slope & intercept','Polynomial fit','Custom Equation','Location','best')
%% Multiple Linear Regression Using the |\| Operator
% Multiple linear regression relates a dependent variable $Y$ to
% multiple independent variables $x_1, ..., x_n$ as
% 
% $$y = \beta_0 + \beta_1x_1 + ... + \beta_nx_n + \epsilon.$$
% 
% Model accidents per state using the independent variables of population
% per state and vehicles per state. First, load data from |hwydata|.
x = hwydata(:,14); % population
y = hwydata(:,6); % vehicles
z = hwydata(:,4); % accidents
%%
% Concatenate |x| and |y| into a matrix. Pad the matrix with ones to
% include a y-intercept in the model. Calcuate the coefficients of the linear model using
% the |\| operator. Use |format long| to view the result.
M = [ones(length(x),1) x y];
format long
v = M\z
%% 
% Reset |format| to the default.
format short
%% 
% Calculate accident values from the model. Compare the actual values and
% calculated values by plotting them on the same figure.
zCalc = (v'*M')';
clf
plot3(x,y,z,'LineWidth',1)
hold on
plot3(x,y,zCalc,'--','LineWidth',1)
view(-11,38)
grid on
xlabel('Population of State')
ylabel('Vehicles in State')
zlabel('Accidents')
title('Linear Regression of Accidents in States from Population and Vehicles')
%% Evaluate Model Using Residues, $R^2$ (R-squared), and adjusted $R^2$
% Calculate and plot the residues for the multiple linear regression.
zResi = zCalc - z;
figure(h2)
subplot(2,1,1)
scatter(x,zResi)
subplot(2,1,2)
scatter(y,zResi)
%% 
% The residues show that the model is a good fit at the extremes but loses
% accurate between the extremes. Thus, the model can be improved.
% 
% Calculate $R^2$ for Multiple Linear Regression by squaring the
% correlation coefficient between the actual values and calculated values.
% 
% Find the correlation coefficient using |corrcoef|. Calculate $R^2$
% for the multiple linear regression previously performed. 
R = corrcoef(z,zCalc);
Rsq = R(1,2)^2
%% 
% As previously defined, adjusted-$R^2$ is
% 
% $$ R^2_{adj} = 1 -
% \frac{SS_{resid}}{SS_{total}}\bigg(\frac{n_{obs}-1}{n_{obs} - d -
% 1}\biggr).$$
% 
% Calculate adjusted $R^2$ for multivariate regression.
SSresid = sum((zCalc - z).^2);
SStotal = (length(z) - 1)*var(z);
deg = length(v) - 1;
Radj = 1 - SSresid/SStotal*((length(z)-1)/(length(z)-deg-1))
%% 
% This value of adjusted $R^2$ is higher than the adjusted $R^2$ of
% |0.8186| with one independent variable. The higher value of
% adjusted $R^2$ indicates that the two-variable model is a better
% predictor than the one-variable model. However, you should use
% adjusted $R^2$ in conjunction with other measures to determine how good
% your model is.
%% Linear Regression Using the Basic Fitting Tool
% The MATLAB Basic Fitting GUI allows you to interactively:
%% 
% * Model data using a spline interpolant, a shape-preserving interpolant, or a polynomial up to the tenth degree
% * Plot one or more fits together with data
% * Plot the residuals of the fits
% * Compute model coefficients
% * Compute the norm of the residuals (a statistic you can use to analyze how well a model fits your data)
% * Use the model to interpolate or extrapolate outside of the data
% * Save coefficients and computed values to the MATLAB workspace for use outside of the GUI
% * Generate MATLAB code to recompute fits and reproduce plots with new data
% 
% For details, see <docid:data_analysis.f1-15377>.
%% Advanced Functionality Using Statistics or Curve Fitting Toolboxes
% The Statistics and Machine Learning Toolbox(TM) and Curve Fitting Toolbox(TM) toolboxes provide advanced functionality
%
% The Curve Fitting Toolbox The toolbox supports over 100 regression
% models. All of these standard regression models include optimized solver 
% parameters and starting conditions to improve fit quality. Supported models include:
%% 
% * Lines and planes
% * High order polynomials (up to ninth degree for curves and fifth degree for surfaces)
% * Fourier and power series
% * Gaussians
% * Weibull functions
% * Exponentials
% * Rational functions
% * Sum of sines
% * Custom Equation option to specify your own regression model
% 
% For information on linear regression in the Curve Fitting Toolbox, please
% see <docid:curvefit_doc_center.linear-and-nonlinear-regression-103> and 
% <docid:curvefit_ug.bszh0l1>.
% 
% The Statistics and Machine Learning Toolbox(TM) The toolbox offers
% several types of linear regression models and fitting methods, including: 
%% 
% * Simple: model with only one predictor
% * Multiple: model with multiple predictors
% * Multivariate: model with multiple response variables
% * Robust:  model in the presence of outliers
% * Stepwise: model with automatic variable selection
% * Regularized:  model that can deal with redundant predictors and prevent overfitting using ridge, lasso, and elastic net algorithms 
% 
% For information on linear regression in the Statistics Toolbox, please
% see <docid:stats_ug.bs9kc51>.