www.gusucode.com > stats 源码程序 matlab案例代码 > stats/LinearRegressionWorkflowExample.m
%% Linear Regression Workflow % This example shows how to fit a linear regression model. A typical workflow % involves the following: import data, fit a regression, test its quality, % modify it to improve the quality, and share it. % Copyright 2015 The MathWorks, Inc. %% Step 1. Import the data into a dataset array. % |hospital.xls| is an Excel(R) spreadsheet containing patient names, sex, % age, weight, blood pressure, and dates of treatment in an experimental % protocol. First read the data into a table. patients = readtable('hospital.xls',... 'ReadRowNames',true); %% % Examine the first row of data. patients(1,:) %% % The |sex| and |smoke| fields seem to have two choices each. So change % these fields to nominal. patients.smoke = nominal(patients.smoke,{'No','Yes'}); patients.sex = nominal(patients.sex); %% Step 2. Create a fitted model. % Your goal is to model the systolic pressure as a function of a patient's % age, weight, sex, and smoking status. Create a linear formula for |'sys'| % as a function of |'age'|, |'wgt'|, |'sex'|, and |'smoke'| . modelspec = 'sys ~ age + wgt + sex + smoke'; mdl = fitlm(patients,modelspec) %% % The sex, age, and weight predictors have rather high $p$-values, indicating % that some of these predictors might be unnecessary. %% Step 3. Locate and remove outliers. % See if there are outliers in the data that should be excluded from the % fit. Plot the residuals. plotResiduals(mdl) %% % There is one possible outlier, with a value greater than 12. This is probably % not truly an outlier. For demonstration, here is how to find and remove it. %% % Find the outlier. outlier = mdl.Residuals.Raw > 12; find(outlier) %% % Remove the outlier. mdl = fitlm(patients,modelspec,... 'Exclude',84); mdl.ObservationInfo(84,:) %% % Observation 84 is no longer in the model. %% Step 4. Simplify the model. % Try to obtain a simpler model, one with fewer predictors but the same % predictive accuracy. |step| looks for a better model by adding or removing % one term at a time. Allow |step| take up to 10 steps. mdl1 = step(mdl,'NSteps',10) %% % |step| took two steps. This means it could not improve the model further % by adding or subtracting a single term. %% % Plot the effectiveness of the simpler model on the training data. plotResiduals(mdl1) %% % The residuals look about as small as those of the original model. %% Step 5. Predict responses to new data. % Suppose you have four new people, aged 25, 30, 40, and 65, and the first % and third smoke. Predict their systolic pressure using |mdl1|. ages = [25;30;40;65]; smoker = {'Yes';'No';'Yes';'No'}; systolicnew = feval(mdl1,ages,smoker) %% % To make predictions, you need only the variables that |mdl1| uses. %% Step 6. Share the model. % You might want others to be able to use your model for prediction. Access % the terms in the linear model. coefnames = mdl1.CoefficientNames %% % View the model formula. mdl1.Formula %% % Access the coefficients of the terms. coefvals = mdl1.Coefficients(:,1); % table coefvals = table2array(coefvals) %% % The model is |sys = 115.1066 + 0.1078*age + 10.0540*smoke|, where |smoke| % is |1| for a smoker, and |0| otherwise.