www.gusucode.com > demos工具箱matlab源码程序 > demos/LogitMapReduceExample.m

    %% Using MapReduce to Fit a Logistic Regression Model
% This example shows how to use |mapreduce| to carry out simple logistic
% regression using a single predictor. It demonstrates chaining multiple
% |mapreduce| calls to carry out an iterative algorithm. Since each
% iteration requires a separate pass through the data, an anonymous
% function passes information from one iteration to the next to supply
% information directly to the mapper.

% Copyright 1984-2014 The MathWorks, Inc.

%% Prepare Data
% Create a datastore using the |airlinesmall.csv| data set. This 12
% megabyte data set contains 29 columns of flight information for several
% airline carriers, including arrival and departure times. In this example,
% the variables of interest are |ArrDelay| (flight arrival delay) and
% |Distance| (total flight distance).
ds = tabularTextDatastore('airlinesmall.csv', 'TreatAsMissing', 'NA');
ds.SelectedVariableNames = {'ArrDelay', 'Distance'}

%%
% |tabularTextDatastore| returns a |TabularTextDatastore| object for the data. This
% datastore treats |'NA'| strings as missing, and replaces the missing
% values with |NaN| values by default. Additionally, the
% |SelectedVariableNames| property allows you to work with only the
% specified variables of interest, which you can verify using |preview|.
preview(ds)

%% Perform Logistic Regression
% Logistic regression is a way to model the probability of an event as a
% function of another variable. In this example, logistic regression models
% the probability of a flight being more than 20 minutes late as a function
% of the flight distance, in thousands of miles.
%
% To accomplish this logistic regression, the mapper and reducer functions
% must collectively perform a weighted least-squares regression based on
% the current coefficient values. The mapper function computes a weighted
% sum of squares and cross product for each chunk of input data. 

%%
% Display the mapper function file.
type logitMapper

%%
% The reducer function computes the regression coefficient estimates from
% the sums of squares and cross products.

%%
% Display the reducer function file.
type logitReducer

%% Run MapReduce
% Run |mapreduce| iteratively by enclosing the calls to |mapreduce| in a
% loop. The loop runs until the convergence criteria are met, with a
% maximum of five iterations.

% Define the coefficient vector, starting as empty for the first iteration.
b = []; 

for iteration = 1:5
    b_old = b;
    iteration
    
    % Here we will use an anonymous function as our mapper. This function
    % definition includes the value of b computed in the previous
    % iteration.
    mapper = @(t,ignore,intermKVStore) logitMapper(b,t,ignore,intermKVStore);
    result = mapreduce(ds, mapper, @logitReducer, 'Display', 'off');
    
    tbl = readall(result);
    b = tbl.Value{1}
    
    % Stop iterating if we have converged.
    if ~isempty(b_old) && ...
       ~any(abs(b-b_old) > 1e-6 * abs(b_old))
       break
    end
end

%% View Results
% Use the resulting regression coefficient estimates to plot a probability
% curve. This curve shows the probability of a flight being more than 20
% minutes late as a function of the flight distance.
xx = linspace(0,4000);
yy = 1./(1+exp(-b(1)-b(2)*(xx/1000)));
plot(xx,yy); 
xlabel('Distance');
ylabel('Prob[Delay>20]')