www.gusucode.com > matlab 案例源码 matlab代码程序 > matlab/ComputeMeanValueWithMapReduceExample.m

    %% Compute Mean Value with MapReduce
% This example shows how to compute the mean of a single variable in a
% data set using |mapreduce|. It demonstrates a simple use of |mapreduce|
% with one key, minimal computation, and an intermediate state
% (accumulating intermediate sum and count).

% Copyright 1984-2014 The MathWorks, Inc.
%% Prepare Data
% Create a datastore using the |airlinesmall.csv| data set. This
% 12-megabyte data set contains 29 columns of flight information for
% several airline carriers, including arrival and departure times. In this
% example, select |ArrDelay| (flight arrival delay) as the variable of
% interest.
ds = datastore('airlinesmall.csv', 'TreatAsMissing', 'NA');
ds.SelectedVariableNames = 'ArrDelay';

%%
% The datastore treats |'NA'| values as missing, and replaces the missing
% values with |NaN| values by default. Additionally, the
% |SelectedVariableNames| property allows you to work with only the
% selected variable of interest, which you can verify using |preview|.
preview(ds)

%% Run MapReduce
% The |mapreduce| function requires a map function and a reduce function as
% inputs. The mapper receives chunks of data and outputs intermediate
% results. The reducer reads the intermediate results and produces a final
% result.
%% 
% In this example, the mapper finds the count and sum of the arrival delays
% in each chunk of data. The mapper then stores these values as the
% intermediate values associated with the key |'PartialCountSumDelay'|.

%%
% Display the map function file.
%
% <include>meanArrivalDelayMapper.m</include>
%

%%
% The reducer accepts the count and sum for each chunk stored by the
% mapper. It sums up the values to obtain the total count and total sum.
% The overall mean arrival delay is a simple division of the values.
% |mapreduce| only calls this reducer once, since the mapper only adds a
% single unique key. The reducer uses |add| to add a single key-value pair
% to the output.

%%
% Display the reduce function file.
%
% <include>meanArrivalDelayReducer.m</include>
%

%%
% Use |mapreduce| to apply the map and reduce functions to the datastore,
% |ds|.
meanDelay = mapreduce(ds, @meanArrivalDelayMapper, @meanArrivalDelayReducer);

%%
% |mapreduce| returns a datastore, |meanDelay|, with files in the
% current folder.

%%
% Read the final result from the output datastore, |meanDelay|.
readall(meanDelay)