www.gusucode.com > matlab 案例源码 matlab代码程序 > matlab/ComputeMeanByGroupUsingMapReduceExample.m
%% Compute Mean by Group Using MapReduce % This example shows how to compute the mean by group in a data set using % |mapreduce|. It demonstrates how to do computations on subgroups of data. % Copyright 1984-2014 The MathWorks, Inc. %% Prepare Data % Create a datastore using the |airlinesmall.csv| data set. This % 12-megabyte data set contains 29 columns of flight information for % several airline carriers, including arrival and departure times. In this % example, select |DayOfWeek| and |ArrDelay| (flight arrival delay) as the % variables of interest. ds = datastore('airlinesmall.csv', 'TreatAsMissing', 'NA'); ds.SelectedVariableNames = {'ArrDelay', 'DayOfWeek'}; %% % The datastore treats |'NA'| values as missing, and replaces the missing % values with |NaN| values by default. Additionally, the % |SelectedVariableNames| property allows you to work with only the % selected variables of interest, which you can verify using |preview|. preview(ds) %% Run MapReduce % The |mapreduce| function requires a map function and a reduce function as % inputs. The mapper receives chunks of data and outputs intermediate % results. The reducer reads the intermediate results and produces a final % result. %% % In this example, the mapper computes the count and sum of delays % by the day of week in each chunk of data, and then stores the results as % intermediate key-value pairs. The keys are integers (1 to 7) representing % the days of the week and the values are two-element vectors representing % the count and sum of the delay of each day. %% % Display the map function file. % % <include>meanArrivalDelayByDayMapper.m</include> % %% % After the Map phase, |mapreduce| groups the intermediate key-value pairs % by unique key (in this case, day of the week). Thus, each call to the % reducer works on the values associated with one day of the week. The % reducer receives a list of the intermediate count and sum of delays for % the day specified by the input key (|intermKey|) and sums up the values % into the total count, |n| and total sum |s|. Then, the reducer calculates % the overall mean, and adds one final key-value pair to the output. This % key-value pair represents the mean flight arrival delay for one day of % the week. %% % Display the reduce function file. % % <include>meanArrivalDelayByDayReducer.m</include> % %% % Use |mapreduce| to apply the map and reduce functions to the datastore, % |ds|. meanDelayByDay = mapreduce(ds, @meanArrivalDelayByDayMapper, ... @meanArrivalDelayByDayReducer); %% % |mapreduce| returns a datastore, |meanDelayByDay|, with files in the % current folder. %% % Read the final result from the output datastore, |meanDelayByDay|. result = readall(meanDelayByDay) %% Organize Results % The integer keys (1 to 7) represent the days of the week. To organize the % results more, convert the keys to a categorical array, retrieve the % numeric values from the single element cells, and rename the variable % names of the resulting table. result.Key = categorical(result.Key, 1:7, ... {'Mon','Tue','Wed','Thu','Fri','Sat','Sun'}); result.Value = cell2mat(result.Value); result.Properties.VariableNames = {'DayOfWeek', 'MeanArrDelay'} %% % Sort the rows of the table by mean flight arrival delay. This reveals % that Saturday is the best day of the week to travel, whereas Friday is % the worst. result = sortrows(result,'MeanArrDelay')