www.gusucode.com > matlab 案例源码 matlab代码程序 > matlab/TallSkinnyQRTSQRMatrixFactorizationUsingMapReduceExample.m
%% Tall Skinny QR (TSQR) Matrix Factorization Using MapReduce % This example shows how to compute a tall skinny QR (TSQR) factorization % using |mapreduce|. It demonstrates how to chain |mapreduce| calls to % perform multiple iterations of factorizations, and uses the |info| % argument of the map function to compute numeric keys. % Copyright 1984-2014 The MathWorks, Inc. %% Prepare Data % Create a datastore using the |airlinesmall.csv| data set. This % 12-megabyte data set contains 29 columns of flight information for % several airline carriers, including arrival and departure times. In this % example, the variables of interest are |ArrDelay| (flight arrival % delay), |DepDelay| (flight departure delay) and |Distance| (total flight % distance). ds = datastore('airlinesmall.csv', 'TreatAsMissing', 'NA'); ds.ReadSize = 1000; ds.SelectedVariableNames = {'ArrDelay', 'DepDelay', 'Distance'}; %% % The datastore treats |'NA'| values as missing and replaces the missing % values with |NaN| values by default. The |ReadSize| property lets you % specify how to partition the data into chunks. Additionally, the % |SelectedVariableNames| property allows you to work with only the % specified variables of interest, which you can verify using |preview|. preview(ds) %% Chain MapReduce Calls % The implementation of the multi-iteration TSQR algorithm needs to chain % consecutive |mapreduce| calls. To demonstrate the general chaining design % pattern, this example uses two |mapreduce| iterations. The output from % the map function calls is passed into a large set of reducers, and then % the output of these reducers becomes the input for the next |mapreduce| % iteration. %% First MapReduce Iteration % In the first iteration, the map function, |tsqrMapper|, receives one % chunk (the ith) of data, which is a table of size $N_i\times 3$. The % mapper computes the $R$ matrix of this chunk of data and stores it as an % intermediate result. Then, |mapreduce| aggregates the intermediate % results by unique key before sending them to the reduce function. Thus, % |mapreduce| sends all intermediate $R$ matrices with the same key to the % same reducer. % % Since the reducer uses |qr|, which is an in-memory MATLAB function, it's % best to first make sure that the $R$ matrices fit in memory. This example % divides the dataset into eight partitions. The |mapreduce| function reads % the data in chunks and passes the data along with some meta information % to the map function. The |info| input argument is the second input to the % map function and it contains the read offset and file size information % that are necessary to generate the key, % % key = ceil(offset/fileSize/numPartitions). % %% % Display the map function file. % % <include>tsqrMapper.m</include> % %% % The reduce function receives a list of the intermediate $R$ matrices, % vertically concatenates them, and computes the $R$ matrix of the % concatenated matrix. %% % Display the reduce function file. % % <include>tsqrReducer.m</include> % %% % Use |mapreduce| to apply the map and reduce functions to the datastore, % |ds|. outds1 = mapreduce(ds, @tsqrMapper, @tsqrReducer); %% % |mapreduce| returns an output datastore, |outds1|, with files in % the current folder. %% Second MapReduce Iteration % The second iteration uses the output of the first iteration, |outds1|, as % its input. This iteration uses an identity mapper, |identityMapper|, % which simply copies over the data using a single key, |'Identity'|. %% % Display the identity mapper file. % % <include>identityMapper.m</include> % %% % The reducer function is the same in both iterations. The use of a single % key by the map function means that |mapreduce| only calls the reduce % function once in the second iteration. %% % Display the reduce function file. % % <include>tsqrReducer.m</include> % %% % Use |mapreduce| to apply the identity mapper and the same reducer to the % output from the first |mapreduce| call. outds2 = mapreduce(outds1, @identityMapper, @tsqrReducer); %% View Results % Read the final results from the output datastore. r = readall(outds2); r.Value{:} %% Reference % % # Paul G. Constantine and David F. Gleich. 2011. Tall and skinny QR % factorizations in MapReduce architectures. In Proceedings of the Second % International Workshop on MapReduce and Its Applications (MapReduce '11). % ACM, New York, NY, USA, 43-50. DOI=10.1145/1996092.1996103 % <http://doi.acm.org/10.1145/1996092.1996103>