www.gusucode.com > database 案例源码程序 matlab代码 > database/AnalyzeLargeDataSetsDatabaseMapReduceExample.m

    %% Analyze Large Data in Database Using MapReduce
% This example shows how to analyze large data sets that are stored in a
% database. You can access large data sets using a
% |<docid:database_ug.bufomil DatabaseDatastore>| object with Database
% Toolbox(TM). After creating a |DatabaseDatastore| object, you can run
% algorithms on large data sets using a tall array. For an example of using
% a |DatabaseDatastore| object with tall arrays, see
% <docid:database_examples.example-ex44807149>. Alternatively, you can
% write a MapReduce algorithm that defines the chunking and reduction of
% the data.
%
% This example uses MapReduce to determine the mean arrival delay of a
% large set of flight data that is stored in a database. This example
% modifies the <docid:import_export.bujibs2> example to use a
% |DatabaseDatastore| instead of a <docid:matlab_ref.budsjo2-1>. You can
% use MapReduce to modify other MATLAB(R) examples that analyze data, as
% described in <docid:import_export.buhnu4_>.
%
% The |DatabaseDatastore| object does not support using a parallel pool
% with Parallel Computing Toolbox(TM) installed. To analyze data using tall
% arrays or run MapReduce algorithms, set the global execution environment
% to be the local MATLAB(R) session.

%% Create |DatabaseDatastore| Object
% Set the global execution environment to be the local MATLAB(R) session.
mapreducer(0);

%% 
% The file |airlinesmall.csv| contains the large set of flight data. Load
% this file into a Microsoft(R) SQL Server(R) database table
% |airlinesmall|. This table contains 123,523 records.

%%
% Using a JDBC driver, create a database connection |conn| to a
% Microsoft(R) SQL Server(R) database with Windows(R) authentication.
% Specify a blank user name and password. Here, the code assumes that you
% are connecting to a database |toy_store|, a database server |dbtb04|, and
% port number |54317|.

conn = database('toy_store','','','Vendor','Microsoft SQL Server', ...
    'Server','dbtb04','PortNumber',54317,'AuthType','Windows');

%% 
% Create a |DatabaseDatastore| object |dbds| using the database connection
% |conn| and SQL query |sqlquery|. This SQL query retrieves arrival-delay
% data from the table |airlinesmall|.

sqlquery = 'select ArrDelay from airlinesmall';

dbds = databaseDatastore(conn,sqlquery);

%% Define Mapper and Reducer Functions
% To process large data sets in chunks, you can write your own mapper
% function. This example uses the mapper function
% |meanArrivalDelayMapper.m|. This function reads arrival-delay data from
% the |DatabaseDatastore| object, determines the number of delays and the
% total delay in the chunk, and stores both values in
% <docid:matlab_ref.buikx8i-1>. Display the mapper function file.
%
% <include>meanArrivalDelayMapper.m</include>
%
%%
% To process large data sets in chunks, you can write your own reducer
% function. This example uses the reducer function
% |meanArrivalDelayReducer.m|. This reducer function reads intermediate
% values for the number of delays and the total arrival delay. Then, this
% function determines the overall mean arrival delay. |mapreduce| calls
% this reducer function once since the mapper function adds only one key to
% |KeyValueStore|. Display the reducer function file.
%
% <include>meanArrivalDelayReducer.m</include>
%
%% Run MapReduce Using Mapper and Reducer Functions
% To determine the mean arrival delay in the flight data, run MapReduce
% with the |DatabaseDatastore| object |dbds|, mapper function
% |meanArrivalDelayMapper|, and reducer function |meanArrivalDelayReducer|.
outds = mapreduce(dbds,@meanArrivalDelayMapper,@meanArrivalDelayReducer);

%% Display Output from MapReduce
% Read the table |outtab| from the output datastore |outds| using
% |readall|.
outtab = readall(outds)
%%
% The table has only one row containing one key-value pair.
%%
% Display the mean arrival delay |meanArrDelay| from the table |outtab|.
meanArrDelay = outtab.Value{1}
%% Close |DatabaseDatastore| Object and Database Connection
close(dbds)