www.gusucode.com > distcomp 案例源码程序 matlab代码 > distcomp/paralleldemo_gpu_arrayfun.m

    %% Improve Performance of Element-wise MATLAB(R) Functions on the GPU using ARRAYFUN
% This example shows how |arrayfun| can be used to run a MATLAB(R) function
% natively on the GPU. When the MATLAB function contains many element-wise
% operations, |arrayfun| can provide improved performance when compared to
% simply executing the MATLAB function directly on the GPU with gpuArray
% input data. The MATLAB function can be in its own file or can be a nested
% or anonymous function. It must contain only scalar operations and
% arithmetic.

% Copyright 2010-2013 The MathWorks, Inc.

%%
% We put the example into a function to allow nested functions:
function paralleldemo_gpu_arrayfun


%% Using Horner's Rule to Calculate Exponentials
% Horner's rule allows the efficient evaluation of power series expansions.
% We will use it to calculate the first 10 terms of the power series
% expansion for the exponential function |exp|. We can implement this as a
% MATLAB function.

function y = horner(x)
%HORNER - series expansion for exp(x) using Horner's rule
y = 1 + x.*(1 + x.*((1 + x.*((1 + ...
        x.*((1 + x.*((1 + x.*((1 + x.*((1 + ...
        x.*((1 + x./9)./8))./7))./6))./5))./4))./3))./2));
end

%% Preparing |horner| for the GPU
% To run this function on the GPU with minimal code changes, we could pass
% a |gpuArray| object as input to the |horner| function.  Since |horner|
% contains only individual element-wise operations, we might not realize
% very good performance on the GPU when performing each operation one at a
% time.  However, we can improve the performance by executing all of the
% element-wise operations in the |horner| function at one time using
% |arrayfun|.
%
% To run this function on the GPU using |arrayfun|, we use a handle to the
% |horner| function.  |horner| automatically adapts to different size and
% type inputs. We can compare the results computed on the GPU using both
% |gpuArray| objects and |arrayfun| with standard MATLAB CPU execution
% simply by evaluating the function directly.

hornerFcn = @horner;

%% Create the Input Data
% We create some inputs of different types and sizes, and use |gpuArray| to
% send them to the GPU.

data1  = rand( 2000, 'single' );
data2  = rand( 1000, 'double' );
gdata1 = gpuArray( data1 );
gdata2 = gpuArray( data2 );

%% Evaluate |horner| on the GPU 
% To evaluate the |horner| function on the GPU, we have two choices.  With
% minimal code changes we can evaluate the original function on the GPU by
% providing a |gpuArray| object as input. However, to improve the
% performance on the GPU call |arrayfun|, using the same calling convention
% as the original MATLAB function.
%
% We can compare the accuracy of the results by evaluating the original
% function directly in MATLAB on the CPU. We expect some slight numerical
% differences because the floating-point arithmetic on the GPU does not
% precisely match the arithmetic performed on the CPU.

gresult1 = arrayfun( hornerFcn, gdata1 );
gresult2 = arrayfun( hornerFcn, gdata2 );

comparesingle = max( max( abs( gresult1 - horner( data1 ) ) ) );
comparedouble = max( max( abs( gresult2 - horner( data2 ) ) ) );
%%
fprintf( 'Maximum discrepancy for single precision: %g\n', comparesingle );
fprintf( 'Maximum discrepancy for double precision: %g\n', comparedouble );

%% Comparing Performance between GPU and CPU
% We can compare the performance of the GPU versions to the native MATLAB
% CPU version. Current generation GPUs have much better performance in
% single precision, so we compare that.

% CPU execution
tic
hornerFcn( data1 );
tcpu = toc;

% GPU execution using only gpuArray objects 
tgpuObject = gputimeit(@() hornerFcn(gdata1));

% GPU execution using gpuArray objects with arrayfun
tgpuArrayfun = gputimeit(@() arrayfun(hornerFcn, gdata1));


fprintf( 'Speed-up achieved using gpuArray objects only: %g\n',...
    tcpu / tgpuObject );
fprintf( 'Speed-up achieved using gpuArray objects with arrayfun: %g\n',...
    tcpu / tgpuArrayfun );


end