www.gusucode.com > distcomp 案例源码程序 matlab代码 > distcomp/paralleldemo_gpu_rng.m

    %% Generating Random Numbers on a GPU
% This example shows how to switch between the different random number
% generators that are supported on the GPU and examines the performance of
% each of them.
%
% Random numbers form a key part of many simulation or estimation
% algorithms.  Typically these numbers are generated using the
% functions 
% <matlab:doc('rand') |rand|>,
% <matlab:doc('randi') |randi|>, and
% <matlab:doc('randn') |randn|>.  Parallel Computing Toolbox(TM) provides
% three corresponding functions for generating random numbers directly on a
% GPU:
% <matlab:doc('gpuArray/rand') |gpuArray.rand|>,
% <matlab:doc('gpuArray/randi') |gpuArray.randi|>, and
% <matlab:doc('gpuArray/randn') |gpuArray.randn|>. 
% These functions can use one of several different number generation
% algorithms.  The generators available on the GPU are listed, described,
% and their performance measured below.

% Copyright 2013 The MathWorks, Inc.

%% Discovering the GPU random number generators
% The <matlab:doc('parallel.gpu.RandStream/list') |list|> method of
% <matlab:doc('parallel.gpu.RandStream') |parallel.gpu.RandStream|>
% provides a short description of the available generators:
parallel.gpu.RandStream.list

%%
% Each of these generators has been designed with parallel use in mind,
% providing multiple independent streams of random numbers.  However, they
% each have some advantages and disadvantages:
%
% * *CombRecursive*  (also known as |MRG32k3a|): Matches the MATLAB(R)
% generator of the same name and produces identical results given the same
% initial state.  This generator was introduced in 1999 and has been widely
% tested and used.
% * *Threefry4x64-20* : New generator introduced in 2011 but based on the
% existing cryptographic ThreeFish algorithm, which is widely tested and
% used.  This generator was designed to give good performance in highly
% parallel systems such as GPUs.
% * *Philox4x32-10* : New generator introduced in 2011, specifically
% designed for high performance in highly parallel systems such as GPUs.
%
% All of these generators pass the standard TestU01 test suite (see
% "TestU01: A C library for empirical testing of random number generators"
% by Pierre L'Ecuyer and Richard Simard, _ACM Transactions on
% Mathematical Software_, Vol. 33, No. 4, August 2007.)


%% Changing the default random number generator
% The function <matlab:doc('parallel.gpu.rng') |parallel.gpu.rng|> can
% store and reset the generator state.  It can also switch between the
% different generators that are provided.  Before changing the generator,
% we keep the existing state so that it can be restored at the end of these
% tests:
oldState = parallel.gpu.rng;

parallel.gpu.rng(0, 'Philox4x32-10');
disp(parallel.gpu.rng)


%% Generating uniformly distributed random numbers
% Uniformly distributed random numbers are generated on the GPU using
% either
% <matlab:doc('gpuArray/rand') |gpuArray.rand|> or 
% <matlab:doc('gpuArray/randi') |gpuArray.randi|>.  In performance terms,
% these two functions behave very similarly and only |rand| is measured
% here.  <matlab:doc('gputimeit') |gputimeit|> is used to measure the 
% performance to ensure accurate timing results, automatically calling the
% function many times and correctly dealing with synchronization and other
% timing issues.
generators = {'CombRecursive', 'Threefry4x64-20', 'Philox4x32-10'};
sizes = ([1, 100000:100000:2000000])';
samplesPerSec = nan(numel(sizes), numel(generators));
for g=1:numel(generators)
    % Set the generator
    parallel.gpu.rng(0, generators{g});
    % Loop over sizes, timing the generator
    for s=1:numel(sizes)
        fcn = @() gpuArray.rand(sizes(s),1);
        time = gputimeit(fcn);
        samplesPerSec(s,g) = sizes(s)/time;
    end
end

% Plot the result
plot(sizes, samplesPerSec, '.-')
legend(generators, 'Location', 'NorthWest')
grid on
xlabel('Number of elements')
ylabel('Samples generated per second')
title('Generating samples in U(0,1)')

%%
% The results clearly highlight some major performance differences.
% Threefry4x64-20 is much faster than CombRecursive, and Philox4x32-10 is
% much faster than Threefry4x64-20.


%% Generating normally distributed random numbers
% Many simulations rely on perturbations sampled from a normal
% distribution.  This is done using 
% <matlab:doc('gpuArray/randn') |gpuArray.randn|>.
samplesPerSec = nan(numel(sizes), numel(generators));
for g=1:numel(generators)
    % Set the generator
    parallel.gpu.rng(0, generators{g});
    % Loop over sizes, timing the generator
    for s=1:numel(sizes)
        fcn = @() gpuArray.randn(sizes(s),1);
        time = gputimeit(fcn);
        samplesPerSec(s,g) = sizes(s)/time;
    end
end

% Plot the result
plot(sizes, samplesPerSec, '.-')
legend(generators, 'Location', 'SouthEast')
grid on
xlabel('Number of elements')
ylabel('Samples generated per second')
title('Generating samples in N(0,1)')

%%
% As with the uniform test, Threefry4x64-20 is faster than CombRecursive,
% and Philox4x32-10 is faster than Threefry4x64-20.  The extra work
% required to produce normally distributed values reduces the rate at which
% values are produced by each of the generators and also reduces the
% performance differences between them.


%%
% Before finishing, restore the original generator state:
parallel.gpu.rng(oldState);


%% Conclusion
% In this example, the three GPU random number generators have been
% compared.  Each provides some advantages (|+|) and has some caveats
% (|-|).
%
% *CombRecursive*
%
% * (|+|) CPU version also available for comparison
% * (|+|) Long track record in real-world usage
% * (|-|) Slowest
%
% *Threefry4x64-20*
%
% * (|+|) Fast
% * (|+|) Based on well known and well tested Threefish algorithm
% * (|-|) Relatively new in real-world usage
% * (|-|) No CPU version available for comparison
%
% *Philox4x32-10*
%
% * (|+|) Fastest
% * (|-|) Relatively new in real-world usage
% * (|-|) No CPU version available for comparison