www.gusucode.com > distcomp 案例源码程序 matlab代码 > distcomp/paralleldemo_communic_prof.m
%% Profiling Explicit Parallel Communication % This example shows how to profile explicit communication to the nearest % neighbor lab. It illustrates the use of |labSend|, |labReceive|, and % |labSendReceive|, showing both the slow (incorrect) and the fast % (optimal) way of implementing this algorithm. The problem is explored % using the parallel profiler. % Copyright 2007 The MathWorks, Inc. %% % *Prerequisites*: % % * Interactive Parallel Mode in Parallel Computing Toolbox(TM) (See % |pmode| in the User's Guide.) % * <docid:distcomp_examples.example-ex31793179 Using the Parallel Profiler in Pmode> % % The figures in this example are produced from a 12-node % cluster. %% % The example code involves explicit communication. In MATLAB(R) % explicit communication is synonymous with directly using Parallel % Computing Toolbox communication primitives (e.g. |labSend|, |labReceive|, % |labSendReceive|, |labBarrier|). Performance problems involving this type % of communication, if not related to the underlying hardware, can be % difficult to trace. % With the parallel profiler many of these problems can be interactively % identified. It is important to remember you can separate the various % parts of your program into separate functions. This can help when % profiling, because some data is collected only for each function. % %% The Algorithm % The algorithm we are profiling is a nearest neighbor % communication pattern. Each MATLAB worker needs data only from itself and % one neighboring lab. This type of data parallel pattern lends itself well % to many matrix problems, but when done incorrectly, can be needlessly % slow. In other words, each lab depends on data that is _already_ % available on an adjacent lab. For example, on a four-lab cluster, lab 1 % wants to send some data to lab 2 and needs some data from lab 4 so each % lab depends on only one other lab: % % 1 depends on -> 4 % % 2 depends on -> 1 % % 3 depends on -> 2 % % 4 depends on -> 3 % % It is possible to implement any given communication algorithm % using |labSend| and |labReceive|. % |labReceive| always blocks your program until the communication is % complete, while |labSend| might not if the data is small. Using % |labSend| first, though, doesn't help in most cases. % % One way to accomplish this algorithm is to have every lab % wait for a receive, and only one lab start the communication chain by % completing a send and then a receive. % Alternatively, we can use |labSendReceive|, and at first glance it may % not be apparent that there should be a major difference in performance. % % You can view the code for <matlab:edit(fullfile(matlabroot, 'examples', 'distcomp', 'pctdemo_aux_profbadcomm.m')) % pctdemo_aux_profbadcomm> and <matlab:edit(fullfile(matlabroot, 'examples', 'distcomp', 'pctdemo_aux_profcomm.m')) % pctdemo_aux_profcomm> to see the complete implementations of this % algorithm. Look at the first file and notice that it uses % |labSend| and |labReceive| for communication. % % It is a common mistake to % start thinking in terms of |labSend| and |labReceive| when it is not % necessary. Looking at how this |pctdemo_aux_profbadcomm| % implementation performs will give us a better idea of what to expect. %% Profiling the labSend Implementation labBarrier;% to ensures the labs all start at the same time mpiprofile reset; mpiprofile on; pctdemo_aux_profbadcomm; mpiprofile viewer; %% % The *Function Summary Report* is displayed. On this page, you can see % time spent waiting in communications as an orange bar under the *Total % Time Plot* heading. The data below shows that considerable amount of time % was spent waiting. Let's see how the parallel profiler helps to identify % the causes of these waits. %% % <<../paralleldemo_communic_profsp.png>> %% Quickstart Steps % # In the profiler Function Summary Report, look at the % <matlab:edit(fullfile(matlabroot, 'examples', 'distcomp', 'pctdemo_aux_profbadcomm.m')) pctdemo_aux_profbadcomm> % entry and click *Compare max vs. min TotalTime*. Observe the large % orange waiting time indicated under the function |iRecFromPrevLab|. This % is an early indication that there is something wrong with a corresponding % send, either because of network problems or algorithm problems. % # Use the top toolbar table to click |Plot All Per Lab Communication|. % The first figure in this view shows all the data received by each lab. In % this example each lab is receiving the same amount of data from the % previous lab, so it doesn't seem to be a data distribution problem. The % second figure shows the various communication times including the time % spent waiting for communication. % In the third figure, the *Receive Comm Waiting Time* plot shows a stepwise % increase in waiting time. % An example *Receive Comm Waiting Time* plot can be seen below using a % 12-node cluster. It is good to go back and check what is happening on the % source lab. % # Browse what's happening on lab 1. % a.) In the Profiler click *Home*. b.) Click the top-level % |pctdemo_aux_profbadcomm| function to go to the Function Detail % Report. c.) Be sure that *Show function listing* is selected. d.) % Scroll down and see where lab 1 spends time and which lines are % covered. e.) For comparison, look at the *Busy Line* table and % select the last lab using the *Goto lab* listbox. %% % To see all the profiled lines of code, % scroll down to the last item in the page. An example of this % annotated code listing can be seen below. %% % <<../paralleldemo_communic_profbadcom.png>> %% Communication Plots Using a Larger Non-local Cluster % To clearly see the problem with our usage of |labSend| and % |labReceive|, look at the following *Receive Comm Waiting Time* plot % from a 12-node cluster. %% % <<../paralleldemo_communic_profperlab.png>> %% % <<../paralleldemo_communic_profcwimage.png>> %% % In the plot above, you can see the unnecessary waiting using *Plot All % PerLab Communication*. This waiting time increases because |labReceive| % blocks until the corresponding paired |labSend| has completed. Hence, you % get sequential communication even though subsequent labs only need the % data that is originating in the immediate neighbor |labindex|. %% Using labSendReceive to Implement this Algorithm % You can use |labSendReceive| to send and % receive data simultaneously from the lab that you depend on to get % minimal waiting time. See this in the corrected version of the % communication pattern implemented in |pctdemo_aux_profcomm|. Clearly, % using |labSendReceive| is not possible if you need to receive data % before you can send it. In such cases, use |labSend| and |labReceive| to % ensure chronological order. However, in cases like this example, when % there is no need to receive data before sending, use |labSendReceive|. % Let's profile this version without resetting the data collected on the % previous version (use |mpiprofile resume|). labBarrier; mpiprofile resume; pctdemo_aux_profcomm; mpiprofile viewer; %% % This corrected version reduces the waiting time to effectively zero. % To see this, click *Plot All PerLab Communication* _after_ selecting % |pctdemo_aux_profcomm|. % The same communication pattern, described above, now spends nearly no % time waiting using |labSendReceive| (see the *Receive Comm Waiting Time* % plot below). %% % <<../paralleldemo_communic_profcwcorr.png>> %% The Plot Color Scheme % For each 2-D image plot, the coloring scheme is normalized to the task at % hand. % Therefore, do not use the coloring scheme in the plot shown above to % compare with other plots, since colors are normalized and are dependent % on the maximum value (seen in the top right in brown). % For this example, using the max value is the best way to compare the % huge difference in waiting times when we use % |pctdemo_aux_profcomm| instead of |pctdemo_aux_profbadcomm|.