www.gusucode.com > hdlverifier 案例代码 matlab源码程序 > hdlfilter/hdlserialfir.m

    %% HDL Serial Architectures for FIR Filters
% This example illustrates how to generate HDL code for a symmetrical
% FIR filter with fully parallel, fully serial, partly serial and
% cascade-serial  architectures for a lowpass filter for an audio filtering
% application.

% Copyright 2004-2016 The MathWorks, Inc.

%% Design the Filter
% Use an audio sampling rate of 44.1 kHz and a passband edge frequency of
% 8.0 kHz. Set the allowable peak-to-peak passband ripple to 1 dB and
% the stopband attenuation to -90 dB. Then, design the filter using
% fdesign.lowpass, and create the FIR filter System object using the
% 'equiripple' method with the 'Direct form symmetric' structure.

Fs           = 44.1e3;         % Sampling Frequency in Hz
Fpass        = 8e3;            % Passband Frequency in Hz
Fstop        = 8.8e3;          % Stopband Frequency in Hz
Apass        = 1;              % Passband Ripple in dB
Astop        = 90;             % Stopband Attenuation in dB

fdes = fdesign.lowpass('Fp,Fst,Ap,Ast',...
    Fpass, Fstop, Apass, Astop, Fs);
lpFilter = design(fdes,'equiripple', 'FilterStructure', 'dfsymfir', ...
    'SystemObject', true);

%% Quantize the Filter
% Assume that the input for the audio filter comes from a 12 bit ADC and
% output is a 12 bit DAC.

nt_in = numerictype(1,12,11);
nt_out = nt_in;
lpFilter.FullPrecisionOverride = false;
lpFilter.CoefficientsDataType = 'Custom';
lpFilter.CustomCoefficientsDataType = numerictype(1,16,16);
lpFilter.OutputDataType = 'Custom';
lpFilter.CustomOutputDataType = nt_out;

% Check the response with fvtool.
fvtool(lpFilter,'Fs',Fs, 'Arithmetic', 'fixed');

%% Generate Fully Parallel HDL Code from the Quantized Filter
% Starting with the correctly quantized filter, generate VHDL or
% Verilog code. Create a temporary work directory. After generating 
% the HDL code (selecting VHDL in this case), open the generated VHDL 
% file in the editor by clicking on hyperlink displayed in the command 
% line display messages.
%
% This is the default case and generates a fully parallel architecture.
% There is a dedicated multiplier for each filter tap in direct form FIR
% filter structure and one for every two symmetric taps in symmetric FIR
% structure. This results in a lot of chip area (78 multipliers, in this
% example). You can implement the filter in a  variety of serial
% architectures to obtain the desired speed/area trade-off. These are
% illustrated in further sections of this example.

workingdir = tempname;
% fully parallel (default)
generatehdl(lpFilter, 'Name', 'fullyparallel', ...
            'TargetLanguage', 'VHDL', ...
            'TargetDirectory', workingdir, ...
            'InputDataType', nt_in);

%% Generate a Test Bench from the Quantized Filter
% Generate a VHDL test bench to make sure that the result matches the
% response you see in MATLAB(R) exactly. The generated VHDL code and VHDL 
% testbench can be compiled and simulated using a simulator. 
%
% Generate DTMF tones to be used as test stimulus for the filter.
% A DTMF signal consists of the sum of two sinusoids - or tones - with 
% frequencies taken from two mutually exclusive groups.  Each pair of
% tones contains one frequency of the low group (697 Hz, 770 Hz, 852 Hz,
% 941 Hz) and one frequency of the high group (1209 Hz, 1336 Hz, 1477Hz) 
% and represents a unique symbol. You will generate all the DTMF signals but
% use one of them (digit 1 here) for test stimulus. This will keep the
% length of test stimulus to reasonable limit.

symbol = {'1','2','3','4','5','6','7','8','9','*','0','#'};

lfg = [697 770 852 941]; % Low frequency group
hfg = [1209 1336 1477];  % High frequency group

% Generate a matrix containing all possible combinations of high and low 
% frequencies, where each column represents one combination.
f  = zeros(2,12);
for c=1:4
    for r=1:3
        f(:,3*(c-1)+r) = [lfg(c); hfg(r)];
    end
end

%%
% Next, let's generate the DTMF tones
Fs  = 8000;       % Sampling frequency 8 kHz
N = 800;          % Tones of 100 ms
t   = (0:N-1)/Fs; % 800 samples at Fs
pit = 2*pi*t;

tones = zeros(N,size(f,2));
for toneChoice=1:12
    % Generate tone
    tones(:,toneChoice) = sum(sin(f(:,toneChoice)*pit))';
end

% Taking the tone for digit '1' for test stimulus.
userstim = tones(:,1);

generatehdl(lpFilter, 'Name', 'fullyparallel',...
            'GenerateHDLTestbench','on', ...
            'TestBenchUserStimulus', userstim,...
            'TargetLanguage', 'VHDL',...
            'TargetDirectory', workingdir, ...
            'InputDataType', nt_in);

%% Information Regarding Serial Architectures
% Serial architectures present a variety of ways to share the hardware
% resources at the expense of increasing the clock rate with respect to the
% sample rate. In FIR filters, we will share the multipliers between the
% inputs of each serial partition. This will have an effect of increasing
% the clock rate by a factor known as folding factor. 
%
% You can use hdlfilterserialinfo function to get information regarding
% various filter lengths based on the value of coefficients. This function
% also displays an exhaustive table of possible options to specify
% SerialPartition property with corresponding values of folding factor and
% number of multipliers. 

hdlfilterserialinfo(lpFilter, 'InputDataType', nt_in);

%%
% You can use the optional properties 'Multipliers' and 'FoldingFactor' to
% display the specific information.

hdlfilterserialinfo(lpFilter, 'Multipliers', 4, ...
                    'InputDataType', nt_in);
                
%%
hdlfilterserialinfo(lpFilter, 'Foldingfactor', 6, ...
                    'InputDataType', nt_in);

%% Fully Serial Architecture
% In fully serial architecture, instead of having a dedicated multiplier for
% each tap, the input sample for each tap is selected serially and is
% multiplied with the corresponding coefficient. For symmetric (and
% antisymmetrical) structures the input samples corresponding to each set of
% symmetric taps are preadded (for symmetric) or pre-subtracted (for
% anti-symmetric) before multiplication with the corresponding
% coefficients. The product is accumulated sequentially using a
% register and the final result is stored in a register before the next set
% of input samples arrive. This implementation needs a  clock rate that is
% as many times faster than input sample rate as the number of products to be
% computed. This results in reducing the required chip area as the
% implementation involves just one multiplier with a few additional logic
% elements like multiplexers and registers. The clock rate will be 78
% times the input sample rate (foldingfactor of 78) equal to 3.4398 MHz for
% this example.
%%
% To implement fully serial architecture, use hdlfilterserialinfo function
% and set its 'Multipliers' property to 1. You can also set the
% 'SerialPartition' property with its value equal to the effective filter
% length, which in this case is 78. The function also returns the folding
% factor and number of multipliers used for that serial partition setting.

[spart, foldingfact, nMults] = hdlfilterserialinfo(lpFilter, 'Multipliers', 1, ...
                                    'InputDataType', nt_in); %#ok<ASGLU>
                      
generatehdl(lpFilter,'Name', 'fullyserial', ...
           'SerialPartition', spart, ...
           'TargetLanguage', 'VHDL', ...
           'TargetDirectory', workingdir, ...
           'InputDataType', nt_in);
%%
% Generate the testbench the same way, as in the fully parallel case. It is
% important to generate a testbench again for each architecture
% implementation. 

%% Partly Serial Architecture
% Fully parallel and fully serial represent two extremes of
% implementations. While Fully serial is very low area, it inherently
% needs a faster clock rate to operate. Fully parallel takes a lot of chip
% area but has very good performance. Partly serial architecture covers all
% the cases that lie between these two extremes.
%
% The input taps are divided into sets. Each set is processed in parallel
% by a serial partition consisting of multiply accumulate and a multiplexer.
% Here, a set of serial partitions process a given set of taps.  These 
% serial partitions operate in parallel with respect to each other but 
% process each tap sequentially to accumulate the result corresponding 
% to the taps served. Finally, the result of each serial partition is added
% together using adders.

%% Partly Serial Architecture for Resource Constraint
% Let us assume that you want to implement this filter on an FPGA which has 
% only 4 multipliers available for the filter. You can implement the filter 
% using 4 serial partitions, each using one multiply accumulate circuit.

hdlfilterserialinfo(lpFilter, 'Multipliers', 4, ...
                    'InputDataType', nt_in);

%%
% The input taps that are processed by these serial partitions will be
% [20 20 20 18]. You will specify SerialPartition with this vector indicating 
% the decomposition of taps for serial partitions. The clock rate is 
% determined by the largest element of this vector. In this case the
% clock rate will be 20 times the input sample rate, 0.882 MHz.

[spart, foldingfact, nMults] = hdlfilterserialinfo(lpFilter, 'Multipliers', 4, ...
                                    'InputDataType', nt_in);

generatehdl(lpFilter,'Name', 'partlyserial1',...
            'SerialPartition', spart,...
            'TargetLanguage', 'VHDL', ...
            'TargetDirectory', workingdir, ...
            'InputDataType', nt_in);
    
%% Partly Serial Architecture for Speed Constraint
% Assume that you have a 
% constraint on the clock rate for filter implementation and the maximum
% clock frequency is 2 MHz. This means that the clock rate can't be more
% than 45 times the input sample rate. For such a design constraint,
% the 'SerialPartition' should be specified with [45 33]. Note that this
% results in an additional serial partition hardware, implying additional
% circuitry to multiply-accumulate 33 taps. You can specify SerialPartition
% using hdlfilterserialinfo and its property 'Foldingfactor' as folllows.

spart = hdlfilterserialinfo(lpFilter, 'Foldingfactor', 45, ...
                            'InputDataType', nt_in);

generatehdl(lpFilter,'Name', 'partlyserial2', ...
            'SerialPartition', spart,...
            'TargetLanguage', 'VHDL',...
            'TargetDirectory', workingdir, ...
            'InputDataType', nt_in);
%%
% In general, you can specify any arbitrary decomposition of taps for
% serial partitions depending on other constraints. The only requirement is
% that the sum of elements of the vector should be equal the effective
% filter length.

%% Cascade-Serial Architecture
% The accumulators in serial partitions can be re-used to add the result of
% the next serial partition. This is possible if the number of taps being
% processed by one serial partition must be more than that by
% serial partition next to it by at least 1. The advantage of this technique
% is that the set of adders required to add the result of all serial
% partitions are removed. However, this increases the clock rate by 1, as
% an additional clock cycle is required to complete the additional 
% accumulation step. 
%
% Cascade-Serial architecture can be specified using the property 'ReuseAccum'. 
% This can be done in two ways. 
%
% Add 'ReuseAccum' to generatehdl method and specify it as 'on'. Note that
% the value specified for 'SerialPartition' property has to be such that 
% the accumulator reuse is feasible. The elements of the vector must be in 
% descending order except for the last two which can be same.
%
% If the property 'SerialPartition' is not specified and 'ReuseAccum' is
% specified as 'on', the decomposition of taps for serial partitions is 
% determined internally. This is done to minimize the clock rate and to
% reuse the accumulator. For this audio filter, it is
% [12 11 10 9 8 7 6 5 4 3 3]. Note that it uses 11 serial partitions,
% implying 11 multiply accumulate circuits. The clock rate will be 13 times
% the input sample rate, 573.3 kHz. 

generatehdl(lpFilter,'Name', 'cascadeserial1',...
            'SerialPartition', [45 33],...
            'ReuseAccum', 'on', ...
            'TargetLanguage', 'VHDL', ...
            'TargetDirectory', workingdir, ...
            'InputDataType', nt_in);
%%
% Optimal decomposition into as many serial partitions required for
% minimum clock rate possible for reusing accumulator.

generatehdl(lpFilter,'Name', 'cascadeserial2', ...
            'ReuseAccum', 'on',...
            'TargetLanguage', 'VHDL',...
            'TargetDirectory', workingdir, ...
            'InputDataType', nt_in);

%% Conclusion
% You designed a lowpass direct form symmetric FIR filter
% to meet the given specification. You then quantized and checked your
% design. You generated VHDL code for fully parallel, fully serial, partly
% serial and  cascade-serial architectures. You generated a VHDL test bench
% using a DTMF tone for one of the architectures.
%
% You can use an HDL Simulator to verify the generated HDL code for 
% different serial architectures. You can use a synthesis tool to compare 
% the area and speed of these architectures. You can also experiment with 
% and generating Verilog code and test benches.