www.gusucode.com > hdlverifier 案例代码 matlab源码程序 > hdlfilter/hdlserialfir.m
%% HDL Serial Architectures for FIR Filters % This example illustrates how to generate HDL code for a symmetrical % FIR filter with fully parallel, fully serial, partly serial and % cascade-serial architectures for a lowpass filter for an audio filtering % application. % Copyright 2004-2016 The MathWorks, Inc. %% Design the Filter % Use an audio sampling rate of 44.1 kHz and a passband edge frequency of % 8.0 kHz. Set the allowable peak-to-peak passband ripple to 1 dB and % the stopband attenuation to -90 dB. Then, design the filter using % fdesign.lowpass, and create the FIR filter System object using the % 'equiripple' method with the 'Direct form symmetric' structure. Fs = 44.1e3; % Sampling Frequency in Hz Fpass = 8e3; % Passband Frequency in Hz Fstop = 8.8e3; % Stopband Frequency in Hz Apass = 1; % Passband Ripple in dB Astop = 90; % Stopband Attenuation in dB fdes = fdesign.lowpass('Fp,Fst,Ap,Ast',... Fpass, Fstop, Apass, Astop, Fs); lpFilter = design(fdes,'equiripple', 'FilterStructure', 'dfsymfir', ... 'SystemObject', true); %% Quantize the Filter % Assume that the input for the audio filter comes from a 12 bit ADC and % output is a 12 bit DAC. nt_in = numerictype(1,12,11); nt_out = nt_in; lpFilter.FullPrecisionOverride = false; lpFilter.CoefficientsDataType = 'Custom'; lpFilter.CustomCoefficientsDataType = numerictype(1,16,16); lpFilter.OutputDataType = 'Custom'; lpFilter.CustomOutputDataType = nt_out; % Check the response with fvtool. fvtool(lpFilter,'Fs',Fs, 'Arithmetic', 'fixed'); %% Generate Fully Parallel HDL Code from the Quantized Filter % Starting with the correctly quantized filter, generate VHDL or % Verilog code. Create a temporary work directory. After generating % the HDL code (selecting VHDL in this case), open the generated VHDL % file in the editor by clicking on hyperlink displayed in the command % line display messages. % % This is the default case and generates a fully parallel architecture. % There is a dedicated multiplier for each filter tap in direct form FIR % filter structure and one for every two symmetric taps in symmetric FIR % structure. This results in a lot of chip area (78 multipliers, in this % example). You can implement the filter in a variety of serial % architectures to obtain the desired speed/area trade-off. These are % illustrated in further sections of this example. workingdir = tempname; % fully parallel (default) generatehdl(lpFilter, 'Name', 'fullyparallel', ... 'TargetLanguage', 'VHDL', ... 'TargetDirectory', workingdir, ... 'InputDataType', nt_in); %% Generate a Test Bench from the Quantized Filter % Generate a VHDL test bench to make sure that the result matches the % response you see in MATLAB(R) exactly. The generated VHDL code and VHDL % testbench can be compiled and simulated using a simulator. % % Generate DTMF tones to be used as test stimulus for the filter. % A DTMF signal consists of the sum of two sinusoids - or tones - with % frequencies taken from two mutually exclusive groups. Each pair of % tones contains one frequency of the low group (697 Hz, 770 Hz, 852 Hz, % 941 Hz) and one frequency of the high group (1209 Hz, 1336 Hz, 1477Hz) % and represents a unique symbol. You will generate all the DTMF signals but % use one of them (digit 1 here) for test stimulus. This will keep the % length of test stimulus to reasonable limit. symbol = {'1','2','3','4','5','6','7','8','9','*','0','#'}; lfg = [697 770 852 941]; % Low frequency group hfg = [1209 1336 1477]; % High frequency group % Generate a matrix containing all possible combinations of high and low % frequencies, where each column represents one combination. f = zeros(2,12); for c=1:4 for r=1:3 f(:,3*(c-1)+r) = [lfg(c); hfg(r)]; end end %% % Next, let's generate the DTMF tones Fs = 8000; % Sampling frequency 8 kHz N = 800; % Tones of 100 ms t = (0:N-1)/Fs; % 800 samples at Fs pit = 2*pi*t; tones = zeros(N,size(f,2)); for toneChoice=1:12 % Generate tone tones(:,toneChoice) = sum(sin(f(:,toneChoice)*pit))'; end % Taking the tone for digit '1' for test stimulus. userstim = tones(:,1); generatehdl(lpFilter, 'Name', 'fullyparallel',... 'GenerateHDLTestbench','on', ... 'TestBenchUserStimulus', userstim,... 'TargetLanguage', 'VHDL',... 'TargetDirectory', workingdir, ... 'InputDataType', nt_in); %% Information Regarding Serial Architectures % Serial architectures present a variety of ways to share the hardware % resources at the expense of increasing the clock rate with respect to the % sample rate. In FIR filters, we will share the multipliers between the % inputs of each serial partition. This will have an effect of increasing % the clock rate by a factor known as folding factor. % % You can use hdlfilterserialinfo function to get information regarding % various filter lengths based on the value of coefficients. This function % also displays an exhaustive table of possible options to specify % SerialPartition property with corresponding values of folding factor and % number of multipliers. hdlfilterserialinfo(lpFilter, 'InputDataType', nt_in); %% % You can use the optional properties 'Multipliers' and 'FoldingFactor' to % display the specific information. hdlfilterserialinfo(lpFilter, 'Multipliers', 4, ... 'InputDataType', nt_in); %% hdlfilterserialinfo(lpFilter, 'Foldingfactor', 6, ... 'InputDataType', nt_in); %% Fully Serial Architecture % In fully serial architecture, instead of having a dedicated multiplier for % each tap, the input sample for each tap is selected serially and is % multiplied with the corresponding coefficient. For symmetric (and % antisymmetrical) structures the input samples corresponding to each set of % symmetric taps are preadded (for symmetric) or pre-subtracted (for % anti-symmetric) before multiplication with the corresponding % coefficients. The product is accumulated sequentially using a % register and the final result is stored in a register before the next set % of input samples arrive. This implementation needs a clock rate that is % as many times faster than input sample rate as the number of products to be % computed. This results in reducing the required chip area as the % implementation involves just one multiplier with a few additional logic % elements like multiplexers and registers. The clock rate will be 78 % times the input sample rate (foldingfactor of 78) equal to 3.4398 MHz for % this example. %% % To implement fully serial architecture, use hdlfilterserialinfo function % and set its 'Multipliers' property to 1. You can also set the % 'SerialPartition' property with its value equal to the effective filter % length, which in this case is 78. The function also returns the folding % factor and number of multipliers used for that serial partition setting. [spart, foldingfact, nMults] = hdlfilterserialinfo(lpFilter, 'Multipliers', 1, ... 'InputDataType', nt_in); %#ok<ASGLU> generatehdl(lpFilter,'Name', 'fullyserial', ... 'SerialPartition', spart, ... 'TargetLanguage', 'VHDL', ... 'TargetDirectory', workingdir, ... 'InputDataType', nt_in); %% % Generate the testbench the same way, as in the fully parallel case. It is % important to generate a testbench again for each architecture % implementation. %% Partly Serial Architecture % Fully parallel and fully serial represent two extremes of % implementations. While Fully serial is very low area, it inherently % needs a faster clock rate to operate. Fully parallel takes a lot of chip % area but has very good performance. Partly serial architecture covers all % the cases that lie between these two extremes. % % The input taps are divided into sets. Each set is processed in parallel % by a serial partition consisting of multiply accumulate and a multiplexer. % Here, a set of serial partitions process a given set of taps. These % serial partitions operate in parallel with respect to each other but % process each tap sequentially to accumulate the result corresponding % to the taps served. Finally, the result of each serial partition is added % together using adders. %% Partly Serial Architecture for Resource Constraint % Let us assume that you want to implement this filter on an FPGA which has % only 4 multipliers available for the filter. You can implement the filter % using 4 serial partitions, each using one multiply accumulate circuit. hdlfilterserialinfo(lpFilter, 'Multipliers', 4, ... 'InputDataType', nt_in); %% % The input taps that are processed by these serial partitions will be % [20 20 20 18]. You will specify SerialPartition with this vector indicating % the decomposition of taps for serial partitions. The clock rate is % determined by the largest element of this vector. In this case the % clock rate will be 20 times the input sample rate, 0.882 MHz. [spart, foldingfact, nMults] = hdlfilterserialinfo(lpFilter, 'Multipliers', 4, ... 'InputDataType', nt_in); generatehdl(lpFilter,'Name', 'partlyserial1',... 'SerialPartition', spart,... 'TargetLanguage', 'VHDL', ... 'TargetDirectory', workingdir, ... 'InputDataType', nt_in); %% Partly Serial Architecture for Speed Constraint % Assume that you have a % constraint on the clock rate for filter implementation and the maximum % clock frequency is 2 MHz. This means that the clock rate can't be more % than 45 times the input sample rate. For such a design constraint, % the 'SerialPartition' should be specified with [45 33]. Note that this % results in an additional serial partition hardware, implying additional % circuitry to multiply-accumulate 33 taps. You can specify SerialPartition % using hdlfilterserialinfo and its property 'Foldingfactor' as folllows. spart = hdlfilterserialinfo(lpFilter, 'Foldingfactor', 45, ... 'InputDataType', nt_in); generatehdl(lpFilter,'Name', 'partlyserial2', ... 'SerialPartition', spart,... 'TargetLanguage', 'VHDL',... 'TargetDirectory', workingdir, ... 'InputDataType', nt_in); %% % In general, you can specify any arbitrary decomposition of taps for % serial partitions depending on other constraints. The only requirement is % that the sum of elements of the vector should be equal the effective % filter length. %% Cascade-Serial Architecture % The accumulators in serial partitions can be re-used to add the result of % the next serial partition. This is possible if the number of taps being % processed by one serial partition must be more than that by % serial partition next to it by at least 1. The advantage of this technique % is that the set of adders required to add the result of all serial % partitions are removed. However, this increases the clock rate by 1, as % an additional clock cycle is required to complete the additional % accumulation step. % % Cascade-Serial architecture can be specified using the property 'ReuseAccum'. % This can be done in two ways. % % Add 'ReuseAccum' to generatehdl method and specify it as 'on'. Note that % the value specified for 'SerialPartition' property has to be such that % the accumulator reuse is feasible. The elements of the vector must be in % descending order except for the last two which can be same. % % If the property 'SerialPartition' is not specified and 'ReuseAccum' is % specified as 'on', the decomposition of taps for serial partitions is % determined internally. This is done to minimize the clock rate and to % reuse the accumulator. For this audio filter, it is % [12 11 10 9 8 7 6 5 4 3 3]. Note that it uses 11 serial partitions, % implying 11 multiply accumulate circuits. The clock rate will be 13 times % the input sample rate, 573.3 kHz. generatehdl(lpFilter,'Name', 'cascadeserial1',... 'SerialPartition', [45 33],... 'ReuseAccum', 'on', ... 'TargetLanguage', 'VHDL', ... 'TargetDirectory', workingdir, ... 'InputDataType', nt_in); %% % Optimal decomposition into as many serial partitions required for % minimum clock rate possible for reusing accumulator. generatehdl(lpFilter,'Name', 'cascadeserial2', ... 'ReuseAccum', 'on',... 'TargetLanguage', 'VHDL',... 'TargetDirectory', workingdir, ... 'InputDataType', nt_in); %% Conclusion % You designed a lowpass direct form symmetric FIR filter % to meet the given specification. You then quantized and checked your % design. You generated VHDL code for fully parallel, fully serial, partly % serial and cascade-serial architectures. You generated a VHDL test bench % using a DTMF tone for one of the architectures. % % You can use an HDL Simulator to verify the generated HDL code for % different serial architectures. You can use a synthesis tool to compare % the area and speed of these architectures. You can also experiment with % and generating Verilog code and test benches.