Go Back

Source code


Name: dblclockfft
Created: Feb 21, 2015
Updated: Jun 2, 2015
SVN Updated: Jun 2, 2015

Other project properties

Category: DSP core
Language: Verilog
Development status: Alpha
Additional info: Design done , FPGA proven , Specification done
WishBone Compliant: No
License: GPL


The goal of this project is to create an IP core for an FFT that runs, in a pipelined fashion, at two samples per clock. A C++ program will generate the Verilog files, allowing the FFT to be of an arbitrary length--subject only to the capability of the FPGA used to implement the FFT.
One of my goals is to create an FFT core that can be used with open source and third party Verilog simulation facilities, such as Verilator. This would be difficult with a proprietary IP core.
For those who might be wondering, why would I need an FFT that runs at two samples per clock? Let me remind them that FFT's tend to use their multiplies more efficiently than other filtering implementations, but to do so you need to use some form of overlap and add filtering structure. An overlap and add structure immediately puts you into needing an FFT that runs at twice the clock speed of the incoming data.

Usage Statistics

The following statistics come from a Basys-3 development board implementation using Vivado as the development tool:
FFT Size 32 64 128 256 512 1024
Bit Width 16 16 16 16 16 16
Twiddle Factor Bits 17 17 17 17 17 17
Extra Internal Bits 1 1 1 1 1 1
Stages with Optimized Multiplies 3 4 5 6 7 7
Slice LUTs 2411 3136 3811 4517 5325 8885
Slice Registers 4130 5401 6536 7593 8682 14687
Memory LUTs 352 470 524 622 808 1280
Flip Flop Pairs 3622 4625 5469 6389 7577 12223
Block RAMs 2 2 4 6 8 13
DSP48s 18 26 30 36 42 42
I should also note that the last two stages of any of these FFT implementations don't use multiplies, just adds and subtracts. As a result seven stages of hardware multiplies is the maximum you can have for a 512 point FFT. The 1024 point FFT does one multiply stage in logic, and the result is ... expensive.

Future Upgrades

If I can muster the time to keep working on this, I'd like to add ...
* A capability to do FFT's on real samples, rather than just complex
* A capability to operate at one sample per clock, or even one sample every two clocks


Please feel free to contact me at dgisselq at opencores.org if you would like further features, or to have this core tailored to your application or device.