#Recursion in MyHDL#
Today, June 15th 2015, I was going through a heap of papers as we recently moved offices
from Leuven (home of Stella Artois) to Hoegaarden ( home of the famous 'witbier')
as I stumbled on a printout of this Design Advantage from Comit Systems Inc.
titled: Recursive VHDL structures in FPGA synthesis by Rangarajan Sundaravaradan.
So I indulged myself in my favourite pastime : Applying recursive programming in hardware.
Recursion is easy: I learned the rules from a book that I, some 30 years ago, recovered from the garbage bin at a previous employer.
The book is LISP, A Gentle Introduction to Symbolic Computation by David S. Touretzky (ISBN 0-06-046657-X).
It has Chapter 8 devoted to Recursion, pages 191 to 213.
On page 200 the dragon (every decent book on programming has one) enumerates the three rules of recursion:
- Find out how to take just one step
- Break each journey down into one step plus a smaller journey
- Know when to stop
Let's call this Rule Set A
I rephrased those rules, except the last one:
- Find out how to divide the task into two smaller tasks
- Keep on dividing those tasks into smaller tasks
- But know when to stop
Let's call this Rule Set B
As I said above, recursion is easy; writing a recursive VHDL function to e.g., count the bits in a std_logic_vector is a relatively easy task especially if you stick to Rule Set A.
function count_bits(s : std_logic_vector) return natural is
begin
if (s'length = 1) then
return to_integer(unsigned(s));
else
return to_integer(s(s'low)) + count_bits(s(s'high downto s'low + 1));
end if;
end function;
But the result doesn't differ much from the standard For-Loop approach
function count_bits(s : std_logic_vector) return natural is
variable sum : integer ;
begin
sum := 0;
for i in 0 to(s'high - s'low) loop
sum := sum + to_integer(s(s'low+i));
end loop;
return sum ;
end function;
When synthesised both will end up in a staircase: one might be going up, the other might be going down, but either eventually delivers the sum.
Now using Rule Set B we get something totally different
function count_bits(s : std_logic_vector) return natural is
begin
if (s'length = 1) then
return to_integer(unsigned(s));
else
return count_bits(s(s'high downto s'low + s'length / 2))
+ count_bits(s(s'low + s'length / 2 - 1 downto s'low));
end if;
end function;
We get a bottoms-up binary tree!
You can notice one important advantage of the last approach: we have a lower number of levels of logic; in this case 3 instead of 7.
When the clockfrequency in the FPGA gets high (and when doesn't it?) the last approach will be a lot faster. Imagine you are not adding up the one bits in a vectors but an array of std_logic_vectors.
Now exactly such a task like adding up the 9 elements in a 3 by 3 convolution, say to do Gauss filtering, calls for a pipelined version.
Recursive functions are inherently non-pipeline-able as you can not infer registers inside a VHDL function (to my knowledge).
And that is why, at the time, I printed that document, and kept it: it talks about using recursive components. I refer to the pdf but it essentially comes down to instantiate the entity itself in its own architecture.
I tried a copy-paste of the code inside the document but that only succeeded partially with quite some editing to do.
But as I have switched to MyHDL, several months back, I decided to do this in MyHDL instead of doing it in VHDL and consequently abandoned the editing.
Here is the MyHDL code:
import myhdl
import hdlutils
def sumbits(Clk, Reset, D, Q):
''' a recursive pipelined implementation'''
LWIDTH_D = len(D)
if LWIDTH_D > 2:
# recurse by splitting things up
LWIDTH_L = LWIDTH_D - LWIDTH_D / 2
dupper, dlower = [myhdl.Signal(myhdl.intbv(0)[LWIDTH_L:]) for _ in range(2)]
lql, lqu = [myhdl.Signal(myhdl.intbv(0)[hdlutils.widthr(LWIDTH_L):]) for _ in range(2)]
supper = sumbits(Clk, Reset, dupper, lqu)
slower = sumbits(Clk, Reset, dlower, lql)
@myhdl.always_comb
def split():
''' this will expand on the left in case the input data-size is uneven '''
dupper.next = D[: LWIDTH_L]
dlower.next = D[LWIDTH_L:]
@myhdl.always_seq(Clk.posedge, Reset)
def rtlr():
''' the result is the sum of the previous branches '''
Q.next = lqu + lql
return supper, slower, split, rtlr
else:
# know when to stop
@myhdl.always_seq(Clk.posedge, Reset)
def rtl2():
''' the result is the sum of the two (terminal) leaves '''
Q.next = D[1] + D[0]
return rtl2
It looks more complicated than the simple VHDL functions we defined earlier, but it matches up to the construct in the paper.
The VHDL (or Verilog, you choose) conversion results in this schematic:
Let's inspect the source:
import myhdl
import hdlutils
Obviously we import the MyHDL library. Notice that we refrained from using the wild from myhdl import *
Hdlutils is my library with utility functions. Unfortunately there is no official library, so everybody brings his own.
Here we only see the usage of widthr() but in the code I call on some simulation helper functions too.
widthr( number ) returns the number bits to represent the given number( e.g. 7 requires 3 bits and 8 needs 4).
def sumbits( Clk, Reset, D, Q):
''' a recursive pipelined implementation'''
WIDTH_D = len(D)
The first thing we do on entrance is find out how many elements we have in our input data. The strategy in recursion (using Rule Set B) is to cut the task up into smaller tasks and thus the size of the input data evolves with each recursion.
if WIDTH_D > 2 :
# recurse by splitting things up
LWIDTH_L = LWIDTH_D - LWIDTH_D / 2
dupper, dlower = [myhdl.Signal(myhdl.intbv(0)[LWIDTH_L:]) for _ in range(2)]
lql, lqu = [myhdl.Signal(myhdl.intbv(0)[hdlutils.widthr(LWIDTH_L):]) for _ in range(2)]
supper = sumbits(Clk, Reset, dupper, lqu)
slower = sumbits(Clk, Reset, dlower, lql)
@myhdl.always_comb
def split():
''' this will expand on the left in case the input data-size is uneven '''
dupper.next = D[: LWIDTH_L]
dlower.next = D[LWIDTH_L:]
@myhdl.always_seq(Clk.posedge, Reset)
def rtlr():
''' the result is the sum of the previous branches '''
Q.next = lqu + lql
return supper, slower, split, rtlr
This is Rule 1. and 2. of our Rule Set B.
- We simply split each input into two sets. This is achieved by
dupper, dlower = [myhdl.Signal(myhdl.intbv(0)[LWIDTH_L:]) for _ in range(2)]
@myhdl.always_comb
def split():
''' this will expand on the left in case the input data-size is uneven '''
dupper.next = D[: LWIDTH_L]
dlower.next = D[LWIDTH_L:]
If the actual input set has an uneven number of element we end up with an even and an uneven set. We could pass these on to the next recursion but this will create an imbalance in the number of registers in the path and thus deliver incorrect results. So we expand the uneven set with one extra element on the msb side. In the example we are treating here it is a single bit and happens automatically in MyHDL when assigning an intbv to a larger one. Now if you are adding up intbv's you have to pad with a zero value intbv (which is a bit more work, and which I will show in some not too distant future). 2. We recurse for each of the two sets
lql, lqu = [myhdl.Signal(myhdl.intbv(0)[hdlutils.widthr(LWIDTH_L):]) for _ in range(2)]
supper = sumbits(Clk, Reset, dupper, lqu)
slower = sumbits(Clk, Reset, dlower, lql)
@myhdl.always_seq(Clk.posedge, Reset)
def rtlr():
''' the result is the sum of the previous branches '''
Q.next = lqu + lql
return supper, slower, split, rtlr
We recurse for each newly split-off set and finally add the two returned results in an @always_seq process that will register the result.
And finally Rule 3.
else:
# know when to stop
@myhdl.always_seq(Clk.posedge, Reset)
def rtl2():
''' the result is the sum of the two (terminal) leaves '''
Q.next = D[1] + D[0]
return rtl2
When the input data only contains two elements we return the registered sum of them.
That's all.
Note: I learned today that Christopher Felton has another fine example
using recursion in MyHDL. Check it out too!
It is a fully worked-out module so you have to dig into def fft_core():
to find the little gem.