Bhargav and I had a short discussion regarding Fibonacci number generating functions over winter break and I recently answered an interview question along the same vein, so I guess its a question worth looking at:

The recursive formula for the generation of the nth Fibonacci number F_{n} is F_{n} = F_{n-1} + F_{n-2}, where F_{1} = F_{2} = 1. How then, do we write a function that generates the nth Fibonacci number? The simplest approach is to convert this formula directly into code (all sample code given in python):

def F(n):
if n == 1 or n == 2:
return 1
else:
return F(n-1) + F(n-2)

This works, but there’s a minor problem. When we run it for small numbers, it seems fine, but it slows down rapidly for large values of n. On my computer (which, admittedly, isn’t very fast, but it’s still a reasonably nice computer), F_{30} takes about half a second to run, F_{35} takes about 6 seconds, and I don’t even want to know when F_{50} will finish (according to my rough calculations, in about 6 days). This is clearly unacceptable. I could calculate F_{50} by hand in less than 6 days (or 20 minutes). So what’s wrong?

If we look at the function, we’ll see that when you call F(n), it makes two recursive calls to F(n-1) and F(n-2), each of those makes two more recursive calls, which each make two more, and so on. Furthermore, each additional call only reduces the problem size by a constant factor, one or two. This leads to exponential growth, where a call to F(n) takes almost twice as long as a call to F(n-1). This is bad! The culprit here is we’re doing extra work. Here is the tree of recursive calls for F(6):

As you can see, F(4) is being calculated in both subtrees of F(6), and F(3) is calculated in three different subtrees. As n gets larger, more and more work is being redone in multiple subtrees.

Bhargav’s solution was to use a hash table to save a solution once it’s been calculated, so that future calls can check if the solution has already been found for some n before calculating it again:

fib_results = {1:1, 2:1}
def F(n):
if n in fib_results:
return fib_results[n]
else:
fib_results[n] = F(n-1) + F(n-2)
return fib_results[n]

This is in fact much faster and runs in linear time instead of exponential time. Unfortunately, it takes linear space in the form of the hash table, and it is unclear how or whether we can avoid using that space, though it seems like it might be possible since we didn’t use linear space in the naive solution. This approach is called memoization. Memoization is an example of a top-down alternative to dynamic programming. It’s advantage over the bottom-up approach, called dynamic programming, is that it is much easier to convert a naive recursive algorithm to a memoized algorithm. The disadvantage is that dynamic programming is generally somewhat faster in implementation and more powerful, making it clearer how to make optimizations such as getting rid of that linear space hash table.

The idea behind dynamic programming is, once we’ve defined a problem in terms of its subproblems, to solve it by starting from the base cases and solving subsequently larger problems, usually in tabular form. To find F(n), we create an array of n cells, filling cells 1 and 2 with the value 1 (those are the base cases). To fill each cell, we look at the two previous cells and fill it with their sum. Conveniently, this means that the ith cell contains the value of F(i). Hence, by the time we fill the nth cell we will have found the value of F(n).

This algorithm would look like this:

def F(n):
# Create an array of n cells
fib_table = [None]*n
# Initialize base cases
fib_table[0] = 1
fib_table[1] = 1
# Fill table
for i in range(2, n):
fib_table[i] = fib_table[i-1] + fib_table[i-2]
return fib_table[n-1]

Unfortunately, this still takes linear space! It will probably be a little faster than the memoized algorithm, since arrays are a little faster than dictionaries, but we still haven’t got rid of the need for the lookup table. However, if we look at this algorithm, it becomes clear that to fill each cell, we only need to look at the two cells before it. In fact, we can forget the values in every cell before those two, since we’ll never need them again. To save using that space, we can just reuse the same three cells over and over again. In fact, since we only have three cells, we can just use variables instead of creating a table:

def F(n):
a = 1
b = 1
c = 1
for i in range(2, n):
c = a + b
a = b
b = c
return c

Mission accomplished. We have a Fibonacci number function that runs in linear time and constant space. Dynamic programming is clearly awesome. A Fibonacci number function is one of the simplest applications of dynamic programming. One could easily imagine coming up with this solution without going through this thought process. However, there are many more complex problems that dynamic programming can solve, without which the best solutions are incredibly difficult to find. Some applications of dynamic programming include RNA folding and seam carving (crazy content aware image resizing)