Posts Tagged ‘comparison’

Problem 25 – More Fibonacci Numbers

September 19, 2009

In problem 25 we are asked to find the first Fibonacci number with a given number of digits. Actually, we are only asked to find which Fibonacci number it is (its index in the sequence of Fibonacci numbers), instead of the number itself. We’ve already visited Fibonacci numbers in problem 2, and had useful work there. For this problem, it will be helpful to note that the number of digits of n can be found as \lfloor \log_{10} n\rfloor + 1, where \lfloor x\rfloor denotes the “floor” function, equal to the greatest integer less than x (on a not-particularly-related note, this isn’t the formula I would have used a week ago, glad I got that cleared up).

I decided to do some performance comparisons with this problem, with a few different versions of code. [GRR. WordPress seems to not be doing the sourcecode “tag” like I thought it should below. Or I’m messing something up…]

In version 1, I’ll compute each Fibonacci number successively, using the recurrence relation, then calculate the number of digits until I’ve found a big enough number. My function looks like:

def solve(n):
    ret = 1
    prev, cur = 0, 1
    while(cur < 10**(n-1)):
        prev, cur = cur, prev + cur
        ret += 1
    return ret

In the second version, I’ll compute the n-th Fibonacci number using the formula

F_n = \frac{1}{\sqrt{5}}\left(\left(\frac{1+\sqrt{5}}{2}\right)^2-\left(\frac{1-\sqrt{5}}{2}\right)^2\right)

that was used last time we saw Fibonacci numbers. In fact, I’ll wrap that up into its own function, “digs_fib” that takes in an “n” and gives the number of digits of the appropriate Fibonacci number. Now my code is basically:

def fib(n):
    """ The nth fibonacci number """
    alpha = (1 + 5**.5) / 2
    beta = (1 - 5**.5) / 2
    return (1/5**.5) * (alpha**n - beta**n)
def digs_fib(n):
    """ Number of digits of fib(n) """
    return int(math.log(fib(n), 10)) + 1
def solve(n):
    ret = 1
    while(digs_fib(ret) < n):
        ret += 1
    return ret

Of course, that second term in the formula is smallish, and raised to big powers it gets smaller, so we could maybe ignore it. And then we’re taking the log of products and powers, so we can use log rules to re-write the expression for the number of digits, without directly calculating the Fibonacci number. My “digs_fib” function changes to the following:

def digs_fib(n):
    """ The approximate number of digits of the nth fibonum """
    alpha = (1+5**.5)/2
    return int(n*math.log(alpha,10)-.5*math.log(5, 10))+1

Finally, given a target number of digits d, we could just about solve for n in

d \approx \lfloor n\log_{10}((1+\sqrt{5})/2) - \frac{1}{2}\log_{10}(5)\rfloor + 1

and obtain

n\approx \dfrac{d-1+\frac{1}{2}\log_{10}(5)}{\log_{10}((1+\sqrt{5})/2)}.

This means that given d we can get an approximation to which Fibonacci number gives us that many digits in basically constant time. Now my “solve” is just

def solve(n):
    num = n - 1 + .5*math.log(5, 10)
    num /= math.log((1+5**.5)/2, 10)
    return int(num+1) # round up, approximately

I ran each of these through my little performance evaluation script and came up with a reasonable graph. I could only go up to asking for 300 digits, because after that the second version fails, since the numbers are too big. The other versions have no problem at all going up to 1000 digits (and well beyond). But anyway, here’s my graph:

You can barely see that constant time algorithm there along the x-axis. The scale of the y-axis is some linear change in actual time taken, but the shape of these graphs is what’s important anyway. Right?


Problem 14 – 3n+1

June 25, 2009

The “3n+1 problem” is one I mention to my calculus class every time we get to sequences. It’s just so fun.

I’ve coded up a few solutions. My first is a basic one where I have a function that takes a number and generates the sequence starting at that number. Then I just loop up to 1000000 looking at the lengths of these sequences.

def seq(num):
    """ Returns the 3n+1 sequence for num """
    ret = [num]
    while(not ret[-1] == 1):
        ret.append(ret[-1]%2 == 0 and ret[-1]//2 or 3*ret[-1]+1)
    return ret

def solve(bound):
    """ Find the number less than bound with largest 3n+1 sequence

    Returns the number, the length of its sequence, and the computation time
    t = time.time()
    retn,retl = 0,0
    for n in xrange(1,bound):
        s = seq(n)
        if(len(s) > retl):
            retn,retl = n,len(s)
    t = time.time() - t
    return ((retn,retl), t)

if __name__ == "__main__":
    print solve(int(sys.argv[1]))

This can be moderately improved by not passing around the entire sequence. Instead, just code up the helper function (“seq”, above) to return the length, and use it accordingly in the main loop (“solve”, above). I’ll not bother copying that code here.

The only real improvement I’ve made on this is to store values as we find them, so we can look them up again later. Here’s the way I coded that up:

def lenseq(n, mem):
    """ Recursively calculate the length of n using a memory dictionary

    This function will modify mem, adding new values to it
    if(n in mem):
        return mem[n]
    suc = n%2 and 3*n+1 or n//2
    lenseq(suc, mem)
    mem[n] = mem[suc] + 1
    return mem[n]

def solve(bound):
    """ Find the number less than bound with longest 3n+1 sequence

    Return ((the number, the length of its sequence), computation time)
    t = time.time()
    retn,retl = 0,0
    mem = {1:1}
    for n in xrange(1,bound):
        l = lenseq(n, mem)
        if(l > retl):
            retn,retl = n,l
    t = time.time() - t
    return ((retn,retl),t)

if __name__ == "__main__":
    print solve(int(sys.argv[1]))

And I did a little performance comparison, because it was easy. In the chart below, “Basic” is the first set of code above, “BasicPassLength” is the modification that doesn’t pass sequences around, just their lengths, and “StoreLengths” is the second set of code above.

The scale is log in both axes. The tickmark, n, on the x-axis refers to an upper bound of 10^n (the last tickmark, then, being the stated problem). The y-axis is less precise, but it’s along the lines of the log of the time taken, in seconds. The “Basic” code took about 2.5 minutes on the stated problem (bound of 1000000), “BasicPassLength” about 1.5 minutes, and the “StoreLengths” took about 20 seconds.

Just out of curiosity, I checked how many values ended up getting stored. There were 2168611 values stored, ranging up to 56991483520 (which, in case you are curious, has length 174). I haven’t thought up an obvious way to trim down which values get stored (maybe don’t do those above the bound?), or other ways to improve on this solution.

Problem 12 – Triangular Numbers

June 19, 2009

I went about counting divisors of a number t by looking at the prime factorization of t. If t=\prod p_i^{e_i}, then t has \prod 1+e_i divisors, because any divisor has a sort of “sub”-factorization, which is to say the same primes just to smaller (or possibly the same) powers (or 0, that’s why the “1+” is in each term in the product).

So, to determine the number of factors, we first find the prime factorization, and then do a quick product. Suppose we have a method “factor” which takes an integer n, and returns a list of tuples of the form (prime, power). So, for example, “factor(12)” would return “[ (2,2), (3,1) ]”. We can then pass such an object to the following function to calculate the number of divisors:

def numberDivisors(f):
    """ take a factorization and return the number of divisors """
    return reduce(lambda x,y:x*y, map(lambda t:t[1]+1,f), 1)

Now, I want to use this to find the number of divisors for triangular numbers. We could just loop over the triangular numbers, factoring each in turn. However, that’s pretty redundant. The n-th triangular number (let’s denote that by T_n) is \frac{1}{2}n(n+1). Exactly one of n or n+1 is even, so T_n=\frac{e}{2}\cdot o, where e is even and o is odd. If we factor \frac{e}{2} and o separately, we can multiply the factorizations together by simply adding powers. While we’re at it, if we store the factorization of \frac{e}{2} and o, and T_n, we’ll be able to use those factorizations to factor later triangular numbers. Probably if we thought about it for a while, we could get away with only storing the factorization of o and T_n, because the factorization of \frac{e}{2} won’t be used again (I think).

Here’s my implementation of these ideas:

def solve(goal):
    """ Find the first triangle number with more than 'goal' divisors

    Return the number and the time taken to find it
    Assumption: goal is a positive integer
    t = time.time()

    # lets store some factorizations to kick things off
    factorization = { 2:[[2,1]], 3:[[3,1]], 4:[[2,2]], \
                      5:[[5,1]], 6:[[2,1],[3,1]] }

    # since goal is a positive integer, by assumption, we are looking
    # for a triangle number with >=2 divisors

    divs, n, tri, epsilon = 2, 2, 3, 0

    # loop invariant: T_n=tri has divs divisors, epsilon = 0 or 1
    # and n+epsilon is even
    while(divs <= goal):
        n, epsilon = n+1, 1-epsilon
        tri = n * (n+1) // 2
        lesser, greater = (n+epsilon)//2, n+1-epsilon
        if not (lesser in factorization):
            factorization&#91;lesser&#93; = factor(lesser)
        if not (greater in factorization):
            factorization&#91;greater&#93; = factor(greater)

        # concatenate the lists of factors
        combfact = factorization&#91;lesser&#93; + factorization&#91;greater&#93;

        # we know how to factor the product
        factproddict = {}
        for p,e in combfact:
            if not (p in factproddict):
                factproddict&#91;p&#93; = e
                factproddict&#91;p&#93; += e

        factorization&#91;tri&#93; = factproddict.items() # convert dictionary => list
        divs = divisors(factorization[tri])

    t = time.time() - t
    return (tri,t)

I guess things could possibly be simplified a little if the “factor” function that I use there returned a dictionary instead of a list, but currently I’ve got it set up to return a list. By the way, the factorization loop I’m using is the “factorAndSieve” code from Numerical Recipes, slightly modified to return a list of tuples, instead of just a (possibly repeating) list of primes. If I’d been paying slightly more attention, he’s got a lengthy post about computing the number of divisors for an integer as well.

I decided to compare the two methods of solving this problem, as I am one to do (the graphs are just so much fun). The first method just factors every triangle number in turn until you have enough divisors (called “Basic” in the following graph). The second is the one with the code above, where we store factorizations (called “Store”). In the following graph, the x-axis indicates the number of divisors that you are hoping for (tickmark labelled n corresponding to 10n divisors), and the y-axis is some scaled “ln(time)”.

Just for some sort of definite idea about timing, the last tickmark, corresponding to 500 divisors (the stated problem) takes my computer around 10 seconds for the Basic script, and less than 1 second for the Store script.

Of course, the Store script uses more storage space. I think we could remove factors as we went to save some space. At the n-th triangular number, I think we never need factorizations of anything less than n/2, so we could remove those as we went. Just as a side note, when you look for the first triangular number with more than 500 divisors, the Store script above has 21524 stored factorizations, with numbers ranging from 2 to the answer (which I got to be up in the 76 million range). If anybody can point me in the right direction to learn about how to test the space usage of my scripts, I’d love to hear about it. Something similar to the built in time module, but for space, would be awesome.

The function taking n to the number of divisors of T_n is probably a vaguely interesting function. I plotted it with inputs up to 100:

and with inputs up to 300:

That first big spike in the second graph is at input 224, indicating T_{224}=25200 has 90 divisors. Apparently.

Problem 4

May 28, 2009

The code Chris posted to solve problem 4 loops over the larger factor as its outer loop, and then the smaller factor as its inner loop. I’d like to share a solution that flips this order, and another solution coming at this problem from a different angle.

I haven’t run across a string.reverse in python, so all the solutions below use the following ispali function:

def ispali(n):
    """ determine if input is palindromic, iteratively """
    while(idx <= len(s)//2):
        if(s&#91;idx&#93; != s&#91;-1-idx&#93;):
            return False
        idx += 1
    return True

Not that it really matters I guess, but I expect this to be twice as quick as comparing a string with its reverse anyway. You only need to go half way through a string to know if its a palindrome or not. But whatever. While I was thinking about how to write an ispali function, I also came up with the following (which needs an "import types" line)

&#91;sourcecode language="python"&#93;
def ispali(n):
    """ determine if input is palindromic, recursively """
    return type(n)==types.IntType and ispali_rec(str(n)) \
         or type(n)==types.StringType and \
            (len(n) < 2 or n&#91;0&#93;==n&#91;-1&#93; and ispali_rec(n&#91;1:-1&#93;))

Recursive functions, FTW! I guess there's not much point in getting it all in one statement (but not one line, that's what the '\'s are for), besides some sort of fun. I was reading <a href="">Chapter 4 of Dive Into Python</a>, and learning about the <a href="">type</a> function, and <a href="">how 'and' and 'or' work</a> in python (which is interesting, you should check it out) when I started coding this function. (This chapter also inspired me to re-write my performance testing script in python, and make each of my scripts modules... I can say more about this another time if anybody is interested).

So, anyway, on to new solutions for the actual problem. First, flip our nested loops, looping over the smaller factor for the outer loop.

def max_for_pali(smaller, bound):
    """ determine largest n in [smaller,bound] with smaller*n a palindrome

    return a tuple with 1 element, n, if such an n exists
    return an empty tuple if no such n exists
    ret = bound
    while(ret >= smaller):
        if ispali(ret * smaller):
            return (ret,) # tuple of size 1, needs trailing ','
        ret -= 1
    return ()

def solve(digits):
    """ the main solver function """

    # the bounds
    lower_b = 10**(digits - 1) + 1
    upper_b = 10**(digits) - 1

    # the biggest product, and its two factors
    max_pali, sm_fact, lg_fact = 0, 0, 0

    smaller = upper_b
    while(smaller >= lower_b):
        if(smaller * upper_b < max_pali):
        for m in max_for_pali(smaller, upper_b):
            p = m*smaller
            if(p > max_pali):
                max_pali, sm_fact, lg_fact = p, smaller, m
        smaller -= 1

    print "%d*%d=%d" % (sm_fact, lg_fact, max_pali)

I’m amused with the return value of the max_for_pali function, and its use in the main solve function. First of all, you might notice that to define a tuple of length one, you have to include a trailing comma. Otherwise it gets treated as a single object, not a tuple-worth of a objects. Then, in the main solve function, I loop over all solutions returned by max_for_pali – but there’s only ever 0 or 1 solution! I thought this was a kinda fun way to avoid an if(len) sort of statement. Of course, there’s no real call for defining the function max_for_pali anyway, but I like it.

This solution also demonstrates a way to make strings that, if memory serves, has not been used in any post here yet. The ‘%’ operator works on strings, taking a formatting string (on the left) and a tuple (on the right). The motto seems to be that this is like printf in c.

The next solution is a little different, in that its outer loop is looping over the palindromes themselves, and the inner loop is to find the factors. My first thought about this approach was that it wouldn’t be very efficient, so I wasn’t even going to bother with it. But I thought about it some more and decided to code it up, and I’m glad I did.

def make_pali(n):
    """ make a palindrome with leading digits n """
    s = str(n)
    l = len(s)
    for i in xrange(0,l):
        s = s + s[l-1-i]
    return int(s)

def factors(n, d):
    """ try to find a nice factorization of n with d digits per factor """
    i = int(math.sqrt(n))
    j = n//i
    while(j < 10**d and i >= 10**(d-1)): # expect the first test to fail first
        if(i*j == n):
            return [i,j]
        i -= 1
        j = n//i
    return []

def solve(digits):
    """ main function """

    max_pali, prod = 0, []

    pali_head = 10**digits - 1

    while(pali_head >= 10**(digits - 1)):
        max_pali = make_pali(pali_head)
        prod = factors(max_pali, digits)
        pali_head -= 1

    print "%d*%d=%d" % (prod[0], prod[1], max_pali)

This solution has the advantage of stopping as soon as the answer is found, instead of worrying about if there is another combination of factors that’ll give a larger palindrome than one already found.

While testing these scripts, I noticed some interesting patterns. Here are what the above scripts have given me as the largest palindomic products, based on the number of digits of the two factors:

 1: 3 * 3 = 9
 2: 91 * 99 = 9009
 3: 913 * 993 = 906609
 4: 9901 * 9999 = 99000099
 5: 99681 * 99979 = 9966006699
 6: 999001 * 999999 = 999000000999
 7: 9997647 * 9998017 = 99956644665999
 8: 99990001 * 99999999 = 9999000000009999

This points out a few things. First of all, my second solution, as posted above, actually fails on the first input. That’s because there’s a built-in assumption there that the maximum palindromic product of two n digit numbers will be a 2n digit number. There also seems to be a pattern to the factors, at least for even inputs – the larger factor is a string of 9s. Does that continue? Apparently NOT! Here’s some more values (these took a while – more on performance later):

 9: 999920317 * 999980347 = 999900665566009999
10: 9999986701 * 9999996699 = 99999834000043899999
11: 99999943851 * 99999996349 = 9999994020000204999999

With these values determined, one can go back and make another improvement or two (which we could probably have guessed at the beginning), if we allow ourselves some assumptions. Specifically, we now expect that the biggest palindrome will begin with a 9, and therefore end with a 9. Since we’re looking at its factors, we can rule out any factors that are even, or multiples of 5, since neither of these will ever give a product ending in 9.

Ok, you all know by now that I can’t resist performance comparisons, so here’s a graph:

The tickmarks on the x-axis are for the number of digits of the factors, and the y-axis is actually the logs of the times (in some reasonable scale). “LoopS” refers to the solution whose outer loop is over the smaller factor (my first solution above), and “LoopL” loops over the larger factor in its outer loop (my own implementation of Chris/Eric’s solution). “Factor” is my second solution above, and “FewFactor” has a couple of lines added to Factor to ignore multiples of 2 and 5.

What I find interesting here is that LoopS and LoopL switch back and forth between which is faster. Looking back at how the factors are coming out, this probably seems reasonable.

I couldn’t resist going out to a few more digits (I took it to 11 above), but only with the FewFactor solution, which seemed to be winning out, timewise. Here are the times it took to run for those higher digits:

 7: 7 seconds
 8: 37 seconds
 9: 3671 seconds ~ 1 hour
10: 298 seconds ~ 5 minutes (lots quicker than 9 digits!)
11: 5402 seconds ~ 90 minutes
12: ?

I finally gave up on 12 digits after running for nearly 30 hours.

Problem 2 – various approaches

May 20, 2009

Allow me to begin with a minor improvement of the standard while-loop solution to problem 2. Noticing that every third Fibonacci number is even, we might as well only calculate F(3n), if possible. Of course, Fibonacci numbers naturally come in adjacent pairs, so perhaps it’d be easy enough to calculate b(n)=F(3n) and c(n)=F(3n+1). With starting values b(0)=0,c(0)=1, it’s easy to determine the recurrence b(n+1)=b(n)+2\cdot c(n) and c(n+1)=2\cdot b(n)+3\cdot c(n). Now we loop over the b(n), until they’re past the upper bound (4000000 in the problem as written). Coded up, I’ve got it as follows (I trimmed some import and timing and print lines out):

b,c = 0,1
sum = 0
bound = int(sys.argv[1])

while(b <= bound):
    sum += b
    b,c = b+2*c, 2*b+3*c

In my mind, I expect this to be marginally faster than the while loop as used in other solutions, because we're essentially doing all the same arithmetic, but only 1/3 as many tests (and fewer multiple-assignments). I'll return to performance comparisons shortly.

The other solution I'd like to give code for is the generalization of Tim's <a href="">mathematical solution</a> to any bound. Again trimming out some import (you need to import math for this to work in pure Python - no Sage), timing, and print statements, I've got:

def geom(b,p):
    """ sum (b^3)^n from n=1 to n=p """
    return (1-(b**3)**(p+1)) / (1-b**3)

bound = int(sys.argv[1])

alpha = (1 + sqrt(5)) / 2
beta = (1 - sqrt(5)) / 2

# what index is the largest even fibonum less than our bound?
# the factor of sqrt(5) on the bound comes from the idea that
# fib(n) ~ alpha^n / sqrt(5)
# and the factor of 3 on alpha is because we want even fibonums
maxn = floor(log(bound * sqrt(5)) / log(alpha**3))

# formula for sum of the even fibonums up to the bound
sum = int((1/sqrt(5)) * (geom(alpha,maxn) - geom(beta,maxn)))

No loops at all. Nice. Of course, there’s a lot of floating point operations going on there. How does that affect performance?

The following chart compares my solution (called “evens” in the chart), with the solution calculating all Fibonacci numbers (called “all”), and Tim’s (“noloop”). The tickmark n along the x-axis indicates that the upper bound (maximum even Fibonacci number to sum to) is 4\cdot 10^n (why the 4? the original problem had one, is all). Times are measure in milliseconds on the y-axis.

We see that, indeed, the “noloop” solution looks approximately constant-time, and that the “even” loop slightly beats the “all” loop. What strikes me as more interesting, at first anyway, is that both looping solutions beat the noloop solution at the bound given in the problem (the tickmark 6 on the x-axis). After some thought, it almost makes sense, because there are only 11 even Fibonacci numbers less than 4000000, so it’s not like the loops are taking long.

I’m also intrigued by the sudden jump in both looping solutions between x=8 and x=9. These correspond to a bound of 4\cdot 10^8 and 4\cdot 10^9, respectively. Taking log base 2 of these, we get \approx 28.57 and \approx 31.89, right around 32, a noteable bound for my computer. In fact, the next even Fibonacci number after 4\cdot 10^9 is F(48)=4,807,526,976, which happens to be bigger than 2^{32}=4,294,967,296. And this is the number we’d make before ending the while loop.

I’m not quite done. Tim commented that for large numbers, \alpha was the dominating term in the expression for F(n). I wondered if the solution I coded up above, based on his math, would give different answers from the looping solutions for some particular bounds. Running a few loops, I noticed that the solutions disagree, unsurprisingly enough, at (some of the) bounds that are even Fibonacci numbers (which, generalizing the original problem, we should include). Here’s a chart using even Fibonacci numbers as bounds:

       Bound        Loop         NoLoop
           2           2           2
           8          10           2
          34          44          44
         144         188          44
         610         798         798
        2584        3382         798
       10946       14328       14328
       46368       60696       14328
      196418      257114      257114
      832040     1089154      257114
     3524578     4613732     4613732
    14930352    19544084    19544084
    63245986    82790070    82790070
   267914296   350704366   350704366
  1134903170  1485607536  1485607536
  4807526976  6293134512  6293134512

For the first several, the non-looping solution (as coded above) alternates between correct, and incorrect. It seems this comes from the \beta term of F(n) being negative, so that even or odd powers have some affect. It looks like the last incorrect value is at the bound 832040.

At this point, I think trying to make the non-looping solution work correctly for all even bounds is most easily patched up by just storing a table of the first several, and then computing it for any higher bounds. Or switching between a loop and non-loop solution based on the bound being big enough.

Or, Sage does symbolic manipulations, let’s take the above non-looping solution, and drop it into Sage with an extra if-switch to see if the upper bound is an even Fibonacci number. I coded that as

if(bound == ((1/sqrt(5))*(alpha**(3*(maxn + 1)) - beta**(3*(maxn + 1))))):
     maxn += 1

and inserted that just after the “maxn = ” line above (so, at line 15). Running this, with the bound an even Fibonacci number, now always seems to agree with the looping solutions.

Ok, I think that’s all I have to say about problem 2.