Archive for June 5th, 2009

Problem 7 – The nth prime

June 5, 2009

I decided to try my hand at problem 7, which is to determine the n-th prime, for some chosen n. I’m sure there’s more math out there on this subject than I could ever read and understand, so I’ll take a pretty simple view on things.

It doesn’t get much simpler, in terms of primes algorithms, than the Sieve of Eratosthenes. I’ll be entirely naive about things, and not even toss out evens from my list, and start my list at 0, even though I know the first prime is 2. This keeps the array indexing trivial (let’s focus on one problem at a time)

def sieveto(n):
“”” Determine all the primes up to n

The basic sieve of Eratosthenes, with some memory of the
implementation from
isprime = [True for n in xrange(0,n+1)]
isprime[0],isprime[1] = False, False
idx = 2
while(idx <= int(math.sqrt(n))): for fi in xrange(idx*idx,n+1,idx): isprime[fi] = False idx += 1 while(not isprime[idx]): idx += 1 return [i for i in xrange(2,n+1) if isprime[i]] [/sourcecode] The difficulty in this problem is that we don't know the bound. In fact, you could take the problem to be "find the bound that you need to run the sieve to." So, I decided to build some initial string of primes, up to some chosen bound, and then keep extending my list until I had enough primes. If I already have a list of all the primes up to some prime $latex p$, then it's easy to get a list of all the primes up to $latex p^2$. When I'd run the sieve up to $latex p^2$, the primes, whose multiples I'd cross out, are all those up to $latex p$. So I just need to go through the primes I already know, cross out all their multiples, and I'll get up to $latex p^2$. [sourcecode language="python"] def extend(smpr): """ Take a list of primes, find primes up to the square of the largest Assumes the input list is in increasing order, and contains all of the primes up to the largest prime it contains (smpr[-1]). Return a list of the new primes """ origmax = smpr[-1] # lets make an isprime array, so that in the end # isprime[n] iff n+origmax+1 is prime # so basically, isprime indexes 0..origmax^2 with 0..origmax removed rngbound = origmax * (origmax - 1) isprime = [True for n in xrange(0,rngbound)] for p in smpr: st = 0 # starting index to start cross things out at if(not ((origmax + 1) % p == 0)): st = p*(((origmax+1)//p)+1) - (origmax+1) for idx in xrange(st,rngbound, p): isprime[idx] = False # remember to shift things to their appropriate value, before returning return map(lambda x:x+origmax+1, [n for n in xrange(0,rngbound) if isprime[n]]) [/sourcecode] So, I throw these two functions together, by first running "sieveto" on some chosen bound, and then continually "extend"ing the result until I've got a list that's big enough to tell me the $latex n$-th prime. [sourcecode language="python"] goal,smallb = 100, 50 primes = sieveto(smallb) while(len(primes) < goal): primes += extend(primes) print primes[goal-1] [/sourcecode] The real trick with this idea is picking the correct value for "smallb". The best value would be something just a little bit bigger than the prime we're looking for. Alternatively, something around the square root of the prime, or the fourth root, etc. The difference can be drastic. If our goal is to find the 100th prime, starting with a "smallb" of 28 takes about 350 times as long as starting with 29 (admittedly, we're talking less than half a second on my computer, even in the worst case). The reason is that starting with 28, we run "extend" twice, ending up with a list of the first 23925 primes (when we only needed the first 100). Running, instead, with "smallb" as 29, we only extend once, generating a list of 146 primes. Of course, starting with "smallb" at 541, we don't need to extend at all, since 541 is the 100th prime.

Problem 8 – Some Improvement

June 5, 2009

I was able to make an improvement in the performance of my script for problem 8. I’m still not particularly sure how to extend my method to taking products of other length substrings of the string of digits (5, in the statement of the problem).

The idea is that we’d like to trim down the number of multiplications we do, remembering some values along the way for future use, instead of doing them over again. The trick seems to be storing the right combinations of products. Here’s the function, which I’ll try to explain below.

def bigprodfive(s):
    """ Largest product of n consectuve digits of s

    We assume that len(s) >= 5, and that it contains only digits

    t = time.time()

    # split the string into a bunch of substrings that have no 0s
    # throw out any substrings that have fewer than 5 characters
    blocks = filter(lambda l: len(l) >= 5, s.split("0"))

    # if there are no substrings, any product would be 0
    if not len(blocks):
        t = time.time() - t
        return (0,t)

    ret = 0
    for b in blocks:
        d = map(int, b) # convert to list of integers

        # store some products
        pair = d[2]*d[3]
        quad = d[0]*d[1]*pair

        # some of the indices we might use are out of bounds
        # and will throw an exception. but that's a good sign
        # we can just stop with this substring (b), and move on
            idx = 0
            while(idx <= len(b)-5):
                # see if the string starting at idx gives a bigger product
                ret = max(ret, quad * d&#91;idx+4&#93;)

                # update our stored products
                quad,pair = pair, d&#91;idx+4&#93;*d&#91;idx+5&#93; # may throw up
                quad *= pair

                # see if the string starting at idx+1 gives a bigger product
                ret = max(ret,d&#91;idx+1&#93;*quad)

                idx += 2
            # don't do anything, "catch" the exception so it
            # doesn't interrupt our calculations

    t = time.time() - t
    return (ret,t)

Hopefully, with the somewhat excessive comments, the main thing I should describe is the stored products. Here's the picture that gave me the idea, d_i is d&#91;i&#93;:
<pre>0: d_0*d_1  d_2*d_3  d_4
1:     d_1  d_2*d_3  d_4*d_5
2:          d_2*d_3  d_4*d_5  d_6
3:              d_3  d_4*d_5  d_6*d_7
4:                   d_4*d_5  d_6*d_7  d_8</pre>
On the left we have the variable idx, increasing from 0, and we show the terms that get multiplied together to determine the product for the substring starting at idx. Notice that, e.g., from idx=1 to idx=2, four of the terms are unchanged. And then from idx=3 to idx=4, same thing. My variable "quad" in the code above stores these products. Notice, furthermore, that to get from the quad for idx=2 to the quad for idx=3, we keep a pair (d_4*d_5, for this particular idx change). Instead of re-calculating that product yet again, we store that product (as "pair"), and then to get the next quad, we can multiply this pair by the next one (which would be d_6*d_7, continuing on).

If we keep track of quads and pairs this way, they only change when we get to an odd value for "idx". My original code didn't wrap things in the try block, and within the while loop just tested the parity of "idx" to decide about updating values or not. But that's a lot of testing, so I thought I'd just take care of idx and idx+1 at the same time, starting with an even idx. But then you might run into trouble at the very end, because you try indexing the idx+5 position in the string, which might not be there (it might be the length of the string, one too far). Well, I could put test code back in my while loop to avoid this issue, but it'll pass every time except possibly once. So I realized I could just wrap the entire while loop in the try block (obtaining the code above), and if we index out of range, we knew we were done anyway, so there's no real harm.

I tried keeping track of how many multiplications this does, based on the input string, and comparing to some of my other solutions. The original solution I posted, which doesn't remember any saved values, does approximately 4 times the string length multiplications. This newer code does approximately 2 times the string length multiplications.

I don't have any sweet performance comparison graphs for you this time, but my little bit of testing indicates that this new solution is approximately 5-6 times faster than my first solution.

I'd still love to get a good generalization of this, taking products of substrings of length other than 5...

In the mean time, I'll post yet another solution. I've been thinking very hard about what products to store, because I want to avoid doing too many multiplications. An unstated goal was that I also wanted to do no divisions, for some reason deciding that this is a slow operation. Of course, if you allow divisions, the process is much simpler, and easier to generalize. To get from the product starting at idx, to the product starting at idx+1, you just divide by the digit at idx, and multiply by an end digit (idx+4 in the problem, but idx+len-1 for length len substrings). I realized, while I was writing this post for the solution above, that I'd never bothered coding this solution up, and that I really should, to see how it compared. In the code above, we just replace the for loop, over blocks, with the following code:

for b in blocks:
    d = map(int, b)

    p = d[0]*d[1]*d[2]*d[3]
    f = 1 # the "first digit" of the previous product

    for idx in xrange(0,len(b)-5+1):
        p = p // f # pull off old first digit
        p *= d[idx+4] # tack on new last digit
        f = d[idx] # update first digit
        ret = max(ret, p)

That’s not only quicker to write, easier to understand, and easier to generalize, it’s faster as well! Not a lot faster, but seems to be perhaps about 10% faster than the first solution above. Silly me.