Archive for June, 2009

Problem 13 – Too Easy?

June 26, 2009

I’m a little confused about problem 13. I wanted to come up with some nice way to find the first (leftmost) 10 digits of the sum. The only way I came up with was to actually do the sum though. I thought maybe that coding this up would point out something I hadn’t thought about that made the problem more difficult. Like maybe the 50 digit numbers were too big to store as integer values, or the sum was, or something. So I just coded up the naive solution to see what would happen:

from __future__ import with_statement

import sys, time

def readin(filename):
    """ Input a file, reading into a list of integers """
    ret = []
    with open(filename) as file:
        for line in file:
            ret.append(int(line))
    return ret

def solve(nums):
    """ The leading 10 digits of the sum of the numbers in the list nums """
    return str(sum(nums))[0:10] # loving this syntax

if __name__ == "__main__":
    t = time.time()
    ans = solve(readin(sys.argv[1]))
    t = time.time() - t
    print solve(readin(sys.argv[1])), t

I ran that, and it spit out an answer in about 0.0004 seconds.

So, am I missing something from this exercise, or is it that easy? Is there a more clever way to do it anyway?

Advertisements

Problem 14 – A Class

June 26, 2009

I was thinking a little bit more about my solutions to problem 14, and thinking that passing around the “mem” dictionary wasn’t particularly efficient. I didn’t want to make it a global variable, because I have a sort of stigma against those. I think, maybe, in Python it’s a stigma I should/could get over, because the file named prob14_v3.py that contains my script is actually a module (Python does that bit for me, no extra work on my part) that gets its own namespace without any extra work from me. But I could be wrong about that.

So, anyway, I thought maybe I’d try wrapping up my code into a class, and making the dictionary an attribute of that class. Here’s what I coded up:

class ProblemSolver:
    def __init__(self, bound = 1000000):
        self.mem = {1:1}
        self.bound = bound

    def lenseq(self, n):
        if(n in self.mem):
            return self.mem[n]
        suc = n%2 and 3*n+1 or n//2
        self.lenseq(suc)
        self.mem[n] = self.mem[suc] + 1
        return self.mem[n]

    def solve(self):
        retn,retl = 0,0
        for n in xrange(1, self.bound):
            l = self.lenseq(n)
            if(l > retl):
                retn,retl = n,l
        return (retn,retl)

if __name__ == "__main__":
    solver = ProblemSolver(int(sys.argv[1]))
    print solver.solve()

I’ve not done much in the way of rigorous performance testing, comparing this script with my other solution from yesterday, or with one that just uses a global variable to store the dictionary. A few quick tests indicate that they take approximately the same time to run.

Problem 14 – 3n+1

June 25, 2009

The “3n+1 problem” is one I mention to my calculus class every time we get to sequences. It’s just so fun.

I’ve coded up a few solutions. My first is a basic one where I have a function that takes a number and generates the sequence starting at that number. Then I just loop up to 1000000 looking at the lengths of these sequences.

def seq(num):
    """ Returns the 3n+1 sequence for num """
    ret = [num]
    while(not ret[-1] == 1):
        ret.append(ret[-1]%2 == 0 and ret[-1]//2 or 3*ret[-1]+1)
    return ret

def solve(bound):
    """ Find the number less than bound with largest 3n+1 sequence

    Returns the number, the length of its sequence, and the computation time
    """
    t = time.time()
    retn,retl = 0,0
    for n in xrange(1,bound):
        s = seq(n)
        if(len(s) > retl):
            retn,retl = n,len(s)
    t = time.time() - t
    return ((retn,retl), t)

if __name__ == "__main__":
    print solve(int(sys.argv[1]))

This can be moderately improved by not passing around the entire sequence. Instead, just code up the helper function (“seq”, above) to return the length, and use it accordingly in the main loop (“solve”, above). I’ll not bother copying that code here.

The only real improvement I’ve made on this is to store values as we find them, so we can look them up again later. Here’s the way I coded that up:

def lenseq(n, mem):
    """ Recursively calculate the length of n using a memory dictionary

    This function will modify mem, adding new values to it
    """
    if(n in mem):
        return mem[n]
    suc = n%2 and 3*n+1 or n//2
    lenseq(suc, mem)
    mem[n] = mem[suc] + 1
    return mem[n]

def solve(bound):
    """ Find the number less than bound with longest 3n+1 sequence

    Return ((the number, the length of its sequence), computation time)
    """
    t = time.time()
    retn,retl = 0,0
    mem = {1:1}
    for n in xrange(1,bound):
        l = lenseq(n, mem)
        if(l > retl):
            retn,retl = n,l
    t = time.time() - t
    return ((retn,retl),t)

if __name__ == "__main__":
    print solve(int(sys.argv[1]))

And I did a little performance comparison, because it was easy. In the chart below, “Basic” is the first set of code above, “BasicPassLength” is the modification that doesn’t pass sequences around, just their lengths, and “StoreLengths” is the second set of code above.

The scale is log in both axes. The tickmark, n, on the x-axis refers to an upper bound of 10^n (the last tickmark, then, being the stated problem). The y-axis is less precise, but it’s along the lines of the log of the time taken, in seconds. The “Basic” code took about 2.5 minutes on the stated problem (bound of 1000000), “BasicPassLength” about 1.5 minutes, and the “StoreLengths” took about 20 seconds.

Just out of curiosity, I checked how many values ended up getting stored. There were 2168611 values stored, ranging up to 56991483520 (which, in case you are curious, has length 174). I haven’t thought up an obvious way to trim down which values get stored (maybe don’t do those above the bound?), or other ways to improve on this solution.

Problem 11 – Products in a Grid

June 19, 2009

Problem 11 seems somewhat straightforward, once you’ve got the grid read in from a file or so. I should probably feel pretty guilty about not doing these more generally, to accept products of a chosen number (instead of 4) of adjacent terms. Right now, though, I just feel tired.

Here’s what I came up with, as a first go:

def solve(grid):
    ret = 0
    for i in xrange(0,len(grid)):
        for j in xrange(0,len(grid[0]) - 4 + 1):
            p = grid[i][j] * grid[i][j+1] * grid[i][j+2] * grid[i][j+3]
            ret = max(ret, p)
            p = grid[j][i] * grid[j+1][i] * grid[j+2][i] * grid[j+3][i]
            ret = max(ret, p)
            p = grid[j][j] * grid[j+1][j+1] * grid[j+2][j+2] * grid[j+3][j+3]
            ret = max(ret, p)
    return ret

In the inner loop, I compute the products in each direction, starting at the index [i][j]. I realized you can improve this inner loop slightly, by not multiplying each product by grid[i][j] initially. Since all of them are being multiplied by that factor, you might as well find the biggest of the products without that factor, then multiply by it, and then take the max with the previous best result. And if grid[i][j] is 0, there’s no point in doing any products starting there. So you can replace the inner loop above with the following:

            if(grid[i][j]):
                p = max(grid[i][j+1] * grid[i][j+2] * grid[i][j+3],
                        grid[j+1][i] * grid[j+2][i] * grid[j+3][i],
                        grid[j+1][j+1] * grid[j+2][j+2] * grid[j+3][j+3])
                ret = max(ret, p*grid[i][j])

I then realized that this problem could be reduced to problem 8. In that problem, you were just looking for the biggest product among adjacent things in a list. The code I used, today, to solve that, is basically the code from the end of my post last time I talked about problem 8:

def bigprod(list, num):
    """ Largest product of num consecutive elements of list

    Assumptions: len(list) >= num, list contains only positive integers
    """
    p = reduce(lambda x,y:x*y, list[0:num-1], 1)
    f = 1 # a fake (for now) first digit of previous product
    ret = 0

    for idx in xrange(0, len(list) - num + 1):
        p = p // f
        p *= list[idx+num-1]
        f = list[idx]
        ret = max(ret, p)
    return ret

So it remains to see how to convert the grid to a list. We can get sort of “horizontally adjacent” terms by just working across rows. If we stick a 0 between rows, then products of adjacent terms in the list are only non-zero when they came from terms that were adjacent, horizontally, in the grid. We can do essentially the same thing for vertically adjacent, and diagonally adjacent, though the indexing takes slightly more thinking about for the diagonal bits. Here’s my code implementing this idea:

def solve(grid):
    """ The biggest product of 4 adjacent numbers in the grid

    Method: re-write grid as a bunch of lists, have bigprod to the work
    Assumption: the grid is square
    """
    adjH, adjV, adjMD, adjDU, adjDL = [], [], [], [], []
    for i in xrange(0, len(grid)):
        for j in xrange(0, len(grid)):
            adjH.append(grid[i][j])
            adjV.append(grid[j][i])
            if(i and i < len(grid) - 4 + 1 and j < len(grid) - i):
                adjDU.append(grid&#91;j&#93;&#91;j+i&#93;)
                adjDL.append(grid&#91;j+1&#93;&#91;j&#93;)
        adjH.append(0)
        adjV.append(0)
        adjDU.append(0)
        adjDL.append(0)
        adjMD.append(grid&#91;i&#93;&#91;i&#93;)

    # stick all of the adjacents together, they already end in 0 (besides MD)
    list = adjH + adjV + adjDU + adjDL + adjMD

    ret = 0

    while(len(list) >= 4):
        endidx = len(list)
        try:
            endidx = list.index(0)
        except:
            pass

        if(endidx >= 4):
            ret = max(ret, bigprod(list[0:endidx], 4))

        list = list[endidx+1:]

    return ret

I’ve not done a particularly formal performance comparison of these, but an informal look indicates that the first solution is much faster. It completes the problem on the assigned input (on my machine) in about .002 seconds, while the second solution takes .003 seconds. Big difference, no? I also fed them random grids of larger sizes. Looking at grids with size 200, the first solution takes along the order of .2 seconds, while the second is more like 1.5 seconds.

Problem 12 – Triangular Numbers

June 19, 2009

I went about counting divisors of a number t by looking at the prime factorization of t. If t=\prod p_i^{e_i}, then t has \prod 1+e_i divisors, because any divisor has a sort of “sub”-factorization, which is to say the same primes just to smaller (or possibly the same) powers (or 0, that’s why the “1+” is in each term in the product).

So, to determine the number of factors, we first find the prime factorization, and then do a quick product. Suppose we have a method “factor” which takes an integer n, and returns a list of tuples of the form (prime, power). So, for example, “factor(12)” would return “[ (2,2), (3,1) ]”. We can then pass such an object to the following function to calculate the number of divisors:

def numberDivisors(f):
    """ take a factorization and return the number of divisors """
    return reduce(lambda x,y:x*y, map(lambda t:t[1]+1,f), 1)

Now, I want to use this to find the number of divisors for triangular numbers. We could just loop over the triangular numbers, factoring each in turn. However, that’s pretty redundant. The n-th triangular number (let’s denote that by T_n) is \frac{1}{2}n(n+1). Exactly one of n or n+1 is even, so T_n=\frac{e}{2}\cdot o, where e is even and o is odd. If we factor \frac{e}{2} and o separately, we can multiply the factorizations together by simply adding powers. While we’re at it, if we store the factorization of \frac{e}{2} and o, and T_n, we’ll be able to use those factorizations to factor later triangular numbers. Probably if we thought about it for a while, we could get away with only storing the factorization of o and T_n, because the factorization of \frac{e}{2} won’t be used again (I think).

Here’s my implementation of these ideas:

def solve(goal):
    """ Find the first triangle number with more than 'goal' divisors

    Return the number and the time taken to find it
    Assumption: goal is a positive integer
    """
    t = time.time()

    # lets store some factorizations to kick things off
    factorization = { 2:[[2,1]], 3:[[3,1]], 4:[[2,2]], \
                      5:[[5,1]], 6:[[2,1],[3,1]] }

    # since goal is a positive integer, by assumption, we are looking
    # for a triangle number with >=2 divisors

    divs, n, tri, epsilon = 2, 2, 3, 0

    # loop invariant: T_n=tri has divs divisors, epsilon = 0 or 1
    # and n+epsilon is even
    while(divs <= goal):
        n, epsilon = n+1, 1-epsilon
        tri = n * (n+1) // 2
        lesser, greater = (n+epsilon)//2, n+1-epsilon
        if not (lesser in factorization):
            factorization&#91;lesser&#93; = factor(lesser)
        if not (greater in factorization):
            factorization&#91;greater&#93; = factor(greater)

        # concatenate the lists of factors
        combfact = factorization&#91;lesser&#93; + factorization&#91;greater&#93;

        # we know how to factor the product
        factproddict = {}
        for p,e in combfact:
            if not (p in factproddict):
                factproddict&#91;p&#93; = e
            else:
                factproddict&#91;p&#93; += e

        factorization&#91;tri&#93; = factproddict.items() # convert dictionary => list
        divs = divisors(factorization[tri])

    t = time.time() - t
    return (tri,t)

I guess things could possibly be simplified a little if the “factor” function that I use there returned a dictionary instead of a list, but currently I’ve got it set up to return a list. By the way, the factorization loop I’m using is the “factorAndSieve” code from Numerical Recipes, slightly modified to return a list of tuples, instead of just a (possibly repeating) list of primes. If I’d been paying slightly more attention, he’s got a lengthy post about computing the number of divisors for an integer as well.

I decided to compare the two methods of solving this problem, as I am one to do (the graphs are just so much fun). The first method just factors every triangle number in turn until you have enough divisors (called “Basic” in the following graph). The second is the one with the code above, where we store factorizations (called “Store”). In the following graph, the x-axis indicates the number of divisors that you are hoping for (tickmark labelled n corresponding to 10n divisors), and the y-axis is some scaled “ln(time)”.

Just for some sort of definite idea about timing, the last tickmark, corresponding to 500 divisors (the stated problem) takes my computer around 10 seconds for the Basic script, and less than 1 second for the Store script.

Of course, the Store script uses more storage space. I think we could remove factors as we went to save some space. At the n-th triangular number, I think we never need factorizations of anything less than n/2, so we could remove those as we went. Just as a side note, when you look for the first triangular number with more than 500 divisors, the Store script above has 21524 stored factorizations, with numbers ranging from 2 to the answer (which I got to be up in the 76 million range). If anybody can point me in the right direction to learn about how to test the space usage of my scripts, I’d love to hear about it. Something similar to the built in time module, but for space, would be awesome.

The function taking n to the number of divisors of T_n is probably a vaguely interesting function. I plotted it with inputs up to 100:

and with inputs up to 300:

That first big spike in the second graph is at input 224, indicating T_{224}=25200 has 90 divisors. Apparently.

A Question About Input

June 18, 2009

Has anyone figured out how to import data from text files?  My initial approach to problem 11 was to convert the entire table to a string by hand.  This was slow and tedious, but it worked.  After getting the correct answer, I looked at the posts on Project Euler.net and learned some more efficient methods.  But these still require me to type a few characters for each line of input and I don’t think they’d be practical for something like problem 67.

Divisors

June 18, 2009

I haven’t posted in a while either.  So to compliment Chris’ post, I thought I’d put up the code I used to count the number of divisors in problem 12.  It isn’t very pretty, but it gets the job done

>>> import math
>>> def div(x):
          v=2
          y=2
          s=math.sqrt(x)
          if s==int(s):
                    v=3
          while y

Triangle numbers

June 17, 2009

I feel bad for not having worked on any problems since Problem 6. So here is my Mathematica solution to Problem 12.

Triangle[n_] := n (n + 1)/2;
n = 1;
While[Length[Divisors[Triangle[n]]] < 500, n++]; Print[Triangle[n]] [/sourcecode] As usual, I cheated by not programming in Python or Sage. But I figure this is better than nothing. I suspect I would get a lot more out of this programming project if I thought about how to write my own Divisors algorithm. Oh well.

Problem 9

June 12, 2009

I didn’t feel right letting the week go by with no solutions posted, so I rigged one up for problem 9 (find the unique Pythagorean triple which sums to 1000). To generate Pythagorean triples, I used what is identified as Euclid’s Formula on the Wikipedia page for Pythagorean triples. After a little while, I decided the appropriate generalization of this problem was to take in any number, and find all of the Pythagorean triples that sum to the given number. Here’s what I came up with:

import sys
goal = len(sys.argv) >= 2 and int(sys.argv[1]) or 1000

answers = []

for n in xrange(1,goal):
    for m in xrange(n+1,goal,2):
        a,b = 2*m*n,m*m-n*n
        trip = (min(a,b),max(a,b),m*m+n*n)
        s = sum(trip)
        if(goal % s == 0):
            k = goal // s
            trip = map(lambda x:k*x,trip)
            if not (trip in answers):
                answers.append(trip)

I’m fairly certain that the loops could be cut down in size, but I’m a bit tired right now to see exactly how far. I increment m by 2 because it keeps the parity of m and n different, which Wikipedia claims is a requirement to generate a “primitive” triple.

I keep hoping to find a function that will take a list, and return just the unique objects from that list, but have yet to run across it. It’d just about work to make a set of the objects, but apparently “list objects are unhashable”, and so doing sets of tuples doesn’t go like I expect it to. Probably because I haven’t read enough, or thought quite enough about what is going on. Maybe one of these days I’ll get it.

Problem 7 – The nth prime

June 5, 2009

I decided to try my hand at problem 7, which is to determine the n-th prime, for some chosen n. I’m sure there’s more math out there on this subject than I could ever read and understand, so I’ll take a pretty simple view on things.

It doesn’t get much simpler, in terms of primes algorithms, than the Sieve of Eratosthenes. I’ll be entirely naive about things, and not even toss out evens from my list, and start my list at 0, even though I know the first prime is 2. This keeps the array indexing trivial (let’s focus on one problem at a time)

def sieveto(n):
“”” Determine all the primes up to n

The basic sieve of Eratosthenes, with some memory of the
implementation from numericalrecipes.wordpress.com
“””
isprime = [True for n in xrange(0,n+1)]
isprime[0],isprime[1] = False, False
idx = 2
while(idx <= int(math.sqrt(n))): for fi in xrange(idx*idx,n+1,idx): isprime[fi] = False idx += 1 while(not isprime[idx]): idx += 1 return [i for i in xrange(2,n+1) if isprime[i]] [/sourcecode] The difficulty in this problem is that we don't know the bound. In fact, you could take the problem to be "find the bound that you need to run the sieve to." So, I decided to build some initial string of primes, up to some chosen bound, and then keep extending my list until I had enough primes. If I already have a list of all the primes up to some prime $latex p$, then it's easy to get a list of all the primes up to $latex p^2$. When I'd run the sieve up to $latex p^2$, the primes, whose multiples I'd cross out, are all those up to $latex p$. So I just need to go through the primes I already know, cross out all their multiples, and I'll get up to $latex p^2$. [sourcecode language="python"] def extend(smpr): """ Take a list of primes, find primes up to the square of the largest Assumes the input list is in increasing order, and contains all of the primes up to the largest prime it contains (smpr[-1]). Return a list of the new primes """ origmax = smpr[-1] # lets make an isprime array, so that in the end # isprime[n] iff n+origmax+1 is prime # so basically, isprime indexes 0..origmax^2 with 0..origmax removed rngbound = origmax * (origmax - 1) isprime = [True for n in xrange(0,rngbound)] for p in smpr: st = 0 # starting index to start cross things out at if(not ((origmax + 1) % p == 0)): st = p*(((origmax+1)//p)+1) - (origmax+1) for idx in xrange(st,rngbound, p): isprime[idx] = False # remember to shift things to their appropriate value, before returning return map(lambda x:x+origmax+1, [n for n in xrange(0,rngbound) if isprime[n]]) [/sourcecode] So, I throw these two functions together, by first running "sieveto" on some chosen bound, and then continually "extend"ing the result until I've got a list that's big enough to tell me the $latex n$-th prime. [sourcecode language="python"] goal,smallb = 100, 50 primes = sieveto(smallb) while(len(primes) < goal): primes += extend(primes) print primes[goal-1] [/sourcecode] The real trick with this idea is picking the correct value for "smallb". The best value would be something just a little bit bigger than the prime we're looking for. Alternatively, something around the square root of the prime, or the fourth root, etc. The difference can be drastic. If our goal is to find the 100th prime, starting with a "smallb" of 28 takes about 350 times as long as starting with 29 (admittedly, we're talking less than half a second on my computer, even in the worst case). The reason is that starting with 28, we run "extend" twice, ending up with a list of the first 23925 primes (when we only needed the first 100). Running, instead, with "smallb" as 29, we only extend once, generating a list of 146 primes. Of course, starting with "smallb" at 541, we don't need to extend at all, since 541 is the 100th prime.