Posts Tagged ‘nah’

Problem 23 – Sums of Abundant Numbers

September 6, 2009

In problem 23, we are asked to find the sum of all of the numbers that are not the sum of two abundant numbers. We are given an upper bound on the numbers we’ll be summing, which is pretty convenient.

To solve this problem, I’ll first make a list of all the abundant numbers up to the bound, using a function I’ll call listOfAbundants. Then, I’ll tease out the numbers that aren’t sums of two abundant numbers as follows:

def solve():
    bunds = listOfAbundants(28123) # hardcoded upper bound
    bsums = set([i+j for i in bunds for j in bunds]) # numbers that are sums
    nonsums = set([k for k in xrange(1,28123) if not k in bsums]) # everybody else
    largest = reduce(lambda x,y:x>y and x or y, nonsums) # better upper bound?
    return sum(nonsums)

List comprehension is awesome. I think, possibly in a newer version of Python or something, one can avoid the “set([])” notation and use “{}”, known as “set builder notation”. But it didn’t work the first time I tried it on my computer, so I moved on (how’s that for persistence?). I have that line for the variable “largest” to see what the actual upper bound is, since the problem hints that 28123 isn’t a strict bound. It seems to work out that largest is 20161.

Now, that leaves the listOfAbundants function. I have a factor method I’ve used before, and I figure with a factorization it should be easy to sum divisors. As a reminder, the “factor” method I have returns an array of pairs (p,e), and the product of all of the p^e is the number. If I’ve got a method to sum divisors, call it sumDivs, then I can do:

def listOfAbundants(bound):
   """ Return a list of all abundant numbers less than bound """
   # we can start at 12, since that's the smallest abundant number
   return [k for k in xrange(12,bound) if sumDivs(k) > 2*k]

My first way to code how to find the sum of the divisors of a number is not very efficient, but I love how it uses list comprehensions, in concert with map/reduce. My idea was to first convert every pair (p,e) to the list [1,p,p^2,...,p^e]. Then, if a=[a_1,a_2,\ldots] and b=[b_1,b_2,\ldots] are lists of numbers, then sort of take the cartesian product a\times b, the map a\times b to integers by multiplication. So, e.g., [1,2,4] and [1,5] combine to [1,2,4,5,10,20]. Doing this iteratively (via reduce) and then suming the resultant list gives the sum of the divisors of the starting number:

def sumDivs(n):
    """ Returns the sum of the divisors of n """
    f = factor(n)
    # find all of the prime power divisors
    ppowdivs = map(lambda pe:[pe[0]**i for i in xrange(0,pe[1]+1))
    # combine them
    return sum(reduce(lambda x,y:[i*j for i in x for j in y], ppowdivs))

Of course, I’m using the property that says that the sum of the divisors of n, denoted \sigma(n), is a multiplicative function – meaning that if m and n are relatively prime then \sigma(m,n)=\sigma(m)\cdot \sigma(n). So if I know how to quickly compute \sigma(p^e), I can simplify the above code. I guess the idea has perhaps come up in a previous problem, but I also ran across it in my reading of Hardy and Wright’s “Introduction to the Theory of Numbers” book. Wherever you get it, \sigma(p^e)=\frac{p^{e+1}-1}{p-1}, the sum of the geometric series 1+p+p^2+\cdots +p^e. So, sumDivs simplifies a bit:

def sumDivs(n):
    """ Returns the sum of the divisors of n """
    f = factor(n)
    return reduce(lambda x,y:x*y,
                  map(lambda pe:(pe[0]**(pe[1]+1)-1)//(pe[0]-1), f))

So that’ll pretty much do it. But it feels wrong (not incorrect… just immoral or something). We’re factoring every single number up to the bound. Instead, we should be able to use some recursion, or some sieving, or something, to more quickly factor the whole interval of numbers we’re interested in.

And we don’t really need the whole factorization. If we know n=p^eq, where p is prime and p^e is the largest power of p dividing n, then \sigma(n)=\sigma(p^e)\cdot\sigma(q) (by multiplicativity). We have a formula (used above) for \sigma(p^e), and q<n so inductively we’ll have computed \sigma(q).

I’ve put together the following code to try to use this idea:

def sumDivsTo(bound):
    """ Returns an array, a, with a[i] = sumDivs(i), for i in [0,bound) """
    ret = [0 for i in xrange(0,bound)]
    ret[1] = 1
    for idx in xrange(2,bound):
        if(ret[idx] == 0): # idx is prime
            ret[idx] = idx+1 # 1 and idx divide idx
            m = idx # the smallest multiple we need to consider
            while(idx * m < bound):
                # tell our future selves (when idx=idx*m) that idx divides idx*m
                ret&#91;idx * m&#93; = idx
                m += 1
        else: # ret&#91;idx&#93; is the smallest prime dividing idx
            p = ret&#91;idx&#93;
            e = 1
            while(idx % p**e == 0):
                e += 1
            e -= 1 # our loop went one too far
            q = idx // p**e
            # now idx = p^e * q, (p,q)=1, p prime
            ret&#91;idx&#93; = (p**(e+1)-1)//(p-1) * ret&#91;q&#93;
    return ret
&#91;/sourcecode&#93;

I think there is still room for improvement here. When we run across a prime, instead of just noting that it divides all of it's multiples, I feel like there should be some clever way to note what power of that prime divides those multiples. Then we wouldn't have to find that power in the loop of the else clause. I guess this is mostly just shifting around when the loop happens, so the improvement might not be vast.

I do want to point out that this same idea can be used to produce all of the factorizations for numbers in some interval [1,b] without factoring each individually. Which is probably helpful at some point.

Now that we've got a more efficient way to find the sum of divisors for all numbers up to our bound, we can re-factor the solve method. The listOfAbundants method never really did much, we can wrap it up into solve in a line. What I'm envisioning now is to get rid of some of that ending code of solve, where we build all of the sums of two abundant numbers, and then basically take the set complement. Instead of doing all that, perhaps there is some improvement to be gained in looping through the numbers in our range, subtracting abundant numbers and seeing if the difference is still abundant. My solve method becomes:

&#91;sourcecode language="python"&#93;
def solve():
    bound = 28123 # the given upper bound
    sds = sumDivsTo(bound) # record all of the sums of divisors
    bunds = &#91;k for k in xrange(12, bound) if sds&#91;k&#93; > 2*k] # the abundants
    ret = 0
    for n in xrange(1,28123):
        i = 0
        isSum = False
        while(bunds[i] < n and not isSum):
            d = n-bunds&#91;i&#93;
            if(sds&#91;d&#93; > 2*d): # is n-bunds[i] abundant?
                isSum = True
            i += 1
        if not isSum:
            ret += n
    return [ret,t]

We could, of course, start our loop at 25, with a starting ret value 1+2+\cdots+23, since 24 is the first number that is the sum of two abundants (since 12 is the first abundant).

This code runs much faster than my first version, taking roughly one-fifth the time. With this faster solve, avoiding the set nonsense, the first version seems to take about 133% as much time as this new version. That is, the big improvement in this problem came from the better solve method, not the better sumDivsTo method.

Whoops! 11 Redux

September 4, 2009

For some reason, this week I decided to actually sign up for an account on projecteuler.net, our source of programming exercises. Then I went back and entered the answers I’d gotten for my past solutions, and did pretty well. However, I messed up number 11 (in 4 versions!). I thought perhaps I should perhaps some corrected code.

I had two things incorrect. First, my inner loop wasn’t going far enough (shame!). Second, I was only accounting for one type of diagonal (shame!). Here’s my corrected (hopefully!) code:

def solve(grid):
    t = time.time()
    print grid
    ret = 0
    diffs = ( (1,0), (0,1), (1,1), (1,-1) )
    for i in xrange(0,len(grid)):
        for j in xrange(0,len(grid[0])):
            for dt in diffs:
            dx,dy = dt
            try:
                p = 1
                for k in xrange(0,4):
                    p *= grid[i+dx*k][j+dy*k]
                    ret = max(ret, p)
            except:
                pass
    t = time.time() - t
    return (ret, t)

I feel like I’m cheating a little when I use try-catch blocks like that. I feel like I should just index into the array to places I know exist, instead of relying on try to sort it out otherwise. But I also like not worrying about indexing (it was the source of one of my errors, after all). I also like looping over the “diffs”, to compute the product across, or down, or diagonally (either diagonal).

Now, on to new problems! I hope to post a solution to at least one this evening, before the (well-extended) deadline.

Problem 22

July 19, 2009

Just got back from dinner, depressed because a friend wanted to have a conversation about politics and things. Finally decided maybe some programming would pull me out of it. I’m not convinced it did, but whatever. It had the best shot of doing so.

In Problem 22 we’re supposed to read in a file, pull out the words between quotes, sort them, do some computation on each string to convert it to an int, and then take the sum of all of those words times their index in the list. I thought I’d see how little actual work I could do, which seems to translate to using lots of maps and lambda functions and such.

After a “from __future__ import with_statement”, it’s pretty easy to read in files:

def readfile(filename):
    ret = []
    with open(filename) as file:
        for line in file:
            ret += line.split(",")
    return ret

The given file has all of the names on one line, so the loop doesn’t last long. Next up, convert that list to a bunch of numbers, and sum them up, with an appropriate product:

def process(names):
    names.sort()
    scores = map(lambda s:sum(map(lambda x:string.ascii_uppercase.index(x)+1,s[1:-1].upper())), names)
    return sum([(i+1)*n for i,n in enumerate(scores)])

Perhaps some explanation is in order here. The first line, sort(), is pretty clear, I hope. In the next line, s[1:-1] strips the quote marks off of the name, and then the call to upper() converts all the characters to uppercase. This is probably superfluous, because the names all seem to be in uppercase to begin with, but better safe than sorry. Then I use the ascii_uppercase string provided by the string module, as documented here. This lets me easily convert letters to their position in the alphabet (remembering to add one because indices in strings start at 0). So my “scores” list above contains what the problem instructions call the “alphanumeric value” of each of the names. In the last line, I use the enumerate function (some documentation) to give me pairs: (index, value) in the list of scores (remembering, again, to add one to the index).

So that was fun.

Problem 20 – One More Line

July 17, 2009

I love one-liners. The sum of the digits of n factorial:

print sum(map(int, str(reduce(lambda x,y:x*y,xrange(1,n+1),1))))

This problem is pretty similar to Problem 16, which I also wrote about. There are probably improvements to be made. Like, you might as well start the xrange at 2, instead of 1. I can’t think of a solution that would be faster to code though.

Problem 18 (and 67) – Bottoms Up

July 10, 2009

My first idea in simplifying problem 18 (finding a path with maximum sum through a given triangle) was to note that going left then right (LR), you’d end at the same place as going right then left (RL). So if you want maximum sums from some point, you should process the sums LL, LR=RL, and RR separately. I wrote up my first solution pretty quickly, and it gave the right answer for the small sample input. Then I wanted to test its answer on the larger input, so I wrote the brute-force solution that checks all paths. It turned out I made an error in my initial solution (it would never try LLR), and thinking about it a little more turned up what I believe is the correct (and quick) solution.

Instead of working from the top down, work from the bottom up, inductively. At any point of the bottom row, you know the best path to the bottom (since you’re already there). Moving up a row, at any index of the second row from the bottom, you should pick the max of going left or going right, and that’ll be the best path to the bottom. Inductively, if you know the best path starting at any point of some row, then you can easily compute the best path for each index in the next row up. Here’s my code:

from __future__ import with_statement

import sys, time

def readfile(filename):
    """ Read in a file, return a list of (lists of) integers """
    ret = []
    with open(filename) as file:
        for line in file:
            ret.append(map(int,line.split()))
    return ret

def solve(grid):
    """ Find the path with the maximum sum
    Returns: the sum, the path string of L and R, time taken
    """
    t = time.time()

    lowerrow = map(lambda n:(n,""), grid[-1])
    rowidx = len(grid) - 2

    while(rowidx >= 0):
        currow = []
        for col in xrange(0, len(grid[rowidx])):
            if(lowerrow[col][0] >= lowerrow[col + 1][0]):
                lowsum, lowstr = lowerrow[col]
                currow.append((lowsum + grid[rowidx][col], "L" + lowstr))
            else:
                lowsum, lowstr = lowerrow[col + 1]
                currow.append((lowsum + grid[rowidx][col], "R" + lowstr))
        lowerrow = currow
        rowidx -= 1

    rsum, rpath = lowerrow[0]

    t = time.time() - t
    return (rsum, rpath, t)

if __name__ == "__main__":
    print solve(readfile(sys.argv[1]))

Problem 15 – Counting

July 3, 2009

When I first read problem 15, about counting the number of paths from one corner of a grid to the opposite corner, I knew there was some clever counting way to do it. I couldn’t quite remember what the clever way was, but I thought about it for a while, and remembered it.

Any path from the upper left to lower right corner consists of a sequence of moves right or down. We can denote any path as a sequence of Rs and Ds. In an n\times m grid, to start in one corner and go to the other, there must be exactly n Rs, and m Ds. So we can think of a path as a string of Rs and Ds, whose length is n+m, and so that exactly m of the letters are D (equivalently, exactly n are Rs). So we have n+m “spots” in which to put m Ds, meaning there are \binom{n+m}{m}=\frac{(n+m)!}{n!m!}=\frac{(m+1)\cdots(m+n)}{n!} paths. This gives us a constant-time (essentially) algorithm for computing the number of paths.

I didn’t give this problem too much more thought after that. Moving and finishing teaching a summer class occupied my time. But then Friday rolled around, and nobody else has written about this problem, so I thought I should. As I started coding, I realized there were some interesting things to say. Well, interesting when you’ve been grading Calculus exams for a few hours.

Here’s the code I wrote that computes the number of paths using the second formula above (which has fewer multiplications than the first):

import sys,time

def solve(n, m):
    """ Number of paths in through a grid 

    Return (number of paths, time to compute)
    """
    t = time.time()
    prod = lambda x,y: x*y
    prange = lambda x,y: reduce(prod,xrange(x,y+1),1)
    fact = lambda y: prange(1,y)
    ret = prange(m+1,m+n) // fact(n)
    t = time.time() - t
    return (ret, t)

if __name__ == "__main__":
    args = map(int, sys.argv[1:3])
    print solve(*args)

There’s a few things to note here. The first is apparent in the code, the “*args” in the last line. What this does is “unpack” a tuple. The line above it makes a list (of length two), and I want to pass this to my solver function. But my solver function takes two arguments, not a list of length two. So I use the “*” to unpack my list. Pretty nice syntactic sugar. For (slightly) more on this topic, see the tutorial.

The other thing to notice only becomes apparent when you run the code. The value that gets printed as the answer (not the time part, but just the number of paths part), ends with an extra “L”. This indicates that the answer is a long integer. This happens, with this code, even if the final answer isn’t really that big. The in-between steps are big enough to cause the long integer to show up.

Next, I thought I’d see how easy it was to code up a solution that didn’t use this counting argument, but just counted the number of paths recursively. If you have an n\times m block, and are in the upper left, you can move either right (making an (n-1)\times m block), or down (making an n\times (m-1) block). So the total number of paths is the sum of the number of paths of these smaller blocks. There’s a base case when one of the parameters is 1. Then you have a 1\times m block, say, and there are m+1 routes through it (m+1 choices for when to move down). Here’s the essential bits of the code I wrote:

def worker(n,m):
    if n == 1 or m == 1:
        return max(n,m) + 1
    return worker(n-1,m) + worker(n,m-1)

But this take a loong time to run. The solution using the counting argument takes less than 0.0001 seconds to run on my computer (on the 20×20 input of the stated problem). This second solution takes… minutes (enough time to write, or at least edit, a post). Thinking about why, while the program continues to churn, we realize some improvements can be made. For starters, \text{worker(n,m)}=\text{worker(m,n)}. To use this, I figure I’ll store results as I compute them, so that I can just use them (instead of re-calculating them) if I need them again later (rather like my solution for problem 14). Here’s what I rigged up:

class Worker:
    def __init__(self):
        self.mem = {}
    def __getitem__(self,t):
        sm,lg = min(t), max(t)
        if (sm,lg) in self.mem:
            return self.mem[(sm,lg)]
        if sm == 1:
            self.mem[(1,lg)] = lg + 1
            return lg + 1
        ret = self[(sm-1,lg)] + self[(sm,lg-1)]
        self.mem[(sm,lg)] = ret
        return ret

def solve(n, m):
    w = Worker()
    return w[(n,m)]

This is still slower than the counting-trick solution, by more than a factor of 10 for the 20×20 input. Still, that’s fractions of a second, a big improvement on the code without any memory. Plus it lets me play with defining classes, and operator overloading. The function “__getitem__” in my class allows me to use the [] notation on an object of my class. I think I kinda like the way c++ does this better, where you define a function “operator[]”. That way you don’t have to remember the name of the function. But whatever, this works. Now that I’ve done it once, I imagine I’ll be able to remember the name.

Problem 16 – One Liner

July 1, 2009

I spent a little while trying to come up with some inductive (or other clever) way to compute the sum of the digits of 2^n, without just implementing multiplication by 2. Mostly I did this because I figure 2^{1000} is pretty big, and maybe having a script compute that number would cause some sort of issue. Not coming up with anything particularly clever, I thought I’d try just a naive approach: compute 2^{n} directly and then sum the digits. Here’s the one-liner (not counting “import sys”) I came up with:

print sum(map(int,str(int(sys.argv[1]))))

Delightful. Even better – my initial concern that something would go wrong (take a while, use up too much memory) was apparently unfounded. This script ran without any noticeable difficulty in just a fraction of a second. It also did fine with a power of 10,000 and 100,000 (taking a little more than a second here). For 1,000,000, it took close to 2 minutes. This still makes me wonder if there’s a clever way to solve this problem quickly for big powers.

I thought I’d see what the function f(n)= the sum of the digits of 2^n looked like. Here’s a sampler:

Kinda fun, I think. Then I thought: how about just g(n)= sum of the digits of n. I encourage you to spend up to 10 seconds thinking what this graph should look like.

Problem 13 – Too Easy?

June 26, 2009

I’m a little confused about problem 13. I wanted to come up with some nice way to find the first (leftmost) 10 digits of the sum. The only way I came up with was to actually do the sum though. I thought maybe that coding this up would point out something I hadn’t thought about that made the problem more difficult. Like maybe the 50 digit numbers were too big to store as integer values, or the sum was, or something. So I just coded up the naive solution to see what would happen:

from __future__ import with_statement

import sys, time

def readin(filename):
    """ Input a file, reading into a list of integers """
    ret = []
    with open(filename) as file:
        for line in file:
            ret.append(int(line))
    return ret

def solve(nums):
    """ The leading 10 digits of the sum of the numbers in the list nums """
    return str(sum(nums))[0:10] # loving this syntax

if __name__ == "__main__":
    t = time.time()
    ans = solve(readin(sys.argv[1]))
    t = time.time() - t
    print solve(readin(sys.argv[1])), t

I ran that, and it spit out an answer in about 0.0004 seconds.

So, am I missing something from this exercise, or is it that easy? Is there a more clever way to do it anyway?

Problem 14 – A Class

June 26, 2009

I was thinking a little bit more about my solutions to problem 14, and thinking that passing around the “mem” dictionary wasn’t particularly efficient. I didn’t want to make it a global variable, because I have a sort of stigma against those. I think, maybe, in Python it’s a stigma I should/could get over, because the file named prob14_v3.py that contains my script is actually a module (Python does that bit for me, no extra work on my part) that gets its own namespace without any extra work from me. But I could be wrong about that.

So, anyway, I thought maybe I’d try wrapping up my code into a class, and making the dictionary an attribute of that class. Here’s what I coded up:

class ProblemSolver:
    def __init__(self, bound = 1000000):
        self.mem = {1:1}
        self.bound = bound

    def lenseq(self, n):
        if(n in self.mem):
            return self.mem[n]
        suc = n%2 and 3*n+1 or n//2
        self.lenseq(suc)
        self.mem[n] = self.mem[suc] + 1
        return self.mem[n]

    def solve(self):
        retn,retl = 0,0
        for n in xrange(1, self.bound):
            l = self.lenseq(n)
            if(l > retl):
                retn,retl = n,l
        return (retn,retl)

if __name__ == "__main__":
    solver = ProblemSolver(int(sys.argv[1]))
    print solver.solve()

I’ve not done much in the way of rigorous performance testing, comparing this script with my other solution from yesterday, or with one that just uses a global variable to store the dictionary. A few quick tests indicate that they take approximately the same time to run.

Problem 14 – 3n+1

June 25, 2009

The “3n+1 problem” is one I mention to my calculus class every time we get to sequences. It’s just so fun.

I’ve coded up a few solutions. My first is a basic one where I have a function that takes a number and generates the sequence starting at that number. Then I just loop up to 1000000 looking at the lengths of these sequences.

def seq(num):
    """ Returns the 3n+1 sequence for num """
    ret = [num]
    while(not ret[-1] == 1):
        ret.append(ret[-1]%2 == 0 and ret[-1]//2 or 3*ret[-1]+1)
    return ret

def solve(bound):
    """ Find the number less than bound with largest 3n+1 sequence

    Returns the number, the length of its sequence, and the computation time
    """
    t = time.time()
    retn,retl = 0,0
    for n in xrange(1,bound):
        s = seq(n)
        if(len(s) > retl):
            retn,retl = n,len(s)
    t = time.time() - t
    return ((retn,retl), t)

if __name__ == "__main__":
    print solve(int(sys.argv[1]))

This can be moderately improved by not passing around the entire sequence. Instead, just code up the helper function (“seq”, above) to return the length, and use it accordingly in the main loop (“solve”, above). I’ll not bother copying that code here.

The only real improvement I’ve made on this is to store values as we find them, so we can look them up again later. Here’s the way I coded that up:

def lenseq(n, mem):
    """ Recursively calculate the length of n using a memory dictionary

    This function will modify mem, adding new values to it
    """
    if(n in mem):
        return mem[n]
    suc = n%2 and 3*n+1 or n//2
    lenseq(suc, mem)
    mem[n] = mem[suc] + 1
    return mem[n]

def solve(bound):
    """ Find the number less than bound with longest 3n+1 sequence

    Return ((the number, the length of its sequence), computation time)
    """
    t = time.time()
    retn,retl = 0,0
    mem = {1:1}
    for n in xrange(1,bound):
        l = lenseq(n, mem)
        if(l > retl):
            retn,retl = n,l
    t = time.time() - t
    return ((retn,retl),t)

if __name__ == "__main__":
    print solve(int(sys.argv[1]))

And I did a little performance comparison, because it was easy. In the chart below, “Basic” is the first set of code above, “BasicPassLength” is the modification that doesn’t pass sequences around, just their lengths, and “StoreLengths” is the second set of code above.

The scale is log in both axes. The tickmark, n, on the x-axis refers to an upper bound of 10^n (the last tickmark, then, being the stated problem). The y-axis is less precise, but it’s along the lines of the log of the time taken, in seconds. The “Basic” code took about 2.5 minutes on the stated problem (bound of 1000000), “BasicPassLength” about 1.5 minutes, and the “StoreLengths” took about 20 seconds.

Just out of curiosity, I checked how many values ended up getting stored. There were 2168611 values stored, ranging up to 56991483520 (which, in case you are curious, has length 174). I haven’t thought up an obvious way to trim down which values get stored (maybe don’t do those above the bound?), or other ways to improve on this solution.