Archive for May 17th, 2009

Problem 1 and Sage

May 17, 2009

In my previous post I mentioned that much of the work I did by hand in the second solution for problem 1, taking all subset of divisors and computing the appropriate sums without constructing sets of multiples, could be done behind the scenes in Sage. I’d like to share that solution now.

# 1 + 2 + ... + n
def sumton(n):
    return n * (n+1) / 2

# sum of multiples of n up to and including b
def sumults(b,n):
    return n*sumton(b//n)

bound = 1000 - 1 # subtract 1 to not include the bound
divs = Set([3,5]) # add as many divisors as you want here
sum = 0

for ss in powerset(divs):
    if(len(ss) > 0):
        sign = (-1)**(len(ss) - 1)
        sum += (sign * sumults(bound,lcm(ss)))

print sum

Notice that lcm is built-in in Sage, so you can save some writing with that. The main improvement comes from the powerset function, so you don’t need to mess with binary strings. According to the documentation, the subsets show up in no particular order, but when I run it the empty set is first. It seems there should be some cute way to not include the empty set at all, and thus remove the if-switch in the for-loop. I can’t quite see what that method would be though, and I guess we wouldn’t technically be allowed to assume the empty set shows up first, based on the documentation. I expect some playing around with iterators, and assuming the empty set comes first, could do it.

I thought I’d also share a helpful website: sagenb.org. There, you can sign up for a free account and have access to sage (and any saved work you want) online. Pretty handy. Also saves you installing it.

Advertisements

Problem 1 – nah

May 17, 2009

Ok, I know problem 1 has been solved several times. But I’ve continued messing with it, and various versions of my programs, and comparing run-times, and so I thought I’d share.

To run my python scripts from the command line, and do performance testing, it’s helpful to have the scripts take arguments. The natural arguments for problem 1 are the upper bound and the list of divisors. So I’d call my program as “python script.py 1000 3 5”. This lets me quickly change the upper bound, or the list of divisors, outside of the code. So, how do I get at those variables in python? I found out from this page at diveintopython.org (which I should probably poke around more).

The first solution I’d like to test uses the idea of my first solution: make a set representing all the multiples, and then simply sum the set. My code is:

import sys, time
from math import ceil

t = time.time()

# the first argument is the upper bound (excluded from the sum)
# python treats all the arguments as strings, so we have to convert to ints
bound = int(sys.argv[1]) - 1

# the remaining arguments are the divisors, convert them all to ints using map
# i like the "slice notation" python uses for sub-lists
divs = map(int, sys.argv[2:])

multiples = set()

for n in divs:
     s = set(n*i for i in xrange(1, bound//n + 1))
     multiples |= s # faster than = multiples | s

t = time.time() - t

print sum(multiples), "in", t, "sec"

I should mention that Chris’ recent code pointed out to me that you can just subtract one from the upper bound, and then do no any extra checking about including it or not.

Next up, we use the idea Chris has explained about directly computing the sum, instead of messing about making lists of multiples. We should expect this to be an essentially constant-time algorithm (in the size of the upper bound, anyway, not the number of divisors) since there is no looping involving the bound. Since I want to allow any number of divisors, I’ll have to iterate over subsets of the divisors, which are conveniently in 1-1 correspondence with with binary strings with length number-of-divisors. Anyway, here’s what I came up with:

import time,sys
t = time.time()

# reduce(gcd,list) to get gcd of more than 2 numbers
def gcd(a,b):
    if(b == 0):
        return a
    return gcd(b,a % b)

# reduce(lcm,list) to get lcm of more than 2 numbers
def lcm(a,b):
    return a*b / gcd(a,b)

# 1 + 2 + … + n
def sumton(n):
    return n * (n + 1) // 2

# sum all multiples of n up to and including b
def summultiples(b,n):
    return n * sumton(b // n)

# convert int to binary string, from
# http://code.activestate.com/recipes/219300/
# sweet lambda function
bstr_ = lambda n: n>0 and bstr_(n>>1)+str(n&1) or ”
# convert n to binary string, with width at least w
# padded on the left with 0s
def tobin(n, w):
     base = bstr_(n)
     spaces = w – len(base)
     if(spaces < 0):          spaces = 0      return spaces*"0" + base # nice syntax... spaces copies of "0" # first argument is the upper bound (excluded) bound = int(sys.argv[1]) - 1 # the remaining arguments are the list of divisors divs = map(int,sys.argv[2:]) sum = 0 # for every non-empty subset of divisors for n in xrange(1,2**len(divs)):      bstr = tobin(n, len(divs))      height = bstr.count('1')      sign = (-1)**(height - 1) # appropriate minus signs      thesedivs = [] # represents this subset of divisors      for p in xrange(1,len(divs) + 1): # let's play with negative string indexing # making the 2^p digit correspond to the p-th divisor in divs if(bstr[-p] == '1'): thesedivs.append(divs[p - 1]) sum += (sign * summultiples(bound,reduce(lcm,thesedivs))) t = time.time() - t print sum, "in", t, "sec" [/sourcecode] This entire solution, by the way, can be done in very few lines with sage, since it has more built-in commands. More on that another day though, I guess. Anyway, I then did some performance testing, for grins. For bounds between 1000 and 10000, in multiples of 500 (so, 1000, 1500, 2000, ... , 10000), I ran each of the above scripts 50 times, and calculated the average runtime. I'd post my shell script here, but it's nasty. If I continue trying performance testing, and clean it up a bit, perhaps I'll write a post. In the mean time... So I took all the numbers out, and played with the Google Charts API for a bit, and came up with a decent graph:

Performance Test Comparison

Performance Test Comparison

I’m tired of messing with it, so I’ll just tell you here that the scale on the x-axis is thousands, and the scale on the y-axis is milliseconds. Now that I’ve done this once, it might go quicker in the future, but don’t count on seeing too many more of these from me.

One final word – a wordpress tip I just stumbled upon accidentally. After doing the “sourcecode” tag, and copying your program in, if you click the “HTML” button toward the top of the editor box, and then click back on “Visual”, things work out more nicely, in terms of editing the sourcecode. At least, they did for me.

Update 20090518: Corrected the lack of links to Chris’ post.