## Archive for May, 2009

### Typical – I’m late

May 30, 2009

Here are my solutions from the last set that I never got around to posting:

Problem 3 – I guess I just printed the list and then took the biggest here.


def divisors(n):
divs = set()

for i in range(2, int(math.sqrt(n)) + 1):
if n % i == 0:

return divs

def isPrime(n):
return len(divisors(n)) == 1

divs = list(divisors(600851475143))
divs.sort()
divs.reverse()

for div in divs:
if isPrime(div):
print div



Problem 4 –

def isPalidrome(num):
numstr = str(num)
for i in range(len(numstr)):
if numstr[i] != numstr[-i-1]:
return False
return True

i = 999
j = 999
prod = i * j
biggest = 0
big_i = 0
big_j = 0

while prod > biggest:
while prod > biggest:
if isPalidrome(prod):
biggest = prod
big_i = i
big_j = j
break
j -= 1
prod = i * j
i -= 1
j = i
prod = i * j

print "Biggest is: ", big_i, " x ", big_j, " = ", biggest


Problem 5 – Okay, this one is cheap, but Carolyn and I figured out that it was super easy to just count which divisors you needed, so there wasn’t really much coding to do. 🙂

print 20*19*9*17*4*7*13*11


Problem 6 – Hmm. Don’t see where I put the source to this one, so maybe I skipped it. I’ll post it with the solutions to next (i.e. this) week’s problems.

Cheers.

### Problem 4

May 28, 2009

The code Chris posted to solve problem 4 loops over the larger factor as its outer loop, and then the smaller factor as its inner loop. I’d like to share a solution that flips this order, and another solution coming at this problem from a different angle.

I haven’t run across a string.reverse in python, so all the solutions below use the following ispali function:

def ispali(n):
""" determine if input is palindromic, iteratively """
s=str(n)
idx=0
while(idx <= len(s)//2):
if(s&#91;idx&#93; != s&#91;-1-idx&#93;):
return False
idx += 1
return True
&#91;/sourcecode&#93;

Not that it really matters I guess, but I expect this to be twice as quick as comparing a string with its reverse anyway. You only need to go half way through a string to know if its a palindrome or not. But whatever. While I was thinking about how to write an ispali function, I also came up with the following (which needs an "import types" line)

&#91;sourcecode language="python"&#93;
def ispali(n):
""" determine if input is palindromic, recursively """
return type(n)==types.IntType and ispali_rec(str(n)) \
or type(n)==types.StringType and \
(len(n) < 2 or n&#91;0&#93;==n&#91;-1&#93; and ispali_rec(n&#91;1:-1&#93;))
&#91;/sourcecode&#93;

Recursive functions, FTW! I guess there's not much point in getting it all in one statement (but not one line, that's what the '\'s are for), besides some sort of fun. I was reading <a href="http://diveintopython.org/power_of_introspection/index.html">Chapter 4 of Dive Into Python</a>, and learning about the <a href="http://diveintopython.org/power_of_introspection/built_in_functions.html#d0e8510">type</a> function, and <a href="http://diveintopython.org/power_of_introspection/and_or.html">how 'and' and 'or' work</a> in python (which is interesting, you should check it out) when I started coding this function. (This chapter also inspired me to re-write my performance testing script in python, and make each of my scripts modules... I can say more about this another time if anybody is interested).

So, anyway, on to new solutions for the actual problem. First, flip our nested loops, looping over the smaller factor for the outer loop.

def max_for_pali(smaller, bound):
""" determine largest n in [smaller,bound] with smaller*n a palindrome

return a tuple with 1 element, n, if such an n exists
return an empty tuple if no such n exists
"""
ret = bound
while(ret >= smaller):
if ispali(ret * smaller):
return (ret,) # tuple of size 1, needs trailing ','
ret -= 1
return ()

def solve(digits):
""" the main solver function """

# the bounds
lower_b = 10**(digits - 1) + 1
upper_b = 10**(digits) - 1

# the biggest product, and its two factors
max_pali, sm_fact, lg_fact = 0, 0, 0

smaller = upper_b
while(smaller >= lower_b):
if(smaller * upper_b < max_pali):
break
for m in max_for_pali(smaller, upper_b):
p = m*smaller
if(p > max_pali):
max_pali, sm_fact, lg_fact = p, smaller, m
smaller -= 1

print "%d*%d=%d" % (sm_fact, lg_fact, max_pali)


I’m amused with the return value of the max_for_pali function, and its use in the main solve function. First of all, you might notice that to define a tuple of length one, you have to include a trailing comma. Otherwise it gets treated as a single object, not a tuple-worth of a objects. Then, in the main solve function, I loop over all solutions returned by max_for_pali – but there’s only ever 0 or 1 solution! I thought this was a kinda fun way to avoid an if(len) sort of statement. Of course, there’s no real call for defining the function max_for_pali anyway, but I like it.

This solution also demonstrates a way to make strings that, if memory serves, has not been used in any post here yet. The ‘%’ operator works on strings, taking a formatting string (on the left) and a tuple (on the right). The motto seems to be that this is like printf in c.

The next solution is a little different, in that its outer loop is looping over the palindromes themselves, and the inner loop is to find the factors. My first thought about this approach was that it wouldn’t be very efficient, so I wasn’t even going to bother with it. But I thought about it some more and decided to code it up, and I’m glad I did.

def make_pali(n):
""" make a palindrome with leading digits n """
s = str(n)
l = len(s)
for i in xrange(0,l):
s = s + s[l-1-i]
return int(s)

def factors(n, d):
""" try to find a nice factorization of n with d digits per factor """
i = int(math.sqrt(n))
j = n//i
while(j < 10**d and i >= 10**(d-1)): # expect the first test to fail first
if(i*j == n):
return [i,j]
i -= 1
j = n//i
return []

def solve(digits):
""" main function """

max_pali, prod = 0, []

prod = factors(max_pali, digits)
if(len(prod)):
break

print "%d*%d=%d" % (prod, prod, max_pali)


This solution has the advantage of stopping as soon as the answer is found, instead of worrying about if there is another combination of factors that’ll give a larger palindrome than one already found.

While testing these scripts, I noticed some interesting patterns. Here are what the above scripts have given me as the largest palindomic products, based on the number of digits of the two factors:

 1: 3 * 3 = 9
2: 91 * 99 = 9009
3: 913 * 993 = 906609
4: 9901 * 9999 = 99000099
5: 99681 * 99979 = 9966006699
6: 999001 * 999999 = 999000000999
7: 9997647 * 9998017 = 99956644665999
8: 99990001 * 99999999 = 9999000000009999

This points out a few things. First of all, my second solution, as posted above, actually fails on the first input. That’s because there’s a built-in assumption there that the maximum palindromic product of two $n$ digit numbers will be a $2n$ digit number. There also seems to be a pattern to the factors, at least for even inputs – the larger factor is a string of 9s. Does that continue? Apparently NOT! Here’s some more values (these took a while – more on performance later):

 9: 999920317 * 999980347 = 999900665566009999
10: 9999986701 * 9999996699 = 99999834000043899999
11: 99999943851 * 99999996349 = 9999994020000204999999

With these values determined, one can go back and make another improvement or two (which we could probably have guessed at the beginning), if we allow ourselves some assumptions. Specifically, we now expect that the biggest palindrome will begin with a 9, and therefore end with a 9. Since we’re looking at its factors, we can rule out any factors that are even, or multiples of 5, since neither of these will ever give a product ending in 9.

Ok, you all know by now that I can’t resist performance comparisons, so here’s a graph: The tickmarks on the x-axis are for the number of digits of the factors, and the y-axis is actually the logs of the times (in some reasonable scale). “LoopS” refers to the solution whose outer loop is over the smaller factor (my first solution above), and “LoopL” loops over the larger factor in its outer loop (my own implementation of Chris/Eric’s solution). “Factor” is my second solution above, and “FewFactor” has a couple of lines added to Factor to ignore multiples of 2 and 5.

What I find interesting here is that LoopS and LoopL switch back and forth between which is faster. Looking back at how the factors are coming out, this probably seems reasonable.

I couldn’t resist going out to a few more digits (I took it to 11 above), but only with the FewFactor solution, which seemed to be winning out, timewise. Here are the times it took to run for those higher digits:

 7: 7 seconds
8: 37 seconds
9: 3671 seconds ~ 1 hour
10: 298 seconds ~ 5 minutes (lots quicker than 9 digits!)
11: 5402 seconds ~ 90 minutes
12: ?

I finally gave up on 12 digits after running for nearly 30 hours.

### Problem 3 – Factoring

May 28, 2009

Problem 3 asks for the largest prime factor of a given integer. We could spend as much time as we wanted looking up factoring algorithms. I don’t really want to (I’d start at Numerical Recipes, since I already read those first few factoring algorithms). I noticed that sage has a factor command, so you can solve this problem in one line: print factor(whatever).

Well, not quite, because that prints out the whole factorization. For example, “print factor(18)” spits out “2 * 3^2”. We are only looking for the ‘3’.

Ok, so let’s split the string. There’s a handy function, called split, that you can use on any string to break it up into a list. If no argument is given, the string is split on spaces. If an argument is given, the string is split on that argument. So, for example


print "Hello  World".split()
print "Hello  World".split("or")



Will print out:

['Hello', 'World']
['Hello  W', 'ld']

Notice that the first line pulled out both spaces, and didn’t assume there was an empty string between them. If you want that behavior, you just need to do split(” “).

So, back to the problem. It might be tempting to try


print factor(18).split(" * ")[-1].split("^")



to split the factorization into a list, grab the last term (presumably the largest), then split that on “^” (in case it’s a power, like for factor(18)), and just take the base of exponentiation. That’d be pretty sweet.

The problem, as you might have guessed by now, is that factor(18) doesn’t return a string object (though, you could use str() to convert it to a string, and then the line above would work). It only looks that way, when we print it. What really gets returned? According to the documentation, we get a Factorization object out. Looking at that documentation, we can index factors of a Factorization exactly like it were a list of tuples (base, power). So factor(whatever)[-1] will solve problem 3 (assuming the [-1] index actually does pull out the biggest base, which I think is the case (there’s a sort method on Factorization objects, if not)).

If you get tired of the ring of integers, there will be Factorization objects waiting for you when you move to another ring.

### Problem 5 and 6

May 25, 2009

Hey guys, sorry I’m late to the party. Figured I’d point out the obvious… In problem 5, we just need to calculate the product of the highest p-th power less n. (Since we’re coding, we should generalize the problem to finding the least integer divisible by all the numbers 1 to n.) I’ve got C code…

In problem 6, we can just use the formulas for sums of integers, sums of squares, and do arithmetic… Unless I’m missing something…

Jimbo

(Source updated after Nick (sumidiot) pointed out that I was too quick with my inequalities…)

#include
#include

int isPrime(int);

int main(int argc, char **argv)
{
int i, n;
int product = 1;
int i_power;

n = atoi(argv);

for(i=2; i<=n; i++) { // For Each prime, it finds the largest power of that primes less // than n and multiplies it into the answer. if(isPrime(i)) { i_power = i; while(i_power*i <=n) { i_power = i_power*i; } product = product * i_power; printf("i_power: %d product: %d\n", i_power, product); } } printf("The smallest number divisible by 1 to %d is %d\n", n, product); return(0); } int isPrime(int a) { int i; if(a==2) return(1); if(!(a%2)) return(0); for(i = 3; i <= sqrt(a); i=i+2) { if(!(a%i)) return(0); } return(1); } [/sourcecode]

### Followup to BiggestPalindrome

May 23, 2009

I found a way to shave my running time down from 5.75 seconds to about 0.1 seconds. Basically, I just eliminated a lot of unnecessary multiplications and palindrome checks.


NotPalindrome[x_] := (y = ToString[x]; y != StringReverse[y]);

BiggestPalindrome[n_] :=
(x = 1;
For[i = 10^n - 1, i > 10^(n - 1) - 1, --i,
For[j = i; m = i*j, j > 10^(n - 1) - 1 && NotPalindrome[m] && m >= x, --j, m = i*j];
If[m >= x, x = m]];
x)


It takes about 0.21 seconds to compute the largest palindrome that is the product of two four digit numbers, and about 19.5 seconds to compute the largest palindrome that is the product of two five digit numbers.

### BiggestPalindrome

May 22, 2009

Once again, I cheated by using Mathematica instead of one of the two sanctioned computing environments (Python or Sage). To test whether or not a number was a palindrome, I used Mathematica’s built-in abilities to convert an expression into a string, and to reverse the characters in any given string. Here is my code (it is based on an algorithm I remember Eric describing last week):

Palindrome[x_] := (y = ToString[x]; y == StringReverse[y]);

BiggestPalindrome[n_] := (x = 1;
For[i = 10^n - 1, i > 10^(n - 1) - 1, --i,
For[j = i, j > 10^(n - 1) - 1, --j, m = i*j;
If[Palindrome[m] && m >= x, x = m]]]; x)


Typing BiggestPalindrome[n] will give the largest palindrome that is the product of two n-digit numbers.It took Mathematica about 5.75 seconds to compute the solution when n=3. I asked Mathematica to give me the answer for n=4, but I got tired of waiting for it to finish.

I’m sure there must be a way to cut down on the number of loops to go through. If there is, I’m sure somebody else will produce it.

### Moving On

May 22, 2009

So, I was updating the sidebar links for next week’s problems, and looked ahead at the two after that as well, and thought, “We could easily do all of those in a week”. In my mind, one of them is trivial (lcm), and one is… extensive (factoring). That leaves two that seem pretty reasonable for a week (and, I hope, neither particularly difficult).

So let’s shoot for problems 3-6 next week. If I misjudged interest in these problems, we can always leave some on the bill for the following week as well.

Also, if you want to see messages like this in the future, about when the assignment gets updated, leave a comment below. Unless there’s some noteworthy change to the schedule, I’ll just keep updating the sidebar every Friday afternoon/evening (Saturday if it works out that way) without any more word on it.

### Problem 2 – various approaches

May 20, 2009

Allow me to begin with a minor improvement of the standard while-loop solution to problem 2. Noticing that every third Fibonacci number is even, we might as well only calculate $F(3n)$, if possible. Of course, Fibonacci numbers naturally come in adjacent pairs, so perhaps it’d be easy enough to calculate $b(n)=F(3n)$ and $c(n)=F(3n+1)$. With starting values $b(0)=0,c(0)=1$, it’s easy to determine the recurrence $b(n+1)=b(n)+2\cdot c(n)$ and $c(n+1)=2\cdot b(n)+3\cdot c(n)$. Now we loop over the $b(n)$, until they’re past the upper bound (4000000 in the problem as written). Coded up, I’ve got it as follows (I trimmed some import and timing and print lines out):

b,c = 0,1
sum = 0
bound = int(sys.argv)

while(b <= bound):
sum += b
b,c = b+2*c, 2*b+3*c
&#91;/sourcecode&#93;

In my mind, I expect this to be marginally faster than the while loop as used in other solutions, because we're essentially doing all the same arithmetic, but only 1/3 as many tests (and fewer multiple-assignments). I'll return to performance comparisons shortly.

The other solution I'd like to give code for is the generalization of Tim's <a href="https://eulerscircus.wordpress.com/2009/05/15/math-of-problem-2/">mathematical solution</a> to any bound. Again trimming out some import (you need to import math for this to work in pure Python - no Sage), timing, and print statements, I've got:

def geom(b,p):
""" sum (b^3)^n from n=1 to n=p """
return (1-(b**3)**(p+1)) / (1-b**3)

bound = int(sys.argv)

alpha = (1 + sqrt(5)) / 2
beta = (1 - sqrt(5)) / 2

# what index is the largest even fibonum less than our bound?
# the factor of sqrt(5) on the bound comes from the idea that
# fib(n) ~ alpha^n / sqrt(5)
# and the factor of 3 on alpha is because we want even fibonums
maxn = floor(log(bound * sqrt(5)) / log(alpha**3))

# formula for sum of the even fibonums up to the bound
sum = int((1/sqrt(5)) * (geom(alpha,maxn) - geom(beta,maxn)))


No loops at all. Nice. Of course, there’s a lot of floating point operations going on there. How does that affect performance?

The following chart compares my solution (called “evens” in the chart), with the solution calculating all Fibonacci numbers (called “all”), and Tim’s (“noloop”). The tickmark $n$ along the $x$-axis indicates that the upper bound (maximum even Fibonacci number to sum to) is $4\cdot 10^n$ (why the 4? the original problem had one, is all). Times are measure in milliseconds on the $y$-axis. We see that, indeed, the “noloop” solution looks approximately constant-time, and that the “even” loop slightly beats the “all” loop. What strikes me as more interesting, at first anyway, is that both looping solutions beat the noloop solution at the bound given in the problem (the tickmark 6 on the $x$-axis). After some thought, it almost makes sense, because there are only 11 even Fibonacci numbers less than 4000000, so it’s not like the loops are taking long.

I’m also intrigued by the sudden jump in both looping solutions between $x=8$ and $x=9$. These correspond to a bound of $4\cdot 10^8$ and $4\cdot 10^9$, respectively. Taking log base 2 of these, we get $\approx 28.57$ and $\approx 31.89$, right around 32, a noteable bound for my computer. In fact, the next even Fibonacci number after $4\cdot 10^9$ is $F(48)=4,807,526,976$, which happens to be bigger than $2^{32}=4,294,967,296$. And this is the number we’d make before ending the while loop.

I’m not quite done. Tim commented that for large numbers, $\alpha$ was the dominating term in the expression for $F(n)$. I wondered if the solution I coded up above, based on his math, would give different answers from the looping solutions for some particular bounds. Running a few loops, I noticed that the solutions disagree, unsurprisingly enough, at (some of the) bounds that are even Fibonacci numbers (which, generalizing the original problem, we should include). Here’s a chart using even Fibonacci numbers as bounds:

       Bound        Loop         NoLoop
2           2           2
8          10           2
34          44          44
144         188          44
610         798         798
2584        3382         798
10946       14328       14328
46368       60696       14328
196418      257114      257114
832040     1089154      257114
3524578     4613732     4613732
14930352    19544084    19544084
63245986    82790070    82790070
267914296   350704366   350704366
1134903170  1485607536  1485607536
4807526976  6293134512  6293134512

For the first several, the non-looping solution (as coded above) alternates between correct, and incorrect. It seems this comes from the $\beta$ term of $F(n)$ being negative, so that even or odd powers have some affect. It looks like the last incorrect value is at the bound 832040.

At this point, I think trying to make the non-looping solution work correctly for all even bounds is most easily patched up by just storing a table of the first several, and then computing it for any higher bounds. Or switching between a loop and non-loop solution based on the bound being big enough.

Or, Sage does symbolic manipulations, let’s take the above non-looping solution, and drop it into Sage with an extra if-switch to see if the upper bound is an even Fibonacci number. I coded that as

if(bound == ((1/sqrt(5))*(alpha**(3*(maxn + 1)) - beta**(3*(maxn + 1))))):
maxn += 1


and inserted that just after the “maxn = ” line above (so, at line 15). Running this, with the bound an even Fibonacci number, now always seems to agree with the looping solutions.

Ok, I think that’s all I have to say about problem 2.

### Still on Problems 1 and 2

May 19, 2009

Yes, I know that I’ve been saying 2 problems per week, and we started early last week on problems 1 and 2. However, I thought some extra time on the first two problems would give everybody some time to get going. So I set the “Current Problems” to be problems 1 and 2, due this coming Friday (May 22). I, at least, still have things to say about problem 2 :). There’s a little widget in the right-hand sidebar of the blog with the “Current Problems”. I’ll change them after this Friday.

Please leave comments below answering the following question: Do you want me to write a post each time I change the “Current Problems”, or are you happy to just move to the next problems after the “Current Problems” “due date” as passed?

If you’re already done with the “Current Problems” any given week, feel free to go on ahead working on whatever. But if you do so, please refrain from posting solutions. Let’s try to stay, online anyway, at least a little bit together. If you’re ahead, I’d say it’s a good time to spend reading some documentation or something, like the various tutorials or whatever other references you find. If you find any nice references, please share. This reading might show you other ways to approach a problem you’ve already done, which would be helpful to see.

In the same spirit, though, after we get going, if you have something new to say about a previously “Current Problem”, feel free to write a post.

If you seriously disagree with this (or any other) “policy”, let’s talk it out in the comments below.

### Performance Testing

May 18, 2009

For no particularly good reason, I decided to clean up the script I used recently to get my performance testing data, and thought I’d share. The script is a bash script, to be run on linux machines. The way I’ve been setting things up (inspired by what Eric said he was doing), I make a directory (folder) for each problem number, and in that directory I have a bunch of .py files, one for each different version of the program. I call my scripts, e.g., prob1-v1.py, prob1-v2.py, and so on. Each of the scripts (that I want to test anyway) is set up to take command line arguments (see my other post for how to do that). The script below assumes that the argument that is changing comes first, and any other arguments are fixed, and come after that. Also, it only works for pure python scripts currently, no sage.

The output of my script is (when things work) a URL that uses the Google Charts API to draw a graph. Visiting that output URL using your favorite (open source) browser should turn up a sweet graph (hopefully). To include the graph in a post here, click the button next to ‘Upload/Insert’ above the editor box (when you mouseover the button, it says ‘Add an Image’), then click ‘From URL’ in the top of the “popup” and go from there.

There’s as much flexibility built in to this script as I could handle this evening. You’ll likely have to tinker with some of the lines to get them working with whatever scheme you’ve got going at home. You’ll also have to tinker with them for different problems, because (at least) the command line arguments will change. Sections you are most likely to have to change have been noted. As things are currently shown below, I am comparing scripts prob1-v3.py and prob1-v6.py (designed to solve problem 1) with inputs ranging from 1000 to 10000 in multiples of 1000. Both of these scripts also take the list of numbers you want to sum multiples of, in the base case of the problem this is “3 5”, which is used below (variable E).

#/bin/bash

###################################
### initial setup block. mess with these values
###################################

# assume scripts have names "probP-vN.py". e.g., "prob1-v3.py"
P=1 # the problem number
V="3 6" # the versions to compare

T=50 # number of times to run each program with each input for averaging

# I should be the string of values to use as inputs
# this will be the changing argument that performance is tested against,
# which scripts should be expecting as the first argument
I="$(seq 1000 1000 10000)" # the extra arguments that should get passed each time, not varying E="3 5" ################################## ### url output block. may want to mess with it ################################## # documentation available at: # http://code.google.com/apis/chart/ ################################## # set things up here, or, after completion, # mess with just these parts of the output ################################## echo -n "http://chart.apis.google.com/chart?" echo -n "cht=lxy&" # type echo -n "chs=300x200&" # size echo -n "chxt=x,y&" # place tickmarks on x and y axis echo -n "chxr=0,1,10,1|1,0,5,1&" # the tickmark numbers on the axes echo -n "chco=FF0000,0000FF&" # color for each graph echo -n "chdl=Sets|Sums&" # legend echo -n "chds=1000,10000,0,5,1000,10000,0,5&" # pairs of min/max for each data set echo -n "chd=t:" # data, to be output below ################################## ### generating data block. probably mostly safe to not mess with ################################## # generate the string of x values, from I (just put , between values) X="$(for J in ${I}; do echo -n "${J},"; done; echo -n "|")"

( # wrap up all the plot data output, for some post-processing

# start testing
for W in ${V}; do # print the string of "x" values for the Google Chart data chd=t: echo -n${X}

# loop through each input to be tested
for J in ${I}; do C="scale=10;(" # string we'll pass to bc to calculate average # run the program${T} times, getting runtime each time
for A in $(seq 1${T}); do
O="$(python prob${P}-v${W}.py${J} ${E})" # program's output # pull out just the runtime of the output R="$(echo ${O} | awk '{print$3}')"
# bc doesn't like scientific notation
R="$(echo${R} | sed -s 's/e/*10^/')"
# scale by 1000 for no particularly good reason
# except if runtimes are along the lines of milliseconds...
R="$(echo "scale=10;1000*${R}" | bc)"

# concatenate R to C, and a plus sign
C="${C}${R}+"
done

# cap off C, compute the average, truncate to 4 characters
echo -n "$(echo "${C}0)/${T}" | bc | head -c 4)," done echo -n "|" done # all done, pull out extraneous symbols ) | sed -s 's/,|/|/g' | sed -s 's/|$//'

echo


Anyway, if you feel like using it, go right ahead. If not, no worries. It remains to be seen how much I’ll use it myself. Please share success/failure stories. If you’d like to rearrange the script to accomodate your home setup, I’m happy to try to help. I’m sure there’s all sorts of room for improvement, but I hope it’s good enough for now.

Now, what was I supposed to be doing today?

Update: Just after posting, I realized that I failed to mention that this script makes an assumption about the output of the python scripts it is running. It expects the output to look like “1234 in 0.00456587 sec”, where 1234 is the answer calculated, and 0.00456587 is the time as calculated in, e.g., the scripts of my original comparison of solutions for problem 1.