понедельник, 18 августа 2008 г.

Boost perfomance with psyco

After I spent some time at sphere I realized that python is not than fast to solve some of those problems. Of course if you want your code run faster you have to use proper algorithm. But sometimes it's not enougth. Here is some of my test results. Hope it will be helpful.
I made my tests with this tester module:

from time import clock
import gc
from cStringIO import StringIO
import psyco

def getTime(f, it):
start = clock()
for i in it:
f()
end = clock()
return end - start;

def testIt(f, name, reps = 10):
gcState = gc.isenabled
gc.enable()
it = xrange(reps)
psyco.bind(f)
time1 = getTime(f, it) / reps
psyco.unbind(f)
time2 = getTime(f, it) / reps
gc.disable()
psyco.bind(f)
time3 = getTime(f, it) / reps
psyco.unbind(f)
time4 = getTime(f, it) / reps
if gcState:
gc.enable()
return '%s\t%.5f\t%.5f\t%.5f\t%.5f\tsec/repeat in %d repeats' % (name, time1, time2, time3, time4, reps)

def compareFuncs(funcList, reps = 10):
result = StringIO()
result.write('\tpsy&gc\tgc\t\tpsyco\tclean\n')
for f in funcList:
result.write(testIt(f, str(f).split()[1], reps))
result.write('\n')
result.seek(0)
return result.read()

  • psy&gc - psyco and garbage collector(gc) enabled
  • gc - psyco disabled and gc enabled
  • psyco - psyco enabled and gc disabled
  • clean - psyco disabled and gc disabled

1. List creation:

from tester import compareFuncs
n = 1000000

def f1():
return [x+2 for x in xrange(n)]

def f2():
result = []
for x in xrange(n):
result.append(x+2)
return result

def f2_1():
result = []
for x in xrange(n):
result.extend([x+2,])
return result

def f3():
result = []
for x in xrange(n):
result += [x+2]
return result

def f4():
result = [0]*n
for x in xrange(n):
result[x] = x+2
return result

print compareFuncs([f1, f2, f2_1, f3, f4])

Results:
psy&gc gc psyco clean
f1 0.11433 0.16656 0.10566 0.17885 sec/repeat in 10 repeats
f2 0.09519 0.30929 0.10910 0.30642 sec/repeat in 10 repeats
f3 0.38955 0.56423 0.38655 0.56900 sec/repeat in 10 repeats
f4 0.04510 0.17216 0.04500 0.17194 sec/repeat in 10 repeats


2. Find maximum in list:

from tester import compareFuncs

n = 10000000

def formList():
result = [0]*n
for x in xrange(n):
result[x] = x+2
return result

l = formList()

def fmax():
return max(l)

def f1():
result = None
for x in l:
result = x if x > result else result
return result

def f2():
result = None
for x in l:
result = x > result and x or result
return result

print compareFuncs([fmax, f1, f2])

Results:

psy&gc gc psyco clean
fmax 0.53332 0.52626 0.53145 0.53093 sec/repeat in 10 repeats
f1 0.14818 0.96928 0.15452 0.97569 sec/repeat in 10 repeats
f2 0.14416 1.05048 0.13946 1.05867 sec/repeat in 10 repeats

Interesting thing about builtin max: I tested it on ascending list as in last example and after this I tested it on descending list. Here is results:

psy&gc gc psyco clean
ascending 0.53332 0.52626 0.53145 0.53093 sec/repeat in 10 repeats
descending 0.51684 0.51587 0.51563 0.51735 sec/repeat in 10 repeats

I think this difference is in the cost of value storing.

3. Finding sum of list:

from tester import compareFuncs

n = 1000000

def formList():
result = [0]*n
for x in xrange(n):
result[x] = x+2
return result

l = formList()

def fsum():
return sum(l)

def f1():
result = 0
for x in l:
result += x
return result

print compareFuncs([fsum, f1])

Results:

psy&gc gc psyco clean
sum 0.16274 0.15875 0.16075 0.15821 sec/repeat in 10 repeats
f1 0.11790 0.20511 0.11828 0.20757 sec/repeat in 10 repeats


4. Filtering list:

from tester import compareFuncs

n = 1000000

def formList():
result = [0]*n
for x in xrange(n):
result[x] = x+2
return result

l = formList()

def fc(x):
return x > m

def f1():
return filter(fc, l)

def f2():
return filter(lambda x: x > m, l)

def f3():
result = []
for x in l:
if x > m:
result.append(x)
return result

def f4():
result = [0] * len(l)
i = 0
for x in l:
if x > m:
result[i] = x
i += 1
return result[:i]

def f4_1():
result = [0] * len(l)
i = 0
for j in xrange(len(l)):
if l[j] > m:
result[i] = l[j]
i += 1
return result[:i]

def f5():
result = [0] * len(l)
for i in xrange(len(l)):
if l[i] > m:
result[i] = l[i]
return filter(None, result)

def f6():
result = l[:]
for i in reversed(xrange(len(result))):
if result[i] <= m:
del result[i]
return result

def f8():
result = l[:]
j = 0
for i in xrange(len(result)):
if result[i] > m:
result[j] = result[i]
j += 1
return result[:j]

def f7():
return [x for x in l if x > m]

print compareFuncs([f1, f2, f3, f4, f4_1, f5, f6, f7, f8])

Results:

with m = 100
psy&gc gc psyco clean
f1 0.27793 0.24135 0.28133 0.24088 sec/repeat in 10 repeats
f2 0.24230 0.24540 0.25347 0.25791 sec/repeat in 10 repeats
f3 0.09478 0.32952 0.09434 0.31757 sec/repeat in 10 repeats
f4 0.07822 0.31062 0.07671 0.30546 sec/repeat in 10 repeats
f4_1 0.08865 0.43612 0.08682 0.43763 sec/repeat in 10 repeats
f5 0.09735 0.39230 0.09663 0.39629 sec/repeat in 10 repeats
f6 0.42679 0.50859 0.41716 0.51017 sec/repeat in 10 repeats
f7 0.10520 0.19076 0.09891 0.18730 sec/repeat in 10 repeats

with m = 999900
psy&gc gc psyco clean
f1 0.23904 0.23723 0.23719 0.23357 sec/repeat in 10 repeats
f2 0.24410 0.23535 0.24561 0.23649 sec/repeat in 10 repeats
f3 0.01541 0.09355 0.01616 0.09356 sec/repeat in 10 repeats
f4 0.03093 0.10816 0.02954 0.10896 sec/repeat in 10 repeats
f4_1 0.03045 0.18474 0.03259 0.18506 sec/repeat in 10 repeats
f5 0.06754 0.22423 0.07007 0.22154 sec/repeat in 10 repeats
f6 0.19033 0.37000 0.19214 0.37743 sec/repeat in 10 repeats
f7 0.01548 0.09655 0.01528 0.09187 sec/repeat in 10 repeats

I bound with psyco only certain functions in this case with filter and map.
But if you use psyco.full() with map or filter you gain poorer perfomance.
Also you can read about this on official psyco site.

5. Strings concatenating and formatting:
Amazing results or it's just some optimization so here is my test code

from tester import compareFuncs

n = 100000

def f1():
result = [None] * n
for i in xrange(n):
result[i] = ' '.join(['test1', str(i), 'test3'])
return result

def f2():
result = [None] * n
for i in xrange(n):
result[i] = '%s %i %s' % ('test1', i, 'test3')
return result

def f3():
result = [None] * n
for i in xrange(n):
result[i] = 'test1' + str(i) + 'test3'
return result

def f4():
result = [None] * n
l = ['test1', 'test2', 'test3']
for i in xrange(n):
result[i] = ' '.join(l)
return result

def f5():
result = [None] * n
for i in xrange(n):
result[i] = '%s %s %s' % ('test1', 'test2', 'test3')
return result

def f6():
result = [None] * n
for i in xrange(n):
result[i] = 'test1' + 'test2' + 'test3'
return result

def f7():
result = [None] * n
s1 = 'test1'
s2 = 'test2'
s3 = 'test3'
for i in xrange(n):
result[i] = s1 + s2 + s3
return result

print compareFuncs([f1, f2, f3, f4, f5, f6, f7], 10)

Results:

psy&gc gc psyco clean
f1 0.13617 0.25503 0.13334 0.20871 sec/repeat in 10 repeats
f2 0.16368 0.20218 0.16317 0.20082 sec/repeat in 10 repeats
f3 0.10835 0.14519 0.10975 0.14607 sec/repeat in 10 repeats
f4 0.01919 0.06375 0.01946 0.06390 sec/repeat in 10 repeats
f5 0.05227 0.08242 0.05179 0.08256 sec/repeat in 10 repeats
f6 0.00172 0.02026 0.00166 0.01999 sec/repeat in 10 repeats
f7 0.00166 0.04547 0.00169 0.04534 sec/repeat in 10 repeats


Some hints for psyco (this info also available at official site):
- enable psyco after all import instructions - psyco slows import
- put all code in functions - psyco doesn't speed up code in module
- never use re with psyco - they become extremely slow
- range = xrange with psyco, so no need to replace range with xrange in "for in" constructions

At last even without psyco you can disable garbage collector with:

import gc
gc.disable()

This can help to increase perfomance with gc-sensible algorithms.

Комментариев нет: