
[ad_1]
tl; DR I used our scalene profiler and some math to make an example program run 5000x faster.

pip install scalene
I am quite interested in the performance of Python so naturally, I read this article – https://martinheinz.dev/blog/64whose title is Python program profiling and analysis performance, It presents an example program (from https://docs.python.org/3/library/decimal.html) and shows how to run it with Python profilers worn multiple times. Unfortunately, it doesn’t come with much actionable information to, more or less, “try on”. ppp”, which increases the code by about 2x. I wondered if I’d be able to get more useful information from sklearn, a profiler I co-wrote.
We developed sklearn to be a lot more useful than existing Python profilers: it provides line-level The information, separating Python from native time, profiles memory usage, GPU, and even copy cost, all at a single line granularity.
Anyway, here’s the result of running sklearn (with just CPU profiling) on the example code. It really cuts to chase.
% scalene --cpu-only --cli --reduced-profile test/test-martinheinz.py

You can see that practically all execution time is spent computing the ratio between num
And fact
, so that’s the only place to really focus any optimization efforts. The fact that a lot of time is spent running native code means that this line is executing some C library under the covers.
It turns out it’s dividing the two Decimal
s (aka bignums) the underlying bignum library is written in C code and is very fast, but seems to be exclusively factorial really Huge really Fast. In an example input, the last value of fact
11,000 points long! No wonder: doing math on such large numbers is expensive. Let’s see what we can do to make those numbers smaller.
I see that we can calculate num / fact
Not from scratch but incrementally: on each loop iteration update a variable via a calculation on a very small number. To do this, I add a new variable nf
which will always be equal to the rationum / fact
, Then, on each loop iteration, the program updates nf
multiplying it by x / i
, You can verify that it maintains immutable nf == num/fact
By observing the following (where _new
Mean count of updated variable in each iteration).
nf == num / fact # true by induction
nf_new == nf * (x / i) # we multiply by x/i each time
nf_new == (num / fact) * (x / i) # definition of nf
nf_new == (num * x) / (fact * i) # re-arranging
nf_new == num_new / fact_new # simplifying
To incorporate this into the original program three lines of code need to be changed, all of which are followed by ###
,
def exp_opt(x):
getcontext().prec += 2
i, lasts, s, fact, num = 0, 0, 1, 1, 1
nf = Decimal(1) ### was: = num / fact
while s != lasts:
lasts = s
i += 1
fact *= i
num *= x
nf *= (x / i) ### update nf to be num / fact
s += nf ### was: s += num / fact
getcontext().prec -= 2
return +s
The result of this change is, uh, dramatic,
On the Apple Mini M1, the original version:
Original:1.39370958066637969731834193711E+65
5.22146968976414395058876300668E+173
7.64620098905470488931072765993E+1302Elapsed time, original (s): 33.231053829193115
Customize Version:
Optimized:1.39370958066637969731834193706E+65
5.22146968976414395058876300659E+173
7.64620098905470488931072766048E+1302Elapsed time, optimized (s): 0.006501913070678711
more than one 5000X Speedup (5096, to be exact).
The moral of the story is that using a more detailed profiler like sklearn can actually help with optimization efforts by finding inefficiencies in an actionable way.
[ad_2]
Source link
#time #optimized #program #5000x #Emery #Berger