Commit 1105d722 authored by gabrieldemarmiesse's avatar gabrieldemarmiesse

Some improvements one the tutorial page.

parent 831ce22f
def primes(int kmax): def primes(int nb_primes):
cdef int n, k, i cdef int n, i, len_p
cdef int p[1000] cdef int p[1000]
result = [] if nb_primes > 1000:
if kmax > 1000: nb_primes = 1000
kmax = 1000
k = 0 len_p = 0 # The number of elements in p
n = 2 n = 2
while k < kmax: while len_p < nb_primes:
i = 0 # Is n prime?
while i < k and n % p[i] != 0: for i in p[:len_p]:
i = i + 1 if n % i == 0:
if i == k: break
p[k] = n
k = k + 1 # If no break occurred in the loop
result.append(n) else:
n = n + 1 p[len_p] = n
return result len_p += 1
n += 1
# Let's put the result in a python list:
result_as_list = [prime for prime in p[:len_p]]
return result_as_list
...@@ -132,27 +132,83 @@ them as a Python list. ...@@ -132,27 +132,83 @@ them as a Python list.
:linenos: :linenos:
You'll see that it starts out just like a normal Python function definition, You'll see that it starts out just like a normal Python function definition,
except that the parameter ``kmax`` is declared to be of type ``int`` . This except that the parameter ``nb_primes`` is declared to be of type ``int`` . This
means that the object passed will be converted to a C integer (or a means that the object passed will be converted to a C integer (or a
``TypeError.`` will be raised if it can't be). ``TypeError.`` will be raised if it can't be).
Now, let's dig into the core of the function::
cdef int n, i, len_p
cdef int p[1000]
Lines 2 and 3 use the ``cdef`` statement to define some local C variables. Lines 2 and 3 use the ``cdef`` statement to define some local C variables.
Line 4 creates a Python list which will be used to return the result. You'll The result is put in ``p``, it will be converted to a python list at the end
notice that this is done exactly the same way it would be in Python. Because of the function (line 22). ::
the variable result hasn't been given a type, it is assumed to hold a Python
object. if nb_primes > 1000:
nb_primes = 1000
As in C, declaring a static array requires knowing the size at compile time.
We make sure the user doesn't set a value above 1000 (or we'll have a nice
segmentation fault, just like in C). ::
len_p = 0 # The number of elements in p
n = 2
while len_p < nb_primes:
Lines 7-9 set up for a loop which will test candidate numbers for primeness Lines 7-9 set up for a loop which will test candidate numbers for primeness
until the required number of primes has been found. Lines 11-12, which try until the required number of primes has been found. ::
dividing a candidate by all the primes found so far, are of particular
interest. Because no Python objects are referred to, the loop is translated # Is n prime?
entirely into C code, and thus runs very fast. for i in p[:len_p]:
if n % i == 0:
When a prime is found, lines 14-15 add it to the p array for fast access by break
the testing loop, and line 16 adds it to the result list. Again, you'll notice
that line 16 looks very much like a Python statement, and in fact it is, with Lines 11-12, which try dividing a candidate by all the primes found so far,
the twist that the C parameter ``n`` is automatically converted to a Python are of particular interest. Because no Python objects are referred to,
object before being passed to the append method. Finally, at line 18, a normal the loop is translated entirely into C code, and thus runs very fast.
You will notice the way we iterate over the ``p`` C array. ::
for i in p[:len_p]:
The loop gets translated into C code transparently. No more ugly C for loops!
Well don't forget how to loop in C style with integers yet, you might need it someday.
If you don't use ``:len_p`` then Cython will loop over the 1000 elements of
the array (it won't go out of bounds and give a segmentation fault). ::
# If no break occurred in the loop
else:
p[len_p] = n
len_p += 1
n += 1
If no breaks occurred, it means that we found a prime, and the block of code
after the ``else`` line 16 will be executed. We add the prime found to ``p``.
If you find having a else after a for loop strange, just know that it's a
hidden secret of the python syntax, and actually doesn't exist in C!
But since Cython is made to be written with the Python syntax, it'll
work out, as if you wrote Python code, but at C speed in this case.
If the for...else syntax still confuses you, see this excellent
`blog post <https://shahriar.svbtle.com/pythons-else-clause-in-loops>`_. ::
# Let's put the result in a python list:
result_as_list = [prime for prime in p[:len_p]]
return result_as_list
Line 22, before returning the result, we need to convert our C array into a
Python list, because Python can't read C arrays. Note that Cython handle
for you the conversion of quite some types between C and Python (you can
see exactly which :ref:`here<type-conversion>`. But not C arrays. We can trick
Cython into doing it because Cython knows how to convert a C int to a Python int.
By doing a list comprehension, we "cast" each C int prime from p into a Python int.
You could have also iterated manually over the C array and used
``result_as_list.append(prime)``, the result would have been the same.
You'll notice we declare a Python list exactly the same way it would be in Python.
Because the variable ``result_as_list`` hasn't been given a type, it is assumed to
hold a Python object.
Finally, at line 18, a normal
Python return statement returns the result list. Python return statement returns the result list.
Compiling primes.pyx with the Cython compiler produces an extension module Compiling primes.pyx with the Cython compiler produces an extension module
...@@ -165,6 +221,72 @@ which we can try out in the interactive interpreter as follows:: ...@@ -165,6 +221,72 @@ which we can try out in the interactive interpreter as follows::
See, it works! And if you're curious about how much work Cython has saved you, See, it works! And if you're curious about how much work Cython has saved you,
take a look at the C code generated for this module. take a look at the C code generated for this module.
It is always good to check where is the Python interaction in the code with the
``annotate=True`` parameter in ``cythonize()``. Let's see:
.. figure:: htmlreport.png
The function declaration and return use the Python interpreter so it makes
sense for those lines to be yellow. Same for the list comprehension because
it involves the creation of a python object. But the line ``if n % i == 0:``, why?
We can examine the generated C code to understand:
.. figure:: python_division.png
We can see that some checks happen. Because Cython defaults to the
Python behavior, the language will perform division checks at runtime,
just like Python does. You can deactivate those checks by using the
:ref:`compiler directives<compiler-directives>`.
Now let's see if, even if we have division checks, we obtained a boost in speed.
Let's write the same program, but Python-style::
def primes_python(nb_primes):
p = []
n = 2
while len(p) < nb_primes:
# Is n prime?
for i in p:
if n % i == 0:
break
# If no break occurred in the loop
else:
p.append(n)
n += 1
return p
Now we can ensure that those two programs output the same values::
>>> primes_python(500) == primes(500)
True
It's possible to compare the speed now::
>>> %timeit primes_python(500)
5.8 ms ± 178 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ::
>>> %timeit primes(500)
502 µs ± 2.22 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
The Cython version is 11 550 times faster than the Python version! What could explain this?
Multiple things:
* In this program, very little computation happen at each line.
So the overhead of the python interpreter is very important. It would be
very different if you were to do a lot computation at each line. Using NumPy for
example.
* Data locality. It's likely that a lot more can fit in CPU cache when using C than
when using Python. Because everything in python is an object, and every object is
implemented as a dictionary, this is not very cache friendly.
It's worth mentioning that you won't usually get speedups like this.
We very likeky touched a sweet spot with the CPU cache. Usually the speedups
are between 2x to 1000x. As always, remember to profile before adding types
everywhere.
Language Details Language Details
================ ================
......
...@@ -301,6 +301,10 @@ return value and raise it yourself, for example,:: ...@@ -301,6 +301,10 @@ return value and raise it yourself, for example,::
raise SpamError("Couldn't open the spam file") raise SpamError("Couldn't open the spam file")
.. _type-conversion:
Automatic type conversions Automatic type conversions
========================== ==========================
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment