docs: Emphasized the speedups of Cython vs NumPy in both the notebook and the docs.

c03b0bca · gabrieldemarmiesse · 20547723 · c03b0bca · c03b0bca
Commit c03b0bca authored Jul 05, 2018 by gabrieldemarmiesse
Showing with 135 additions and 138 deletions

docs/examples/userguide/numpy_tutorial/numpy_and_cython.ipynb .../examples/userguide/numpy_tutorial/numpy_and_cython.ipynb +118 -125

docs/src/userguide/numpy_tutorial.rst docs/src/userguide/numpy_tutorial.rst +17 -13

No files found.
--- a/docs/examples/userguide/numpy_tutorial/numpy_and_cython.ipynb
+++ b/docs/examples/userguide/numpy_tutorial/numpy_and_cython.ipynb
@@ -20,9 +20,20 @@
   "metadata": {
    "scrolled": true
   },
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0.29a0\n"
+     ]
+    }
+   ],
   "source": [
-    "%load_ext cython"
+    "from __future__ import print_function\n",
+    "%load_ext cython\n",
+    "import Cython\n",
+    "print(Cython.__version__)"
   ]
  },
  {
@@ -72,12 +83,13 @@
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "8.69 ms ± 297 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
+      "8.11 ms ± 25.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
     ]
    }
   ],
   "source": [
-    "%timeit compute_np(array_1, array_2, a, b, c)"
+    "timeit_result = %timeit -o compute_np(array_1, array_2, a, b, c)\n",
+    "np_time = timeit_result.average"
   ]
  },
  {
@@ -86,7 +98,7 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "result = compute_np(array_1, array_2, a, b, c)"
+    "np_result = compute_np(array_1, array_2, a, b, c)"
   ]
  },
  {
@@ -136,7 +148,7 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "assert np.all(compute(array_1, array_2, a, b, c) == result)"
+    "assert np.all(compute(array_1, array_2, a, b, c) == np_result)"
   ]
  },
  {
@@ -148,24 +160,56 @@
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "25.6 s ± 225 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
+      "27.9 s ± 1.75 s per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
     ]
    }
   ],
   "source": [
-    "%timeit compute(array_1, array_2, a, b, c)"
+    "timeit_result = %timeit -o compute(array_1, array_2, a, b, c)\n",
+    "py_time = timeit_result.average"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "##### Pure Python version compiled with Cython:"
+    "#### We make a function to be able to easily compare timings with the NumPy version and the pure Python version."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def compare_time(current, reference, name):\n",
+    "    ratio = reference/current\n",
+    "    if ratio > 1:\n",
+    "        word = \"faster\"\n",
+    "    else:\n",
+    "        ratio = 1 / ratio \n",
+    "        word = \"slower\"\n",
+    "        \n",
+    "    print(\"We are\", \"{0:.1f}\".format(ratio), \"times\", word, \"than the\", name, \"version.\")\n",
+    "\n",
+    "def print_report(compute_function):\n",
+    "    assert np.all(compute_function(array_1, array_2, a, b, c) == np_result)\n",
+    "    timeit_result = %timeit -o compute_function(array_1, array_2, a, b, c)\n",
+    "    run_time = timeit_result.average\n",
+    "    compare_time(run_time, py_time, \"pure Python\")\n",
+    "    compare_time(run_time, np_time, \"NumPy\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "##### Pure Python version compiled with Cython:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
   "metadata": {
    "scrolled": false
   },
@@ -1115,7 +1159,7 @@
       "<IPython.core.display.HTML object>"
      ]
     },
-     "execution_count": 9,
+     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
@@ -1153,31 +1197,25 @@
    "    return result"
   ]
  },
-  {
-   "cell_type": "code",
-   "execution_count": 10,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "assert np.all(compute(array_1, array_2, a, b, c) == result)"
-   ]
-  },
  {
   "cell_type": "code",
   "execution_count": 11,
-   "metadata": {},
+   "metadata": {
+    "scrolled": true
+   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "The history saving thread hit an unexpected error (OperationalError('disk I/O error',)).History will not be written to the database.\n",
-      "21.9 s ± 398 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
+      "22.1 s ± 142 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n",
+      "We are 1.3 times faster than the pure Python version.\n",
+      "We are 2724.1 times slower than the NumPy version.\n"
     ]
    }
   ],
   "source": [
-    "%timeit compute(array_1, array_2, a, b, c)"
+    "print_report(compute)"
   ]
  },
  {
@@ -2021,27 +2059,22 @@
  {
   "cell_type": "code",
   "execution_count": 13,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "assert np.all(compute(array_1, array_2, a, b, c) == result)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 14,
-   "metadata": {},
+   "metadata": {
+    "scrolled": true
+   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "10.5 s ± 301 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
+      "10.1 s ± 50.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n",
+      "We are 2.8 times faster than the pure Python version.\n",
+      "We are 1250.0 times slower than the NumPy version.\n"
     ]
    }
   ],
   "source": [
-    "%timeit compute(array_1, array_2, a, b, c)"
+    "print_report(compute)"
   ]
  },
  {
@@ -2053,7 +2086,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 15,
+   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
@@ -2763,7 +2796,7 @@
       "<IPython.core.display.HTML object>"
      ]
     },
-     "execution_count": 15,
+     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
@@ -2804,28 +2837,21 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 16,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "assert np.all(compute(array_1, array_2, a, b, c) == result)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 17,
+   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "9.56 ms ± 139 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
+      "8.83 ms ± 42.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n",
+      "We are 3161.3 times faster than the pure Python version.\n",
+      "We are 1.1 times slower than the NumPy version.\n"
     ]
    }
   ],
   "source": [
-    "%timeit compute(array_1, array_2, a, b, c)"
+    "print_report(compute)"
   ]
  },
  {
@@ -2837,8 +2863,10 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 46,
-   "metadata": {},
+   "execution_count": 16,
+   "metadata": {
+    "scrolled": true
+   },
   "outputs": [
    {
     "data": {
@@ -3510,7 +3538,7 @@
       "<IPython.core.display.HTML object>"
      ]
     },
-     "execution_count": 46,
+     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
@@ -3553,28 +3581,21 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 47,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "assert np.all(compute(array_1, array_2, a, b, c) == result)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 48,
+   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "6.06 ms ± 26 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
+      "6.04 ms ± 12.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n",
+      "We are 4623.4 times faster than the pure Python version.\n",
+      "We are 1.3 times faster than the NumPy version.\n"
     ]
    }
   ],
   "source": [
-    "%timeit compute(array_1, array_2, a, b, c)"
+    "print_report(compute)"
   ]
  },
  {
@@ -3586,7 +3607,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 62,
+   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
@@ -3628,28 +3649,21 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 50,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "assert np.all(compute(array_1, array_2, a, b, c) == result)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 51,
+   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "4.13 ms ± 87.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
+      "4.18 ms ± 34 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n",
+      "We are 6673.5 times faster than the pure Python version.\n",
+      "We are 1.9 times faster than the NumPy version.\n"
     ]
    }
   ],
   "source": [
-    "%timeit compute(array_1, array_2, a, b, c)"
+    "print_report(compute)"
   ]
  },
  {
@@ -3661,7 +3675,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 63,
+   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
@@ -3704,28 +3718,21 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 53,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "assert np.all(compute(array_1, array_2, a, b, c) == result)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 54,
+   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "4.1 ms ± 54.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
+      "4.25 ms ± 52.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n",
+      "We are 6562.5 times faster than the pure Python version.\n",
+      "We are 1.9 times faster than the NumPy version.\n"
     ]
    }
   ],
   "source": [
-    "%timeit compute(array_1, array_2, a, b, c)"
+    "print_report(compute)"
   ]
  },
  {
@@ -3737,7 +3744,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 64,
+   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
@@ -3790,16 +3797,22 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 56,
+   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
-    "assert np.all(compute(array_1, array_2, a, b, c) == result)"
+    "arr_1_float = array_1.astype(np.float64)\n",
+    "arr_2_float = array_2.astype(np.float64)\n",
+    "\n",
+    "float_cython_result = compute(arr_1_float, arr_2_float, a, b, c)\n",
+    "float_numpy_result = compute_np(arr_1_float, arr_2_float, a, b, c)\n",
+    "\n",
+    "assert np.all(float_cython_result == float_numpy_result)"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 57,
+   "execution_count": 24,
   "metadata": {
    "scrolled": true
   },
@@ -3808,27 +3821,14 @@
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "6 ms ± 70.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
+      "6.17 ms ± 164 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n",
+      "We are 4525.9 times faster than the pure Python version.\n",
+      "We are 1.3 times faster than the NumPy version.\n"
     ]
    }
   ],
   "source": [
-    "%timeit compute(array_1, array_2, a, b, c)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 58,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "arr_1 = np.random.uniform(0, 1000, size=(100, 100)).astype(np.float64)\n",
-    "arr_2 = np.random.uniform(0, 1000, size=(100, 100)).astype(np.float64)\n",
-    "\n",
-    "float_cython_result = compute(arr_1, arr_2, a, b, c)\n",
-    "float_numpy_result = compute_np(arr_1, arr_2, a, b, c)\n",
-    "\n",
-    "assert np.all(float_cython_result == float_numpy_result)"
+    "print_report(compute)"
   ]
  },
  {
@@ -3840,7 +3840,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 65,
+   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
@@ -3895,30 +3895,23 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 60,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "assert np.all(compute(array_1, array_2, a, b, c) == result)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 61,
+   "execution_count": 26,
   "metadata": {
-    "scrolled": true
+    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "3.41 ms ± 93.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
+      "3.55 ms ± 80.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n",
+      "We are 7858.9 times faster than the pure Python version.\n",
+      "We are 2.3 times faster than the NumPy version.\n"
     ]
    }
   ],
   "source": [
-    "%timeit compute(array_1, array_2, a, b, c)"
+    "print_report(compute)"
   ]
  }
 ],

--- a/docs/src/userguide/numpy_tutorial.rst
+++ b/docs/src/userguide/numpy_tutorial.rst
@@ -175,15 +175,15 @@ run a Python session to test both the Python version (imported from
    In [7]: def compute_np(array_1, array_2, a, b, c):
       ...:     return np.clip(array_1, 2, 10) * a + array_2 * b + c
    In [8]: %timeit compute_np(array_1, array_2, a, b, c)
-    8.69 ms ± 297 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
+    8.11 ms ± 25.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

    In [9]: import compute_py
    In [10]: compute_py.compute(array_1, array_2, a, b, c)
-    25.6 s ± 225 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
+    27.9 s ± 1.75 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

    In [11]: import compute_cy
    In [12]: compute_cy.compute(array_1, array_2, a, b, c)
-    21.9 s ± 398 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
+    22.1 s ± 142 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

 There's not such a huge difference yet; because the C code still does exactly
 what the Python interpreter does (meaning, for instance, that a new object is
@@ -218,7 +218,7 @@ After building this and continuing my (very informal) benchmarks, I get:
 .. sourcecode:: ipython

    In [13]: %timeit compute_typed.compute(array_1, array_2, a, b, c)
-    10.5 s ± 301 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
+    10.1 s ± 50.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

 So adding types does make the code faster, but nowhere
 near the speed of NumPy?
@@ -287,10 +287,10 @@ Let's see how much faster accessing is now.
 .. sourcecode:: ipython

    In [22]: %timeit compute_memview.compute(array_1, array_2, a, b, c)
-    9.56 ms ± 139 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
+    8.83 ms ± 42.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

 Note the importance of this change.
-We're now 2700 times faster than an interpreted version of Python and close
+We're now 3161 times faster than an interpreted version of Python and close
 to NumPy speed.

 Memoryviews can be used with slices too, or even
@@ -326,9 +326,9 @@ information.
 .. sourcecode:: ipython

    In [23]: %timeit compute_index.compute(array_1, array_2, a, b, c)
-    6.1 ms ± 103 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
+    6.04 ms ± 12.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

-We're now faster than the NumPy version. NumPy is really well written,
+We're now faster than the NumPy version, not by much (1.3x). NumPy is really well written,
 but does not performs operation lazily, meaning a lot
 of back and forth in memory. Our version is very memory efficient and
 cache friendly because we know the operations in advance.
@@ -375,9 +375,10 @@ get by declaring the memoryviews as contiguous:
 .. sourcecode:: ipython

    In [23]: %timeit compute_contiguous.compute(array_1, array_2, a, b, c)
-    4.13 ms ± 87.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
+    4.18 ms ± 34 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

-We're now around two times faster than the NumPy version.
+We're now around two times faster than the NumPy version, and 6600 times
+faster than the pure Python version!

 Making the function cleaner
 ===========================
@@ -403,7 +404,7 @@ We now do a speed test:
 .. sourcecode:: ipython

    In [24]: %timeit compute_infer_types.compute(array_1, array_2, a, b, c)
-    4.1 ms ± 54.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
+    4.25 ms ± 52.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

 Lo and behold, the speed has not changed.

@@ -444,7 +445,7 @@ We now do a speed test:
 .. sourcecode:: ipython

    In [25]: %timeit compute_fused_types.compute(array_1, array_2, a, b, c)
-    6 ms ± 70.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
+    6.17 ms ± 164 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

 We're a bit slower than before, because of the right call to the clip function
 must be found at runtime and adds a bit of overhead.
@@ -471,7 +472,10 @@ We can have substantial speed gains for minimal effort:
 .. sourcecode:: ipython

    In [25]: %timeit compute_prange.compute(array_1, array_2, a, b, c)
-    3.41 ms ± 93.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
+    3.55 ms ± 80.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
+
+We're now 7858 times faster than the pure Python version and 2.3 times faster
+than NumPy!

 Where to go from here?
 ======================