1. 29 Mar, 2018 1 commit
  2. 27 Mar, 2018 3 commits
    • Jason Madden's avatar
      Working on some more socket benchmarks · 840692b0
      Jason Madden authored
      The good news is that current master is about 10-15% faster for
      sendall than 1.2.2 was (e.g., 301ws vs 256ms in Python 3.6). udp
      sendto is roughly unaffected (within the margins, based on the native
      performance). Moving the chunking implementation of sendall to Cython
      doesn't show any improvements (so that's not a bottleneck, at least in
      these benchmarks).
      
      The "bad" news is that both UDP and (especially) sendall perform much
      worse than native (native does about 47ms for sendall). This is
      probably related to the fact that we're doing everything in one
      process and one thread, and it is CPU bound; the native process can
      use 150% CPU or so, but the gevent version cannot. So the comparison
      is not directly meaningful.
      
      [skip ci]
      840692b0
    • Jason Madden's avatar
      Merge pull request #1156 from gevent/cython-waiter · 62802671
      Jason Madden authored
      Compile the important hub operations that use Waiters with Cython
      62802671
    • Jason Madden's avatar
      Attempt to fix test__backdoor.py · a4fbd046
      Jason Madden authored
      a4fbd046
  3. 26 Mar, 2018 4 commits
    • Jason Madden's avatar
    • Jason Madden's avatar
      Compile the hub operations that use Waiters with Cython · 92b1a6b6
      Jason Madden authored
      Since we've come this far, might as well keep taking advantage of the
      effort...
      
      There are substantial improvements on the micro benchmarks for things
      that wait and switch:
      
      | Benchmark           | 27_hub_master2 | 27_hub_cython5               |
      |---------------------|----------------|------------------------------|
      | multiple wait ready | 1.96 us        | 1.10 us: 1.77x faster (-44%) |
      | wait ready          | 1.47 us        | 897 ns: 1.64x faster (-39%)  |
      | cancel wait         | 2.93 us        | 1.81 us: 1.61x faster (-38%) |
      | switch              | 2.33 us        | 1.94 us: 1.20x faster (-17%) |
      
      | Benchmark           | 36_hub_master2 | 36_hub_cython6 |
      |---------------------|----------------|------------------------------|
      | multiple wait ready | 1.28 us        | 820 ns: 1.56x faster (-36%)  |
      | wait ready          | 939 ns         | 722 ns: 1.30x faster (-23%)  |
      | cancel wait         | 1.76 us        | 1.37 us: 1.29x faster (-23%) |
      | switch              | 1.60 us        | 1.35 us: 1.18x faster (-16%) |
      92b1a6b6
    • Jason Madden's avatar
      Merge pull request #1155 from gevent/cython-waiter · 821e7fc8
      Jason Madden authored
      Compile gevent.queue and gevent.hub.waiter with Cython
      821e7fc8
    • Jason Madden's avatar
      Greenlet can cimport waiter. · 870e8e13
      Jason Madden authored
      870e8e13
  4. 25 Mar, 2018 5 commits
    • Jason Madden's avatar
      cdef _NONE, it shows up in some hot paths. · cdac6316
      Jason Madden authored
      cdac6316
    • Jason Madden's avatar
      whoops, not internal · 18492009
      Jason Madden authored
      18492009
    • Jason Madden's avatar
      Compile gevent.queue and gevent.hub.waiter with Cython · 99541fd6
      Jason Madden authored
      This gives massive performance benefits to queues:
      
      | Benchmark                              | 27_queue_master | 27_queue_cython2             |
      |----------------------------------------|-----------------|------------------------------|
      | bench_unbounded_queue_noblock          | 2.09 us         | 622 ns: 3.37x faster (-70%)  |
      | bench_bounded_queue_noblock            | 2.55 us         | 634 ns: 4.02x faster (-75%)  |
      | bench_bounded_queue_block              | 36.1 us         | 7.29 us: 4.95x faster (-80%) |
      | bench_channel                          | 15.4 us         | 6.40 us: 2.40x faster (-58%) |
      | bench_bounded_queue_block_hub          | 13.6 us         | 3.89 us: 3.48x faster (-71%) |
      | bench_channel_hub                      | 7.55 us         | 3.38 us: 2.24x faster (-55%) |
      | bench_unbounded_priority_queue_noblock | 5.02 us         | 3.18 us: 1.58x faster (-37%) |
      | bench_bounded_priority_queue_noblock   | 5.48 us         | 3.22 us: 1.70x faster (-41%) |
      
      In a "real" use caes (pool.imap) it shows up as a 10-20% improvement:
      
      | Benchmark          | 36_pool_event5 | 36_pool_ubq_cython          |
      |--------------------|----------------|-----------------------------|
      | imap_unordered_seq | 553 us         | 461 us: 1.20x faster (-17%) |
      | imap_unordered_par | 301 us         | 265 us: 1.14x faster (-12%) |
      | imap_seq           | 587 us         | 497 us: 1.18x faster (-15%) |
      | imap_par           | 326 us         | 275 us: 1.19x faster (-16%) |
      | spawn              | 310 us         | 284 us: 1.09x faster (-8%)  |
      
      Not significant (3): map_seq; map_par; apply
      99541fd6
    • Jason Madden's avatar
      Add basic benchmarks for gevent.queue · b61f9e91
      Jason Madden authored
      Timing as of this commit (macOS 10.13.3, MacBook Pro retina 15-inch,
      mid 2015, default loop impls):
      
      | Benchmark                              | 27_queue_master | 27pypy_queue_master             | 36_queue_master              | 37_queue_master              |
      |----------------------------------------|-----------------|---------------------------------|------------------------------|------------------------------|
      | bench_unbounded_queue_noblock          | 2.09 us         | 10.8 ns: 193.75x faster (-99%)  | 1.34 us: 1.56x faster (-36%) | 1.24 us: 1.69x faster (-41%) |
      | bench_bounded_queue_noblock            | 2.55 us         | 10.9 ns: 234.91x faster (-100%) | 1.67 us: 1.53x faster (-35%) | 1.55 us: 1.65x faster (-39%) |
      | bench_bounded_queue_block              | 36.1 us         | 2.28 us: 15.81x faster (-94%)   | not significant              | 12.9 us: 2.80x faster (-64%) |
      | bench_channel                          | 15.4 us         | 1.91 us: 8.03x faster (-88%)    | 9.96 us: 1.54x faster (-35%) | 8.17 us: 1.88x faster (-47%) |
      | bench_bounded_queue_block_hub          | 13.6 us         | 1.07 us: 12.64x faster (-92%)   | 8.61 us: 1.57x faster (-36%) | 7.66 us: 1.77x faster (-44%) |
      | bench_channel_hub                      | 7.55 us         | 760 ns: 9.94x faster (-90%)     | 5.11 us: 1.48x faster (-32%) | 4.33 us: 1.75x faster (-43%) |
      | bench_unbounded_priority_queue_noblock | 5.02 us         | 186 ns: 26.97x faster (-96%)    | 1.63 us: 3.08x faster (-68%) | 1.60 us: 3.14x faster (-68%) |
      | bench_bounded_priority_queue_noblock   | 5.48 us         | 183 ns: 29.91x faster (-97%)    | 1.98 us: 2.77x faster (-64%) | 1.79 us: 3.07x faster (-67%) |
      
      [skip ci]
      b61f9e91
    • Jason Madden's avatar
      Merge pull request #1154 from gevent/threadpool-opts · 8916cda1
      Jason Madden authored
      Compile IMap[Unordered] with Cython
      8916cda1
  5. 24 Mar, 2018 8 commits
    • Jason Madden's avatar
    • Jason Madden's avatar
      Compile IMap[Unordered] with Cython · 4bac7f17
      Jason Madden authored
      This gets us another 20-30% faster:
      
      | Benchmark          | 27_pool_opts | 27_pool_cython2             |
      |--------------------|--------------|-----------------------------|
      | imap_unordered_seq | 897 us       | 694 us: 1.29x faster (-23%) |
      | imap_unordered_par | 539 us       | 363 us: 1.49x faster (-33%) |
      | imap_seq           | 1.00 ms      | 714 us: 1.41x faster (-29%) |
      | imap_par           | 612 us       | 404 us: 1.52x faster (-34%) |
      | map_seq            | 382 us       | 349 us: 1.09x faster (-9%)  |
      | map_par            | 267 us       | 252 us: 1.06x faster (-6%)  |
      | apply              | 427 us       | 406 us: 1.05x faster (-5%)  |
      | spawn              | 397 us       | 360 us: 1.10x faster (-9%)  |
      4bac7f17
    • Jason Madden's avatar
      Merge pull request #1153 from gevent/threadpool-opts · b4db40b8
      Jason Madden authored
      Optimizations for threadpool
      b4db40b8
    • Jason Madden's avatar
      Add change note. · c21db37f
      Jason Madden authored
      Here's the improvement for the greenlet pools:
      
      | Benchmark          | 36_pool_master | 36_pool_opts                |
      +--------------------+----------------+-----------------------------+
      | imap_unordered_seq | 803 us         | 686 us: 1.17x faster (-15%) |
      | imap_unordered_par | 445 us         | 389 us: 1.14x faster (-13%) |
      | imap_seq           | 793 us         | 729 us: 1.09x faster (-8%)  |
      | imap_par           | 407 us         | 398 us: 1.02x faster (-2%)  |
      | map_seq            | 715 us         | 293 us: 2.44x faster (-59%) |
      | map_par            | 388 us         | 199 us: 1.96x faster (-49%) |
      
      Not significant (2): apply; spawn
      c21db37f
    • Jason Madden's avatar
      Add benchmark for plain greenlet pools. · c2e65dbc
      Jason Madden authored
      c2e65dbc
    • Jason Madden's avatar
      More optimizations and clarifying comments · c61eb0a3
      Jason Madden authored
      Compared to the previous commit:
      
      | Benchmark          | 36_threadpool_opt_PR | 36_threadpool_opt_cond10    |
      +--------------------+----------------------+-----------------------------+
      | imap_unordered_seq | 1.06 ms              | 1.02 ms: 1.04x faster (-4%) |
      | imap_unordered_par | 965 us               | 928 us: 1.04x faster (-4%)  |
      | imap_seq           | 1.08 ms              | 1.03 ms: 1.04x faster (-4%) |
      | map_seq            | 785 us               | 870 us: 1.11x slower (+11%) |
      | map_par            | 656 us               | 675 us: 1.03x slower (+3%)  |
      | apply              | 1.14 ms              | 1.12 ms: 1.02x faster (-2%) |
      c61eb0a3
    • Jason Madden's avatar
    • Jason Madden's avatar
      Optimizations for threadpool · 50a3130b
      Jason Madden authored
      Especially for map. None of the pools really need map to go through
      imap since they have to wait for everything anyway and they return
      results ordererd.
      
      | Benchmark          | 36_threadpool_master | 36_threadpool_opt_cond5     |
      |--------------------|----------------------|-----------------------------|
      | imap_unordered_seq | 1.15 ms              | 1.07 ms: 1.08x faster (-7%) |
      | imap_unordered_par | 1.02 ms              | 950 us: 1.08x faster (-7%)  |
      | imap_seq           | 1.17 ms              | 1.10 ms: 1.06x faster (-6%) |
      | imap_par           | 1.07 ms              | 1000 us: 1.07x faster (-7%) |
      | map_seq            | 1.16 ms              | 724 us: 1.60x faster (-37%) |
      | map_par            | 1.07 ms              | 646 us: 1.66x faster (-40%) |
      | apply              | 1.22 ms              | 1.14 ms: 1.07x faster (-7%) |
      | spawn              | 1.21 ms              | 1.13 ms: 1.07x faster (-7%) |
      50a3130b
  6. 23 Mar, 2018 3 commits
  7. 22 Mar, 2018 5 commits
  8. 21 Mar, 2018 4 commits
  9. 20 Mar, 2018 2 commits
  10. 19 Mar, 2018 3 commits
  11. 17 Mar, 2018 2 commits
    • Jason Madden's avatar
      Merge pull request #1142 from gevent/opt-greenlet · c7491589
      Jason Madden authored
      Introduce GEVENT_TRACK_GREENLET_TREE to disable greenlet tree features
      c7491589
    • Jason Madden's avatar
      Introduce GEVENT_TRACK_GREENLET_TREE to disable greenlet tree features · 25ff8d4a
      Jason Madden authored
      As a performance optimization for applications where spawning
      greenlets is critical. Plus some other optimizations to speed up
      spawning in the general case.
      
      CPython 3.6 with 1.2.2 vs these changes with tracking disabled:
      
      | Benchmark              | 36_122_bench_spawn | 36config_bench_spawn_tree_off |
      +------------------------+--------------------+-------------------------------+
      | eventlet spawn         | 12.6 us            | 12.2 us: 1.04x faster (-4%)   |
      | eventlet sleep         | 5.22 us            | 4.97 us: 1.05x faster (-5%)   |
      | gevent spawn           | 4.27 us            | 5.06 us: 1.19x slower (+19%)  |
      | gevent sleep           | 2.63 us            | 1.25 us: 2.11x faster (-53%)  |
      | geventpool spawn       | 9.00 us            | 8.31 us: 1.08x faster (-8%)   |
      | geventpool sleep       | 4.82 us            | 2.83 us: 1.70x faster (-41%)  |
      | geventraw spawn        | 2.51 us            | 2.81 us: 1.12x slower (+12%)  |
      | geventraw sleep        | 649 ns             | 679 ns: 1.05x slower (+5%)    |
      | geventpool join        | 3.47 us            | 1.42 us: 2.44x faster (-59%)  |
      | geventpool spawn kwarg | 11.0 us            | 8.95 us: 1.23x faster (-19%)  |
      | geventraw spawn kwarg  | 3.87 us            | 4.20 us: 1.08x slower (+8%)   |
      
      The differences compared to master are hard to quantify because the
      standard deviation ends up being more than 10% of the mean in many
      cases---and about a 10% improvement is what we typically see, so it
      goes back and forth.
      25ff8d4a