1. 25 Apr, 2018 3 commits
  2. 23 Apr, 2018 6 commits
    • Jason Madden's avatar
      Merge pull request #1190 from gevent/cython-tracer · ab5272c5
      Jason Madden authored
      Compile the monitor greenlet tracer with Cython
      ab5272c5
    • Jason Madden's avatar
      Compile the monitor greenlet tracer with Cython. · aa911ed0
      Jason Madden authored
      This makes things 54% faster. In fact, the monitor tracer is now faster than a trivial tracer implemented in python (settrace(lambda e, a: None)).
      
      +-------------------+-----------------+-----------------------------+
      | Benchmark         | 37_bench_tracer | 37_bench_tracer_cython_opt3 |
      +===================+=================+=============================+
      | monitor tracer    | 1.62 us         | 739 ns: 2.20x faster (-54%) |
      +-------------------+-----------------+-----------------------------+
      | max switch tracer | 3.06 us         | 874 ns: 3.50x faster (-71%) |
      +-------------------+-----------------+-----------------------------+
      | hub switch tracer | 2.16 us         | 815 ns: 2.66x faster (-62%) |
      +-------------------+-----------------+-----------------------------+
      
      Not significant (2): no tracer; trivial tracer
      aa911ed0
    • Jason Madden's avatar
      Move the greenlet tracers to their own file and compile with cython. · 4231079a
      Jason Madden authored
      Unoptimized still makes them 25% faster:
      
      +-------------------+-----------------+------------------------------+
      | Benchmark         | 37_bench_tracer | 37_bench_tracer_first_cython |
      +===================+=================+==============================+
      | trivial tracer    | 792 ns          | 786 ns: 1.01x faster (-1%)   |
      +-------------------+-----------------+------------------------------+
      | monitor tracer    | 1.62 us         | 1.24 us: 1.31x faster (-24%) |
      +-------------------+-----------------+------------------------------+
      | max switch tracer | 3.06 us         | 1.89 us: 1.62x faster (-38%) |
      +-------------------+-----------------+------------------------------+
      | hub switch tracer | 2.16 us         | 1.68 us: 1.29x faster (-22%) |
      +-------------------+-----------------+------------------------------+
      4231079a
    • Jason Madden's avatar
      Add benchmark for greenlet tracers. [skip ci] · 596773a1
      Jason Madden authored
      Current numbers on 3.7b3:
      
      no tracer: Mean +- std dev: 414 ns +- 10 ns
      trivial tracer: Mean +- std dev: 792 ns +- 16 ns
      monitor tracer: Mean +- std dev: 1.62 us +- 0.12 us
      max switch tracer: Mean +- std dev: 3.06 us +- 0.12 us
      hub switch tracer: Mean +- std dev: 2.16 us +- 0.04 us
      596773a1
    • Jason Madden's avatar
      Merge pull request #1189 from felixonmars/patch-2 · 58754ba8
      Jason Madden authored
      Fix a typo [skip ci]
      58754ba8
    • Felix Yan's avatar
      Fix a typo · 9633ecc9
      Felix Yan authored
      9633ecc9
  3. 20 Apr, 2018 3 commits
  4. 19 Apr, 2018 7 commits
  5. 18 Apr, 2018 5 commits
    • Jason Madden's avatar
      Fill in whatsnew_1_3.rst [skip ci] · 72c428f3
      Jason Madden authored
      72c428f3
    • Jason Madden's avatar
      643ee9fc
    • Jason Madden's avatar
      Merge pull request #1181 from gevent/libuv120 · 43848b60
      Jason Madden authored
      Update libuv to 1.20
      43848b60
    • Jason Madden's avatar
      Update libuv to 1.20 · 3a599958
      Jason Madden authored
      3a599958
    • Jason Madden's avatar
      More socket benchmarks [skip ci] · 7a0a9585
      Jason Madden authored
      We're more competitive using a forked process:
      
      .....................
      gevent socketpair sendall greenlet: Mean +- std dev: 256 ms +- 6 ms
      .....................
      native socketpair sendall thread: Mean +- std dev: 44.4 ms +- 1.5 ms
      .....................
      WARNING: the benchmark result may be unstable
      * the standard deviation (7.31 ms) is 13% of the mean (57.9 ms)
      * the maximum (89.0 ms) is 54% greater than the mean (57.9 ms)
      
      Try to rerun the benchmark with more runs, values and/or loops.
      Run 'python -m perf system tune' command to reduce the system jitter.
      Use perf stats, perf dump and perf hist to analyze results.
      Use --quiet option to hide these warnings.
      
      gevent socketpair sendall fork: Mean +- std dev: 57.9 ms +- 7.3 ms
      .....................
      native socketpair sendall fork: Mean +- std dev: 44.6 ms +- 2.0 ms
      .....................
      native udp sendto: Mean +- std dev: 30.6 ms +- 1.1 ms
      .....................
      gevent udp sendto: Mean +- std dev: 37.5 ms +- 2.7 ms
      
      (python 3.7)
      7a0a9585
  6. 16 Apr, 2018 4 commits
  7. 15 Apr, 2018 7 commits
    • Jason Madden's avatar
      Use /dev/fd|/proc/self/fd to get open FDs to close in Popen · 461d6966
      Jason Madden authored
      If those aren't available, use the old brute-force approach. This is
      closer to what CPython does in its C implementation, and is much
      faster.
      
      We don't have to worry about the async signal safe stuff the C code
      does because, guess what, we're running Python code here already
      anyway, so much of it could wind up doing something that's not
      actually safe anyway. Oh well.
      
      Since we depend on Python 3.4 and above now, we can rely on the
      CLOEXEC flag being set by default and not have to manually check
      everything.
      
      This speeds up 2.7 (close_fds defaults to *false* there, so the
      default case doesn't change):
      
      | Benchmark                 | 27_bench_subprocess | 27_bench_subprocess_dirfd     |
      +---------------------------+---------------------+-------------------------------+
      | spawn native no close_fds | 1.81 ms             | 1.79 ms: 1.01x faster (-1%)   |
      | spawn gevent no close_fds | 2.11 ms             | 2.20 ms: 1.04x slower (+4%)   |
      | spawn native close_fds    | 31.0 ms             | 30.2 ms: 1.03x faster (-3%)   |
      | spawn gevent close_fds    | 31.6 ms             | 2.56 ms: 12.31x faster (-92%) |
      
      And it really speeds up 3.7 (close_fds defaults to *true* there, so
      the default case is much faster, and the non-default case is even better):
      
      | Benchmark                 | 37_bench_subprocess | 37_bench_subprocess_dirfd     |
      +---------------------------+---------------------+-------------------------------+
      | spawn native no close_fds | 1.34 ms             | 1.27 ms: 1.06x faster (-6%)   |
      | spawn gevent no close_fds | 117 ms              | 3.05 ms: 38.27x faster (-97%) |
      | spawn native close_fds    | 1.36 ms             | 1.30 ms: 1.04x faster (-4%)   |
      | spawn gevent close_fds    | 32.5 ms             | 3.34 ms: 9.75x faster (-90%)  |
      
      Fixes #1172
      461d6966
    • Jason Madden's avatar
      Add benchmark for using subprocess.Popen to create /usr/bin/true [skip ci] · b166aaf8
      Jason Madden authored
      In response to #1172
      
      The following numbers are for my machine on macOS 10.13.3 with MAXFD
      of 50000.
      
      Python 2.7:
      
      .....................
      spawn native no close_fds: Mean +- std dev: 1.81 ms +- 0.04 ms
      .....................
      spawn gevent no close_fds: Mean +- std dev: 2.11 ms +- 0.08 ms
      .....................
      spawn native close_fds: Mean +- std dev: 31.0 ms +- 0.7 ms
      .....................
      spawn gevent close_fds: Mean +- std dev: 31.6 ms +- 0.6 ms
      
      Notice that the times when close_fd=True (not the default on 2.7) are
      about the same. 2.7 uses the same Python loop we do to close all the
      fds.
      
      Now 3.7:
      
      .....................
      spawn native no close_fds: Mean +- std dev: 1.34 ms +- 0.04 ms
      .....................
      spawn gevent no close_fds: Mean +- std dev: 117 ms +- 2 ms
      .....................
      spawn native close_fds: Mean +- std dev: 1.36 ms +- 0.03 ms
      .....................
      spawn gevent close_fds: Mean +- std dev: 32.5 ms +- 0.4 ms
      
      Notice that gevent is *much* slower when we *don't* close the fds.
      This is because, starting in Python 3.4, close_fds defaults to true,
      and when it's false we have to check os.get_inheritable() for each fd
      before we close it. gevent performs the same as it did on Python 2.7
      when closing fds, but the native implementation is much faster due to
      the C optimizations outlined in #1172---it turns out they apply to BSD
      and Apple platforms in addition to Linux, although they're not async
      safe.
      
      Now, the C code does the opposite for inheritable handles: it
      explicitly calls make_inheritable() for the ones it wants to keep and
      lets the OS close the others with CLOEXEC. We could probably do that
      too; the slow down for this case counts as a regression, I think.
      b166aaf8
    • Jason Madden's avatar
      Reorganize docs. · b7c9a1df
      Jason Madden authored
      Move the API reference to its own (sectioned) page to clean up the main page. Break up the massive 'gevent' module page into more digestable parts and add more xrefs.
      b7c9a1df
    • Jason Madden's avatar
    • Jason Madden's avatar
    • Jason Madden's avatar
    • Jason Madden's avatar
      Merge pull request #1174 from gevent/cffi-win · e2aef582
      Jason Madden authored
      Use environment markers to install CFFI on windows so the libuv backend can really be default
      e2aef582
  8. 14 Apr, 2018 5 commits