- 25 Apr, 2018 3 commits
-
-
Jason Madden authored
-
Jason Madden authored
-
Jason Madden authored
This should eliminate the need to patch and the need to avoid 0 duration timers. This works more like libev. We could theoretically implement priorities using this system.
-
- 23 Apr, 2018 6 commits
-
-
Jason Madden authored
Compile the monitor greenlet tracer with Cython
-
Jason Madden authored
This makes things 54% faster. In fact, the monitor tracer is now faster than a trivial tracer implemented in python (settrace(lambda e, a: None)). +-------------------+-----------------+-----------------------------+ | Benchmark | 37_bench_tracer | 37_bench_tracer_cython_opt3 | +===================+=================+=============================+ | monitor tracer | 1.62 us | 739 ns: 2.20x faster (-54%) | +-------------------+-----------------+-----------------------------+ | max switch tracer | 3.06 us | 874 ns: 3.50x faster (-71%) | +-------------------+-----------------+-----------------------------+ | hub switch tracer | 2.16 us | 815 ns: 2.66x faster (-62%) | +-------------------+-----------------+-----------------------------+ Not significant (2): no tracer; trivial tracer
-
Jason Madden authored
Unoptimized still makes them 25% faster: +-------------------+-----------------+------------------------------+ | Benchmark | 37_bench_tracer | 37_bench_tracer_first_cython | +===================+=================+==============================+ | trivial tracer | 792 ns | 786 ns: 1.01x faster (-1%) | +-------------------+-----------------+------------------------------+ | monitor tracer | 1.62 us | 1.24 us: 1.31x faster (-24%) | +-------------------+-----------------+------------------------------+ | max switch tracer | 3.06 us | 1.89 us: 1.62x faster (-38%) | +-------------------+-----------------+------------------------------+ | hub switch tracer | 2.16 us | 1.68 us: 1.29x faster (-22%) | +-------------------+-----------------+------------------------------+
-
Jason Madden authored
Current numbers on 3.7b3: no tracer: Mean +- std dev: 414 ns +- 10 ns trivial tracer: Mean +- std dev: 792 ns +- 16 ns monitor tracer: Mean +- std dev: 1.62 us +- 0.12 us max switch tracer: Mean +- std dev: 3.06 us +- 0.12 us hub switch tracer: Mean +- std dev: 2.16 us +- 0.04 us
-
Jason Madden authored
Fix a typo [skip ci]
-
Felix Yan authored
-
- 20 Apr, 2018 3 commits
-
-
Jason Madden authored
Make the libuv run QUEUE part of the loop.
-
Jason Madden authored
Benchmarking (link in the email) showed that malloc/free had substantial and widely varying overhead. I didn't really see much of a difference in the gevent benchmarks, but I didn't run them all. However, if any patch gets upstreamed, it will probably be something like this. The link referenced in the email contains the discussion on the libuv mailing list.
-
Jason Madden authored
* Make the monitor thread survive a fork. * Add coverage for the new get_process. * psutil is not on windows, move the test to a protected location. * No, really.
-
- 19 Apr, 2018 7 commits
-
-
Jason Madden authored
Add gevent.util.assert_switches
-
-
Jason Madden authored
Based on refactoring existing code in _monitor.py Flexibly allows checking for any switching, switches that take longer than a set amount of time, or switches that exceed a time limit between getting back to the hub. Fixes #1182.
-
Jason Madden authored
Update to libuv 1.20.1
-
Jason Madden authored
-
Jason Madden authored
-
Jason Madden authored
-
- 18 Apr, 2018 5 commits
-
-
Jason Madden authored
-
Jason Madden authored
-
Jason Madden authored
Update libuv to 1.20
-
Jason Madden authored
-
Jason Madden authored
We're more competitive using a forked process: ..................... gevent socketpair sendall greenlet: Mean +- std dev: 256 ms +- 6 ms ..................... native socketpair sendall thread: Mean +- std dev: 44.4 ms +- 1.5 ms ..................... WARNING: the benchmark result may be unstable * the standard deviation (7.31 ms) is 13% of the mean (57.9 ms) * the maximum (89.0 ms) is 54% greater than the mean (57.9 ms) Try to rerun the benchmark with more runs, values and/or loops. Run 'python -m perf system tune' command to reduce the system jitter. Use perf stats, perf dump and perf hist to analyze results. Use --quiet option to hide these warnings. gevent socketpair sendall fork: Mean +- std dev: 57.9 ms +- 7.3 ms ..................... native socketpair sendall fork: Mean +- std dev: 44.6 ms +- 2.0 ms ..................... native udp sendto: Mean +- std dev: 30.6 ms +- 1.1 ms ..................... gevent udp sendto: Mean +- std dev: 37.5 ms +- 2.7 ms (python 3.7)
-
- 16 Apr, 2018 4 commits
-
-
Jason Madden authored
This addresses the first few points of #1060. We now have a basic documentation for what a loop looks like, and we have documented watcher.close() Hub is better documented (only the parts that should be documented are documented). get/setswitchinterval is documnted. Several xrefs are fixed up.
-
Jason Madden authored
Sigh. Pip changed the semantics of -U in pip 10. go back to the old way so we're actually testing changing deps; if it breaks we need to know.
-
Jason Madden authored
Use /dev/fd|/proc/self/fd to get open FDs to close in Popen
-
Jason Madden authored
Should get them to complete coverage. Also move our test dependencies into the standard 'test' extra and install mock (and futures) only on Python 2.7.
-
- 15 Apr, 2018 7 commits
-
-
Jason Madden authored
If those aren't available, use the old brute-force approach. This is closer to what CPython does in its C implementation, and is much faster. We don't have to worry about the async signal safe stuff the C code does because, guess what, we're running Python code here already anyway, so much of it could wind up doing something that's not actually safe anyway. Oh well. Since we depend on Python 3.4 and above now, we can rely on the CLOEXEC flag being set by default and not have to manually check everything. This speeds up 2.7 (close_fds defaults to *false* there, so the default case doesn't change): | Benchmark | 27_bench_subprocess | 27_bench_subprocess_dirfd | +---------------------------+---------------------+-------------------------------+ | spawn native no close_fds | 1.81 ms | 1.79 ms: 1.01x faster (-1%) | | spawn gevent no close_fds | 2.11 ms | 2.20 ms: 1.04x slower (+4%) | | spawn native close_fds | 31.0 ms | 30.2 ms: 1.03x faster (-3%) | | spawn gevent close_fds | 31.6 ms | 2.56 ms: 12.31x faster (-92%) | And it really speeds up 3.7 (close_fds defaults to *true* there, so the default case is much faster, and the non-default case is even better): | Benchmark | 37_bench_subprocess | 37_bench_subprocess_dirfd | +---------------------------+---------------------+-------------------------------+ | spawn native no close_fds | 1.34 ms | 1.27 ms: 1.06x faster (-6%) | | spawn gevent no close_fds | 117 ms | 3.05 ms: 38.27x faster (-97%) | | spawn native close_fds | 1.36 ms | 1.30 ms: 1.04x faster (-4%) | | spawn gevent close_fds | 32.5 ms | 3.34 ms: 9.75x faster (-90%) | Fixes #1172
-
Jason Madden authored
In response to #1172 The following numbers are for my machine on macOS 10.13.3 with MAXFD of 50000. Python 2.7: ..................... spawn native no close_fds: Mean +- std dev: 1.81 ms +- 0.04 ms ..................... spawn gevent no close_fds: Mean +- std dev: 2.11 ms +- 0.08 ms ..................... spawn native close_fds: Mean +- std dev: 31.0 ms +- 0.7 ms ..................... spawn gevent close_fds: Mean +- std dev: 31.6 ms +- 0.6 ms Notice that the times when close_fd=True (not the default on 2.7) are about the same. 2.7 uses the same Python loop we do to close all the fds. Now 3.7: ..................... spawn native no close_fds: Mean +- std dev: 1.34 ms +- 0.04 ms ..................... spawn gevent no close_fds: Mean +- std dev: 117 ms +- 2 ms ..................... spawn native close_fds: Mean +- std dev: 1.36 ms +- 0.03 ms ..................... spawn gevent close_fds: Mean +- std dev: 32.5 ms +- 0.4 ms Notice that gevent is *much* slower when we *don't* close the fds. This is because, starting in Python 3.4, close_fds defaults to true, and when it's false we have to check os.get_inheritable() for each fd before we close it. gevent performs the same as it did on Python 2.7 when closing fds, but the native implementation is much faster due to the C optimizations outlined in #1172---it turns out they apply to BSD and Apple platforms in addition to Linux, although they're not async safe. Now, the C code does the opposite for inheritable handles: it explicitly calls make_inheritable() for the ones it wants to keep and lets the OS close the others with CLOEXEC. We could probably do that too; the slow down for this case counts as a regression, I think.
-
Jason Madden authored
Move the API reference to its own (sectioned) page to clean up the main page. Break up the massive 'gevent' module page into more digestable parts and add more xrefs.
-
Jason Madden authored
-
Jason Madden authored
[skip ci]
-
Jason Madden authored
-
Jason Madden authored
Use environment markers to install CFFI on windows so the libuv backend can really be default
-
- 14 Apr, 2018 5 commits
-
-
Jason Madden authored
-
Jason Madden authored
-
Jason Madden authored
-
Jason Madden authored
-
Jason Madden authored
-