Commit ce532784 authored by Jason Madden's avatar Jason Madden

PyPy: Substantial performance improvements for sending large amounts of data with socket.sendall.

This masks an underlying PyPy bug referenced in the changelog, so it's still not as efficient as it
could be, but it's the best we can do on currently released PyPy.

Impact for CPython2 appears negligible (and none at all for CPython 3, since we don't support PyPy3).
parent af93e655
...@@ -15,8 +15,14 @@ Unreleased ...@@ -15,8 +15,14 @@ Unreleased
initialized. Now, the module will be automtically compiled when initialized. Now, the module will be automtically compiled when
gevent is imported. Reported in :issue:`619` by Thinh Nguyen with gevent is imported. Reported in :issue:`619` by Thinh Nguyen with
contributions by Jay Oster and Matt Dupre. contributions by Jay Oster and Matt Dupre.
- PyPy: Improve the performance of ``gevent.socket.socket:sendall``
with large inputs. `bench_sendall.py`_ now performs about as well on
PyPy as it does on CPython, an improvement of 10x (from ~60MB/s to
~630MB/s). See this `pypy bug`_ for details.
.. _future: http://python-future.org .. _future: http://python-future.org
.. _bench_sendall.py: https://raw.githubusercontent.com/gevent/gevent/master/greentest/bench_sendall.py
.. _pypy bug: https://bitbucket.org/pypy/pypy/issues/2091/non-blocking-socketsend-slow-gevent
1.1b2 (Aug 5, 2015) 1.1b2 (Aug 5, 2015)
=================== ===================
......
...@@ -301,27 +301,80 @@ class socket(object): ...@@ -301,27 +301,80 @@ class socket(object):
return 0 return 0
raise raise
def __send_chunk(self, data_memory, flags, timeleft, end):
"""
Send the complete contents of ``data_memory`` before returning.
This is the core loop around :meth:`send`.
:param timeleft: Either ``None`` if there is no timeout involved,
or a float indicating the timeout to use.
:param end: Either ``None`` if there is no timeout involved, or
a float giving the absolute end time.
:return: An updated value for ``timeleft`` (or None)
:raises timeout: If ``timeleft`` was given and elapsed while
sending this chunk.
"""
data_sent = 0
len_data_memory = len(data_memory)
while data_sent < len_data_memory:
chunk = data_memory[data_sent:]
if timeleft is None:
data_sent += self.send(chunk, flags)
else:
data_sent += self.send(chunk, flags, timeout=timeleft)
if data_sent >= len_data_memory:
return
timeleft = end - time.time()
if timeleft <= 0:
raise timeout('timed out')
return timeleft
def sendall(self, data, flags=0): def sendall(self, data, flags=0):
if isinstance(data, unicode): if isinstance(data, unicode):
data = data.encode() data = data.encode()
# this sendall is also reused by gevent.ssl.SSLSocket subclass, # this sendall is also reused by gevent.ssl.SSLSocket subclass,
# so it should not call self._sock methods directly # so it should not call self._sock methods directly
data_memory = _get_memory(data) data_memory = _get_memory(data)
if self.timeout is None: len_data_memory = len(data_memory)
# On PyPy up through 2.6.0, subviews of a memoryview() object
# copy the underlying bytes the first time the builtin
# socket.send() method is called. On a non-blocking socket
# (that thus calls socket.send() many times) with a large
# input, this results in many repeated copies of an ever
# smaller string, depending on the networking buffering. For
# example, if each send() can process 1MB of a 50MB input, and
# we naively pass the entire remaining subview each time, we'd
# copy 49MB, 48MB, 47MB, etc, thus completely killing
# performance. To workaround this problem, we work in
# reasonable, fixed-size chunks. This results in a 10x
# improvement to bench_sendall.py, while having no measurable impact on
# CPython (since it doesn't copy at all the only extra overhead is
# a few python function calls, which is negligible for large inputs).
# See https://bitbucket.org/pypy/pypy/issues/2091/non-blocking-socketsend-slow-gevent
# Too small of a chunk (the socket's buf size is usually too
# small) results in reduced perf due to *too many* calls to send and too many
# small copies. With a buffer of 143K (the default on my system), for
# example, bench_sendall.py yields ~264MB/s, while using 1MB yields
# ~653MB/s (matching CPython). 1MB is arbitrary and might be better
# chosen, say, to match a page size?
chunk_size = max(self.getsockopt(SOL_SOCKET, SO_SNDBUF), 1024 * 1024)
data_sent = 0 data_sent = 0
while data_sent < len(data_memory): end = None
data_sent += self.send(data_memory[data_sent:], flags) timeleft = None
else: if self.timeout is not None:
timeleft = self.timeout timeleft = self.timeout
end = time.time() + timeleft end = time.time() + timeleft
data_sent = 0
while True: while data_sent < len_data_memory:
data_sent += self.send(data_memory[data_sent:], flags, timeout=timeleft) chunk_end = min(data_sent + chunk_size, len_data_memory)
if data_sent >= len(data_memory): chunk = data_memory[data_sent:chunk_end]
return
timeleft = end - time.time() timeleft = self.__send_chunk(chunk, flags, timeleft, end)
if timeleft <= 0: data_sent += len(chunk) # Guaranteed it sent the whole thing
raise timeout('timed out')
def sendto(self, *args): def sendto(self, *args):
sock = self._sock sock = self._sock
......
...@@ -327,6 +327,7 @@ class socket(object): ...@@ -327,6 +327,7 @@ class socket(object):
raise raise
def sendall(self, data, flags=0): def sendall(self, data, flags=0):
# XXX When we run on PyPy3, see the notes in _socket2.py's sendall()
data_memory = _get_memory(data) data_memory = _get_memory(data)
if self.timeout is None: if self.timeout is None:
data_sent = 0 data_sent = 0
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment