Commit 8cba5055 authored by Guido van Rossum's avatar Guido van Rossum

In wait(), when there's no asyncore main loop, we called

asyncore.poll() with a timeout of 10 seconds.  Change this to a
variable timeout starting at 1 msec and doubling until 1 second.

While debugging Win2k crashes in the check4ExtStorageThread test from
ZODB/tests/MTStorage.py, Tim noticed that there were frequent 10
second gaps in the log file where *nothing* happens.  These were caused
by the following scenario.

Suppose a ZEO client process has two threads using the same connection
to the ZEO server, and there's no asyncore loop active.  T1 makes a
synchronous call, and enters the wait() function.  Then T2 makes
another synchronous call, and enters the wait() function.  At this
point, both are blocked in the select() call in asyncore.poll(), with
a timeout of 10 seconds (in the old version).  Now the replies for
both calls arrive.  Say T1 wakes up.  The handle_read() method in
smac.py calls self.recv(8096), so it gets both replies in its buffer,
decodes both, and calls self.message_input() for both, which sticks
both replies in the self.replies dict.  Now T1 finds its response, its
wait() call returns with it.  But T2 is still stuck in
asyncore.poll(): its select() call never woke up, and has to "sit out"
the whole timeout of 10 seconds.  (Good thing I added timeouts to
everything!  Or perhaps not, since it masked the problem.)

One other condition must be satisfied before this becomes a disaster:
T2 must have started a transaction, and all other threads must be
waiting to start another transaction.  This is what I saw in the log.
(Hmm, maybe a message should be logged when a thread is waiting to
start a transaction this way.)

In a real Zope application, this won't happen, because there's a
centralized asyncore loop in a separate thread (probably the client's
main thread) and the various threads would be waiting on the condition
variable; whenever a reply is inserted in the replies dict, all
threads are notified.  But in the test suite there's no asyncore loop,
and I don't feel like adding one.  So the exponential backoff seems
the easiest "solution".
parent 4a981c73
......@@ -351,10 +351,15 @@ class Connection(smac.SizedMessageAsyncConnection):
def wait(self, msgid):
"""Invoke asyncore mainloop and wait for reply."""
if __debug__:
log("wait() async=%d" % self.is_async(), level=zLOG.TRACE)
log("wait(%d), async=%d" % (msgid, self.is_async()),
level=zLOG.TRACE)
if self.is_async():
self.trigger.pull_trigger()
# Delay used when we call asyncore.poll() directly.
# Start with a 1 msec delay, double until 1 sec.
delay = 0.001
self.replies_cond.acquire()
try:
while 1:
......@@ -363,6 +368,9 @@ class Connection(smac.SizedMessageAsyncConnection):
reply = self.replies.get(msgid)
if reply is not None:
del self.replies[msgid]
if __debug__:
log("wait(%d): reply=%s" % (msgid, short_repr(reply)),
level=zLOG.DEBUG)
return reply
if self.is_async():
self.replies_cond.wait(10.0)
......@@ -370,7 +378,12 @@ class Connection(smac.SizedMessageAsyncConnection):
self.replies_cond.release()
try:
try:
asyncore.poll(10.0, self._map)
if __debug__:
log("wait(%d): asyncore.poll(%s)" %
(msgid, delay), level=zLOG.TRACE)
asyncore.poll(delay, self._map)
if delay < 1.0:
delay += delay
except select.error, err:
log("Closing. asyncore.poll() raised %s." % err,
level=zLOG.BLATHER)
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment