• Kirill Smelkov's avatar
    X Switch Zope to use poll() instead of select() · c6addb05
    Kirill Smelkov authored
    Recently we are starting to get following errors on a production
    instance:
    
          File ".../bin/runzope", line 211, in <module>
            sys.exit(Zope2.Startup.run.run())
          File ".../eggs/Zope2-2.13.24-py2.7.egg/Zope2/Startup/run.py", line 26, in run
            starter.run()
          File ".../eggs/Zope2-2.13.24-py2.7.egg/Zope2/Startup/__init__.py", line 111, in run
            Lifetime.loop()
          File ".../eggs/Zope2-2.13.24-py2.7.egg/Lifetime/__init__.py", line 43, in loop
            lifetime_loop()
          File ".../eggs/Zope2-2.13.24-py2.7.egg/Lifetime/__init__.py", line 53, in lifetime_loop
            asyncore.poll(timeout, map)
          File ".../parts/python2.7/lib/python2.7/asyncore.py", line 145, in poll
            r, w, e = select.select(r, w, e, timeout)
        ValueError: filedescriptor out of range in select()
    
    Initially we thought the reason here is that number of file descriptors this
    process uses goes beyond allowed limit (65K open files on that instance)
    because wendelin.core used 1 fd per an array view (opening
    /dev/shm/ramh.XXXXX) but that turned out to be not strictly that case:
    
    The reason here is that select() has limit for how many #fd it can
    monitor at all. The limit is system-specific but on Linux it is usually
    1024 - http://man7.org/linux/man-pages/man2/select.2.html#BUGS :
    
        $ cat s.c
        #include <sys/select.h>
        #include <stdio.h>
    
        int main() {
                printf("%d\n", FD_SETSIZE);
                return 0;
        }
    
        $ tcc -run s.c
        1024
    
    Also select() can monitor only file descriptors which are by itself are
    "< FD_SETSIZE", e.g. it cannot monitor fd=1025 even if there are only 10
    opened file descriptors because 1025 is not < 1024.
    
    As was said above in wendelin.core every array view uses 1 file
    descriptor, so if we are using not so small amount of arrays, even
    though #fd does not go beyond process-specific ulimit, because of
    select() usage the total number of allowed opened file descriptors
    becomes essentially 1024.
    
    So let's switch from select() to poll() which does not have this 1024
     #fd limit. Asyncore already support using poll() out of the box -
    either via passing use_pull=True to asyncore.loop()
    
    https://docs.python.org/2/library/asyncore.html#asyncore.loop
    
    or by using asyncore.poll2() instead of asyncore.poll()
    
    https://github.com/python/cpython/blob/2.7/Lib/asyncore.py#L170
    https://github.com/python/cpython/blob/2.7/Lib/asyncore.py#L125
    
    --------
    
    This patch switches asyncore.poll() -> asyncore.poll2() for only 1 place
    in Zope. There are however many such places in Zope and other software.
    
    For this reason, for me, what makes sense to do is not to patch all such
    places, but instead change via runtime-patching asyncore.poll to be
    asyncore.poll2 - this way all software will automatically benefit from
    poll() usage instead of select.
    
    P.S.
    
    What might also make sense to do in the future is to actually let
    asyncore.poll to use epoll(), as both select() and poll() are doing all
    fd registration on _every_ call, so when #fd grows this preparation
    time grows too. For epoll() file descriptors are registered only once.
    For this to work asyncore.socket_map has to be patched also, since there
    are places in code which modify this fd registry directly (e.g. remove
    fd from there etc)
    c6addb05
__init__.py 2.88 KB