• Sage Weil's avatar
    libceph: handle connection reopen race with callbacks · 0da5d703
    Sage Weil authored
    If a connection is closed and/or reopened (ceph_con_close, ceph_con_open)
    it can race with a callback.  con_work does various state checks for
    closed or reopened sockets at the beginning, but drops con->mutex before
    making callbacks.  We need to check for state bit changes after retaking
    the lock to ensure we restart con_work and execute those CLOSED/OPENING
    tests or else we may end up operating under stale assumptions.
    
    In Jim's case, this was causing 'bad tag' errors.
    
    There are four cases where we re-take the con->mutex inside con_work: catch
    them all and return EAGAIN from try_{read,write} so that we can restart
    con_work.
    Reported-by: default avatarJim Schutt <jaschut@sandia.gov>
    Tested-by: default avatarJim Schutt <jaschut@sandia.gov>
    Signed-off-by: default avatarSage Weil <sage@newdream.net>
    0da5d703
messenger.c 60.9 KB