• Eric W. Biederman's avatar
    fuse: in fuse_flush only wait if someone wants the return code · 5a8bee63
    Eric W. Biederman authored
    
    
    If a fuse filesystem is mounted inside a container, there is a problem
    during pid namespace destruction. The scenario is:
    
    1. task (a thread in the fuse server, with a fuse file open) starts
       exiting, does exit_signals(), goes into fuse_flush() -> wait
    2. fuse daemon gets killed, tries to wake everyone up
    3. task from 1 is stuck because complete_signal() doesn't wake it up, since
       it has PF_EXITING.
    
    The result is that the thread will never be woken up, and pid namespace
    destruction will block indefinitely.
    
    To add insult to injury, nobody is waiting for these return codes, since
    the pid namespace is being destroyed.
    
    To fix this, let's not block on flush operations when the current task has
    PF_EXITING.
    
    This does change the semantics slightly: the wait here is for posix locks
    to be unlocked, so the task will exit before things are unlocked. To quote
    Miklos:
    
      "remote" posix locks are almost never used due to problems like this, so
      I think it's safe to do this.
    Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
    Signed-off-by: default avatarTycho Andersen <tycho@tycho.pizza>
    Link: https://lore.kernel.org/all/YrShFXRLtRt6T%2Fj+@risky/
    
    Tested-by: default avatarTycho Andersen <tycho@tycho.pizza>
    Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
    5a8bee63
file.c 80.9 KB