• Pavel Emelyanov's avatar
    [NETNS]: Fix race between put_net() and netlink_kernel_create(). · 23fe1866
    Pavel Emelyanov authored
    The comment about "race free view of the set of network
    namespaces" was a bit hasty. Look (there even can be only
    one CPU, as discovered by Alexey Dobriyan and Denis Lunev):
    
    put_net()
      if (atomic_dec_and_test(&net->refcnt))
        /* true */
          __put_net(net);
            queue_work(...);
    
    /*
     * note: the net now has refcnt 0, but still in
     * the global list of net namespaces
     */
    
    == re-schedule ==
    
    register_pernet_subsys(&some_ops);
      register_pernet_operations(&some_ops);
        (*some_ops)->init(net);
          /*
           * we call netlink_kernel_create() here
           * in some places
           */
          netlink_kernel_create();
             sk_alloc();
                get_net(net); /* refcnt = 1 */
             /*
              * now we drop the net refcount not to
              * block the net namespace exit in the
              * future (or this can be done on the
              * error path)
              */
             put_net(sk->sk_net);
                 if (atomic_dec_and_test(&...))
                       /*
                        * true. BOOOM! The net is
                        * scheduled for release twice
                        */
    
    When thinking on this problem, I decided, that getting and
    putting the net in init callback is wrong. If some init
    callback needs to have a refcount-less reference on the struct
    net, _it_ has to be careful himself, rather than relying on
    the infrastructure to handle this correctly.
    
    In case of netlink_kernel_create(), the problem is that the
    sk_alloc() gets the given namespace, but passing the info
    that we don't want to get it inside this call is too heavy.
    
    Instead, I propose to crate the socket inside an init_net
    namespace and then re-attach it to the desired one right
    after the socket is created.
    
    After doing this, we also have to be careful on error paths
    not to drop the reference on the namespace, we didn't get
    the one on.
    Signed-off-by: default avatarPavel Emelyanov <xemul@openvz.org>
    Acked-by: default avatarDenis Lunev <den@openvz.org>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    23fe1866
af_netlink.c 43.7 KB