• Nikolay Aleksandrov's avatar
    net: nexthop: release IPv6 per-cpu dsts when replacing a nexthop group · 1005f19b
    Nikolay Aleksandrov authored
    When replacing a nexthop group, we must release the IPv6 per-cpu dsts of
    the removed nexthop entries after an RCU grace period because they
    contain references to the nexthop's net device and to the fib6 info.
    With specific series of events[1] we can reach net device refcount
    imbalance which is unrecoverable. IPv4 is not affected because dsts
    don't take a refcount on the route.
    
    [1]
     $ ip nexthop list
      id 200 via 2002:db8::2 dev bridge.10 scope link onlink
      id 201 via 2002:db8::3 dev bridge scope link onlink
      id 203 group 201/200
     $ ip -6 route
      2001:db8::10 nhid 203 metric 1024 pref medium
         nexthop via 2002:db8::3 dev bridge weight 1 onlink
         nexthop via 2002:db8::2 dev bridge.10 weight 1 onlink
    
    Create rt6_info through one of the multipath legs, e.g.:
     $ taskset -a -c 1  ./pkt_inj 24 bridge.10 2001:db8::10
     (pkt_inj is just a custom packet generator, nothing special)
    
    Then remove that leg from the group by replace (let's assume it is id
    200 in this case):
     $ ip nexthop replace id 203 group 201
    
    Now remove the IPv6 route:
     $ ip -6 route del 2001:db8::10/128
    
    The route won't be really deleted due to the stale rt6_info holding 1
    refcnt in nexthop id 200.
    At this point we have the following reference count dependency:
     (deleted) IPv6 route holds 1 reference over nhid 203
     nh 203 holds 1 ref over id 201
     nh 200 holds 1 ref over the net device and the route due to the stale
     rt6_info
    
    Now to create circular dependency between nh 200 and the IPv6 route, and
    also to get a reference over nh 200, restore nhid 200 in the group:
     $ ip nexthop replace id 203 group 201/200
    
    And now we have a permanent circular dependncy because nhid 203 holds a
    reference over nh 200 and 201, but the route holds a ref over nh 203 and
    is deleted.
    
    To trigger the bug just delete the group (nhid 203):
     $ ip nexthop del id 203
    
    It won't really be deleted due to the IPv6 route dependency, and now we
    have 2 unlinked and deleted objects that reference each other: the group
    and the IPv6 route. Since the group drops the reference it holds over its
    entries at free time (i.e. its own refcount needs to drop to 0) that will
    never happen and we get a permanent ref on them, since one of the entries
    holds a reference over the IPv6 route it will also never be released.
    
    At this point the dependencies are:
     (deleted, only unlinked) IPv6 route holds reference over group nh 203
     (deleted, only unlinked) group nh 203 holds reference over nh 201 and 200
     nh 200 holds 1 ref over the net device and the route due to the stale
     rt6_info
    
    This is the last point where it can be fixed by running traffic through
    nh 200, and specifically through the same CPU so the rt6_info (dst) will
    get released due to the IPv6 genid, that in turn will free the IPv6
    route, which in turn will free the ref count over the group nh 203.
    
    If nh 200 is deleted at this point, it will never be released due to the
    ref from the unlinked group 203, it will only be unlinked:
     $ ip nexthop del id 200
     $ ip nexthop
     $
    
    Now we can never release that stale rt6_info, we have IPv6 route with ref
    over group nh 203, group nh 203 with ref over nh 200 and 201, nh 200 with
    rt6_info (dst) with ref over the net device and the IPv6 route. All of
    these objects are only unlinked, and cannot be released, thus they can't
    release their ref counts.
    
     Message from syslogd@dev at Nov 19 14:04:10 ...
      kernel:[73501.828730] unregister_netdevice: waiting for bridge.10 to become free. Usage count = 3
     Message from syslogd@dev at Nov 19 14:04:20 ...
      kernel:[73512.068811] unregister_netdevice: waiting for bridge.10 to become free. Usage count = 3
    
    Fixes: 7bf4796d ("nexthops: add support for replace")
    Signed-off-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    1005f19b
nexthop.c 90.7 KB