1. 14 Dec, 2015 5 commits
    • Francesco Ruggeri's avatar
      packet: race condition in packet_bind · f236cda1
      Francesco Ruggeri authored
      [ Upstream commit 30f7ea1c ]
      
      There is a race conditions between packet_notifier and packet_bind{_spkt}.
      
      It happens if packet_notifier(NETDEV_UNREGISTER) executes between the
      time packet_bind{_spkt} takes a reference on the new netdevice and the
      time packet_do_bind sets po->ifindex.
      In this case the notification can be missed.
      If this happens during a dev_change_net_namespace this can result in the
      netdevice to be moved to the new namespace while the packet_sock in the
      old namespace still holds a reference on it. When the netdevice is later
      deleted in the new namespace the deletion hangs since the packet_sock
      is not found in the new namespace' &net->packet.sklist.
      It can be reproduced with the script below.
      
      This patch makes packet_do_bind check again for the presence of the
      netdevice in the packet_sock's namespace after the synchronize_net
      in unregister_prot_hook.
      More in general it also uses the rcu lock for the duration of the bind
      to stop dev_change_net_namespace/rollback_registered_many from
      going past the synchronize_net following unlist_netdevice, so that
      no NETDEV_UNREGISTER notifications can happen on the new netdevice
      while the bind is executing. In order to do this some code from
      packet_bind{_spkt} is consolidated into packet_do_dev.
      
      import socket, os, time, sys
      proto=7
      realDev='em1'
      vlanId=400
      if len(sys.argv) > 1:
         vlanId=int(sys.argv[1])
      dev='vlan%d' % vlanId
      
      os.system('taskset -p 0x10 %d' % os.getpid())
      
      s = socket.socket(socket.PF_PACKET, socket.SOCK_RAW, proto)
      os.system('ip link add link %s name %s type vlan id %d' %
                (realDev, dev, vlanId))
      os.system('ip netns add dummy')
      
      pid=os.fork()
      
      if pid == 0:
         # dev should be moved while packet_do_bind is in synchronize net
         os.system('taskset -p 0x20000 %d' % os.getpid())
         os.system('ip link set %s netns dummy' % dev)
         os.system('ip netns exec dummy ip link del %s' % dev)
         s.close()
         sys.exit(0)
      
      time.sleep(.004)
      try:
         s.bind(('%s' % dev, proto+1))
      except:
         print 'Could not bind socket'
         s.close()
         os.system('ip netns del dummy')
         sys.exit(0)
      
      os.waitpid(pid, 0)
      s.close()
      os.system('ip netns del dummy')
      sys.exit(0)
      Signed-off-by: default avatarFrancesco Ruggeri <fruggeri@arista.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      f236cda1
    • WANG Cong's avatar
      ipv4: disable BH when changing ip local port range · 59fed94a
      WANG Cong authored
      [ Upstream commit 4ee3bd4a ]
      
      This fixes the following lockdep warning:
      
       [ INFO: inconsistent lock state ]
       4.3.0-rc7+ #1197 Not tainted
       ---------------------------------
       inconsistent {IN-SOFTIRQ-R} -> {SOFTIRQ-ON-W} usage.
       sysctl/1019 [HC0[0]:SC0[0]:HE1:SE1] takes:
        (&(&net->ipv4.ip_local_ports.lock)->seqcount){+.+-..}, at: [<ffffffff81921de7>] ipv4_local_port_range+0xb4/0x12a
       {IN-SOFTIRQ-R} state was registered at:
         [<ffffffff810bd682>] __lock_acquire+0x2f6/0xdf0
         [<ffffffff810be6d5>] lock_acquire+0x11c/0x1a4
         [<ffffffff818e599c>] inet_get_local_port_range+0x4e/0xae
         [<ffffffff8166e8e3>] udp_flow_src_port.constprop.40+0x23/0x116
         [<ffffffff81671cb9>] vxlan_xmit_one+0x219/0xa6a
         [<ffffffff81672f75>] vxlan_xmit+0xa6b/0xaa5
         [<ffffffff817f2deb>] dev_hard_start_xmit+0x2ae/0x465
         [<ffffffff817f35ed>] __dev_queue_xmit+0x531/0x633
         [<ffffffff817f3702>] dev_queue_xmit_sk+0x13/0x15
         [<ffffffff818004a5>] neigh_resolve_output+0x12f/0x14d
         [<ffffffff81959cfa>] ip6_finish_output2+0x344/0x39f
         [<ffffffff8195bf58>] ip6_finish_output+0x88/0x8e
         [<ffffffff8195bfef>] ip6_output+0x91/0xe5
         [<ffffffff819792ae>] dst_output_sk+0x47/0x4c
         [<ffffffff81979392>] NF_HOOK_THRESH.constprop.30+0x38/0x82
         [<ffffffff8197981e>] mld_sendpack+0x189/0x266
         [<ffffffff8197b28b>] mld_ifc_timer_expire+0x1ef/0x223
         [<ffffffff810de581>] call_timer_fn+0xfb/0x28c
         [<ffffffff810ded1e>] run_timer_softirq+0x1c7/0x1f1
      
      Fixes: b8f1a556 ("udp: Add function to make source port for UDP tunnels")
      Cc: Tom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      59fed94a
    • Sabrina Dubroca's avatar
      ipv6: clean up dev_snmp6 proc entry when we fail to initialize inet6_dev · 8d135fed
      Sabrina Dubroca authored
      [ Upstream commit 2a189f9e ]
      
      In ipv6_add_dev, when addrconf_sysctl_register fails, we do not clean up
      the dev_snmp6 entry that we have already registered for this device.
      Call snmp6_unregister_dev in this case.
      
      Fixes: a317a2f1 ("ipv6: fail early when creating netdev named all or default")
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      8d135fed
    • Martin Habets's avatar
      sfc: push partner queue for skb->xmit_more · 725242ca
      Martin Habets authored
      [ Upstream commit b2663a4f ]
      
      When the IP stack passes SKBs the sfc driver puts them in 2 different TX
      queues (called partners), one for checksummed and one for not checksummed.
      If the SKB has xmit_more set the driver will delay pushing the work to the
      NIC.
      
      When later it does decide to push the buffers this patch ensures it also
      pushes the partner queue, if that also has any delayed work. Before this
      fix the work in the partner queue would be left for a long time and cause
      a netdev watchdog.
      
      Fixes: 70b33fb0 ("sfc: add support for skb->xmit_more")
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Signed-off-by: default avatarMartin Habets <mhabets@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      725242ca
    • Eric Dumazet's avatar
      sit: fix sit0 percpu double allocations · 624fe175
      Eric Dumazet authored
      [ Upstream commit 4ece9009 ]
      
      sit0 device allocates its percpu storage twice :
      - One time in ipip6_tunnel_init()
      - One time in ipip6_fb_tunnel_init()
      
      Thus we leak 48 bytes per possible cpu per network namespace dismantle.
      
      ipip6_fb_tunnel_init() can be much simpler and does not
      return an error, and should be called after register_netdev()
      
      Note that ipip6_tunnel_clone_6rd() also needs to be called
      after register_netdev() (calling ipip6_tunnel_init())
      
      Fixes: ebe084aa ("sit: Use ipip6_tunnel_init as the ndo_init function.")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      624fe175
  2. 03 Dec, 2015 35 commits