1. 15 Dec, 2014 8 commits
    • willy tarreau's avatar
      net: mvneta: fix Tx interrupt delay · 67bcd9ed
      willy tarreau authored
      [ Upstream commit aebea2ba ]
      
      The mvneta driver sets the amount of Tx coalesce packets to 16 by
      default. Normally that does not cause any trouble since the driver
      uses a much larger Tx ring size (532 packets). But some sockets
      might run with very small buffers, much smaller than the equivalent
      of 16 packets. This is what ping is doing for example, by setting
      SNDBUF to 324 bytes rounded up to 2kB by the kernel.
      
      The problem is that there is no documented method to force a specific
      packet to emit an interrupt (eg: the last of the ring) nor is it
      possible to make the NIC emit an interrupt after a given delay.
      
      In this case, it causes trouble, because when ping sends packets over
      its raw socket, the few first packets leave the system, and the first
      15 packets will be emitted without an IRQ being generated, so without
      the skbs being freed. And since the socket's buffer is small, there's
      no way to reach that amount of packets, and the ping ends up with
      "send: no buffer available" after sending 6 packets. Running with 3
      instances of ping in parallel is enough to hide the problem, because
      with 6 packets per instance, that's 18 packets total, which is enough
      to grant a Tx interrupt before all are sent.
      
      The original driver in the LSP kernel worked around this design flaw
      by using a software timer to clean up the Tx descriptors. This timer
      was slow and caused terrible network performance on some Tx-bound
      workloads (such as routing) but was enough to make tools like ping
      work correctly.
      
      Instead here, we simply set the packet counts before interrupt to 1.
      This ensures that each packet sent will produce an interrupt. NAPI
      takes care of coalescing interrupts since the interrupt is disabled
      once generated.
      
      No measurable performance impact nor CPU usage were observed on small
      nor large packets, including when saturating the link on Tx, and this
      fixes tools like ping which rely on too small a send buffer. If one
      wants to increase this value for certain workloads where it is safe
      to do so, "ethtool -C $dev tx-frames" will override this default
      setting.
      
      This fix needs to be applied to stable kernels starting with 3.10.
      Tested-By: default avatarMaggie Mae Roxas <maggie.mae.roxas@gmail.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      67bcd9ed
    • Nicolas Dichtel's avatar
      rtnetlink: release net refcnt on error in do_setlink() · 48e9bbd6
      Nicolas Dichtel authored
      [ Upstream commit e0ebde0e ]
      
      rtnl_link_get_net() holds a reference on the 'struct net', we need to release
      it in case of error.
      
      CC: Eric W. Biederman <ebiederm@xmission.com>
      Fixes: b51642f6 ("net: Enable a userns root rtnl calls that are safe for unprivilged users")
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Reviewed-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      48e9bbd6
    • Jack Morgenstein's avatar
      net/mlx4_core: Limit count field to 24 bits in qp_alloc_res · 53772173
      Jack Morgenstein authored
      [ Upstream commit 2d5c57d7 ]
      
      Some VF drivers use the upper byte of "param1" (the qp count field)
      in mlx4_qp_reserve_range() to pass flags which are used to optimize
      the range allocation.
      
      Under the current code, if any of these flags are set, the 32-bit
      count field yields a count greater than 2^24, which is out of range,
      and this VF fails.
      
      As these flags represent a "best-effort" allocation hint anyway, they may
      safely be ignored. Therefore, the PF driver may simply mask out the bits.
      
      Fixes: c82e9aa0 "mlx4_core: resource tracking for HCA resources used by guests"
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      53772173
    • Thadeu Lima de Souza Cascardo's avatar
      tg3: fix ring init when there are more TX than RX channels · f9a72cb1
      Thadeu Lima de Souza Cascardo authored
      [ Upstream commit a620a6bc ]
      
      If TX channels are set to 4 and RX channels are set to less than 4,
      using ethtool -L, the driver will try to initialize more RX channels
      than it has allocated, causing an oops.
      
      This fix only initializes the RX ring if it has been allocated.
      Signed-off-by: default avatarThadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      f9a72cb1
    • Marcelo Leitner's avatar
      Fix race condition between vxlan_sock_add and vxlan_sock_release · 6f1ff303
      Marcelo Leitner authored
      [ Upstream commit 00c83b01 ]
      
      Currently, when trying to reuse a socket, vxlan_sock_add will grab
      vn->sock_lock, locate a reusable socket, inc refcount and release
      vn->sock_lock.
      
      But vxlan_sock_release() will first decrement refcount, and then grab
      that lock. refcnt operations are atomic but as currently we have
      deferred works which hold vs->refcnt each, this might happen, leading to
      a use after free (specially after vxlan_igmp_leave):
      
        CPU 1                            CPU 2
      
      deferred work                    vxlan_sock_add
        ...                              ...
                                         spin_lock(&vn->sock_lock)
                                         vs = vxlan_find_sock();
        vxlan_sock_release
          dec vs->refcnt, reaches 0
          spin_lock(&vn->sock_lock)
                                         vxlan_sock_hold(vs), refcnt=1
                                         spin_unlock(&vn->sock_lock)
          hlist_del_rcu(&vs->hlist);
          vxlan_notify_del_rx_port(vs)
          spin_unlock(&vn->sock_lock)
      
      So when we look for a reusable socket, we check if it wasn't freed
      already before reusing it.
      Signed-off-by: default avatarMarcelo Ricardo Leitner <mleitner@redhat.com>
      Fixes: 7c47cedf ("vxlan: move IGMP join/leave to work queue")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      6f1ff303
    • Yuri Chislov's avatar
      ipv6: gre: fix wrong skb->protocol in WCCP · dafa5cc8
      Yuri Chislov authored
      [ Upstream commit be6572fd ]
      
      When using GRE redirection in WCCP, it sets the wrong skb->protocol,
      that is, ETH_P_IP instead of ETH_P_IPV6 for the encapuslated traffic.
      
      Fixes: c12b395a ("gre: Support GRE over IPv6")
      Cc: Dmitry Kozlov <xeb@mail.ru>
      Signed-off-by: default avatarYuri Chislov <yuri.chislov@gmail.com>
      Tested-by: default avatarYuri Chislov <yuri.chislov@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      dafa5cc8
    • lucien's avatar
      ip_tunnel: the lack of vti_link_ops' dellink() cause kernel panic · ce77432d
      lucien authored
      [ Upstream commit 20ea60ca ]
      
      Now the vti_link_ops do not point the .dellink, for fb tunnel device
      (ip_vti0), the net_device will be removed as the default .dellink is
      unregister_netdevice_queue,but the tunnel still in the tunnel list,
      then if we add a new vti tunnel, in ip_tunnel_find():
      
              hlist_for_each_entry_rcu(t, head, hash_node) {
                      if (local == t->parms.iph.saddr &&
                          remote == t->parms.iph.daddr &&
                          link == t->parms.link &&
      ==>                 type == t->dev->type &&
                          ip_tunnel_key_match(&t->parms, flags, key))
                              break;
              }
      
      the panic will happen, cause dev of ip_tunnel *t is null:
      [ 3835.072977] IP: [<ffffffffa04103fd>] ip_tunnel_find+0x9d/0xc0 [ip_tunnel]
      [ 3835.073008] PGD b2c21067 PUD b7277067 PMD 0
      [ 3835.073008] Oops: 0000 [#1] SMP
      .....
      [ 3835.073008] Stack:
      [ 3835.073008]  ffff8800b72d77f0 ffffffffa0411924 ffff8800bb956000 ffff8800b72d78e0
      [ 3835.073008]  ffff8800b72d78a0 0000000000000000 ffffffffa040d100 ffff8800b72d7858
      [ 3835.073008]  ffffffffa040b2e3 0000000000000000 0000000000000000 0000000000000000
      [ 3835.073008] Call Trace:
      [ 3835.073008]  [<ffffffffa0411924>] ip_tunnel_newlink+0x64/0x160 [ip_tunnel]
      [ 3835.073008]  [<ffffffffa040b2e3>] vti_newlink+0x43/0x70 [ip_vti]
      [ 3835.073008]  [<ffffffff8150d4da>] rtnl_newlink+0x4fa/0x5f0
      [ 3835.073008]  [<ffffffff812f68bb>] ? nla_strlcpy+0x5b/0x70
      [ 3835.073008]  [<ffffffff81508fb0>] ? rtnl_link_ops_get+0x40/0x60
      [ 3835.073008]  [<ffffffff8150d11f>] ? rtnl_newlink+0x13f/0x5f0
      [ 3835.073008]  [<ffffffff81509cf4>] rtnetlink_rcv_msg+0xa4/0x270
      [ 3835.073008]  [<ffffffff8126adf5>] ? sock_has_perm+0x75/0x90
      [ 3835.073008]  [<ffffffff81509c50>] ? rtnetlink_rcv+0x30/0x30
      [ 3835.073008]  [<ffffffff81529e39>] netlink_rcv_skb+0xa9/0xc0
      [ 3835.073008]  [<ffffffff81509c48>] rtnetlink_rcv+0x28/0x30
      ....
      
      modprobe ip_vti
      ip link del ip_vti0 type vti
      ip link add ip_vti0 type vti
      rmmod ip_vti
      
      do that one or more times, kernel will panic.
      
      fix it by assigning ip_tunnel_dellink to vti_link_ops' dellink, in
      which we skip the unregister of fb tunnel device. do the same on ip6_vti.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarCong Wang <cwang@twopensource.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      ce77432d
    • Alex Deucher's avatar
      drm/radeon: initialize sadb to NULL in the audio code · ec608097
      Alex Deucher authored
      commit 83d04c39 upstream.
      
      Fixes kfree of the sadb buffer when it's NULL.
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      BugLink: http://bugs.launchpad.net/bugs/1402714
      (backported from commit 83d04c39)
      Signed-off-by: default avatarChris J Arges <chris.j.arges@canonical.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      ec608097
  2. 09 Dec, 2014 32 commits