1. 15 Jan, 2014 11 commits
    • Jason Wang's avatar
      netvsc: don't flush peers notifying work during setting mtu · 0248d4b4
      Jason Wang authored
      [ Upstream commit 50dc875f ]
      
      There's a possible deadlock if we flush the peers notifying work during setting
      mtu:
      
      [   22.991149] ======================================================
      [   22.991173] [ INFO: possible circular locking dependency detected ]
      [   22.991198] 3.10.0-54.0.1.el7.x86_64.debug #1 Not tainted
      [   22.991219] -------------------------------------------------------
      [   22.991243] ip/974 is trying to acquire lock:
      [   22.991261]  ((&(&net_device_ctx->dwork)->work)){+.+.+.}, at: [<ffffffff8108af95>] flush_work+0x5/0x2e0
      [   22.991307]
      but task is already holding lock:
      [   22.991330]  (rtnl_mutex){+.+.+.}, at: [<ffffffff81539deb>] rtnetlink_rcv+0x1b/0x40
      [   22.991367]
      which lock already depends on the new lock.
      
      [   22.991398]
      the existing dependency chain (in reverse order) is:
      [   22.991426]
      -> #1 (rtnl_mutex){+.+.+.}:
      [   22.991449]        [<ffffffff810dfdd9>] __lock_acquire+0xb19/0x1260
      [   22.991477]        [<ffffffff810e0d12>] lock_acquire+0xa2/0x1f0
      [   22.991501]        [<ffffffff81673659>] mutex_lock_nested+0x89/0x4f0
      [   22.991529]        [<ffffffff815392b7>] rtnl_lock+0x17/0x20
      [   22.991552]        [<ffffffff815230b2>] netdev_notify_peers+0x12/0x30
      [   22.991579]        [<ffffffffa0340212>] netvsc_send_garp+0x22/0x30 [hv_netvsc]
      [   22.991610]        [<ffffffff8108d251>] process_one_work+0x211/0x6e0
      [   22.991637]        [<ffffffff8108d83b>] worker_thread+0x11b/0x3a0
      [   22.991663]        [<ffffffff81095e5d>] kthread+0xed/0x100
      [   22.991686]        [<ffffffff81681c6c>] ret_from_fork+0x7c/0xb0
      [   22.991715]
      -> #0 ((&(&net_device_ctx->dwork)->work)){+.+.+.}:
      [   22.991715]        [<ffffffff810de817>] check_prevs_add+0x967/0x970
      [   22.991715]        [<ffffffff810dfdd9>] __lock_acquire+0xb19/0x1260
      [   22.991715]        [<ffffffff810e0d12>] lock_acquire+0xa2/0x1f0
      [   22.991715]        [<ffffffff8108afde>] flush_work+0x4e/0x2e0
      [   22.991715]        [<ffffffff8108e1b5>] __cancel_work_timer+0x95/0x130
      [   22.991715]        [<ffffffff8108e303>] cancel_delayed_work_sync+0x13/0x20
      [   22.991715]        [<ffffffffa03404e4>] netvsc_change_mtu+0x84/0x200 [hv_netvsc]
      [   22.991715]        [<ffffffff815233d4>] dev_set_mtu+0x34/0x80
      [   22.991715]        [<ffffffff8153bc2a>] do_setlink+0x23a/0xa00
      [   22.991715]        [<ffffffff8153d054>] rtnl_newlink+0x394/0x5e0
      [   22.991715]        [<ffffffff81539eac>] rtnetlink_rcv_msg+0x9c/0x260
      [   22.991715]        [<ffffffff8155cdd9>] netlink_rcv_skb+0xa9/0xc0
      [   22.991715]        [<ffffffff81539dfa>] rtnetlink_rcv+0x2a/0x40
      [   22.991715]        [<ffffffff8155c41d>] netlink_unicast+0xdd/0x190
      [   22.991715]        [<ffffffff8155c807>] netlink_sendmsg+0x337/0x750
      [   22.991715]        [<ffffffff8150d219>] sock_sendmsg+0x99/0xd0
      [   22.991715]        [<ffffffff8150d63e>] ___sys_sendmsg+0x39e/0x3b0
      [   22.991715]        [<ffffffff8150eba2>] __sys_sendmsg+0x42/0x80
      [   22.991715]        [<ffffffff8150ebf2>] SyS_sendmsg+0x12/0x20
      [   22.991715]        [<ffffffff81681d19>] system_call_fastpath+0x16/0x1b
      
      This is because we hold the rtnl_lock() before ndo_change_mtu() and try to flush
      the work in netvsc_change_mtu(), in the mean time, netdev_notify_peers() may be
      called from worker and also trying to hold the rtnl_lock. This will lead the
      flush won't succeed forever. Solve this by not canceling and flushing the work,
      this is safe because the transmission done by NETDEV_NOTIFY_PEERS was
      synchronized with the netif_tx_disable() called by netvsc_change_mtu().
      Reported-by: default avatarYaju Cao <yacao@redhat.com>
      Tested-by: default avatarYaju Cao <yacao@redhat.com>
      Cc: K. Y. Srinivasan <kys@microsoft.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0248d4b4
    • Nat Gurumoorthy's avatar
      tg3: Initialize REG_BASE_ADDR at PCI config offset 120 to 0 · e5893b25
      Nat Gurumoorthy authored
      [ Upstream commit 388d3335 ]
      
      The new tg3 driver leaves REG_BASE_ADDR (PCI config offset 120)
      uninitialized. From power on reset this register may have garbage in it. The
      Register Base Address register defines the device local address of a
      register. The data pointed to by this location is read or written using
      the Register Data register (PCI config offset 128). When REG_BASE_ADDR has
      garbage any read or write of Register Data Register (PCI 128) will cause the
      PCI bus to lock up. The TCO watchdog will fire and bring down the system.
      Signed-off-by: default avatarNat Gurumoorthy <natg@google.com>
      Acked-by: default avatarMichael Chan <mchan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e5893b25
    • Sasha Levin's avatar
      net: unix: allow set_peek_off to fail · e387172f
      Sasha Levin authored
      [ Upstream commit 12663bfc ]
      
      unix_dgram_recvmsg() will hold the readlock of the socket until recv
      is complete.
      
      In the same time, we may try to setsockopt(SO_PEEK_OFF) which will hang until
      unix_dgram_recvmsg() will complete (which can take a while) without allowing
      us to break out of it, triggering a hung task spew.
      
      Instead, allow set_peek_off to fail, this way userspace will not hang.
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Acked-by: default avatarPavel Emelyanov <xemul@parallels.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e387172f
    • Changli Gao's avatar
      net: drop_monitor: fix the value of maxattr · fcbb1132
      Changli Gao authored
      [ Upstream commit d323e92c ]
      
      maxattr in genl_family should be used to save the max attribute
      type, but not the max command type. Drop monitor doesn't support
      any attributes, so we should leave it as zero.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fcbb1132
    • Hannes Frederic Sowa's avatar
      ipv6: don't count addrconf generated routes against gc limit · 30e65cd5
      Hannes Frederic Sowa authored
      [ Upstream commit a3300ef4 ]
      
      Brett Ciphery reported that new ipv6 addresses failed to get installed
      because the addrconf generated dsts where counted against the dst gc
      limit. We don't need to count those routes like we currently don't count
      administratively added routes.
      
      Because the max_addresses check enforces a limit on unbounded address
      generation first in case someone plays with router advertisments, we
      are still safe here.
      Reported-by: default avatarBrett Ciphery <brett.ciphery@windriver.com>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      30e65cd5
    • Jason Wang's avatar
      macvtap: signal truncated packets · ae8d56de
      Jason Wang authored
      [ Upstream commit ce232ce0 ]
      
      macvtap_put_user() never return a value grater than iov length, this in fact
      bypasses the truncated checking in macvtap_recvmsg(). Fix this by always
      returning the size of packet plus the possible vlan header to let the trunca
      checking work.
      
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Cc: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ae8d56de
    • Zhi Yong Wu's avatar
      tun: update file current position · 693d5d27
      Zhi Yong Wu authored
      [ Upstream commit d0b7da8a ]
      Signed-off-by: default avatarZhi Yong Wu <wuzhy@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      693d5d27
    • Zhi Yong Wu's avatar
      macvtap: update file current position · d717ca46
      Zhi Yong Wu authored
      [ Upstream commit e6ebc7f1 ]
      Signed-off-by: default avatarZhi Yong Wu <wuzhy@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d717ca46
    • Vlad Yasevich's avatar
      macvtap: Do not double-count received packets · b4c3eac4
      Vlad Yasevich authored
      [ Upstream commit 006da7b0 ]
      
      Currently macvlan will count received packets after calling each
      vlans receive handler.   Macvtap attempts to count the packet
      yet again when the user reads the packet from the tap socket.
      This code doesn't do this consistently either.  Remove the
      counting from macvtap and let only macvlan count received
      packets.
      Signed-off-by: default avatarVlad Yasevich <vyasevic@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b4c3eac4
    • Venkat Venkatsubra's avatar
      rds: prevent BUG_ON triggered on congestion update to loopback · 468c82ca
      Venkat Venkatsubra authored
      [ Upstream commit 18fc25c9 ]
      
      After congestion update on a local connection, when rds_ib_xmit returns
      less bytes than that are there in the message, rds_send_xmit calls
      back rds_ib_xmit with an offset that causes BUG_ON(off & RDS_FRAG_SIZE)
      to trigger.
      
      For a 4Kb PAGE_SIZE rds_ib_xmit returns min(8240,4096)=4096 when actually
      the message contains 8240 bytes. rds_send_xmit thinks there is more to send
      and calls rds_ib_xmit again with a data offset "off" of 4096-48(rds header)
      =4048 bytes thus hitting the BUG_ON(off & RDS_FRAG_SIZE) [RDS_FRAG_SIZE=4k].
      
      The commit 6094628b
      "rds: prevent BUG_ON triggering on congestion map updates" introduced
      this regression. That change was addressing the triggering of a different
      BUG_ON in rds_send_xmit() on PowerPC architecture with 64Kbytes PAGE_SIZE:
       	BUG_ON(ret != 0 &&
          		 conn->c_xmit_sg == rm->data.op_nents);
      This was the sequence it was going through:
      (rds_ib_xmit)
      /* Do not send cong updates to IB loopback */
      if (conn->c_loopback
         && rm->m_inc.i_hdr.h_flags & RDS_FLAG_CONG_BITMAP) {
        	rds_cong_map_updated(conn->c_fcong, ~(u64) 0);
          	return sizeof(struct rds_header) + RDS_CONG_MAP_BYTES;
      }
      rds_ib_xmit returns 8240
      rds_send_xmit:
        c_xmit_data_off = 0 + 8240 - 48 (rds header accounted only the first time)
         		 = 8192
        c_xmit_data_off < 65536 (sg->length), so calls rds_ib_xmit again
      rds_ib_xmit returns 8240
      rds_send_xmit:
        c_xmit_data_off = 8192 + 8240 = 16432, calls rds_ib_xmit again
        and so on (c_xmit_data_off 24672,32912,41152,49392,57632)
      rds_ib_xmit returns 8240
      On this iteration this sequence causes the BUG_ON in rds_send_xmit:
          while (ret) {
          	tmp = min_t(int, ret, sg->length - conn->c_xmit_data_off);
          	[tmp = 65536 - 57632 = 7904]
          	conn->c_xmit_data_off += tmp;
          	[c_xmit_data_off = 57632 + 7904 = 65536]
          	ret -= tmp;
          	[ret = 8240 - 7904 = 336]
          	if (conn->c_xmit_data_off == sg->length) {
          		conn->c_xmit_data_off = 0;
          		sg++;
          		conn->c_xmit_sg++;
          		BUG_ON(ret != 0 &&
          			conn->c_xmit_sg == rm->data.op_nents);
          		[c_xmit_sg = 1, rm->data.op_nents = 1]
      
      What the current fix does:
      Since the congestion update over loopback is not actually transmitted
      as a message, all that rds_ib_xmit needs to do is let the caller think
      the full message has been transmitted and not return partial bytes.
      It will return 8240 (RDS_CONG_MAP_BYTES+48) when PAGE_SIZE is 4Kb.
      And 64Kb+48 when page size is 64Kb.
      Reported-by: default avatarJosh Hunt <joshhunt00@gmail.com>
      Tested-by: default avatarHonggang Li <honli@redhat.com>
      Acked-by: default avatarBang Nguyen <bang.nguyen@oracle.com>
      Signed-off-by: default avatarVenkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      468c82ca
    • Eric Dumazet's avatar
      net: do not pretend FRAGLIST support · acd4b2b8
      Eric Dumazet authored
      [ Upstream commit 28e24c62 ]
      
      Few network drivers really supports frag_list : virtual drivers.
      
      Some drivers wrongly advertise NETIF_F_FRAGLIST feature.
      
      If skb with a frag_list is given to them, packet on the wire will be
      corrupt.
      
      Remove this flag, as core networking stack will make sure to
      provide packets that can be sent without corruption.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
      Cc: Anirudha Sarangi <anirudh@xilinx.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      acd4b2b8
  2. 08 Jan, 2014 29 commits