1. 05 Oct, 2018 18 commits
    • Wei Wang's avatar
      ipv6: take rcu lock in rawv6_send_hdrinc() · a688caa3
      Wei Wang authored
      In rawv6_send_hdrinc(), in order to avoid an extra dst_hold(), we
      directly assign the dst to skb and set passed in dst to NULL to avoid
      double free.
      However, in error case, we free skb and then do stats update with the
      dst pointer passed in. This causes use-after-free on the dst.
      Fix it by taking rcu read lock right before dst could get released to
      make sure dst does not get freed until the stats update is done.
      Note: we don't have this issue in ipv4 cause dst is not used for stats
      update in v4.
      
      Syzkaller reported following crash:
      BUG: KASAN: use-after-free in rawv6_send_hdrinc net/ipv6/raw.c:692 [inline]
      BUG: KASAN: use-after-free in rawv6_sendmsg+0x4421/0x4630 net/ipv6/raw.c:921
      Read of size 8 at addr ffff8801d95ba730 by task syz-executor0/32088
      
      CPU: 1 PID: 32088 Comm: syz-executor0 Not tainted 4.19.0-rc2+ #93
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1c4/0x2b4 lib/dump_stack.c:113
       print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256
       kasan_report_error mm/kasan/report.c:354 [inline]
       kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412
       __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
       rawv6_send_hdrinc net/ipv6/raw.c:692 [inline]
       rawv6_sendmsg+0x4421/0x4630 net/ipv6/raw.c:921
       inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
       sock_sendmsg_nosec net/socket.c:621 [inline]
       sock_sendmsg+0xd5/0x120 net/socket.c:631
       ___sys_sendmsg+0x7fd/0x930 net/socket.c:2114
       __sys_sendmsg+0x11d/0x280 net/socket.c:2152
       __do_sys_sendmsg net/socket.c:2161 [inline]
       __se_sys_sendmsg net/socket.c:2159 [inline]
       __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2159
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x457099
      Code: fd b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f83756edc78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00007f83756ee6d4 RCX: 0000000000457099
      RDX: 0000000000000000 RSI: 0000000020003840 RDI: 0000000000000004
      RBP: 00000000009300a0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 00000000004d4b30 R14: 00000000004c90b1 R15: 0000000000000000
      
      Allocated by task 32088:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553
       kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
       kmem_cache_alloc+0x12e/0x730 mm/slab.c:3554
       dst_alloc+0xbb/0x1d0 net/core/dst.c:105
       ip6_dst_alloc+0x35/0xa0 net/ipv6/route.c:353
       ip6_rt_cache_alloc+0x247/0x7b0 net/ipv6/route.c:1186
       ip6_pol_route+0x8f8/0xd90 net/ipv6/route.c:1895
       ip6_pol_route_output+0x54/0x70 net/ipv6/route.c:2093
       fib6_rule_lookup+0x277/0x860 net/ipv6/fib6_rules.c:122
       ip6_route_output_flags+0x2c5/0x350 net/ipv6/route.c:2121
       ip6_route_output include/net/ip6_route.h:88 [inline]
       ip6_dst_lookup_tail+0xe27/0x1d60 net/ipv6/ip6_output.c:951
       ip6_dst_lookup_flow+0xc8/0x270 net/ipv6/ip6_output.c:1079
       rawv6_sendmsg+0x12d9/0x4630 net/ipv6/raw.c:905
       inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
       sock_sendmsg_nosec net/socket.c:621 [inline]
       sock_sendmsg+0xd5/0x120 net/socket.c:631
       ___sys_sendmsg+0x7fd/0x930 net/socket.c:2114
       __sys_sendmsg+0x11d/0x280 net/socket.c:2152
       __do_sys_sendmsg net/socket.c:2161 [inline]
       __se_sys_sendmsg net/socket.c:2159 [inline]
       __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2159
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 5356:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521
       kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
       __cache_free mm/slab.c:3498 [inline]
       kmem_cache_free+0x83/0x290 mm/slab.c:3756
       dst_destroy+0x267/0x3c0 net/core/dst.c:141
       dst_destroy_rcu+0x16/0x19 net/core/dst.c:154
       __rcu_reclaim kernel/rcu/rcu.h:236 [inline]
       rcu_do_batch kernel/rcu/tree.c:2576 [inline]
       invoke_rcu_callbacks kernel/rcu/tree.c:2880 [inline]
       __rcu_process_callbacks kernel/rcu/tree.c:2847 [inline]
       rcu_process_callbacks+0xf23/0x2670 kernel/rcu/tree.c:2864
       __do_softirq+0x30b/0xad8 kernel/softirq.c:292
      
      Fixes: 1789a640 ("raw: avoid two atomics in xmit")
      Signed-off-by: default avatarWei Wang <weiwan@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a688caa3
    • David Ahern's avatar
      net: sched: Add policy validation for tc attributes · 8b4c3cdd
      David Ahern authored
      A number of TC attributes are processed without proper validation
      (e.g., length checks). Add a tca policy for all input attributes and use
      when invoking nlmsg_parse.
      
      The 2 Fixes tags below cover the latest additions. The other attributes
      are a string (KIND), nested attribute (OPTIONS which does seem to have
      validation in most cases), for dumps only or a flag.
      
      Fixes: 5bc17018 ("net: sched: introduce multichain support for filters")
      Fixes: d47a6b0e ("net: sched: introduce ingress/egress block index attributes for qdisc")
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8b4c3cdd
    • Mauricio Faria de Oliveira's avatar
      rtnetlink: fix rtnl_fdb_dump() for ndmsg header · bd961c9b
      Mauricio Faria de Oliveira authored
      Currently, rtnl_fdb_dump() assumes the family header is 'struct ifinfomsg',
      which is not always true -- 'struct ndmsg' is used by iproute2 ('ip neigh').
      
      The problem is, the function bails out early if nlmsg_parse() fails, which
      does occur for iproute2 usage of 'struct ndmsg' because the payload length
      is shorter than the family header alone (as 'struct ifinfomsg' is assumed).
      
      This breaks backward compatibility with userspace -- nothing is sent back.
      
      Some examples with iproute2 and netlink library for go [1]:
      
       1) $ bridge fdb show
          33:33:00:00:00:01 dev ens3 self permanent
          01:00:5e:00:00:01 dev ens3 self permanent
          33:33:ff:15:98:30 dev ens3 self permanent
      
            This one works, as it uses 'struct ifinfomsg'.
      
            fdb_show() @ iproute2/bridge/fdb.c
              """
              .n.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
              ...
              if (rtnl_dump_request(&rth, RTM_GETNEIGH, [...]
              """
      
       2) $ ip --family bridge neigh
          RTNETLINK answers: Invalid argument
          Dump terminated
      
            This one fails, as it uses 'struct ndmsg'.
      
            do_show_or_flush() @ iproute2/ip/ipneigh.c
              """
              .n.nlmsg_type = RTM_GETNEIGH,
              .n.nlmsg_len = NLMSG_LENGTH(sizeof(struct ndmsg)),
              """
      
       3) $ ./neighlist
          < no output >
      
            This one fails, as it uses 'struct ndmsg'-based.
      
            neighList() @ netlink/neigh_linux.go
              """
              req := h.newNetlinkRequest(unix.RTM_GETNEIGH, [...]
              msg := Ndmsg{
              """
      
      The actual breakage was introduced by commit 0ff50e83 ("net: rtnetlink:
      bail out from rtnl_fdb_dump() on parse error"), because nlmsg_parse() fails
      if the payload length (with the _actual_ family header) is less than the
      family header length alone (which is assumed, in parameter 'hdrlen').
      This is true in the examples above with struct ndmsg, with size and payload
      length shorter than struct ifinfomsg.
      
      However, that commit just intends to fix something under the assumption the
      family header is indeed an 'struct ifinfomsg' - by preventing access to the
      payload as such (via 'ifm' pointer) if the payload length is not sufficient
      to actually contain it.
      
      The assumption was introduced by commit 5e6d2435 ("bridge: netlink dump
      interface at par with brctl"), to support iproute2's 'bridge fdb' command
      (not 'ip neigh') which indeed uses 'struct ifinfomsg', thus is not broken.
      
      So, in order to unbreak the 'struct ndmsg' family headers and still allow
      'struct ifinfomsg' to continue to work, check for the known message sizes
      used with 'struct ndmsg' in iproute2 (with zero or one attribute which is
      not used in this function anyway) then do not parse the data as ifinfomsg.
      
      Same examples with this patch applied (or revert/before the original fix):
      
          $ bridge fdb show
          33:33:00:00:00:01 dev ens3 self permanent
          01:00:5e:00:00:01 dev ens3 self permanent
          33:33:ff:15:98:30 dev ens3 self permanent
      
          $ ip --family bridge neigh
          dev ens3 lladdr 33:33:00:00:00:01 PERMANENT
          dev ens3 lladdr 01:00:5e:00:00:01 PERMANENT
          dev ens3 lladdr 33:33:ff:15:98:30 PERMANENT
      
          $ ./neighlist
          netlink.Neigh{LinkIndex:2, Family:7, State:128, Type:0, Flags:2, IP:net.IP(nil), HardwareAddr:net.HardwareAddr{0x33, 0x33, 0x0, 0x0, 0x0, 0x1}, LLIPAddr:net.IP(nil), Vlan:0, VNI:0}
          netlink.Neigh{LinkIndex:2, Family:7, State:128, Type:0, Flags:2, IP:net.IP(nil), HardwareAddr:net.HardwareAddr{0x1, 0x0, 0x5e, 0x0, 0x0, 0x1}, LLIPAddr:net.IP(nil), Vlan:0, VNI:0}
          netlink.Neigh{LinkIndex:2, Family:7, State:128, Type:0, Flags:2, IP:net.IP(nil), HardwareAddr:net.HardwareAddr{0x33, 0x33, 0xff, 0x15, 0x98, 0x30}, LLIPAddr:net.IP(nil), Vlan:0, VNI:0}
      
      Tested on mainline (v4.19-rc6) and net-next (3bd09b05).
      
      References:
      
      [1] netlink library for go (test-case)
          https://github.com/vishvananda/netlink
      
          $ cat ~/go/src/neighlist/main.go
          package main
          import ("fmt"; "syscall"; "github.com/vishvananda/netlink")
          func main() {
              neighs, _ := netlink.NeighList(0, syscall.AF_BRIDGE)
              for _, neigh := range neighs { fmt.Printf("%#v\n", neigh) }
          }
      
          $ export GOPATH=~/go
          $ go get github.com/vishvananda/netlink
          $ go build neighlist
          $ ~/go/src/neighlist/neighlist
      
      Thanks to David Ahern for suggestions to improve this patch.
      
      Fixes: 0ff50e83 ("net: rtnetlink: bail out from rtnl_fdb_dump() on parse error")
      Fixes: 5e6d2435 ("bridge: netlink dump interface at par with brctl")
      Reported-by: default avatarAidan Obley <aobley@pivotal.io>
      Signed-off-by: default avatarMauricio Faria de Oliveira <mfo@canonical.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bd961c9b
    • Wenwen Wang's avatar
      yam: fix a missing-check bug · 0781168e
      Wenwen Wang authored
      In yam_ioctl(), the concrete ioctl command is firstly copied from the
      user-space buffer 'ifr->ifr_data' to 'ioctl_cmd' and checked through the
      following switch statement. If the command is not as expected, an error
      code EINVAL is returned. In the following execution the buffer
      'ifr->ifr_data' is copied again in the cases of the switch statement to
      specific data structures according to what kind of ioctl command is
      requested. However, after the second copy, no re-check is enforced on the
      newly-copied command. Given that the buffer 'ifr->ifr_data' is in the user
      space, a malicious user can race to change the command between the two
      copies. This way, the attacker can inject inconsistent data and cause
      undefined behavior.
      
      This patch adds a re-check in each case of the switch statement if there is
      a second copy in that case, to re-check whether the command obtained in the
      second copy is the same as the one in the first copy. If not, an error code
      EINVAL will be returned.
      Signed-off-by: default avatarWenwen Wang <wang6495@umn.edu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0781168e
    • Shanthosh RK's avatar
      net: bpfilter: Fix type cast and pointer warnings · 33aa8da1
      Shanthosh RK authored
      Fixes the following Sparse warnings:
      
      net/bpfilter/bpfilter_kern.c:62:21: warning: cast removes address space
      of expression
      net/bpfilter/bpfilter_kern.c:101:49: warning: Using plain integer as
      NULL pointer
      Signed-off-by: default avatarShanthosh RK <shanthosh.rk@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      33aa8da1
    • Wenwen Wang's avatar
      net: cxgb3_main: fix a missing-check bug · 2c05d888
      Wenwen Wang authored
      In cxgb_extension_ioctl(), the command of the ioctl is firstly copied from
      the user-space buffer 'useraddr' to 'cmd' and checked through the
      switch statement. If the command is not as expected, an error code
      EOPNOTSUPP is returned. In the following execution, i.e., the cases of the
      switch statement, the whole buffer of 'useraddr' is copied again to a
      specific data structure, according to what kind of command is requested.
      However, after the second copy, there is no re-check on the newly-copied
      command. Given that the buffer 'useraddr' is in the user space, a malicious
      user can race to change the command between the two copies. By doing so,
      the attacker can supply malicious data to the kernel and cause undefined
      behavior.
      
      This patch adds a re-check in each case of the switch statement if there is
      a second copy in that case, to re-check whether the command obtained in the
      second copy is the same as the one in the first copy. If not, an error code
      EINVAL is returned.
      Signed-off-by: default avatarWenwen Wang <wang6495@umn.edu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2c05d888
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · b8d5b7ce
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2018-10-05
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) Fix to truncate input on ALU operations in 32 bit mode, from Jann.
      
      2) Fixes for cgroup local storage to reject reserved flags on element
         update and rejection of map allocation with zero-sized value, from Roman.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b8d5b7ce
    • Jann Horn's avatar
      bpf: 32-bit RSH verification must truncate input before the ALU op · b799207e
      Jann Horn authored
      When I wrote commit 468f6eaf ("bpf: fix 32-bit ALU op verification"), I
      assumed that, in order to emulate 64-bit arithmetic with 32-bit logic, it
      is sufficient to just truncate the output to 32 bits; and so I just moved
      the register size coercion that used to be at the start of the function to
      the end of the function.
      
      That assumption is true for almost every op, but not for 32-bit right
      shifts, because those can propagate information towards the least
      significant bit. Fix it by always truncating inputs for 32-bit ops to 32
      bits.
      
      Also get rid of the coerce_reg_to_size() after the ALU op, since that has
      no effect.
      
      Fixes: 468f6eaf ("bpf: fix 32-bit ALU op verification")
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      b799207e
    • Baruch Siach's avatar
      net: phy: phylink: fix SFP interface autodetection · 7e418375
      Baruch Siach authored
      When connecting SFP PHY to phylink use the detected interface.
      Otherwise, the link fails to come up when the configured 'phy-mode'
      differs from the SFP detected mode.
      
      Move most of phylink_connect_phy() into __phylink_connect_phy(), and
      leave phylink_connect_phy() as a wrapper. phylink_sfp_connect_phy() can
      now pass the SFP detected PHY interface to __phylink_connect_phy().
      
      This fixes 1GB SFP module link up on eth3 of the Macchiatobin board that
      is configured in the DT to "2500base-x" phy-mode.
      
      Fixes: 9525ae83 ("phylink: add phylink infrastructure")
      Suggested-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarBaruch Siach <baruch@tkos.co.il>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7e418375
    • Davide Caratti's avatar
      be2net: don't flip hw_features when VXLANs are added/deleted · 2d52527e
      Davide Caratti authored
      the be2net implementation of .ndo_tunnel_{add,del}() changes the value of
      NETIF_F_GSO_UDP_TUNNEL bit in 'features' and 'hw_features', but it forgets
      to call netdev_features_change(). Moreover, ethtool setting for that bit
      can potentially be reverted after a tunnel is added or removed.
      
      GSO already does software segmentation when 'hw_enc_features' is 0, even
      if VXLAN offload is turned on. In addition, commit 096de2f8 ("benet:
      stricter vxlan offloading check in be_features_check") avoids hardware
      segmentation of non-VXLAN tunneled packets, or VXLAN packets having wrong
      destination port. So, it's safe to avoid flipping the above feature on
      addition/deletion of VXLAN tunnels.
      
      Fixes: 630f4b70 ("be2net: Export tunnel offloads only when a VxLAN tunnel is created")
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2d52527e
    • Jianfeng Tan's avatar
      net/packet: fix packet drop as of virtio gso · 9d2f67e4
      Jianfeng Tan authored
      When we use raw socket as the vhost backend, a packet from virito with
      gso offloading information, cannot be sent out in later validaton at
      xmit path, as we did not set correct skb->protocol which is further used
      for looking up the gso function.
      
      To fix this, we set this field according to virito hdr information.
      
      Fixes: e858fae2 ("virtio_net: use common code for virtio_net_hdr and skb GSO conversion")
      Signed-off-by: default avatarJianfeng Tan <jianfeng.tan@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9d2f67e4
    • Florian Fainelli's avatar
      net: dsa: b53: Keep CPU port as tagged in all VLANs · ca893194
      Florian Fainelli authored
      Commit c499696e ("net: dsa: b53: Stop using dev->cpu_port
      incorrectly") was a bit too trigger happy in removing the CPU port from
      the VLAN membership because we rely on DSA to program the CPU port VLAN,
      which it does, except it does not bother itself with tagged/untagged and
      just usese untagged.
      
      Having the CPU port "follow" the user ports tagged/untagged is not great
      and does not allow for properly differentiating, so keep the CPU port
      tagged in all VLANs.
      Reported-by: default avatarGerhard Wiesinger <lists@wiesinger.com>
      Fixes: c499696e ("net: dsa: b53: Stop using dev->cpu_port incorrectly")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ca893194
    • Flavio Leitner's avatar
      openvswitch: load NAT helper · 17c357ef
      Flavio Leitner authored
      Load the respective NAT helper module if the flow uses it.
      Signed-off-by: default avatarFlavio Leitner <fbl@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      17c357ef
    • David S. Miller's avatar
      Merge branch 'bnxt_en-fixes' · 508646aa
      David S. Miller authored
      Michael Chan says:
      
      ====================
      bnxt_en: Misc. bug fixes.
      
      4 small bug fixes related to setting firmware message enables bits, possible
      memory leak when probe fails, and ring accouting when RDMA driver is loaded.
      
      Please queue these for -stable as well.  Thanks.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      508646aa
    • Vasundhara Volam's avatar
      bnxt_en: get the reduced max_irqs by the ones used by RDMA · c78fe058
      Vasundhara Volam authored
      When getting the max rings supported, get the reduced max_irqs
      by the ones used by RDMA.
      
      If the number MSIX is the limiting factor, this bug may cause the
      max ring count to be higher than it should be when RDMA driver is
      loaded and may result in ring allocation failures.
      
      Fixes: 30f52947 ("bnxt_en: Do not modify max IRQ count after RDMA driver requests/frees IRQs.")
      Signed-off-by: default avatarVasundhara Volam <vasundhara-v.volam@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c78fe058
    • Venkat Duvvuru's avatar
      bnxt_en: free hwrm resources, if driver probe fails. · a2bf74f4
      Venkat Duvvuru authored
      When the driver probe fails, all the resources that were allocated prior
      to the failure must be freed. However, hwrm dma response memory is not
      getting freed.
      
      This patch fixes the problem described above.
      
      Fixes: c0c050c5 ("bnxt_en: New Broadcom ethernet driver.")
      Signed-off-by: default avatarVenkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a2bf74f4
    • Vasundhara Volam's avatar
      bnxt_en: Fix enables field in HWRM_QUEUE_COS2BW_CFG request · 5db0e096
      Vasundhara Volam authored
      In HWRM_QUEUE_COS2BW_CFG request, enables field should have the bits
      set only for the queue ids which are having the valid parameters.
      
      This causes firmware to return error when the TC to hardware CoS queue
      mapping is not 1:1 during DCBNL ETS setup.
      
      Fixes: 2e8ef77e ("bnxt_en: Add TC to hardware QoS queue mapping logic.")
      Signed-off-by: default avatarVasundhara Volam <vasundhara-v.volam@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5db0e096
    • Michael Chan's avatar
      bnxt_en: Fix VNIC reservations on the PF. · dbe80d44
      Michael Chan authored
      The enables bit for VNIC was set wrong when calling the HWRM_FUNC_CFG
      firmware call to reserve VNICs.  This has the effect that the firmware
      will keep a large number of VNICs for the PF, and having very few for
      VFs.  DPDK driver running on the VFs, which requires more VNICs, may not
      work properly as a result.
      
      Fixes: 674f50a5 ("bnxt_en: Implement new method to reserve rings.")
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dbe80d44
  2. 04 Oct, 2018 6 commits
    • Ido Schimmel's avatar
      team: Forbid enslaving team device to itself · 471b83bd
      Ido Schimmel authored
      team's ndo_add_slave() acquires 'team->lock' and later tries to open the
      newly enslaved device via dev_open(). This emits a 'NETDEV_UP' event
      that causes the VLAN driver to add VLAN 0 on the team device. team's
      ndo_vlan_rx_add_vid() will also try to acquire 'team->lock' and
      deadlock.
      
      Fix this by checking early at the enslavement function that a team
      device is not being enslaved to itself.
      
      A similar check was added to the bond driver in commit 09a89c21
      ("bonding: disallow enslaving a bond to itself").
      
      WARNING: possible recursive locking detected
      4.18.0-rc7+ #176 Not tainted
      --------------------------------------------
      syz-executor4/6391 is trying to acquire lock:
      (____ptrval____) (&team->lock){+.+.}, at: team_vlan_rx_add_vid+0x3b/0x1e0 drivers/net/team/team.c:1868
      
      but task is already holding lock:
      (____ptrval____) (&team->lock){+.+.}, at: team_add_slave+0xdb/0x1c30 drivers/net/team/team.c:1947
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&team->lock);
        lock(&team->lock);
      
       *** DEADLOCK ***
      
       May be due to missing lock nesting notation
      
      2 locks held by syz-executor4/6391:
       #0: (____ptrval____) (rtnl_mutex){+.+.}, at: rtnl_lock net/core/rtnetlink.c:77 [inline]
       #0: (____ptrval____) (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x412/0xc30 net/core/rtnetlink.c:4662
       #1: (____ptrval____) (&team->lock){+.+.}, at: team_add_slave+0xdb/0x1c30 drivers/net/team/team.c:1947
      
      stack backtrace:
      CPU: 1 PID: 6391 Comm: syz-executor4 Not tainted 4.18.0-rc7+ #176
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
       print_deadlock_bug kernel/locking/lockdep.c:1765 [inline]
       check_deadlock kernel/locking/lockdep.c:1809 [inline]
       validate_chain kernel/locking/lockdep.c:2405 [inline]
       __lock_acquire.cold.64+0x1fb/0x486 kernel/locking/lockdep.c:3435
       lock_acquire+0x1e4/0x540 kernel/locking/lockdep.c:3924
       __mutex_lock_common kernel/locking/mutex.c:757 [inline]
       __mutex_lock+0x176/0x1820 kernel/locking/mutex.c:894
       mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:909
       team_vlan_rx_add_vid+0x3b/0x1e0 drivers/net/team/team.c:1868
       vlan_add_rx_filter_info+0x14a/0x1d0 net/8021q/vlan_core.c:210
       __vlan_vid_add net/8021q/vlan_core.c:278 [inline]
       vlan_vid_add+0x63e/0x9d0 net/8021q/vlan_core.c:308
       vlan_device_event.cold.12+0x2a/0x2f net/8021q/vlan.c:381
       notifier_call_chain+0x180/0x390 kernel/notifier.c:93
       __raw_notifier_call_chain kernel/notifier.c:394 [inline]
       raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
       call_netdevice_notifiers_info+0x3f/0x90 net/core/dev.c:1735
       call_netdevice_notifiers net/core/dev.c:1753 [inline]
       dev_open+0x173/0x1b0 net/core/dev.c:1433
       team_port_add drivers/net/team/team.c:1219 [inline]
       team_add_slave+0xa8b/0x1c30 drivers/net/team/team.c:1948
       do_set_master+0x1c9/0x220 net/core/rtnetlink.c:2248
       do_setlink+0xba4/0x3e10 net/core/rtnetlink.c:2382
       rtnl_setlink+0x2a9/0x400 net/core/rtnetlink.c:2636
       rtnetlink_rcv_msg+0x46e/0xc30 net/core/rtnetlink.c:4665
       netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2455
       rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4683
       netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
       netlink_unicast+0x5a0/0x760 net/netlink/af_netlink.c:1343
       netlink_sendmsg+0xa18/0xfd0 net/netlink/af_netlink.c:1908
       sock_sendmsg_nosec net/socket.c:642 [inline]
       sock_sendmsg+0xd5/0x120 net/socket.c:652
       ___sys_sendmsg+0x7fd/0x930 net/socket.c:2126
       __sys_sendmsg+0x11d/0x290 net/socket.c:2164
       __do_sys_sendmsg net/socket.c:2173 [inline]
       __se_sys_sendmsg net/socket.c:2171 [inline]
       __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2171
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x456b29
      Code: fd b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f9706bf8c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00007f9706bf96d4 RCX: 0000000000456b29
      RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000004
      RBP: 00000000009300a0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 00000000004d3548 R14: 00000000004c8227 R15: 0000000000000000
      
      Fixes: 87002b03 ("net: introduce vlan_vid_[add/del] and use them instead of direct [add/kill]_vid ndo calls")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reported-and-tested-by: syzbot+bd051aba086537515cdb@syzkaller.appspotmail.com
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      471b83bd
    • Yu Zhao's avatar
      net/usb: cancel pending work when unbinding smsc75xx · f7b2a56e
      Yu Zhao authored
      Cancel pending work before freeing smsc75xx private data structure
      during binding. This fixes the following crash in the driver:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
      IP: mutex_lock+0x2b/0x3f
      <snipped>
      Workqueue: events smsc75xx_deferred_multicast_write [smsc75xx]
      task: ffff8caa83e85700 task.stack: ffff948b80518000
      RIP: 0010:mutex_lock+0x2b/0x3f
      <snipped>
      Call Trace:
       smsc75xx_deferred_multicast_write+0x40/0x1af [smsc75xx]
       process_one_work+0x18d/0x2fc
       worker_thread+0x1a2/0x269
       ? pr_cont_work+0x58/0x58
       kthread+0xfa/0x10a
       ? pr_cont_work+0x58/0x58
       ? rcu_read_unlock_sched_notrace+0x48/0x48
       ret_from_fork+0x22/0x40
      Signed-off-by: default avatarYu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f7b2a56e
    • David S. Miller's avatar
      Merge tag 'mac80211-for-davem-2018-10-04' of... · 9e15ff7b
      David S. Miller authored
      Merge tag 'mac80211-for-davem-2018-10-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
      
      Johannes Berg says:
      
      ====================
      Just three small fixes:
       * fix use-after-free in regulatory code
       * fix rx-mgmt key flag in AP mode (mac80211)
       * fix wireless extensions compat code memory leak
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9e15ff7b
    • David S. Miller's avatar
      Merge branch 'mlxsw-fixes' · b576eddb
      David S. Miller authored
      Ido Schimmel says:
      
      ====================
      mlxsw: Couple of fixes
      
      First patch works around an hardware issue in Spectrum-2 where a field
      indicating the event type is always set to the same value. Since there
      are only two event types and they are reported using different queues,
      we can use the queue number to derive the event type.
      
      Second patch prevents a router interface (RIF) leakage when a VLAN
      device is deleted from on top a bridge device.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b576eddb
    • Ido Schimmel's avatar
      mlxsw: spectrum: Delete RIF when VLAN device is removed · c360867e
      Ido Schimmel authored
      In commit 602b74ed ("mlxsw: spectrum_switchdev: Do not leak RIFs
      when removing bridge") I handled the case where RIFs created for VLAN
      devices were not properly cleaned up when their real device (a bridge)
      was removed.
      
      However, I forgot to handle the case of the VLAN device itself being
      removed. Do so now when the VLAN device is being unlinked from its real
      device.
      
      Fixes: 99f44bb3 ("mlxsw: spectrum: Enable L3 interfaces on top of bridge devices")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reported-by: default avatarArtem Shvorin <art@qrator.net>
      Tested-by: default avatarArtem Shvorin <art@qrator.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c360867e
    • Nir Dotan's avatar
      mlxsw: pci: Derive event type from event queue number · f3c84a8e
      Nir Dotan authored
      Due to a hardware issue in Spectrum-2, the field event_type of the event
      queue element (EQE) has become reserved. It was used to distinguish between
      command interface completion events and completion events.
      
      Use queue number to determine event type, as command interface completion
      events are always received on EQ0 and mlxsw driver maps completion events
      to EQ1.
      
      Fixes: c3ab4354 ("mlxsw: spectrum: Extend to support Spectrum-2 ASIC")
      Signed-off-by: default avatarNir Dotan <nird@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f3c84a8e
  3. 03 Oct, 2018 16 commits