1. 22 Mar, 2017 18 commits
    • Eric Dumazet's avatar
      tcp/dccp: block BH for SYN processing · da2da823
      Eric Dumazet authored
      [ Upstream commit 449809a6 ]
      
      SYN processing really was meant to be handled from BH.
      
      When I got rid of BH blocking while processing socket backlog
      in commit 5413d1ba ("net: do not block BH while processing socket
      backlog"), I forgot that a malicious user could transition to TCP_LISTEN
      from a state that allowed (SYN) packets to be parked in the socket
      backlog while socket is owned by the thread doing the listen() call.
      
      Sure enough syzkaller found this and reported the bug ;)
      
      =================================
      [ INFO: inconsistent lock state ]
      4.10.0+ #60 Not tainted
      ---------------------------------
      inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
      syz-executor0/5090 [HC0[0]:SC0[0]:HE1:SE1] takes:
       (&(&hashinfo->ehash_locks[i])->rlock){+.?...}, at:
      [<ffffffff83a6a370>] spin_lock include/linux/spinlock.h:299 [inline]
       (&(&hashinfo->ehash_locks[i])->rlock){+.?...}, at:
      [<ffffffff83a6a370>] inet_ehash_insert+0x240/0xad0
      net/ipv4/inet_hashtables.c:407
      {IN-SOFTIRQ-W} state was registered at:
        mark_irqflags kernel/locking/lockdep.c:2923 [inline]
        __lock_acquire+0xbcf/0x3270 kernel/locking/lockdep.c:3295
        lock_acquire+0x241/0x580 kernel/locking/lockdep.c:3753
        __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
        _raw_spin_lock+0x33/0x50 kernel/locking/spinlock.c:151
        spin_lock include/linux/spinlock.h:299 [inline]
        inet_ehash_insert+0x240/0xad0 net/ipv4/inet_hashtables.c:407
        reqsk_queue_hash_req net/ipv4/inet_connection_sock.c:753 [inline]
        inet_csk_reqsk_queue_hash_add+0x1b7/0x2a0 net/ipv4/inet_connection_sock.c:764
        tcp_conn_request+0x25cc/0x3310 net/ipv4/tcp_input.c:6399
        tcp_v4_conn_request+0x157/0x220 net/ipv4/tcp_ipv4.c:1262
        tcp_rcv_state_process+0x802/0x4130 net/ipv4/tcp_input.c:5889
        tcp_v4_do_rcv+0x56b/0x940 net/ipv4/tcp_ipv4.c:1433
        tcp_v4_rcv+0x2e12/0x3210 net/ipv4/tcp_ipv4.c:1711
        ip_local_deliver_finish+0x4ce/0xc40 net/ipv4/ip_input.c:216
        NF_HOOK include/linux/netfilter.h:257 [inline]
        ip_local_deliver+0x1ce/0x710 net/ipv4/ip_input.c:257
        dst_input include/net/dst.h:492 [inline]
        ip_rcv_finish+0xb1d/0x2110 net/ipv4/ip_input.c:396
        NF_HOOK include/linux/netfilter.h:257 [inline]
        ip_rcv+0xd90/0x19c0 net/ipv4/ip_input.c:487
        __netif_receive_skb_core+0x1ad1/0x3400 net/core/dev.c:4179
        __netif_receive_skb+0x2a/0x170 net/core/dev.c:4217
        netif_receive_skb_internal+0x1d6/0x430 net/core/dev.c:4245
        napi_skb_finish net/core/dev.c:4602 [inline]
        napi_gro_receive+0x4e6/0x680 net/core/dev.c:4636
        e1000_receive_skb drivers/net/ethernet/intel/e1000/e1000_main.c:4033 [inline]
        e1000_clean_rx_irq+0x5e0/0x1490
      drivers/net/ethernet/intel/e1000/e1000_main.c:4489
        e1000_clean+0xb9a/0x2910 drivers/net/ethernet/intel/e1000/e1000_main.c:3834
        napi_poll net/core/dev.c:5171 [inline]
        net_rx_action+0xe70/0x1900 net/core/dev.c:5236
        __do_softirq+0x2fb/0xb7d kernel/softirq.c:284
        invoke_softirq kernel/softirq.c:364 [inline]
        irq_exit+0x19e/0x1d0 kernel/softirq.c:405
        exiting_irq arch/x86/include/asm/apic.h:658 [inline]
        do_IRQ+0x81/0x1a0 arch/x86/kernel/irq.c:250
        ret_from_intr+0x0/0x20
        native_safe_halt+0x6/0x10 arch/x86/include/asm/irqflags.h:53
        arch_safe_halt arch/x86/include/asm/paravirt.h:98 [inline]
        default_idle+0x8f/0x410 arch/x86/kernel/process.c:271
        arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:262
        default_idle_call+0x36/0x60 kernel/sched/idle.c:96
        cpuidle_idle_call kernel/sched/idle.c:154 [inline]
        do_idle+0x348/0x440 kernel/sched/idle.c:243
        cpu_startup_entry+0x18/0x20 kernel/sched/idle.c:345
        start_secondary+0x344/0x440 arch/x86/kernel/smpboot.c:272
        verify_cpu+0x0/0xfc
      irq event stamp: 1741
      hardirqs last  enabled at (1741): [<ffffffff84d49d77>]
      __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160
      [inline]
      hardirqs last  enabled at (1741): [<ffffffff84d49d77>]
      _raw_spin_unlock_irqrestore+0xf7/0x1a0 kernel/locking/spinlock.c:191
      hardirqs last disabled at (1740): [<ffffffff84d4a732>]
      __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:108 [inline]
      hardirqs last disabled at (1740): [<ffffffff84d4a732>]
      _raw_spin_lock_irqsave+0xa2/0x110 kernel/locking/spinlock.c:159
      softirqs last  enabled at (1738): [<ffffffff84d4deff>]
      __do_softirq+0x7cf/0xb7d kernel/softirq.c:310
      softirqs last disabled at (1571): [<ffffffff84d4b92c>]
      do_softirq_own_stack+0x1c/0x30 arch/x86/entry/entry_64.S:902
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&(&hashinfo->ehash_locks[i])->rlock);
        <Interrupt>
          lock(&(&hashinfo->ehash_locks[i])->rlock);
      
       *** DEADLOCK ***
      
      1 lock held by syz-executor0/5090:
       #0:  (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff83406b43>] lock_sock
      include/net/sock.h:1460 [inline]
       #0:  (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff83406b43>]
      sock_setsockopt+0x233/0x1e40 net/core/sock.c:683
      
      stack backtrace:
      CPU: 1 PID: 5090 Comm: syz-executor0 Not tainted 4.10.0+ #60
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:15 [inline]
       dump_stack+0x292/0x398 lib/dump_stack.c:51
       print_usage_bug+0x3ef/0x450 kernel/locking/lockdep.c:2387
       valid_state kernel/locking/lockdep.c:2400 [inline]
       mark_lock_irq kernel/locking/lockdep.c:2602 [inline]
       mark_lock+0xf30/0x1410 kernel/locking/lockdep.c:3065
       mark_irqflags kernel/locking/lockdep.c:2941 [inline]
       __lock_acquire+0x6dc/0x3270 kernel/locking/lockdep.c:3295
       lock_acquire+0x241/0x580 kernel/locking/lockdep.c:3753
       __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
       _raw_spin_lock+0x33/0x50 kernel/locking/spinlock.c:151
       spin_lock include/linux/spinlock.h:299 [inline]
       inet_ehash_insert+0x240/0xad0 net/ipv4/inet_hashtables.c:407
       reqsk_queue_hash_req net/ipv4/inet_connection_sock.c:753 [inline]
       inet_csk_reqsk_queue_hash_add+0x1b7/0x2a0 net/ipv4/inet_connection_sock.c:764
       dccp_v6_conn_request+0xada/0x11b0 net/dccp/ipv6.c:380
       dccp_rcv_state_process+0x51e/0x1660 net/dccp/input.c:606
       dccp_v6_do_rcv+0x213/0x350 net/dccp/ipv6.c:632
       sk_backlog_rcv include/net/sock.h:896 [inline]
       __release_sock+0x127/0x3a0 net/core/sock.c:2052
       release_sock+0xa5/0x2b0 net/core/sock.c:2539
       sock_setsockopt+0x60f/0x1e40 net/core/sock.c:1016
       SYSC_setsockopt net/socket.c:1782 [inline]
       SyS_setsockopt+0x2fb/0x3a0 net/socket.c:1765
       entry_SYSCALL_64_fastpath+0x1f/0xc2
      RIP: 0033:0x4458b9
      RSP: 002b:00007fe8b26c2b58 EFLAGS: 00000292 ORIG_RAX: 0000000000000036
      RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 00000000004458b9
      RDX: 000000000000001a RSI: 0000000000000001 RDI: 0000000000000006
      RBP: 00000000006e2110 R08: 0000000000000010 R09: 0000000000000000
      R10: 00000000208c3000 R11: 0000000000000292 R12: 0000000000708000
      R13: 0000000020000000 R14: 0000000000001000 R15: 0000000000000000
      
      Fixes: 5413d1ba ("net: do not block BH while processing socket backlog")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      da2da823
    • Ido Schimmel's avatar
      mlxsw: spectrum_router: Avoid potential packets loss · 8f05976c
      Ido Schimmel authored
      [ Upstream commit f7df4923 ]
      
      When the structure of the LPM tree changes (f.e., due to the addition of
      a new prefix), we unbind the old tree and then bind the new one. This
      may result in temporary packet loss.
      
      Instead, overwrite the old binding with the new one.
      
      Fixes: 6b75c480 ("mlxsw: spectrum_router: Add virtual router management")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8f05976c
    • Jakub Kicinski's avatar
      geneve: lock RCU on TX path · 40f9f783
      Jakub Kicinski authored
      [ Upstream commit a717e3f7 ]
      
      There is no guarantees that callers of the TX path will hold
      the RCU lock.  Grab it explicitly.
      
      Fixes: fceb9c3e ("geneve: avoid using stale geneve socket.")
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      40f9f783
    • Jakub Kicinski's avatar
      vxlan: lock RCU on TX path · d6705c8c
      Jakub Kicinski authored
      [ Upstream commit 56de859e ]
      
      There is no guarantees that callers of the TX path will hold
      the RCU lock.  Grab it explicitly.
      
      Fixes: c6fcc4fc ("vxlan: avoid using stale vxlan socket.")
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d6705c8c
    • Paul Hüber's avatar
      l2tp: avoid use-after-free caused by l2tp_ip_backlog_recv · 4c94beba
      Paul Hüber authored
      [ Upstream commit 51fb60eb ]
      
      l2tp_ip_backlog_recv may not return -1 if the packet gets dropped.
      The return value is passed up to ip_local_deliver_finish, which treats
      negative values as an IP protocol number for resubmission.
      Signed-off-by: default avatarPaul Hüber <phueber@kernsp.in>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4c94beba
    • Roman Mashak's avatar
      net sched actions: decrement module reference count after table flush. · 639fdd96
      Roman Mashak authored
      [ Upstream commit edb9d1bf ]
      
      When tc actions are loaded as a module and no actions have been installed,
      flushing them would result in actions removed from the memory, but modules
      reference count not being decremented, so that the modules would not be
      unloaded.
      
      Following is example with GACT action:
      
      % sudo modprobe act_gact
      % lsmod
      Module                  Size  Used by
      act_gact               16384  0
      %
      % sudo tc actions ls action gact
      %
      % sudo tc actions flush action gact
      % lsmod
      Module                  Size  Used by
      act_gact               16384  1
      % sudo tc actions flush action gact
      % lsmod
      Module                  Size  Used by
      act_gact               16384  2
      % sudo rmmod act_gact
      rmmod: ERROR: Module act_gact is in use
      ....
      
      After the fix:
      % lsmod
      Module                  Size  Used by
      act_gact               16384  0
      %
      % sudo tc actions add action pass index 1
      % sudo tc actions add action pass index 2
      % sudo tc actions add action pass index 3
      % lsmod
      Module                  Size  Used by
      act_gact               16384  3
      %
      % sudo tc actions flush action gact
      % lsmod
      Module                  Size  Used by
      act_gact               16384  0
      %
      % sudo tc actions flush action gact
      % lsmod
      Module                  Size  Used by
      act_gact               16384  0
      % sudo rmmod act_gact
      % lsmod
      Module                  Size  Used by
      %
      
      Fixes: f97017cd ("net-sched: Fix actions flushing")
      Signed-off-by: default avatarRoman Mashak <mrv@mojatatu.com>
      Signed-off-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      639fdd96
    • Xin Long's avatar
      sctp: set sin_port for addr param when checking duplicate address · 467bec36
      Xin Long authored
      [ Upstream commit 2e3ce5bc ]
      
      Commit b8607805 ("sctp: not copying duplicate addrs to the assoc's
      bind address list") tried to check for duplicate address before copying
      to asoc's bind_addr list from global addr list.
      
      But all the addrs' sin_ports in global addr list are 0 while the addrs'
      sin_ports are bp->port in asoc's bind_addr list. It means even if it's
      a duplicate address, af->cmp_addr will still return 0 as the their
      sin_ports are different.
      
      This patch is to fix it by setting the sin_port for addr param with
      bp->port before comparing the addrs.
      
      Fixes: b8607805 ("sctp: not copying duplicate addrs to the assoc's bind address list")
      Reported-by: default avatarWei Chen <weichen@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      467bec36
    • Julian Anastasov's avatar
      ipv4: mask tos for input route · 91f4f5bf
      Julian Anastasov authored
      [ Upstream commit 6e28099d ]
      
      Restore the lost masking of TOS in input route code to
      allow ip rules to match it properly.
      
      Problem [1] noticed by Shmulik Ladkani <shmulik.ladkani@gmail.com>
      
      [1] http://marc.info/?t=137331755300040&r=1&w=2
      
      Fixes: 89aef892 ("ipv4: Delete routing cache.")
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      91f4f5bf
    • Julian Anastasov's avatar
      ipv4: add missing initialization for flowi4_uid · 0a33d62a
      Julian Anastasov authored
      [ Upstream commit 8bcfd092 ]
      
      Avoid matching of random stack value for uid when rules
      are looked up on input route or when RP filter is used.
      Problem should affect only setups that use ip rules with
      uid range.
      
      Fixes: 622ec2c9 ("net: core: add UID to flows, rules, and routes")
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0a33d62a
    • Brian Russell's avatar
      vxlan: don't allow overwrite of config src addr · 2b5a48d6
      Brian Russell authored
      [ Upstream commit 1158632b ]
      
      When using IPv6 transport and a default dst, a pointer to the configured
      source address is passed into the route lookup. If no source address is
      configured, then the value is overwritten.
      
      IPv6 route lookup ignores egress ifindex match if the source address is set,
      so if egress ifindex match is desired, the source address must be passed
      as any. The overwrite breaks this for subsequent lookups.
      
      Avoid this by copying the configured address to an existing stack variable
      and pass a pointer to that instead.
      
      Fixes: 272d96a5 ("net: vxlan: lwt: Use source ip address during route lookup.")
      Signed-off-by: default avatarBrian Russell <brussell@brocade.com>
      Acked-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2b5a48d6
    • David Forster's avatar
      vti6: return GRE_KEY for vti6 · fef3f97a
      David Forster authored
      [ Upstream commit 7dcdf941 ]
      
      Align vti6 with vti by returning GRE_KEY flag. This enables iproute2
      to display tunnel keys on "ip -6 tunnel show"
      Signed-off-by: default avatarDavid Forster <dforster@brocade.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fef3f97a
    • Matthias Schiffer's avatar
      vxlan: correctly validate VXLAN ID against VXLAN_N_VID · 36ec2150
      Matthias Schiffer authored
      [ Upstream commit 4e37d691 ]
      
      The incorrect check caused an off-by-one error: the maximum VID 0xffffff
      was unusable.
      
      Fixes: d342894c ("vxlan: virtual extensible lan")
      Signed-off-by: default avatarMatthias Schiffer <mschiffer@universe-factory.net>
      Acked-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      36ec2150
    • Marcelo Ricardo Leitner's avatar
      sctp: deny peeloff operation on asocs with threads sleeping on it · f4487753
      Marcelo Ricardo Leitner authored
      [ Upstream commit dfcb9f4f ]
      
      commit 2dcab598 ("sctp: avoid BUG_ON on sctp_wait_for_sndbuf")
      attempted to avoid a BUG_ON call when the association being used for a
      sendmsg() is blocked waiting for more sndbuf and another thread did a
      peeloff operation on such asoc, moving it to another socket.
      
      As Ben Hutchings noticed, then in such case it would return without
      locking back the socket and would cause two unlocks in a row.
      
      Further analysis also revealed that it could allow a double free if the
      application managed to peeloff the asoc that is created during the
      sendmsg call, because then sctp_sendmsg() would try to free the asoc
      that was created only for that call.
      
      This patch takes another approach. It will deny the peeloff operation
      if there is a thread sleeping on the asoc, so this situation doesn't
      exist anymore. This avoids the issues described above and also honors
      the syscalls that are already being handled (it can be multiple sendmsg
      calls).
      
      Joint work with Xin Long.
      
      Fixes: 2dcab598 ("sctp: avoid BUG_ON on sctp_wait_for_sndbuf")
      Cc: Alexander Popov <alex.popov@linux.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f4487753
    • Tariq Toukan's avatar
      net/mlx5e: Fix wrong CQE decompression · 55bb0dd0
      Tariq Toukan authored
      [ Upstream commit 36154be4 ]
      
      In cqe compression with striding RQ, the decompression of the CQE field
      wqe_counter was done with a wrong wraparound value.
      This caused handling cqes with a wrong pointer to wqe (rx descriptor)
      and creating SKBs with wrong data, pointing to wrong (and already consumed)
      strides/pages.
      
      The meaning of the CQE field wqe_counter in striding RQ holds the
      stride index instead of the WQE index. Hence, when decompressing
      a CQE, wqe_counter should have wrapped-around the number of strides
      in a single multi-packet WQE.
      
      We dropped this wrap-around mask at all in CQE decompression of striding
      RQ. It is not needed as in such cases the CQE compression session would
      break because of different value of wqe_id field, starting a new
      compression session.
      
      Tested:
       ethtool -K ethxx lro off/on
       ethtool --set-priv-flags ethxx rx_cqe_compress on
       super_netperf 16 {ipv4,ipv6} -t TCP_STREAM -m 50 -D
       verified no csum errors and no page refcount issues.
      
      Fixes: 7219ab34 ("net/mlx5e: CQE compression")
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Reported-by: default avatarTom Herbert <tom@herbertland.com>
      Cc: kernel-team@fb.com
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      55bb0dd0
    • Saeed Mahameed's avatar
      net/mlx5e: Update MPWQE stride size when modifying CQE compress state · c0dc4855
      Saeed Mahameed authored
      [ Upstream commit 6dc4b54e ]
      
      When the admin enables/disables cqe compression, updating
      mpwqe stride size is required:
          CQE compress ON  ==> stride size = 256B
          CQE compress OFF ==> stride size = 64B
      
      This is already done on driver load via mlx5e_set_rq_type_params, all we
      need is just to call it on arbitrary admin changes of cqe compression
      state via priv flags or when changing timestamping state
      (as it is mutually exclusive with cqe compression).
      
      This bug introduces no functional damage, it only makes cqe compression
      occur less often, since in ConnectX4-LX CQE compression is performed
      only on packets smaller than stride size.
      
      Tested:
       ethtool --set-priv-flags ethxx rx_cqe_compress on
       pktgen with  64 < pkt size < 256 and netperf TCP_STREAM (IPv4/IPv6)
       verify `ethtool -S ethxx | grep compress` are advancing more often
       (rapidly)
      
      Fixes: 7219ab34 ("net/mlx5e: CQE compression")
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Cc: kernel-team@fb.com
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c0dc4855
    • Tariq Toukan's avatar
      net/mlx5e: Fix broken CQE compression initialization · c34c1786
      Tariq Toukan authored
      [ Upstream commit b0d4660b ]
      
      Some of RQ type parameters are derived from CQE compression state flag,
      CQE compression flag was initialized only after RQ type parameters
      setup. This leads to load RQ with stride size smaller than what we
      want for when CQE compression is on.
      
      This bug introduces no functional damage, it only makes CQE compression
      occur less often, since in ConnectX4-LX CQE compression is performed
      only on packets smaller than stride size.
      
      Fix this by marking default status of CQE compression in PFLAG prior to
      calling mlx5e_set_rq_priv_params(), as it inits some fields based on it.
      
      Tested:
       load driver on systems where rx CQE compress will be on (MH)
       pktgen with  64 < pkt size < 256 and netperf TCP_STREAM (IPv4/IPv6)
       verify `ethtool -S ethxx | grep compress` are advancing more often
       (rapidly)
      
      Fixes: 2fc4bfb7 ("net/mlx5e: Dynamic RQ type infrastructure")
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Cc: kernel-team@fb.com
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c34c1786
    • Tariq Toukan's avatar
      net/mlx5e: Do not reduce LRO WQE size when not using build_skb · 850a1bfb
      Tariq Toukan authored
      [ Upstream commit 4078e637 ]
      
      When rq_type is Striding RQ, no room of SKB_RESERVE is needed
      as SKB allocation is not done via build_skb.
      
      Fixes: e4b85508 ("net/mlx5e: Slightly reduce hardware LRO size")
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      850a1bfb
    • Saeed Mahameed's avatar
      net/mlx5e: Register/unregister vport representors on interface attach/detach · 96b457b8
      Saeed Mahameed authored
      [ Upstream commit 6f08a22c ]
      
      Currently vport representors are added only on driver load and removed on
      driver unload.  Apparently we forgot to handle them when we added the
      seamless reset flow feature.  This caused to leave the representors
      netdevs alive and active with open HW resources on pci shutdown and on
      error reset flows.
      
      To overcome this we move their handling to interface attach/detach, so
      they would be cleaned up on shutdown and recreated on reset flows.
      
      Fixes: 26e59d80 ("net/mlx5e: Implement mlx5e interface attach/detach callbacks")
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Reviewed-by: default avatarHadar Hen Zion <hadarh@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      96b457b8
  2. 18 Mar, 2017 22 commits