1. 07 Jun, 2017 40 commits
    • Ard Biesheuvel's avatar
      drivers/tty: 8250: only call fintek_8250_probe when doing port I/O · e4bab31c
      Ard Biesheuvel authored
      commit 4c4fc909 upstream.
      
      Commit fa01e2ca ("serial: 8250: Integrate Fintek into 8250_base")
      modified the probing logic for PNP0501 devices, to remove a collision
      between the generic 16550A driver and the Fintek driver, which reused
      the same ACPI _HID.
      
      The Fintek device probe is now incorporated into the common 8250 probe
      path, and gets called for all discovered 16550A compatible devices,
      including ones that are MMIO mapped rather than IO mapped. However,
      the Fintek driver assumes the port base is a I/O address, and proceeds
      to probe some arbitrary offsets above it.
      
      This is generally a wrong thing to do, but on ARM systems (having no
      native port I/O), this may result in faulting accesses of completely
      unrelated MMIO regions in the PCI I/O space. Given that this is at
      serial probe time, this results in hard to diagnose crashes at boot.
      
      So let's restrict the Fintek probe to devices that we know are using
      port I/O in the first place.
      
      Fixes: fa01e2ca ("serial: 8250: Integrate Fintek into 8250_base")
      Suggested-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reviewed-by: default avatarRicardo Ribalda <ricardo.ribalda@gmail.com>
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e4bab31c
    • Johan Hovold's avatar
      serdev: fix tty-port client deregistration · 84ac7693
      Johan Hovold authored
      commit aee5da78 upstream.
      
      The port client data must be set when registering the serdev controller
      or client deregistration will fail (and the serdev devices are left
      registered and allocated) if the port was never opened in between.
      
      Make sure to clear the port client data on any probe errors to avoid a
      use-after-free when the client is later deregistered unconditionally
      (e.g. in a tty-port deregistration helper).
      
      Also move port client operation initialisation to registration. Note
      that the client ops must be restored on failed probe.
      
      Fixes: bed35c6d ("serdev: add a tty port controller driver")
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Reviewed-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      84ac7693
    • Johan Hovold's avatar
      Revert "tty_port: register tty ports with serdev bus" · 427fa8e3
      Johan Hovold authored
      commit d3ba126a upstream.
      
      This reverts commit 8ee3fde0.
      
      The new serdev bus hooked into the tty layer in
      tty_port_register_device() by registering a serdev controller instead of
      a tty device whenever a serdev client is present, and by deregistering
      the controller in the tty-port destructor. This is broken in several
      ways:
      
      Firstly, it leads to a NULL-pointer dereference whenever a tty driver
      later deregisters its devices as no corresponding character device will
      exist.
      
      Secondly, far from every tty driver uses tty-port refcounting (e.g.
      serial core) so the serdev devices might never be deregistered or
      deallocated.
      
      Thirdly, deregistering at tty-port destruction is too late as the
      underlying device and structures may be long gone by then. A port is not
      released before an open tty device is closed, something which a
      registered serdev client can prevent from ever happening. A driver
      callback while the device is gone typically also leads to crashes.
      
      Many tty drivers even keep their ports around until the driver is
      unloaded (e.g. serial core), something which even if a late callback
      never happens, leads to leaks if a device is unbound from its driver and
      is later rebound.
      
      The right solution here is to add a new tty_port_unregister_device()
      helper and to never call tty_device_unregister() whenever the port has
      been claimed by serdev, but since this requires modifying just about
      every tty driver (and multiple subsystems) it will need to be done
      incrementally.
      
      Reverting the offending patch is the first step in fixing the broken
      lifetime assumptions. A follow-up patch will add a new pair of
      tty-device registration helpers, which a vetted tty driver can use to
      support serdev (initially serial core). When every tty driver uses the
      serdev helpers (at least for deregistration), we can add serdev
      registration to tty_port_register_device() again.
      
      Note that this also fixes another issue with serdev, which currently
      allocates and registers a serdev controller for every tty device
      registered using tty_port_device_register() only to immediately
      deregister and deallocate it when the corresponding OF node or serdev
      child node is missing. This should be addressed before enabling serdev
      for hot-pluggable buses.
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Reviewed-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      427fa8e3
    • Jeremy Kerr's avatar
      powerpc/spufs: Fix hash faults for kernel regions · baa4d411
      Jeremy Kerr authored
      commit d75e4919 upstream.
      
      Commit ac29c640 ("powerpc/mm: Replace _PAGE_USER with
      _PAGE_PRIVILEGED") swapped _PAGE_USER for _PAGE_PRIVILEGED, and
      introduced check_pte_access() which denied kernel access to
      non-_PAGE_PRIVILEGED pages.
      
      However, it didn't add _PAGE_PRIVILEGED to the hash fault handler
      for spufs' kernel accesses, so the DMAs required to establish SPE
      memory no longer work.
      
      This change adds _PAGE_PRIVILEGED to the hash fault handler for
      kernel accesses.
      
      Fixes: ac29c640 ("powerpc/mm: Replace _PAGE_USER with _PAGE_PRIVILEGED")
      Signed-off-by: default avatarJeremy Kerr <jk@ozlabs.org>
      Reported-by: default avatarSombat Tragolgosol <sombat3960@gmail.com>
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      baa4d411
    • Michael Neuling's avatar
      powerpc: Fix booting P9 hash with CONFIG_PPC_RADIX_MMU=N · 919c7173
      Michael Neuling authored
      commit d957fb4d upstream.
      
      Currently if you disable CONFIG_PPC_RADIX_MMU you'll crash on boot on
      a P9. This is because we still set MMU_FTR_TYPE_RADIX via
      ibm,pa-features and MMU_FTR_TYPE_RADIX is what's used for code patching
      in much of the asm code (ie. slb_miss_realmode)
      
      This patch fixes the problem by stopping MMU_FTR_TYPE_RADIX from being
      set from ibm.pa-features.
      
      We may eventually end up removing the CONFIG_PPC_RADIX_MMU option
      completely but until then this fixes the issue.
      
      Fixes: 17a3dd2f ("powerpc/mm/radix: Use firmware feature to enable Radix MMU")
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      919c7173
    • Richard Narron's avatar
      fs/ufs: Set UFS default maximum bytes per file · 72351ac5
      Richard Narron authored
      commit 239e250e upstream.
      
      This fixes a problem with reading files larger than 2GB from a UFS-2
      file system:
      
          https://bugzilla.kernel.org/show_bug.cgi?id=195721
      
      The incorrect UFS s_maxsize limit became a problem as of commit
      c2a9737f ("vfs,mm: fix a dead loop in truncate_inode_pages_range()")
      which started using s_maxbytes to avoid a page index overflow in
      do_generic_file_read().
      
      That caused files to be truncated on UFS-2 file systems because the
      default maximum file size is 2GB (MAX_NON_LFS) and UFS didn't update it.
      
      Here I simply increase the default to a common value used by other file
      systems.
      Signed-off-by: default avatarRichard Narron <comet.berkeley@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Will B <will.brokenbourgh2877@gmail.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      72351ac5
    • Liam R. Howlett's avatar
      sparc/ftrace: Fix ftrace graph time measurement · f351b122
      Liam R. Howlett authored
      
      [ Upstream commit 48078d2d ]
      
      The ftrace function_graph time measurements of a given function is not
      accurate according to those recorded by ftrace using the function
      filters.  This change pulls the x86_64 fix from 'commit 722b3c74
      ("ftrace/graph: Trace function entry before updating index")' into the
      sparc specific prepare_ftrace_return which stops ftrace from
      counting interrupted tasks in the time measurement.
      
      Example measurements for select_task_rq_fair running "hackbench 100
      process 1000":
      
                    |  tracing/trace_stat/function0  |  function_graph
       Before patch |  2.802 us                      |  4.255 us
       After patch  |  2.749 us                      |  3.094 us
      Signed-off-by: default avatarLiam R. Howlett <Liam.Howlett@Oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f351b122
    • Orlando Arias's avatar
      sparc: Fix -Wstringop-overflow warning · 76037bf9
      Orlando Arias authored
      
      [ Upstream commit deba804c ]
      
      Greetings,
      
      GCC 7 introduced the -Wstringop-overflow flag to detect buffer overflows
      in calls to string handling functions [1][2]. Due to the way
      ``empty_zero_page'' is declared in arch/sparc/include/setup.h, this
      causes a warning to trigger at compile time in the function mem_init(),
      which is subsequently converted to an error. The ensuing patch fixes
      this issue and aligns the declaration of empty_zero_page to that of
      other architectures. Thank you.
      
      Cheers,
      Orlando.
      
      [1] https://gcc.gnu.org/ml/gcc-patches/2016-10/msg02308.html
      [2] https://gcc.gnu.org/gcc-7/changes.htmlSigned-off-by: default avatarOrlando Arias <oarias@knights.ucf.edu>
      
      --------------------------------------------------------------------------------
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      76037bf9
    • Nitin Gupta's avatar
      sparc64: Fix mapping of 64k pages with MAP_FIXED · e346489f
      Nitin Gupta authored
      
      [ Upstream commit b6c41cb0 ]
      
      An incorrect huge page alignment check caused
      mmap failure for 64K pages when MAP_FIXED is used
      with address not aligned to HPAGE_SIZE.
      
      Orabug: 25885991
      
      Fixes: dcd1912d ("sparc64: Add 64K page size support")
      Signed-off-by: default avatarNitin Gupta <nitin.m.gupta@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e346489f
    • Daniel Borkmann's avatar
      bpf: adjust verifier heuristics · 21dccb0f
      Daniel Borkmann authored
      
      [ Upstream commit 3c2ce60b ]
      
      Current limits with regards to processing program paths do not
      really reflect today's needs anymore due to programs becoming
      more complex and verifier smarter, keeping track of more data
      such as const ALU operations, alignment tracking, spilling of
      PTR_TO_MAP_VALUE_ADJ registers, and other features allowing for
      smarter matching of what LLVM generates.
      
      This also comes with the side-effect that we result in fewer
      opportunities to prune search states and thus often need to do
      more work to prove safety than in the past due to different
      register states and stack layout where we mismatch. Generally,
      it's quite hard to determine what caused a sudden increase in
      complexity, it could be caused by something as trivial as a
      single branch somewhere at the beginning of the program where
      LLVM assigned a stack slot that is marked differently throughout
      other branches and thus causing a mismatch, where verifier
      then needs to prove safety for the whole rest of the program.
      Subsequently, programs with even less than half the insn size
      limit can get rejected. We noticed that while some programs
      load fine under pre 4.11, they get rejected due to hitting
      limits on more recent kernels. We saw that in the vast majority
      of cases (90+%) pruning failed due to register mismatches. In
      case of stack mismatches, majority of cases failed due to
      different stack slot types (invalid, spill, misc) rather than
      differences in spilled registers.
      
      This patch makes pruning more aggressive by also adding markers
      that sit at conditional jumps as well. Currently, we only mark
      jump targets for pruning. For example in direct packet access,
      these are usually error paths where we bail out. We found that
      adding these markers, it can reduce number of processed insns
      by up to 30%. Another option is to ignore reg->id in probing
      PTR_TO_MAP_VALUE_OR_NULL registers, which can help pruning
      slightly as well by up to 7% observed complexity reduction as
      stand-alone. Meaning, if a previous path with register type
      PTR_TO_MAP_VALUE_OR_NULL for map X was found to be safe, then
      in the current state a PTR_TO_MAP_VALUE_OR_NULL register for
      the same map X must be safe as well. Last but not least the
      patch also adds a scheduling point and bumps the current limit
      for instructions to be processed to a more adequate value.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      21dccb0f
    • Daniel Borkmann's avatar
      bpf: fix wrong exposure of map_flags into fdinfo for lpm · 87cebd0f
      Daniel Borkmann authored
      
      [ Upstream commit a316338c ]
      
      trie_alloc() always needs to have BPF_F_NO_PREALLOC passed in via
      attr->map_flags, since it does not support preallocation yet. We
      check the flag, but we never copy the flag into trie->map.map_flags,
      which is later on exposed into fdinfo and used by loaders such as
      iproute2. Latter uses this in bpf_map_selfcheck_pinned() to test
      whether a pinned map has the same spec as the one from the BPF obj
      file and if not, bails out, which is currently the case for lpm
      since it exposes always 0 as flags.
      
      Also copy over flags in array_map_alloc() and stack_map_alloc().
      They always have to be 0 right now, but we should make sure to not
      miss to copy them over at a later point in time when we add actual
      flags for them to use.
      
      Fixes: b95a5c4d ("bpf: add a longest prefix match trie map implementation")
      Reported-by: default avatarJarno Rajahalme <jarno@covalent.io>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      87cebd0f
    • Daniel Borkmann's avatar
      bpf: add bpf_clone_redirect to bpf_helper_changes_pkt_data · d6d2860e
      Daniel Borkmann authored
      
      [ Upstream commit 41703a73 ]
      
      The bpf_clone_redirect() still needs to be listed in
      bpf_helper_changes_pkt_data() since we call into
      bpf_try_make_head_writable() from there, thus we need
      to invalidate prior pkt regs as well.
      
      Fixes: 36bbef52 ("bpf: direct packet write and access for helpers for clsact progs")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d6d2860e
    • Eric Dumazet's avatar
      ipv4: add reference counting to metrics · 3b69d651
      Eric Dumazet authored
      
      [ Upstream commit 3fb07daf ]
      
      Andrey Konovalov reported crashes in ipv4_mtu()
      
      I could reproduce the issue with KASAN kernels, between
      10.246.7.151 and 10.246.7.152 :
      
      1) 20 concurrent netperf -t TCP_RR -H 10.246.7.152 -l 1000 &
      
      2) At the same time run following loop :
      while :
      do
       ip ro add 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500
       ip ro del 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500
      done
      
      Cong Wang attempted to add back rt->fi in commit
      82486aa6 ("ipv4: restore rt->fi for reference counting")
      but this proved to add some issues that were complex to solve.
      
      Instead, I suggested to add a refcount to the metrics themselves,
      being a standalone object (in particular, no reference to other objects)
      
      I tried to make this patch as small as possible to ease its backport,
      instead of being super clean. Note that we believe that only ipv4 dst
      need to take care of the metric refcount. But if this is wrong,
      this patch adds the basic infrastructure to extend this to other
      families.
      
      Many thanks to Julian Anastasov for reviewing this patch, and Cong Wang
      for his efforts on this problem.
      
      Fixes: 2860583f ("ipv4: Kill rt->fi")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Reviewed-by: default avatarJulian Anastasov <ja@ssi.bg>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3b69d651
    • Peter Dawson's avatar
      ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets · d3edf403
      Peter Dawson authored
      
      [ Upstream commit 0e9a7095 ]
      
      This fix addresses two problems in the way the DSCP field is formulated
       on the encapsulating header of IPv6 tunnels.
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=195661
      
      1) The IPv6 tunneling code was manipulating the DSCP field of the
       encapsulating packet using the 32b flowlabel. Since the flowlabel is
       only the lower 20b it was incorrect to assume that the upper 12b
       containing the DSCP and ECN fields would remain intact when formulating
       the encapsulating header. This fix handles the 'inherit' and
       'fixed-value' DSCP cases explicitly using the extant dsfield u8 variable.
      
      2) The use of INET_ECN_encapsulate(0, dsfield) in ip6_tnl_xmit was
       incorrect and resulted in the DSCP value always being set to 0.
      
      Commit 90427ef5 ("ipv6: fix flow labels when the traffic class
       is non-0") caused the regression by masking out the flowlabel
       which exposed the incorrect handling of the DSCP portion of the
       flowlabel in ip6_tunnel and ip6_gre.
      
      Fixes: 90427ef5 ("ipv6: fix flow labels when the traffic class is non-0")
      Signed-off-by: default avatarPeter Dawson <peter.a.dawson@boeing.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d3edf403
    • Davide Caratti's avatar
      sctp: fix ICMP processing if skb is non-linear · 90e7c332
      Davide Caratti authored
      
      [ Upstream commit 804ec7eb ]
      
      sometimes ICMP replies to INIT chunks are ignored by the client, even if
      the encapsulated SCTP headers match an open socket. This happens when the
      ICMP packet is carried by a paged skb: use skb_header_pointer() to read
      packet contents beyond the SCTP header, so that chunk header and initiate
      tag are validated correctly.
      
      v2:
      - don't use skb_header_pointer() to read the transport header, since
        icmp_socket_deliver() already puts these 8 bytes in the linear area.
      - change commit message to make specific reference to INIT chunks.
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Reviewed-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      90e7c332
    • Wei Wang's avatar
      tcp: avoid fastopen API to be used on AF_UNSPEC · 0236d8c4
      Wei Wang authored
      
      [ Upstream commit ba615f67 ]
      
      Fastopen API should be used to perform fastopen operations on the TCP
      socket. It does not make sense to use fastopen API to perform disconnect
      by calling it with AF_UNSPEC. The fastopen data path is also prone to
      race conditions and bugs when using with AF_UNSPEC.
      
      One issue reported and analyzed by Vegard Nossum is as follows:
      +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
      Thread A:                            Thread B:
      ------------------------------------------------------------------------
      sendto()
       - tcp_sendmsg()
           - sk_stream_memory_free() = 0
               - goto wait_for_sndbuf
      	     - sk_stream_wait_memory()
      	        - sk_wait_event() // sleep
                |                          sendto(flags=MSG_FASTOPEN, dest_addr=AF_UNSPEC)
      	  |                           - tcp_sendmsg()
      	  |                              - tcp_sendmsg_fastopen()
      	  |                                 - __inet_stream_connect()
      	  |                                    - tcp_disconnect() //because of AF_UNSPEC
      	  |                                       - tcp_transmit_skb()// send RST
      	  |                                    - return 0; // no reconnect!
      	  |                           - sk_stream_wait_connect()
      	  |                                 - sock_error()
      	  |                                    - xchg(&sk->sk_err, 0)
      	  |                                    - return -ECONNRESET
      	- ... // wake up, see sk->sk_err == 0
          - skb_entail() on TCP_CLOSE socket
      
      If the connection is reopened then we will send a brand new SYN packet
      after thread A has already queued a buffer. At this point I think the
      socket internal state (sequence numbers etc.) becomes messed up.
      
      When the new connection is closed, the FIN-ACK is rejected because the
      sequence number is outside the window. The other side tries to
      retransmit,
      but __tcp_retransmit_skb() calls tcp_trim_head() on an empty skb which
      corrupts the skb data length and hits a BUG() in copy_and_csum_bits().
      +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
      
      Hence, this patch adds a check for AF_UNSPEC in the fastopen data path
      and return EOPNOTSUPP to user if such case happens.
      
      Fixes: cf60af03 ("tcp: Fast Open client - sendmsg(MSG_FASTOPEN)")
      Reported-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: default avatarWei Wang <weiwan@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0236d8c4
    • Eric Garver's avatar
      geneve: fix fill_info when using collect_metadata · 1642394f
      Eric Garver authored
      
      [ Upstream commit 11387fe4 ]
      
      Since 9b4437a5 ("geneve: Unify LWT and netdev handling.") fill_info
      does not return UDP_ZERO_CSUM6_RX when using COLLECT_METADATA. This is
      because it uses ip_tunnel_info_af() with the device level info, which is
      not valid for COLLECT_METADATA.
      
      Fix by checking for the presence of the actual sockets.
      
      Fixes: 9b4437a5 ("geneve: Unify LWT and netdev handling.")
      Signed-off-by: default avatarEric Garver <e@erig.me>
      Acked-by: default avatarPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1642394f
    • Vlad Yasevich's avatar
      virtio-net: enable TSO/checksum offloads for Q-in-Q vlans · 4dbbbaad
      Vlad Yasevich authored
      
      [ Upstream commit 2836b4f2 ]
      
      Since virtio does not provide it's own ndo_features_check handler,
      TSO, and now checksum offload, are disabled for stacked vlans.
      Re-enable the support and let the host take care of it.  This
      restores/improves Guest-to-Guest performance over Q-in-Q vlans.
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4dbbbaad
    • Vlad Yasevich's avatar
      be2net: Fix offload features for Q-in-Q packets · acc866e9
      Vlad Yasevich authored
      
      [ Upstream commit cc6e9de6 ]
      
      At least some of the be2net cards do not seem to be capabled
      of performing checksum offload computions on Q-in-Q packets.
      In these case, the recevied checksum on the remote is invalid
      and TCP syn packets are dropped.
      
      This patch adds a call to check disbled acceleration features
      on Q-in-Q tagged traffic.
      
      CC: Sathya Perla <sathya.perla@broadcom.com>
      CC: Ajit Khaparde <ajit.khaparde@broadcom.com>
      CC: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
      CC: Somnath Kotur <somnath.kotur@broadcom.com>
      Signed-off-by: default avatarVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      acc866e9
    • Vlad Yasevich's avatar
      vlan: Fix tcp checksum offloads in Q-in-Q vlans · 423c1b43
      Vlad Yasevich authored
      
      [ Upstream commit 35d2f80b ]
      
      It appears that TCP checksum offloading has been broken for
      Q-in-Q vlans.  The behavior was execerbated by the
      series
          commit afb0bc97 ("Merge branch 'stacked_vlan_tso'")
      that that enabled accleleration features on stacked vlans.
      
      However, event without that series, it is possible to trigger
      this issue.  It just requires a lot more specialized configuration.
      
      The root cause is the interaction between how
      netdev_intersect_features() works, the features actually set on
      the vlan devices and HW having the ability to run checksum with
      longer headers.
      
      The issue starts when netdev_interesect_features() replaces
      NETIF_F_HW_CSUM with a combination of NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM,
      if the HW advertises IP|IPV6 specific checksums.  This happens
      for tagged and multi-tagged packets.   However, HW that enables
      IP|IPV6 checksum offloading doesn't gurantee that packets with
      arbitrarily long headers can be checksummed.
      
      This patch disables IP|IPV6 checksums on the packet for multi-tagged
      packets.
      
      CC: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      CC: Michal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarVladislav Yasevich <vyasevic@redhat.com>
      Acked-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      423c1b43
    • Andrew Lunn's avatar
      net: phy: marvell: Limit errata to 88m1101 · f1cd4c63
      Andrew Lunn authored
      
      [ Upstream commit f2899788 ]
      
      The 88m1101 has an errata when configuring autoneg. However, it was
      being applied to many other Marvell PHYs as well. Limit its scope to
      just the 88m1101.
      
      Fixes: 76884679 ("phylib: Add support for Marvell 88e1111S and 88e1145")
      Reported-by: default avatarDaniel Walker <danielwa@cisco.com>
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Acked-by: default avatarHarini Katakam <harinik@xilinx.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f1cd4c63
    • Mohamad Haj Yahia's avatar
      net/mlx5: Avoid using pending command interface slots · bea27802
      Mohamad Haj Yahia authored
      
      [ Upstream commit 73dd3a48 ]
      
      Currently when firmware command gets stuck or it takes long time to
      complete, the driver command will get timeout and the command slot is
      freed and can be used for new commands, and if the firmware receive new
      command on the old busy slot its behavior is unexpected and this could
      be harmful.
      To fix this when the driver command gets timeout we return failure,
      but we don't free the command slot and we wait for the firmware to
      explicitly respond to that command.
      Once all the entries are busy we will stop processing new firmware
      commands.
      
      Fixes: 9cba4ebc ('net/mlx5: Fix potential deadlock in command mode change')
      Signed-off-by: default avatarMohamad Haj Yahia <mohamad@mellanox.com>
      Cc: kernel-team@fb.com
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bea27802
    • Jarod Wilson's avatar
      bonding: fix accounting of active ports in 3ad · 1cdcbe7c
      Jarod Wilson authored
      
      [ Upstream commit 751da2a6 ]
      
      As of 7bb11dc9 and 0622cab0, bond slaves in a 3ad bond are not
      removed from the aggregator when they are down, and the active slave count
      is NOT equal to number of ports in the aggregator, but rather the number
      of ports in the aggregator that are still enabled. The sysfs spew for
      bonding_show_ad_num_ports() has a comment that says "Show number of active
      802.3ad ports.", but it's currently showing total number of ports, both
      active and inactive. Remedy it by using the same logic introduced in
      0622cab0 in __bond_3ad_get_active_agg_info(), so sysfs, procfs and
      netlink all report the number of active ports. Note that this means that
      IFLA_BOND_AD_INFO_NUM_PORTS really means NUM_ACTIVE_PORTS instead of
      NUM_PORTS, and thus perhaps should be renamed for clarity.
      
      Lightly tested on a dual i40e lacp bond, simulating link downs with an ip
      link set dev <slave2> down, was able to produce the state where I could
      see both in the same aggregator, but a number of ports count of 1.
      
      MII Status: up
      Active Aggregator Info:
              Aggregator ID: 1
              Number of ports: 2 <---
      Slave Interface: ens10
      MII Status: up <---
      Aggregator ID: 1
      Slave Interface: ens11
      MII Status: up
      Aggregator ID: 1
      
      MII Status: up
      Active Aggregator Info:
              Aggregator ID: 1
              Number of ports: 1 <---
      Slave Interface: ens10
      MII Status: down <---
      Aggregator ID: 1
      Slave Interface: ens11
      MII Status: up
      Aggregator ID: 1
      
      CC: Jay Vosburgh <j.vosburgh@gmail.com>
      CC: Veaceslav Falico <vfalico@gmail.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: netdev@vger.kernel.org
      Signed-off-by: default avatarJarod Wilson <jarod@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1cdcbe7c
    • Eric Dumazet's avatar
      ipv6: fix out of bound writes in __ip6_append_data() · 827624c3
      Eric Dumazet authored
      
      [ Upstream commit 232cd35d ]
      
      Andrey Konovalov and idaifish@gmail.com reported crashes caused by
      one skb shared_info being overwritten from __ip6_append_data()
      
      Andrey program lead to following state :
      
      copy -4200 datalen 2000 fraglen 2040
      maxfraglen 2040 alloclen 2048 transhdrlen 0 offset 0 fraggap 6200
      
      The skb_copy_and_csum_bits(skb_prev, maxfraglen, data + transhdrlen,
      fraggap, 0); is overwriting skb->head and skb_shared_info
      
      Since we apparently detect this rare condition too late, move the
      code earlier to even avoid allocating skb and risking crashes.
      
      Once again, many thanks to Andrey and syzkaller team.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Reported-by: <idaifish@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      827624c3
    • Xin Long's avatar
      bridge: start hello_timer when enabling KERNEL_STP in br_stp_start · 99c971a5
      Xin Long authored
      
      [ Upstream commit 6d18c732 ]
      
      Since commit 76b91c32 ("bridge: stp: when using userspace stp stop
      kernel hello and hold timers"), bridge would not start hello_timer if
      stp_enabled is not KERNEL_STP when br_dev_open.
      
      The problem is even if users set stp_enabled with KERNEL_STP later,
      the timer will still not be started. It causes that KERNEL_STP can
      not really work. Users have to re-ifup the bridge to avoid this.
      
      This patch is to fix it by starting br->hello_timer when enabling
      KERNEL_STP in br_stp_start.
      
      As an improvement, it's also to start hello_timer again only when
      br->stp_enabled is KERNEL_STP in br_hello_timer_expired, there is
      no reason to start the timer again when it's NO_STP.
      
      Fixes: 76b91c32 ("bridge: stp: when using userspace stp stop kernel hello and hold timers")
      Reported-by: default avatarHaidong Li <haili@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Reviewed-by: default avatarIvan Vecera <cera@cera.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      99c971a5
    • Bjørn Mork's avatar
      qmi_wwan: add another Lenovo EM74xx device ID · bf97c6bf
      Bjørn Mork authored
      
      [ Upstream commit 486181bc ]
      
      In their infinite wisdom, and never ending quest for end user frustration,
      Lenovo has decided to use a new USB device ID for the wwan modules in
      their 2017 laptops.  The actual hardware is still the Sierra Wireless
      EM7455 or EM7430, depending on region.
      Signed-off-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bf97c6bf
    • Tobias Jungel's avatar
      bridge: netlink: check vlan_default_pvid range · 7b570175
      Tobias Jungel authored
      
      [ Upstream commit a2858602 ]
      
      Currently it is allowed to set the default pvid of a bridge to a value
      above VLAN_VID_MASK (0xfff). This patch adds a check to br_validate and
      returns -EINVAL in case the pvid is out of bounds.
      
      Reproduce by calling:
      
      [root@test ~]# ip l a type bridge
      [root@test ~]# ip l a type dummy
      [root@test ~]# ip l s bridge0 type bridge vlan_filtering 1
      [root@test ~]# ip l s bridge0 type bridge vlan_default_pvid 9999
      [root@test ~]# ip l s dummy0 master bridge0
      [root@test ~]# bridge vlan
      port	vlan ids
      bridge0	 9999 PVID Egress Untagged
      
      dummy0	 9999 PVID Egress Untagged
      
      Fixes: 0f963b75 ("bridge: netlink: add support for default_pvid")
      Acked-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarTobias Jungel <tobias.jungel@bisdn.de>
      Acked-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7b570175
    • David S. Miller's avatar
      ipv6: Check ip6_find_1stfragopt() return value properly. · 4b3a6fa3
      David S. Miller authored
      
      [ Upstream commit 7dd7eb95 ]
      
      Do not use unsigned variables to see if it returns a negative
      error or not.
      
      Fixes: 2423496a ("ipv6: Prevent overrun when parsing v6 header options")
      Reported-by: default avatarJulia Lawall <julia.lawall@lip6.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4b3a6fa3
    • Craig Gallek's avatar
      ipv6: Prevent overrun when parsing v6 header options · 9909e4e4
      Craig Gallek authored
      
      [ Upstream commit 2423496a ]
      
      The KASAN warning repoted below was discovered with a syzkaller
      program.  The reproducer is basically:
        int s = socket(AF_INET6, SOCK_RAW, NEXTHDR_HOP);
        send(s, &one_byte_of_data, 1, MSG_MORE);
        send(s, &more_than_mtu_bytes_data, 2000, 0);
      
      The socket() call sets the nexthdr field of the v6 header to
      NEXTHDR_HOP, the first send call primes the payload with a non zero
      byte of data, and the second send call triggers the fragmentation path.
      
      The fragmentation code tries to parse the header options in order
      to figure out where to insert the fragment option.  Since nexthdr points
      to an invalid option, the calculation of the size of the network header
      can made to be much larger than the linear section of the skb and data
      is read outside of it.
      
      This fix makes ip6_find_1stfrag return an error if it detects
      running out-of-bounds.
      
      [   42.361487] ==================================================================
      [   42.364412] BUG: KASAN: slab-out-of-bounds in ip6_fragment+0x11c8/0x3730
      [   42.365471] Read of size 840 at addr ffff88000969e798 by task ip6_fragment-oo/3789
      [   42.366469]
      [   42.366696] CPU: 1 PID: 3789 Comm: ip6_fragment-oo Not tainted 4.11.0+ #41
      [   42.367628] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1ubuntu1 04/01/2014
      [   42.368824] Call Trace:
      [   42.369183]  dump_stack+0xb3/0x10b
      [   42.369664]  print_address_description+0x73/0x290
      [   42.370325]  kasan_report+0x252/0x370
      [   42.370839]  ? ip6_fragment+0x11c8/0x3730
      [   42.371396]  check_memory_region+0x13c/0x1a0
      [   42.371978]  memcpy+0x23/0x50
      [   42.372395]  ip6_fragment+0x11c8/0x3730
      [   42.372920]  ? nf_ct_expect_unregister_notifier+0x110/0x110
      [   42.373681]  ? ip6_copy_metadata+0x7f0/0x7f0
      [   42.374263]  ? ip6_forward+0x2e30/0x2e30
      [   42.374803]  ip6_finish_output+0x584/0x990
      [   42.375350]  ip6_output+0x1b7/0x690
      [   42.375836]  ? ip6_finish_output+0x990/0x990
      [   42.376411]  ? ip6_fragment+0x3730/0x3730
      [   42.376968]  ip6_local_out+0x95/0x160
      [   42.377471]  ip6_send_skb+0xa1/0x330
      [   42.377969]  ip6_push_pending_frames+0xb3/0xe0
      [   42.378589]  rawv6_sendmsg+0x2051/0x2db0
      [   42.379129]  ? rawv6_bind+0x8b0/0x8b0
      [   42.379633]  ? _copy_from_user+0x84/0xe0
      [   42.380193]  ? debug_check_no_locks_freed+0x290/0x290
      [   42.380878]  ? ___sys_sendmsg+0x162/0x930
      [   42.381427]  ? rcu_read_lock_sched_held+0xa3/0x120
      [   42.382074]  ? sock_has_perm+0x1f6/0x290
      [   42.382614]  ? ___sys_sendmsg+0x167/0x930
      [   42.383173]  ? lock_downgrade+0x660/0x660
      [   42.383727]  inet_sendmsg+0x123/0x500
      [   42.384226]  ? inet_sendmsg+0x123/0x500
      [   42.384748]  ? inet_recvmsg+0x540/0x540
      [   42.385263]  sock_sendmsg+0xca/0x110
      [   42.385758]  SYSC_sendto+0x217/0x380
      [   42.386249]  ? SYSC_connect+0x310/0x310
      [   42.386783]  ? __might_fault+0x110/0x1d0
      [   42.387324]  ? lock_downgrade+0x660/0x660
      [   42.387880]  ? __fget_light+0xa1/0x1f0
      [   42.388403]  ? __fdget+0x18/0x20
      [   42.388851]  ? sock_common_setsockopt+0x95/0xd0
      [   42.389472]  ? SyS_setsockopt+0x17f/0x260
      [   42.390021]  ? entry_SYSCALL_64_fastpath+0x5/0xbe
      [   42.390650]  SyS_sendto+0x40/0x50
      [   42.391103]  entry_SYSCALL_64_fastpath+0x1f/0xbe
      [   42.391731] RIP: 0033:0x7fbbb711e383
      [   42.392217] RSP: 002b:00007ffff4d34f28 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      [   42.393235] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fbbb711e383
      [   42.394195] RDX: 0000000000001000 RSI: 00007ffff4d34f60 RDI: 0000000000000003
      [   42.395145] RBP: 0000000000000046 R08: 00007ffff4d34f40 R09: 0000000000000018
      [   42.396056] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000400aad
      [   42.396598] R13: 0000000000000066 R14: 00007ffff4d34ee0 R15: 00007fbbb717af00
      [   42.397257]
      [   42.397411] Allocated by task 3789:
      [   42.397702]  save_stack_trace+0x16/0x20
      [   42.398005]  save_stack+0x46/0xd0
      [   42.398267]  kasan_kmalloc+0xad/0xe0
      [   42.398548]  kasan_slab_alloc+0x12/0x20
      [   42.398848]  __kmalloc_node_track_caller+0xcb/0x380
      [   42.399224]  __kmalloc_reserve.isra.32+0x41/0xe0
      [   42.399654]  __alloc_skb+0xf8/0x580
      [   42.400003]  sock_wmalloc+0xab/0xf0
      [   42.400346]  __ip6_append_data.isra.41+0x2472/0x33d0
      [   42.400813]  ip6_append_data+0x1a8/0x2f0
      [   42.401122]  rawv6_sendmsg+0x11ee/0x2db0
      [   42.401505]  inet_sendmsg+0x123/0x500
      [   42.401860]  sock_sendmsg+0xca/0x110
      [   42.402209]  ___sys_sendmsg+0x7cb/0x930
      [   42.402582]  __sys_sendmsg+0xd9/0x190
      [   42.402941]  SyS_sendmsg+0x2d/0x50
      [   42.403273]  entry_SYSCALL_64_fastpath+0x1f/0xbe
      [   42.403718]
      [   42.403871] Freed by task 1794:
      [   42.404146]  save_stack_trace+0x16/0x20
      [   42.404515]  save_stack+0x46/0xd0
      [   42.404827]  kasan_slab_free+0x72/0xc0
      [   42.405167]  kfree+0xe8/0x2b0
      [   42.405462]  skb_free_head+0x74/0xb0
      [   42.405806]  skb_release_data+0x30e/0x3a0
      [   42.406198]  skb_release_all+0x4a/0x60
      [   42.406563]  consume_skb+0x113/0x2e0
      [   42.406910]  skb_free_datagram+0x1a/0xe0
      [   42.407288]  netlink_recvmsg+0x60d/0xe40
      [   42.407667]  sock_recvmsg+0xd7/0x110
      [   42.408022]  ___sys_recvmsg+0x25c/0x580
      [   42.408395]  __sys_recvmsg+0xd6/0x190
      [   42.408753]  SyS_recvmsg+0x2d/0x50
      [   42.409086]  entry_SYSCALL_64_fastpath+0x1f/0xbe
      [   42.409513]
      [   42.409665] The buggy address belongs to the object at ffff88000969e780
      [   42.409665]  which belongs to the cache kmalloc-512 of size 512
      [   42.410846] The buggy address is located 24 bytes inside of
      [   42.410846]  512-byte region [ffff88000969e780, ffff88000969e980)
      [   42.411941] The buggy address belongs to the page:
      [   42.412405] page:ffffea000025a780 count:1 mapcount:0 mapping:          (null) index:0x0 compound_mapcount: 0
      [   42.413298] flags: 0x100000000008100(slab|head)
      [   42.413729] raw: 0100000000008100 0000000000000000 0000000000000000 00000001800c000c
      [   42.414387] raw: ffffea00002a9500 0000000900000007 ffff88000c401280 0000000000000000
      [   42.415074] page dumped because: kasan: bad access detected
      [   42.415604]
      [   42.415757] Memory state around the buggy address:
      [   42.416222]  ffff88000969e880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [   42.416904]  ffff88000969e900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [   42.417591] >ffff88000969e980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [   42.418273]                    ^
      [   42.418588]  ffff88000969ea00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [   42.419273]  ffff88000969ea80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [   42.419882] ==================================================================
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarCraig Gallek <kraig@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9909e4e4
    • David Ahern's avatar
      net: Improve handling of failures on link and route dumps · df6342be
      David Ahern authored
      
      [ Upstream commit f6c5775f ]
      
      In general, rtnetlink dumps do not anticipate failure to dump a single
      object (e.g., link or route) on a single pass. As both route and link
      objects have grown via more attributes, that is no longer a given.
      
      netlink dumps can handle a failure if the dump function returns an
      error; specifically, netlink_dump adds the return code to the response
      if it is <= 0 so userspace is notified of the failure. The missing
      piece is the rtnetlink dump functions returning the error.
      
      Fix route and link dump functions to return the errors if no object is
      added to an skb (detected by skb->len != 0). IPv6 route dumps
      (rt6_dump_route) already return the error; this patch updates IPv4 and
      link dumps. Other dump functions may need to be ajusted as well.
      Reported-by: default avatarJan Moskyto Matejka <mq@ucw.cz>
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      df6342be
    • Christoph Hellwig's avatar
      net/smc: Add warning about remote memory exposure · a19f55f9
      Christoph Hellwig authored
      
      [ Upstream commit 19a0f7e3 ]
      
      The driver explicitly bypasses APIs to register all memory once a
      connection is made, and thus allows remote access to memory.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Acked-by: default avatarUrsula Braun <ubraun@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a19f55f9
    • Ursula Braun's avatar
      smc: switch to usage of IB_PD_UNSAFE_GLOBAL_RKEY · 516b3ed9
      Ursula Braun authored
      
      [ Upstream commit 263eec9b ]
      
      Currently, SMC enables remote access to physical memory when a user
      has successfully configured and established an SMC-connection until ten
      minutes after the last SMC connection is closed. Because this is considered
      a security risk, drivers are supposed to use IB_PD_UNSAFE_GLOBAL_RKEY in
      such a case.
      
      This patch changes the current SMC code to use IB_PD_UNSAFE_GLOBAL_RKEY.
      This improves user awareness, but does not remove the security risk itself.
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      516b3ed9
    • Soheil Hassas Yeganeh's avatar
      tcp: eliminate negative reordering in tcp_clean_rtx_queue · d6cb41cf
      Soheil Hassas Yeganeh authored
      
      [ Upstream commit bafbb9c7 ]
      
      tcp_ack() can call tcp_fragment() which may dededuct the
      value tp->fackets_out when MSS changes. When prior_fackets
      is larger than tp->fackets_out, tcp_clean_rtx_queue() can
      invoke tcp_update_reordering() with negative values. This
      results in absurd tp->reodering values higher than
      sysctl_tcp_max_reordering.
      
      Note that tcp_update_reordering indeeds sets tp->reordering
      to min(sysctl_tcp_max_reordering, metric), but because
      the comparison is signed, a negative metric always wins.
      
      Fixes: c7caf8d3 ("[TCP]: Fix reord detection due to snd_una covered holes")
      Reported-by: default avatarRebecca Isaacs <risaacs@google.com>
      Signed-off-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d6cb41cf
    • Gal Pressman's avatar
      net/mlx5e: Fix ethtool pause support and advertise reporting · cc1d2a62
      Gal Pressman authored
      
      [ Upstream commit e3c19503 ]
      
      Pause bit should set when RX pause is on, not TX pause.
      Also, setting Asym_Pause is incorrect, and should be turned off.
      
      Fixes: 665bc539 ("net/mlx5e: Use new ethtool get/set link ksettings API")
      Signed-off-by: default avatarGal Pressman <galp@mellanox.com>
      Cc: kernel-team@fb.com
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cc1d2a62
    • Gal Pressman's avatar
      net/mlx5e: Use the correct pause values for ethtool advertising · c8a52aa7
      Gal Pressman authored
      
      [ Upstream commit b383b544 ]
      
      Query the operational pause from firmware (PFCC register) instead of
      always passing zeros.
      
      Fixes: 665bc539 ("net/mlx5e: Use new ethtool get/set link ksettings API")
      Signed-off-by: default avatarGal Pressman <galp@mellanox.com>
      Cc: kernel-team@fb.com
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c8a52aa7
    • Douglas Caetano dos Santos's avatar
      net/packet: fix missing net_device reference release · 03c10a87
      Douglas Caetano dos Santos authored
      
      [ Upstream commit d19b183c ]
      
      When using a TX ring buffer, if an error occurs processing a control
      message (e.g. invalid message), the net_device reference is not
      released.
      
      Fixes c14ac945 ("sock: enable timestamping using control messages")
      Signed-off-by: default avatarDouglas Caetano dos Santos <douglascs@taghos.com.br>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      03c10a87
    • Eric Dumazet's avatar
      sctp: do not inherit ipv6_{mc|ac|fl}_list from parent · 703a2082
      Eric Dumazet authored
      
      [ Upstream commit fdcee2cb ]
      
      SCTP needs fixes similar to 83eaddab ("ipv6/dccp: do not inherit
      ipv6_mc_list from parent"), otherwise bad things can happen.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      703a2082
    • Xin Long's avatar
      sctp: fix src address selection if using secondary addresses for ipv6 · dc9bf551
      Xin Long authored
      
      [ Upstream commit dbc2b5e9 ]
      
      Commit 0ca50d12 ("sctp: fix src address selection if using secondary
      addresses") has fixed a src address selection issue when using secondary
      addresses for ipv4.
      
      Now sctp ipv6 also has the similar issue. When using a secondary address,
      sctp_v6_get_dst tries to choose the saddr which has the most same bits
      with the daddr by sctp_v6_addr_match_len. It may make some cases not work
      as expected.
      
      hostA:
        [1] fd21:356b:459a:cf10::11 (eth1)
        [2] fd21:356b:459a:cf20::11 (eth2)
      
      hostB:
        [a] fd21:356b:459a:cf30::2  (eth1)
        [b] fd21:356b:459a:cf40::2  (eth2)
      
      route from hostA to hostB:
        fd21:356b:459a:cf30::/64 dev eth1  metric 1024  mtu 1500
      
      The expected path should be:
        fd21:356b:459a:cf10::11 <-> fd21:356b:459a:cf30::2
      But addr[2] matches addr[a] more bits than addr[1] does, according to
      sctp_v6_addr_match_len. It causes the path to be:
        fd21:356b:459a:cf20::11 <-> fd21:356b:459a:cf30::2
      
      This patch is to fix it with the same way as Marcelo's fix for sctp ipv4.
      As no ip_dev_find for ipv6, this patch is to use ipv6_chk_addr to check
      if the saddr is in a dev instead.
      
      Note that for backwards compatibility, it will still do the addr_match_len
      check here when no optimal is found.
      Reported-by: default avatarPatrick Talbert <ptalbert@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dc9bf551
    • Jon Paul Maloy's avatar
      tipc: make macro tipc_wait_for_cond() smp safe · d9cb26d6
      Jon Paul Maloy authored
      
      [ Upstream commit 844cf763 ]
      
      The macro tipc_wait_for_cond() is embedding the macro sk_wait_event()
      to fulfil its task. The latter, in turn, is evaluating the stated
      condition outside the socket lock context. This is problematic if
      the condition is accessing non-trivial data structures which may be
      altered by incoming interrupts, as is the case with the cong_links()
      linked list, used by socket to keep track of the current set of
      congested links. We sometimes see crashes when this list is accessed
      by a condition function at the same time as a SOCK_WAKEUP interrupt
      is removing an element from the list.
      
      We fix this by expanding selected parts of sk_wait_event() into the
      outer macro, while ensuring that all evaluations of a given condition
      are performed under socket lock protection.
      
      Fixes: commit 365ad353 ("tipc: reduce risk of user starvation during link congestion")
      Reviewed-by: default avatarParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d9cb26d6
    • Yuchung Cheng's avatar
      tcp: avoid fragmenting peculiar skbs in SACK · c91cd74b
      Yuchung Cheng authored
      
      [ Upstream commit b451e5d2 ]
      
      This patch fixes a bug in splitting an SKB during SACK
      processing. Specifically if an skb contains multiple
      packets and is only partially sacked in the higher sequences,
      tcp_match_sack_to_skb() splits the skb and marks the second fragment
      as SACKed.
      
      The current code further attempts rounding up the first fragment
      to MSS boundaries. But it misses a boundary condition when the
      rounded-up fragment size (pkt_len) is exactly skb size.  Spliting
      such an skb is pointless and causses a kernel warning and aborts
      the SACK processing. This patch universally checks such over-split
      before calling tcp_fragment to prevent these unnecessary warnings.
      
      Fixes: adb92db8 ("tcp: Make SACK code to split only at mss boundaries")
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c91cd74b