1. 12 Oct, 2017 40 commits
    • Luca Coelho's avatar
      iwlwifi: mvm: use IWL_HCMD_NOCOPY for MCAST_FILTER_CMD · f8895642
      Luca Coelho authored
      commit 97bce57b upstream.
      
      The MCAST_FILTER_CMD can get quite large when we have many mcast
      addresses to set (we support up to 255).  So the command should be
      send as NOCOPY to prevent a warning caused by too-long commands:
      
      WARNING: CPU: 0 PID: 9700 at /root/iwlwifi/stack-dev/drivers/net/wireless/intel/iwlwifi/pcie/tx.c:1550 iwl_pcie_enqueue_hcmd+0x8c7/0xb40 [iwlwifi]
      Command MCAST_FILTER_CMD (0x1d0) is too large (328 bytes)
      
      This fixes: https://bugzilla.kernel.org/show_bug.cgi?id=196743Signed-off-by: default avatarLuca Coelho <luciano.coelho@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f8895642
    • Arnd Bergmann's avatar
      netlink: fix nla_put_{u8,u16,u32} for KASAN · 9a19bc44
      Arnd Bergmann authored
      commit b4391db4 upstream.
      
      When CONFIG_KASAN is enabled, the "--param asan-stack=1" causes rather large
      stack frames in some functions. This goes unnoticed normally because
      CONFIG_FRAME_WARN is disabled with CONFIG_KASAN by default as of commit
      3f181b4d ("lib/Kconfig.debug: disable -Wframe-larger-than warnings with
      KASAN=y").
      
      The kernelci.org build bot however has the warning enabled and that led
      me to investigate it a little further, as every build produces these warnings:
      
      net/wireless/nl80211.c:4389:1: warning: the frame size of 2240 bytes is larger than 2048 bytes [-Wframe-larger-than=]
      net/wireless/nl80211.c:1895:1: warning: the frame size of 3776 bytes is larger than 2048 bytes [-Wframe-larger-than=]
      net/wireless/nl80211.c:1410:1: warning: the frame size of 2208 bytes is larger than 2048 bytes [-Wframe-larger-than=]
      net/bridge/br_netlink.c:1282:1: warning: the frame size of 2544 bytes is larger than 2048 bytes [-Wframe-larger-than=]
      
      Most of this problem is now solved in gcc-8, which can consolidate
      the stack slots for the inline function arguments. On older compilers
      we can add a workaround by declaring a local variable in each function
      to pass the inline function argument.
      
      Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9a19bc44
    • Arnd Bergmann's avatar
      rocker: fix rocker_tlv_put_* functions for KASAN · 57a77fff
      Arnd Bergmann authored
      commit 6098d7dd upstream.
      
      Inlining these functions creates lots of stack variables that each take
      64 bytes when KASAN is enabled, leading to this warning about potential
      stack overflow:
      
      drivers/net/ethernet/rocker/rocker_ofdpa.c: In function 'ofdpa_cmd_flow_tbl_add':
      drivers/net/ethernet/rocker/rocker_ofdpa.c:621:1: error: the frame size of 2752 bytes is larger than 1536 bytes [-Werror=frame-larger-than=]
      
      gcc-8 can now consolidate the stack slots itself, but on older versions
      we get the same behavior by using a temporary variable that holds a
      copy of the inline function argument.
      
      Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      57a77fff
    • Ping Cheng's avatar
      HID: wacom: bits shifted too much for 9th and 10th buttons · 50b27486
      Ping Cheng authored
      commit ce06760b upstream.
      
      Cintiq 12 has 10 expresskey buttons. The bit shift for the last
      two buttons were off by 5.
      
      Fixes: c7f0522a ("HID: wacom: Slim down wacom_intuos_pad processing")
      Signed-off-by: default avatarPing Cheng <ping.cheng@wacom.com>
      Tested-by: default avatarMatthieu Robin <matthieu@macolu.org>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Cc: Jason Gerecke <killertofu@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      50b27486
    • Jason Gerecke's avatar
      HID: wacom: Always increment hdev refcount within wacom_get_hdev_data · 953f5e7c
      Jason Gerecke authored
      commit 2a5e597c upstream.
      
      The wacom_get_hdev_data function is used to find and return a reference to
      the "other half" of a Wacom device (i.e., the touch device associated with
      a pen, or vice-versa). To ensure these references are properly accounted
      for, the function is supposed to automatically increment the refcount before
      returning. This was not done, however, for devices which have pen & touch
      on different interfaces of the same USB device. This can lead to a WARNING
      ("refcount_t: underflow; use-after-free") when removing the module or device
      as we call kref_put() more times than kref_get(). Triggering an "actual" use-
      after-free would be difficult since both devices will disappear nearly-
      simultaneously. To silence this warning and prevent the potential error, we
      need to increment the refcount for all cases within wacom_get_hdev_data.
      
      Fixes: 41372d5d ("HID: wacom: Augment 'oVid' and 'oPid' with heuristics for HID_GENERIC")
      Signed-off-by: default avatarJason Gerecke <jason.gerecke@wacom.com>
      Reviewed-by: default avatarPing Cheng <ping.cheng@wacom.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      953f5e7c
    • Aaron Armstrong Skomra's avatar
      HID: wacom: leds: Don't try to control the EKR's read-only LEDs · 04b54e8f
      Aaron Armstrong Skomra authored
      commit 74aebed6 upstream.
      
      Commit a50aac71 introduces 'led.groups' and adds EKR support
      for these groups. However, unlike the other devices with LEDs,
      the EKR's LEDs are read-only and we shouldn't attempt to control
      them in wacom_led_control().
      
      See bug: https://sourceforge.net/p/linuxwacom/bugs/342/
      
      Fixes: a50aac71 ("HID: wacom: leds: dynamically allocate LED groups")
      Signed-off-by: default avatarAaron Armstrong Skomra <aaron.skomra@wacom.com>
      Reviewed-by: default avatarJason Gerecke <jason.gerecke@wacom.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      04b54e8f
    • Adrian Salido's avatar
      HID: i2c-hid: allocate hid buffers for real worst case · 5abb9cd4
      Adrian Salido authored
      commit 8320caee upstream.
      
      The buffer allocation is not currently accounting for an extra byte for
      the report id. This can cause an out of bounds access in function
      i2c_hid_set_or_send_report() with reportID > 15.
      Signed-off-by: default avatarAdrian Salido <salidoa@google.com>
      Reviewed-by: default avatarBenson Leung <bleung@chromium.org>
      Signed-off-by: default avatarGuenter Roeck <groeck@chromium.org>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5abb9cd4
    • Shu Wang's avatar
      ftrace: Fix kmemleak in unregister_ftrace_graph · a3ec1049
      Shu Wang authored
      commit 2b0b8499 upstream.
      
      The trampoline allocated by function tracer was overwriten by function_graph
      tracer, and caused a memory leak. The save_global_trampoline should have
      saved the previous trampoline in register_ftrace_graph() and restored it in
      unregister_ftrace_graph(). But as it is implemented, save_global_trampoline was
      only used in unregister_ftrace_graph as default value 0, and it overwrote the
      previous trampoline's value. Causing the previous allocated trampoline to be
      lost.
      
      kmmeleak backtrace:
          kmemleak_vmalloc+0x77/0xc0
          __vmalloc_node_range+0x1b5/0x2c0
          module_alloc+0x7c/0xd0
          arch_ftrace_update_trampoline+0xb5/0x290
          ftrace_startup+0x78/0x210
          register_ftrace_function+0x8b/0xd0
          function_trace_init+0x4f/0x80
          tracing_set_tracer+0xe6/0x170
          tracing_set_trace_write+0x90/0xd0
          __vfs_write+0x37/0x170
          vfs_write+0xb2/0x1b0
          SyS_write+0x55/0xc0
          do_syscall_64+0x67/0x180
          return_from_SYSCALL_64+0x0/0x6a
      
      [
        Looking further into this, I found that this was left over from when the
        function and function graph tracers shared the same ftrace_ops. But in
        commit 5f151b24 ("ftrace: Fix function_profiler and function tracer
        together"), the two were separated, and the save_global_trampoline no
        longer was necessary (and it may have been broken back then too).
        -- Steven Rostedt
      ]
      
      Link: http://lkml.kernel.org/r/20170912021454.5976-1-shuwang@redhat.com
      
      Fixes: 5f151b24 ("ftrace: Fix function_profiler and function tracer together")
      Signed-off-by: default avatarShu Wang <shuwang@redhat.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a3ec1049
    • Alexander Shishkin's avatar
      stm class: Fix a use-after-free · 3ff8bc81
      Alexander Shishkin authored
      commit fd085bb1 upstream.
      
      For reasons unknown, the stm_source removal path uses device_destroy()
      to kill the underlying device object. Because device_destroy() uses
      devt to look for the device to destroy and the fact that stm_source
      devices don't have one (or all have the same one), it just picks the
      first device in the class, which may well be the wrong one.
      
      That is, loading stm_console and stm_heartbeat and then removing both
      will die in dereferencing a freed object.
      
      Since this should have been device_unregister() in the first place,
      use it instead of device_destroy().
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Fixes: 7bd1d409 ("stm class: Introduce an abstraction for System Trace Module devices")
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3ff8bc81
    • Olaf Hering's avatar
      Drivers: hv: fcopy: restore correct transfer length · c541aaad
      Olaf Hering authored
      commit 549e658a upstream.
      
      Till recently the expected length of bytes read by the
      daemon did depend on the context. It was either hv_start_fcopy or
      hv_do_fcopy. The daemon had a buffer size of two pages, which was much
      larger than needed.
      
      Now the expected length of bytes read by the
      daemon changed slightly. For START_FILE_COPY it is still the size of
      hv_start_fcopy.  But for WRITE_TO_FILE and the other operations it is as
      large as the buffer that arrived via vmbus. In case of WRITE_TO_FILE
      that is slightly larger than a struct hv_do_fcopy. Since the buffer in
      the daemon was still larger everything was fine.
      
      Currently, the daemon reads only what is actually needed.
      The new buffer layout is as large as a struct hv_do_fcopy, for the
      WRITE_TO_FILE operation. Since the kernel expects a slightly larger
      size, hvt_op_read will return -EINVAL because the daemon will read
      slightly less than expected. Address this by restoring the expected
      buffer size in case of WRITE_TO_FILE.
      
      Fixes: 'c7e490fc ("Drivers: hv: fcopy: convert to hv_utils_transport")'
      Fixes: '3f2baa8a ("Tools: hv: update buffer handling in hv_fcopy_daemon")'
      Signed-off-by: default avatarOlaf Hering <olaf@aepfle.de>
      Signed-off-by: default avatarK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c541aaad
    • Nicolai Stange's avatar
      driver core: platform: Don't read past the end of "driver_override" buffer · a97ca4f7
      Nicolai Stange authored
      commit bf563b01 upstream.
      
      When printing the driver_override parameter when it is 4095 and 4094 bytes
      long, the printing code would access invalid memory because we need count+1
      bytes for printing.
      
      Reject driver_override values of these lengths in driver_override_store().
      
      This is in close analogy to commit 4efe874a ("PCI: Don't read past the
      end of sysfs "driver_override" buffer") from Sasha Levin.
      
      Fixes: 3d713e0e ("driver core: platform: add device binding path 'driver_override'")
      Signed-off-by: default avatarNicolai Stange <nstange@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a97ca4f7
    • Mark Rutland's avatar
      percpu: make this_cpu_generic_read() atomic w.r.t. interrupts · fc3c6722
      Mark Rutland authored
      commit e88d62cd upstream.
      
      As raw_cpu_generic_read() is a plain read from a raw_cpu_ptr() address,
      it's possible (albeit unlikely) that the compiler will split the access
      across multiple instructions.
      
      In this_cpu_generic_read() we disable preemption but not interrupts
      before calling raw_cpu_generic_read(). Thus, an interrupt could be taken
      in the middle of the split load instructions. If a this_cpu_write() or
      RMW this_cpu_*() op is made to the same variable in the interrupt
      handling path, this_cpu_read() will return a torn value.
      
      For native word types, we can avoid tearing using READ_ONCE(), but this
      won't work in all cases (e.g. 64-bit types on most 32-bit platforms).
      This patch reworks this_cpu_generic_read() to use READ_ONCE() where
      possible, otherwise falling back to disabling interrupts.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Pranith Kumar <bobby.prani@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fc3c6722
    • Gustavo Romero's avatar
      powerpc/tm: Fix illegal TM state in signal handler · 6a988259
      Gustavo Romero authored
      commit 044215d1 upstream.
      
      Currently it's possible that on returning from the signal handler
      through the restore_tm_sigcontexts() code path (e.g. from a signal
      caught due to a `trap` instruction executed in the middle of an HTM
      block, or a deliberately constructed sigframe) an illegal TM state
      (like TS=10 TM=0, i.e. "T0") is set in SRR1 and when `rfid` sets
      implicitly the MSR register from SRR1 register on return to userspace
      it causes a TM Bad Thing exception.
      
      That illegal state can be set (a) by a malicious user that disables
      the TM bit by tweaking the bits in uc_mcontext before returning from
      the signal handler or (b) by a sufficient number of context switches
      occurring such that the load_tm counter overflows and TM is disabled
      whilst in the signal handler.
      
      This commit fixes the illegal TM state by ensuring that TM bit is
      always enabled before we return from restore_tm_sigcontexts(). A small
      comment correction is made as well.
      
      Fixes: 5d176f75 ("powerpc: tm: Enable transactional memory (TM) lazily for userspace")
      Signed-off-by: default avatarGustavo Romero <gromero@linux.vnet.ibm.com>
      Signed-off-by: default avatarBreno Leitao <leitao@debian.org>
      Signed-off-by: default avatarCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6a988259
    • Cyril Bur's avatar
      powerpc/64s: Use emergency stack for kernel TM Bad Thing program checks · afebf5ef
      Cyril Bur authored
      commit 265e60a1 upstream.
      
      When using transactional memory (TM), the CPU can be in one of six
      states as far as TM is concerned, encoded in the Machine State
      Register (MSR). Certain state transitions are illegal and if attempted
      trigger a "TM Bad Thing" type program check exception.
      
      If we ever hit one of these exceptions it's treated as a bug, ie. we
      oops, and kill the process and/or panic, depending on configuration.
      
      One case where we can trigger a TM Bad Thing, is when returning to
      userspace after a system call or interrupt, using RFID. When this
      happens the CPU first restores the user register state, in particular
      r1 (the stack pointer) and then attempts to update the MSR. However
      the MSR update is not allowed and so we take the program check with
      the user register state, but the kernel MSR.
      
      This tricks the exception entry code into thinking we have a bad
      kernel stack pointer, because the MSR says we're coming from the
      kernel, but r1 is pointing to userspace.
      
      To avoid this we instead always switch to the emergency stack if we
      take a TM Bad Thing from the kernel. That way none of the user
      register values are used, other than for printing in the oops message.
      
      This is the fix for CVE-2017-1000255.
      
      Fixes: 5d176f75 ("powerpc: tm: Enable transactional memory (TM) lazily for userspace")
      Signed-off-by: default avatarCyril Bur <cyrilbur@gmail.com>
      [mpe: Rewrite change log & comments, tweak asm slightly]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      afebf5ef
    • Eric Dumazet's avatar
      socket, bpf: fix possible use after free · 02f7e410
      Eric Dumazet authored
      
      [ Upstream commit eefca20e ]
      
      Starting from linux-4.4, 3WHS no longer takes the listener lock.
      
      Since this time, we might hit a use-after-free in sk_filter_charge(),
      if the filter we got in the memcpy() of the listener content
      just happened to be replaced by a thread changing listener BPF filter.
      
      To fix this, we need to make sure the filter refcount is not already
      zero before incrementing it again.
      
      Fixes: e994b2f0 ("tcp: do not lock listener to process SYN packets")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      02f7e410
    • Nikolay Aleksandrov's avatar
      net: rtnetlink: fix info leak in RTM_GETSTATS call · 95206ea3
      Nikolay Aleksandrov authored
      
      [ Upstream commit ce024f42 ]
      
      When RTM_GETSTATS was added the fields of its header struct were not all
      initialized when returning the result thus leaking 4 bytes of information
      to user-space per rtnl_fill_statsinfo call, so initialize them now. Thanks
      to Alexander Potapenko for the detailed report and bisection.
      Reported-by: default avatarAlexander Potapenko <glider@google.com>
      Fixes: 10c9ead9 ("rtnetlink: add new RTM_GETSTATS message to dump link stats")
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Acked-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      95206ea3
    • Parthasarathy Bhuvaragan's avatar
      tipc: use only positive error codes in messages · 58b1b840
      Parthasarathy Bhuvaragan authored
      
      [ Upstream commit aad06212 ]
      
      In commit e3a77561 ("tipc: split up function tipc_msg_eval()"),
      we have updated the function tipc_msg_lookup_dest() to set the error
      codes to negative values at destination lookup failures. Thus when
      the function sets the error code to -TIPC_ERR_NO_NAME, its inserted
      into the 4 bit error field of the message header as 0xf instead of
      TIPC_ERR_NO_NAME (1). The value 0xf is an unknown error code.
      
      In this commit, we set only positive error code.
      
      Fixes: e3a77561 ("tipc: split up function tipc_msg_eval()")
      Signed-off-by: default avatarParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      58b1b840
    • Xin Long's avatar
      ip6_tunnel: update mtu properly for ARPHRD_ETHER tunnel device in tx path · 09788d46
      Xin Long authored
      
      [ Upstream commit d41bb33b ]
      
      Now when updating mtu in tx path, it doesn't consider ARPHRD_ETHER tunnel
      device, like ip6gre_tap tunnel, for which it should also subtract ether
      header to get the correct mtu.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      09788d46
    • Xin Long's avatar
      ip6_gre: ip6gre_tap device should keep dst · ab4da56f
      Xin Long authored
      
      [ Upstream commit 2d40557c ]
      
      The patch 'ip_gre: ipgre_tap device should keep dst' fixed
      a issue that ipgre_tap mtu couldn't be updated in tx path.
      
      The same fix is needed for ip6gre_tap as well.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ab4da56f
    • Jason A. Donenfeld's avatar
      netlink: do not proceed if dump's start() errs · b4a11925
      Jason A. Donenfeld authored
      
      [ Upstream commit fef0035c ]
      
      Drivers that use the start method for netlink dumping rely on dumpit not
      being called if start fails. For example, ila_xlat.c allocates memory
      and assigns it to cb->args[0] in its start() function. It might fail to
      do that and return -ENOMEM instead. However, even when returning an
      error, dumpit will be called, which, in the example above, quickly
      dereferences the memory in cb->args[0], which will OOPS the kernel. This
      is but one example of how this goes wrong.
      
      Since start() has always been a function with an int return type, it
      therefore makes sense to use it properly, rather than ignoring it. This
      patch thus returns early and does not call dumpit() when start() fails.
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Reviewed-by: default avatarJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b4a11925
    • Christoph Paasch's avatar
      net: Set sk_prot_creator when cloning sockets to the right proto · cf2eaf16
      Christoph Paasch authored
      
      [ Upstream commit 9d538fa6 ]
      
      sk->sk_prot and sk->sk_prot_creator can differ when the app uses
      IPV6_ADDRFORM (transforming an IPv6-socket to an IPv4-one).
      Which is why sk_prot_creator is there to make sure that sk_prot_free()
      does the kmem_cache_free() on the right kmem_cache slab.
      
      Now, if such a socket gets transformed back to a listening socket (using
      connect() with AF_UNSPEC) we will allocate an IPv4 tcp_sock through
      sk_clone_lock() when a new connection comes in. But sk_prot_creator will
      still point to the IPv6 kmem_cache (as everything got copied in
      sk_clone_lock()). When freeing, we will thus put this
      memory back into the IPv6 kmem_cache although it was allocated in the
      IPv4 cache. I have seen memory corruption happening because of this.
      
      With slub-debugging and MEMCG_KMEM enabled this gives the warning
      	"cache_from_obj: Wrong slab cache. TCPv6 but object is from TCP"
      
      A C-program to trigger this:
      
      void main(void)
      {
              int fd = socket(AF_INET6, SOCK_STREAM, IPPROTO_TCP);
              int new_fd, newest_fd, client_fd;
              struct sockaddr_in6 bind_addr;
              struct sockaddr_in bind_addr4, client_addr1, client_addr2;
              struct sockaddr unsp;
              int val;
      
              memset(&bind_addr, 0, sizeof(bind_addr));
              bind_addr.sin6_family = AF_INET6;
              bind_addr.sin6_port = ntohs(42424);
      
              memset(&client_addr1, 0, sizeof(client_addr1));
              client_addr1.sin_family = AF_INET;
              client_addr1.sin_port = ntohs(42424);
              client_addr1.sin_addr.s_addr = inet_addr("127.0.0.1");
      
              memset(&client_addr2, 0, sizeof(client_addr2));
              client_addr2.sin_family = AF_INET;
              client_addr2.sin_port = ntohs(42421);
              client_addr2.sin_addr.s_addr = inet_addr("127.0.0.1");
      
              memset(&unsp, 0, sizeof(unsp));
              unsp.sa_family = AF_UNSPEC;
      
              bind(fd, (struct sockaddr *)&bind_addr, sizeof(bind_addr));
      
              listen(fd, 5);
      
              client_fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
              connect(client_fd, (struct sockaddr *)&client_addr1, sizeof(client_addr1));
              new_fd = accept(fd, NULL, NULL);
              close(fd);
      
              val = AF_INET;
              setsockopt(new_fd, SOL_IPV6, IPV6_ADDRFORM, &val, sizeof(val));
      
              connect(new_fd, &unsp, sizeof(unsp));
      
              memset(&bind_addr4, 0, sizeof(bind_addr4));
              bind_addr4.sin_family = AF_INET;
              bind_addr4.sin_port = ntohs(42421);
              bind(new_fd, (struct sockaddr *)&bind_addr4, sizeof(bind_addr4));
      
              listen(new_fd, 5);
      
              client_fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
              connect(client_fd, (struct sockaddr *)&client_addr2, sizeof(client_addr2));
      
              newest_fd = accept(new_fd, NULL, NULL);
              close(new_fd);
      
              close(client_fd);
              close(new_fd);
      }
      
      As far as I can see, this bug has been there since the beginning of the
      git-days.
      Signed-off-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cf2eaf16
    • Willem de Bruijn's avatar
      packet: only test po->has_vnet_hdr once in packet_snd · 24ee394a
      Willem de Bruijn authored
      
      [ Upstream commit da7c9561 ]
      
      Packet socket option po->has_vnet_hdr can be updated concurrently with
      other operations if no ring is attached.
      
      Do not test the option twice in packet_snd, as the value may change in
      between calls. A race on setsockopt disable may cause a packet > mtu
      to be sent without having GSO options set.
      
      Fixes: bfd5f4a3 ("packet: Add GSO/csum offload support.")
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      24ee394a
    • Willem de Bruijn's avatar
      packet: in packet_do_bind, test fanout with bind_lock held · 0f22167d
      Willem de Bruijn authored
      
      [ Upstream commit 4971613c ]
      
      Once a socket has po->fanout set, it remains a member of the group
      until it is destroyed. The prot_hook must be constant and identical
      across sockets in the group.
      
      If fanout_add races with packet_do_bind between the test of po->fanout
      and taking the lock, the bind call may make type or dev inconsistent
      with that of the fanout group.
      
      Hold po->bind_lock when testing po->fanout to avoid this race.
      
      I had to introduce artificial delay (local_bh_enable) to actually
      observe the race.
      
      Fixes: dc99f600 ("packet: Add fanout support.")
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0f22167d
    • Florian Fainelli's avatar
      net: dsa: Fix network device registration order · 6eab1f82
      Florian Fainelli authored
      
      [ Upstream commit e804441c ]
      
      We cannot be registering the network device first, then setting its
      carrier off and finally connecting it to a PHY, doing that leaves a
      window during which the carrier is at best inconsistent, and at worse
      the device is not usable without a down/up sequence since the network
      device is visible to user space with possibly no PHY device attached.
      
      Re-order steps so that they make logical sense. This fixes some devices
      where the port was not usable after e.g: an unbind then bind of the
      driver.
      
      Fixes: 0071f56e ("dsa: Register netdev before phy")
      Fixes: 91da11f8 ("net: Distributed Switch Architecture protocol support")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6eab1f82
    • Alexander Potapenko's avatar
      tun: bail out from tun_get_user() if the skb is empty · b8990d2e
      Alexander Potapenko authored
      
      [ Upstream commit 2580c4c1 ]
      
      KMSAN (https://github.com/google/kmsan) reported accessing uninitialized
      skb->data[0] in the case the skb is empty (i.e. skb->len is 0):
      
      ================================================
      BUG: KMSAN: use of uninitialized memory in tun_get_user+0x19ba/0x3770
      CPU: 0 PID: 3051 Comm: probe Not tainted 4.13.0+ #3140
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      Call Trace:
      ...
       __msan_warning_32+0x66/0xb0 mm/kmsan/kmsan_instr.c:477
       tun_get_user+0x19ba/0x3770 drivers/net/tun.c:1301
       tun_chr_write_iter+0x19f/0x300 drivers/net/tun.c:1365
       call_write_iter ./include/linux/fs.h:1743
       new_sync_write fs/read_write.c:457
       __vfs_write+0x6c3/0x7f0 fs/read_write.c:470
       vfs_write+0x3e4/0x770 fs/read_write.c:518
       SYSC_write+0x12f/0x2b0 fs/read_write.c:565
       SyS_write+0x55/0x80 fs/read_write.c:557
       do_syscall_64+0x242/0x330 arch/x86/entry/common.c:284
       entry_SYSCALL64_slow_path+0x25/0x25 arch/x86/entry/entry_64.S:245
      ...
      origin:
      ...
       kmsan_poison_shadow+0x6e/0xc0 mm/kmsan/kmsan.c:211
       slab_alloc_node mm/slub.c:2732
       __kmalloc_node_track_caller+0x351/0x370 mm/slub.c:4351
       __kmalloc_reserve net/core/skbuff.c:138
       __alloc_skb+0x26a/0x810 net/core/skbuff.c:231
       alloc_skb ./include/linux/skbuff.h:903
       alloc_skb_with_frags+0x1d7/0xc80 net/core/skbuff.c:4756
       sock_alloc_send_pskb+0xabf/0xfe0 net/core/sock.c:2037
       tun_alloc_skb drivers/net/tun.c:1144
       tun_get_user+0x9a8/0x3770 drivers/net/tun.c:1274
       tun_chr_write_iter+0x19f/0x300 drivers/net/tun.c:1365
       call_write_iter ./include/linux/fs.h:1743
       new_sync_write fs/read_write.c:457
       __vfs_write+0x6c3/0x7f0 fs/read_write.c:470
       vfs_write+0x3e4/0x770 fs/read_write.c:518
       SYSC_write+0x12f/0x2b0 fs/read_write.c:565
       SyS_write+0x55/0x80 fs/read_write.c:557
       do_syscall_64+0x242/0x330 arch/x86/entry/common.c:284
       return_from_SYSCALL_64+0x0/0x6a arch/x86/entry/entry_64.S:245
      ================================================
      
      Make sure tun_get_user() doesn't touch skb->data[0] unless there is
      actual data.
      
      C reproducer below:
      ==========================
          // autogenerated by syzkaller (http://github.com/google/syzkaller)
      
          #define _GNU_SOURCE
      
          #include <fcntl.h>
          #include <linux/if_tun.h>
          #include <netinet/ip.h>
          #include <net/if.h>
          #include <string.h>
          #include <sys/ioctl.h>
      
          int main()
          {
            int sock = socket(PF_INET, SOCK_STREAM, IPPROTO_IP);
            int tun_fd = open("/dev/net/tun", O_RDWR);
            struct ifreq req;
            memset(&req, 0, sizeof(struct ifreq));
            strcpy((char*)&req.ifr_name, "gre0");
            req.ifr_flags = IFF_UP | IFF_MULTICAST;
            ioctl(tun_fd, TUNSETIFF, &req);
            ioctl(sock, SIOCSIFFLAGS, "gre0");
            write(tun_fd, "hi", 0);
            return 0;
          }
      ==========================
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b8990d2e
    • Sabrina Dubroca's avatar
      l2tp: fix race condition in l2tp_tunnel_delete · b4a9b12d
      Sabrina Dubroca authored
      
      [ Upstream commit 62b982ee ]
      
      If we try to delete the same tunnel twice, the first delete operation
      does a lookup (l2tp_tunnel_get), finds the tunnel, calls
      l2tp_tunnel_delete, which queues it for deletion by
      l2tp_tunnel_del_work.
      
      The second delete operation also finds the tunnel and calls
      l2tp_tunnel_delete. If the workqueue has already fired and started
      running l2tp_tunnel_del_work, then l2tp_tunnel_delete will queue the
      same tunnel a second time, and try to free the socket again.
      
      Add a dead flag to prevent firing the workqueue twice. Then we can
      remove the check of queue_work's result that was meant to prevent that
      race but doesn't.
      
      Reproducer:
      
          ip l2tp add tunnel tunnel_id 3000 peer_tunnel_id 4000 local 192.168.0.2 remote 192.168.0.1 encap udp udp_sport 5000 udp_dport 6000
          ip l2tp add session name l2tp1 tunnel_id 3000 session_id 1000 peer_session_id 2000
          ip link set l2tp1 up
          ip l2tp del tunnel tunnel_id 3000
          ip l2tp del tunnel tunnel_id 3000
      
      Fixes: f8ccac0e ("l2tp: put tunnel socket release on a workqueue")
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Acked-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b4a9b12d
    • Ridge Kennedy's avatar
      l2tp: Avoid schedule while atomic in exit_net · e5941137
      Ridge Kennedy authored
      
      [ Upstream commit 12d656af ]
      
      While destroying a network namespace that contains a L2TP tunnel a
      "BUG: scheduling while atomic" can be observed.
      
      Enabling lockdep shows that this is happening because l2tp_exit_net()
      is calling l2tp_tunnel_closeall() (via l2tp_tunnel_delete()) from
      within an RCU critical section.
      
      l2tp_exit_net() takes rcu_read_lock_bh()
        << list_for_each_entry_rcu() >>
        l2tp_tunnel_delete()
          l2tp_tunnel_closeall()
            __l2tp_session_unhash()
              synchronize_rcu() << Illegal inside RCU critical section >>
      
      BUG: sleeping function called from invalid context
      in_atomic(): 1, irqs_disabled(): 0, pid: 86, name: kworker/u16:2
      INFO: lockdep is turned off.
      CPU: 2 PID: 86 Comm: kworker/u16:2 Tainted: G        W  O    4.4.6-at1 #2
      Hardware name: Xen HVM domU, BIOS 4.6.1-xs125300 05/09/2016
      Workqueue: netns cleanup_net
       0000000000000000 ffff880202417b90 ffffffff812b0013 ffff880202410ac0
       ffffffff81870de8 ffff880202417bb8 ffffffff8107aee8 ffffffff81870de8
       0000000000000c51 0000000000000000 ffff880202417be0 ffffffff8107b024
      Call Trace:
       [<ffffffff812b0013>] dump_stack+0x85/0xc2
       [<ffffffff8107aee8>] ___might_sleep+0x148/0x240
       [<ffffffff8107b024>] __might_sleep+0x44/0x80
       [<ffffffff810b21bd>] synchronize_sched+0x2d/0xe0
       [<ffffffff8109be6d>] ? trace_hardirqs_on+0xd/0x10
       [<ffffffff8105c7bb>] ? __local_bh_enable_ip+0x6b/0xc0
       [<ffffffff816a1b00>] ? _raw_spin_unlock_bh+0x30/0x40
       [<ffffffff81667482>] __l2tp_session_unhash+0x172/0x220
       [<ffffffff81667397>] ? __l2tp_session_unhash+0x87/0x220
       [<ffffffff8166888b>] l2tp_tunnel_closeall+0x9b/0x140
       [<ffffffff81668c74>] l2tp_tunnel_delete+0x14/0x60
       [<ffffffff81668dd0>] l2tp_exit_net+0x110/0x270
       [<ffffffff81668d5c>] ? l2tp_exit_net+0x9c/0x270
       [<ffffffff815001c3>] ops_exit_list.isra.6+0x33/0x60
       [<ffffffff81501166>] cleanup_net+0x1b6/0x280
       ...
      
      This bug can easily be reproduced with a few steps:
      
       $ sudo unshare -n bash  # Create a shell in a new namespace
       # ip link set lo up
       # ip addr add 127.0.0.1 dev lo
       # ip l2tp add tunnel remote 127.0.0.1 local 127.0.0.1 tunnel_id 1 \
          peer_tunnel_id 1 udp_sport 50000 udp_dport 50000
       # ip l2tp add session name foo tunnel_id 1 session_id 1 \
          peer_session_id 1
       # ip link set foo up
       # exit  # Exit the shell, in turn exiting the namespace
       $ dmesg
       ...
       [942121.089216] BUG: scheduling while atomic: kworker/u16:3/13872/0x00000200
       ...
      
      To fix this, move the call to l2tp_tunnel_closeall() out of the RCU
      critical section, and instead call it from l2tp_tunnel_del_work(), which
      is running from the l2tp_wq workqueue.
      
      Fixes: 2b551c6e ("l2tp: close sessions before initiating tunnel delete")
      Signed-off-by: default avatarRidge Kennedy <ridge.kennedy@alliedtelesis.co.nz>
      Acked-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e5941137
    • Alexey Kodanev's avatar
      vti: fix use after free in vti_tunnel_xmit/vti6_tnl_xmit · 6689f835
      Alexey Kodanev authored
      
      [ Upstream commit 36f6ee22 ]
      
      When running LTP IPsec tests, KASan might report:
      
      BUG: KASAN: use-after-free in vti_tunnel_xmit+0xeee/0xff0 [ip_vti]
      Read of size 4 at addr ffff880dc6ad1980 by task swapper/0/0
      ...
      Call Trace:
        <IRQ>
        dump_stack+0x63/0x89
        print_address_description+0x7c/0x290
        kasan_report+0x28d/0x370
        ? vti_tunnel_xmit+0xeee/0xff0 [ip_vti]
        __asan_report_load4_noabort+0x19/0x20
        vti_tunnel_xmit+0xeee/0xff0 [ip_vti]
        ? vti_init_net+0x190/0x190 [ip_vti]
        ? save_stack_trace+0x1b/0x20
        ? save_stack+0x46/0xd0
        dev_hard_start_xmit+0x147/0x510
        ? icmp_echo.part.24+0x1f0/0x210
        __dev_queue_xmit+0x1394/0x1c60
      ...
      Freed by task 0:
        save_stack_trace+0x1b/0x20
        save_stack+0x46/0xd0
        kasan_slab_free+0x70/0xc0
        kmem_cache_free+0x81/0x1e0
        kfree_skbmem+0xb1/0xe0
        kfree_skb+0x75/0x170
        kfree_skb_list+0x3e/0x60
        __dev_queue_xmit+0x1298/0x1c60
        dev_queue_xmit+0x10/0x20
        neigh_resolve_output+0x3a8/0x740
        ip_finish_output2+0x5c0/0xe70
        ip_finish_output+0x4ba/0x680
        ip_output+0x1c1/0x3a0
        xfrm_output_resume+0xc65/0x13d0
        xfrm_output+0x1e4/0x380
        xfrm4_output_finish+0x5c/0x70
      
      Can be fixed if we get skb->len before dst_output().
      
      Fixes: b9959fd3 ("vti: switch to new ip tunnel code")
      Fixes: 22e1b23d ("vti6: Support inter address family tunneling.")
      Signed-off-by: default avatarAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6689f835
    • Timur Tabi's avatar
      net: qcom/emac: specify the correct size when mapping a DMA buffer · 852bdea5
      Timur Tabi authored
      
      [ Upstream commit a93ad944 ]
      
      When mapping the RX DMA buffers, the driver was accidentally specifying
      zero for the buffer length.  Under normal circumstances, SWIOTLB does not
      need to allocate a bounce buffer, so the address is just mapped without
      checking the size field.  This is why the error was not detected earlier.
      
      Fixes: b9b17deb ("net: emac: emac gigabit ethernet controller driver")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarTimur Tabi <timur@codeaurora.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      852bdea5
    • Konstantin Khlebnikov's avatar
      net_sched: always reset qdisc backlog in qdisc_reset() · 5600c758
      Konstantin Khlebnikov authored
      
      [ Upstream commit c8e18129 ]
      
      SKB stored in qdisc->gso_skb also counted into backlog.
      
      Some qdiscs don't reset backlog to zero in ->reset(),
      for example sfq just dequeue and free all queued skb.
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Fixes: 2ccccf5f ("net_sched: update hierarchical backlog too")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5600c758
    • Meng Xu's avatar
      isdn/i4l: fetch the ppp_write buffer in one shot · 93eef217
      Meng Xu authored
      
      [ Upstream commit 02388bf8 ]
      
      In isdn_ppp_write(), the header (i.e., protobuf) of the buffer is
      fetched twice from userspace. The first fetch is used to peek at the
      protocol of the message and reset the huptimer if necessary; while the
      second fetch copies in the whole buffer. However, given that buf resides
      in userspace memory, a user process can race to change its memory content
      across fetches. By doing so, we can either avoid resetting the huptimer
      for any type of packets (by first setting proto to PPP_LCP and later
      change to the actual type) or force resetting the huptimer for LCP
      packets.
      
      This patch changes this double-fetch behavior into two single fetches
      decided by condition (lp->isdn_device < 0 || lp->isdn_channel <0).
      A more detailed discussion can be found at
      https://marc.info/?l=linux-kernel&m=150586376926123&w=2Signed-off-by: default avatarMeng Xu <mengxu.gatech@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      93eef217
    • Yonghong Song's avatar
      bpf: one perf event close won't free bpf program attached by another perf event · 0dee549f
      Yonghong Song authored
      
      [ Upstream commit ec9dd352 ]
      
      This patch fixes a bug exhibited by the following scenario:
        1. fd1 = perf_event_open with attr.config = ID1
        2. attach bpf program prog1 to fd1
        3. fd2 = perf_event_open with attr.config = ID1
           <this will be successful>
        4. user program closes fd2 and prog1 is detached from the tracepoint.
        5. user program with fd1 does not work properly as tracepoint
           no output any more.
      
      The issue happens at step 4. Multiple perf_event_open can be called
      successfully, but only one bpf prog pointer in the tp_event. In the
      current logic, any fd release for the same tp_event will free
      the tp_event->prog.
      
      The fix is to free tp_event->prog only when the closing fd
      corresponds to the one which registered the program.
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0dee549f
    • Willem de Bruijn's avatar
      packet: hold bind lock when rebinding to fanout hook · 6f7cdd4a
      Willem de Bruijn authored
      
      [ Upstream commit 008ba2a1 ]
      
      Packet socket bind operations must hold the po->bind_lock. This keeps
      po->running consistent with whether the socket is actually on a ptype
      list to receive packets.
      
      fanout_add unbinds a socket and its packet_rcv/tpacket_rcv call, then
      binds the fanout object to receive through packet_rcv_fanout.
      
      Make it hold the po->bind_lock when testing po->running and rebinding.
      Else, it can race with other rebind operations, such as that in
      packet_set_ring from packet_rcv to tpacket_rcv. Concurrent updates
      can result in a socket being added to a fanout group twice, causing
      use-after-free KASAN bug reports, among others.
      
      Reported independently by both trinity and syzkaller.
      Verified that the syzkaller reproducer passes after this patch.
      
      Fixes: dc99f600 ("packet: Add fanout support.")
      Reported-by: default avatarnixioaming <nixiaoming@huawei.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6f7cdd4a
    • Christian Lamparter's avatar
      net: emac: Fix napi poll list corruption · 6eac2cd2
      Christian Lamparter authored
      
      [ Upstream commit f5595606 ]
      
      This patch is pretty much a carbon copy of
      commit 3079c652 ("caif: Fix napi poll list corruption")
      with "caif" replaced by "emac".
      
      The commit d75b1ade ("net: less interrupt masking in NAPI")
      breaks emac.
      
      It is now required that if the entire budget is consumed when poll
      returns, the napi poll_list must remain empty.  However, like some
      other drivers emac tries to do a last-ditch check and if there is
      more work it will call napi_reschedule and then immediately process
      some of this new work.  Should the entire budget be consumed while
      processing such new work then we will violate the new caller
      contract.
      
      This patch fixes this by not touching any work when we reschedule
      in emac.
      Signed-off-by: default avatarChristian Lamparter <chunkeey@googlemail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6eac2cd2
    • Eric Dumazet's avatar
      tcp: fastopen: fix on syn-data transmit failure · b463521d
      Eric Dumazet authored
      
      [ Upstream commit b5b7db8d ]
      
      Our recent change exposed a bug in TCP Fastopen Client that syzkaller
      found right away [1]
      
      When we prepare skb with SYN+DATA, we attempt to transmit it,
      and we update socket state as if the transmit was a success.
      
      In socket RTX queue we have two skbs, one with the SYN alone,
      and a second one containing the DATA.
      
      When (malicious) ACK comes in, we now complain that second one had no
      skb_mstamp.
      
      The proper fix is to make sure that if the transmit failed, we do not
      pretend we sent the DATA skb, and make it our send_head.
      
      When 3WHS completes, we can now send the DATA right away, without having
      to wait for a timeout.
      
      [1]
      WARNING: CPU: 0 PID: 100189 at net/ipv4/tcp_input.c:3117 tcp_clean_rtx_queue+0x2057/0x2ab0 net/ipv4/tcp_input.c:3117()
      
       WARN_ON_ONCE(last_ackt == 0);
      
      Modules linked in:
      CPU: 0 PID: 100189 Comm: syz-executor1 Not tainted
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
       0000000000000000 ffff8800b35cb1d8 ffffffff81cad00d 0000000000000000
       ffffffff828a4347 ffff88009f86c080 ffffffff8316eb20 0000000000000d7f
       ffff8800b35cb220 ffffffff812c33c2 ffff8800baad2440 00000009d46575c0
      Call Trace:
       [<ffffffff81cad00d>] __dump_stack
       [<ffffffff81cad00d>] dump_stack+0xc1/0x124
       [<ffffffff812c33c2>] warn_slowpath_common+0xe2/0x150
       [<ffffffff812c361e>] warn_slowpath_null+0x2e/0x40
       [<ffffffff828a4347>] tcp_clean_rtx_queue+0x2057/0x2ab0 n
       [<ffffffff828ae6fd>] tcp_ack+0x151d/0x3930
       [<ffffffff828baa09>] tcp_rcv_state_process+0x1c69/0x4fd0
       [<ffffffff828efb7f>] tcp_v4_do_rcv+0x54f/0x7c0
       [<ffffffff8258aacb>] sk_backlog_rcv
       [<ffffffff8258aacb>] __release_sock+0x12b/0x3a0
       [<ffffffff8258ad9e>] release_sock+0x5e/0x1c0
       [<ffffffff8294a785>] inet_wait_for_connect
       [<ffffffff8294a785>] __inet_stream_connect+0x545/0xc50
       [<ffffffff82886f08>] tcp_sendmsg_fastopen
       [<ffffffff82886f08>] tcp_sendmsg+0x2298/0x35a0
       [<ffffffff82952515>] inet_sendmsg+0xe5/0x520
       [<ffffffff8257152f>] sock_sendmsg_nosec
       [<ffffffff8257152f>] sock_sendmsg+0xcf/0x110
      
      Fixes: 8c72c65b ("tcp: update skb->skb_mstamp more carefully")
      Fixes: 783237e8 ("net-tcp: Fast Open client - sending SYN-data")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b463521d
    • Davide Caratti's avatar
      net/sched: cls_matchall: fix crash when used with classful qdisc · b13bc543
      Davide Caratti authored
      
      [ Upstream commit 3ff4cbec ]
      
      this script, edited from Linux Advanced Routing and Traffic Control guide
      
      tc q a dev en0 root handle 1: htb default a
      tc c a dev en0 parent 1:  classid 1:1 htb rate 6mbit burst 15k
      tc c a dev en0 parent 1:1 classid 1:a htb rate 5mbit ceil 6mbit burst 15k
      tc c a dev en0 parent 1:1 classid 1:b htb rate 1mbit ceil 6mbit burst 15k
      tc f a dev en0 parent 1:0 prio 1 $clsname $clsargs classid 1:b
      ping $address -c1
      tc -s c s dev en0
      
      classifies traffic to 1:b or 1:a, depending on whether the packet matches
      or not the pattern $clsargs of filter $clsname. However, when $clsname is
      'matchall', a systematic crash can be observed in htb_classify(). HTB and
      classful qdiscs don't assign initial value to struct tcf_result, but then
      they expect it to contain valid values after filters have been run. Thus,
      current 'matchall' ignores the TCA_MATCHALL_CLASSID attribute, configured
      by user, and makes HTB (and classful qdiscs) dereference random pointers.
      
      By assigning head->res to *res in mall_classify(), before the actions are
      invoked, we fix this crash and enable TCA_MATCHALL_CLASSID functionality,
      that had no effect on 'matchall' classifier since its first introduction.
      
      BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1460213Reported-by: default avatarJiri Benc <jbenc@redhat.com>
      Fixes: b87f7936 ("net/sched: introduce Match-all classifier")
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Acked-by: default avatarYotam Gigi <yotamg@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b13bc543
    • Xin Long's avatar
      ip6_tunnel: do not allow loading ip6_tunnel if ipv6 is disabled in cmdline · 13c8bd7a
      Xin Long authored
      
      [ Upstream commit 8c22dab0 ]
      
      If ipv6 has been disabled from cmdline since kernel started, it makes
      no sense to allow users to create any ip6 tunnel. Otherwise, it could
      some potential problem.
      
      Jianlin found a kernel crash caused by this in ip6_gre when he set
      ipv6.disable=1 in grub:
      
      [  209.588865] Unable to handle kernel paging request for data at address 0x00000080
      [  209.588872] Faulting instruction address: 0xc000000000a3aa6c
      [  209.588879] Oops: Kernel access of bad area, sig: 11 [#1]
      [  209.589062] NIP [c000000000a3aa6c] fib_rules_lookup+0x4c/0x260
      [  209.589071] LR [c000000000b9ad90] fib6_rule_lookup+0x50/0xb0
      [  209.589076] Call Trace:
      [  209.589097] fib6_rule_lookup+0x50/0xb0
      [  209.589106] rt6_lookup+0xc4/0x110
      [  209.589116] ip6gre_tnl_link_config+0x214/0x2f0 [ip6_gre]
      [  209.589125] ip6gre_newlink+0x138/0x3a0 [ip6_gre]
      [  209.589134] rtnl_newlink+0x798/0xb80
      [  209.589142] rtnetlink_rcv_msg+0xec/0x390
      [  209.589151] netlink_rcv_skb+0x138/0x150
      [  209.589159] rtnetlink_rcv+0x48/0x70
      [  209.589169] netlink_unicast+0x538/0x640
      [  209.589175] netlink_sendmsg+0x40c/0x480
      [  209.589184] ___sys_sendmsg+0x384/0x4e0
      [  209.589194] SyS_sendmsg+0xd4/0x140
      [  209.589201] SyS_socketcall+0x3e0/0x4f0
      [  209.589209] system_call+0x38/0xe0
      
      This patch is to return -EOPNOTSUPP in ip6_tunnel_init if ipv6 has been
      disabled from cmdline.
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      13c8bd7a
    • Fahad Kunnathadi's avatar
      net: phy: Fix mask value write on gmii2rgmii converter speed register · fc2fe7a0
      Fahad Kunnathadi authored
      
      [ Upstream commit f2654a47 ]
      
      To clear Speed Selection in MDIO control register(0x10),
      ie, clear bits 6 and 13 to zero while keeping other bits same.
      Before AND operation,The Mask value has to be perform with bitwise NOT
      operation (ie, ~ operator)
      
      This patch clears current speed selection before writing the
      new speed settings to gmii2rgmii converter
      
      Fixes: f411a616 ("net: phy: Add gmiitorgmii converter support")
      Signed-off-by: default avatarFahad Kunnathadi <fahad.kunnathadi@dexceldesigns.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fc2fe7a0
    • Xin Long's avatar
      ip6_gre: skb_push ipv6hdr before packing the header in ip6gre_header · e814bae3
      Xin Long authored
      
      [ Upstream commit 76cc0d32 ]
      
      Now in ip6gre_header before packing the ipv6 header, it skb_push t->hlen
      which only includes encap_hlen + tun_hlen. It means greh and inner header
      would be over written by ipv6 stuff and ipv6h might have no chance to set
      up.
      
      Jianlin found this issue when using remote any on ip6_gre, the packets he
      captured on gre dev are truncated:
      
      22:50:26.210866 Out ethertype IPv6 (0x86dd), length 120: truncated-ip6 -\
      8128 bytes missing!(flowlabel 0x92f40, hlim 0, next-header Options (0)  \
      payload length: 8192) ::1:2000:0 > ::1:0:86dd: HBH [trunc] ip-proto-128 \
      8184
      
      It should also skb_push ipv6hdr so that ipv6h points to the right position
      to set ipv6 stuff up.
      
      This patch is to skb_push hlen + sizeof(*ipv6h) and also fix some indents
      in ip6gre_header.
      
      Fixes: c12b395a ("gre: Support GRE over IPv6")
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e814bae3
    • Subash Abhinov Kasiviswanathan's avatar
      udpv6: Fix the checksum computation when HW checksum does not apply · f0a5af78
      Subash Abhinov Kasiviswanathan authored
      
      [ Upstream commit 63ecc3d9 ]
      
      While trying an ESP transport mode encryption for UDPv6 packets of
      datagram size 1436 with MTU 1500, checksum error was observed in
      the secondary fragment.
      
      This error occurs due to the UDP payload checksum being missed out
      when computing the full checksum for these packets in
      udp6_hwcsum_outgoing().
      
      Fixes: d39d938c ("ipv6: Introduce udpv6_send_skb()")
      Signed-off-by: default avatarSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f0a5af78