1. 22 Mar, 2017 40 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.10.5 · 034612ee
      Greg Kroah-Hartman authored
      034612ee
    • Krzysztof Kozlowski's avatar
      crypto: s5p-sss - Fix spinlock recursion on LRW(AES) · 7814c9bd
      Krzysztof Kozlowski authored
      commit 28b62b14 upstream.
      
      Running TCRYPT with LRW compiled causes spinlock recursion:
      
          testing speed of async lrw(aes) (lrw(ecb-aes-s5p)) encryption
          tcrypt: test 0 (256 bit key, 16 byte blocks): 19007 operations in 1 seconds (304112 bytes)
          tcrypt: test 1 (256 bit key, 64 byte blocks): 15753 operations in 1 seconds (1008192 bytes)
          tcrypt: test 2 (256 bit key, 256 byte blocks): 14293 operations in 1 seconds (3659008 bytes)
          tcrypt: test 3 (256 bit key, 1024 byte blocks): 11906 operations in 1 seconds (12191744 bytes)
          tcrypt: test 4 (256 bit key, 8192 byte blocks):
          BUG: spinlock recursion on CPU#1, irq/84-10830000/89
           lock: 0xeea99a68, .magic: dead4ead, .owner: irq/84-10830000/89, .owner_cpu: 1
          CPU: 1 PID: 89 Comm: irq/84-10830000 Not tainted 4.11.0-rc1-00001-g897ca6d0800d #559
          Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
          [<c010e1ec>] (unwind_backtrace) from [<c010ae1c>] (show_stack+0x10/0x14)
          [<c010ae1c>] (show_stack) from [<c03449c0>] (dump_stack+0x78/0x8c)
          [<c03449c0>] (dump_stack) from [<c015de68>] (do_raw_spin_lock+0x11c/0x120)
          [<c015de68>] (do_raw_spin_lock) from [<c0720110>] (_raw_spin_lock_irqsave+0x20/0x28)
          [<c0720110>] (_raw_spin_lock_irqsave) from [<c0572ca0>] (s5p_aes_crypt+0x2c/0xb4)
          [<c0572ca0>] (s5p_aes_crypt) from [<bf1d8aa4>] (do_encrypt+0x78/0xb0 [lrw])
          [<bf1d8aa4>] (do_encrypt [lrw]) from [<bf1d8b00>] (encrypt_done+0x24/0x54 [lrw])
          [<bf1d8b00>] (encrypt_done [lrw]) from [<c05732a0>] (s5p_aes_complete+0x60/0xcc)
          [<c05732a0>] (s5p_aes_complete) from [<c0573440>] (s5p_aes_interrupt+0x134/0x1a0)
          [<c0573440>] (s5p_aes_interrupt) from [<c01667c4>] (irq_thread_fn+0x1c/0x54)
          [<c01667c4>] (irq_thread_fn) from [<c0166a98>] (irq_thread+0x12c/0x1e0)
          [<c0166a98>] (irq_thread) from [<c0136a28>] (kthread+0x108/0x138)
          [<c0136a28>] (kthread) from [<c0107778>] (ret_from_fork+0x14/0x3c)
      
      Interrupt handling routine was calling req->base.complete() under
      spinlock.  In most cases this wasn't fatal but when combined with some
      of the cipher modes (like LRW) this caused recursion - starting the new
      encryption (s5p_aes_crypt()) while still holding the spinlock from
      previous round (s5p_aes_complete()).
      
      Beside that, the s5p_aes_interrupt() error handling path could execute
      two completions in case of error for RX and TX blocks.
      
      Rewrite the interrupt handling routine and the completion by:
      
      1. Splitting the operations on scatterlist copies from
         s5p_aes_complete() into separate s5p_sg_done(). This still should be
         done under lock.
         The s5p_aes_complete() now only calls req->base.complete() and it has
         to be called outside of lock.
      
      2. Moving the s5p_aes_complete() out of spinlock critical sections.
         In interrupt service routine s5p_aes_interrupts(), it appeared in few
         places, including error paths inside other functions called from ISR.
         This code was not so obvious to read so simplify it by putting the
         s5p_aes_complete() only within ISR level.
      Reported-by: default avatarNathan Royce <nroycea+kernel@gmail.com>
      Signed-off-by: default avatarKrzysztof Kozlowski <krzk@kernel.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7814c9bd
    • Daniel Axtens's avatar
      crypto: powerpc - Fix initialisation of crc32c context · 4310604e
      Daniel Axtens authored
      commit aa2be9b3 upstream.
      
      Turning on crypto self-tests on a POWER8 shows:
      
          alg: hash: Test 1 failed for crc32c-vpmsum
          00000000: ff ff ff ff
      
      Comparing the code with the Intel CRC32c implementation on which
      ours is based shows that we are doing an init with 0, not ~0
      as CRC32c requires.
      
      This probably wasn't caught because btrfs does its own weird
      open-coded initialisation.
      
      Initialise our internal context to ~0 on init.
      
      This makes the self-tests pass, and btrfs continues to work.
      
      Fixes: 6dd7a82c ("crypto: powerpc - Add POWER8 optimised crc32c")
      Cc: Anton Blanchard <anton@samba.org>
      Signed-off-by: default avatarDaniel Axtens <dja@axtens.net>
      Acked-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4310604e
    • Niklas Cassel's avatar
      locking/rwsem: Fix down_write_killable() for CONFIG_RWSEM_GENERIC_SPINLOCK=y · de3c88fa
      Niklas Cassel authored
      commit 17fcbd59 upstream.
      
      We hang if SIGKILL has been sent, but the task is stuck in down_read()
      (after do_exit()), even though no task is doing down_write() on the
      rwsem in question:
      
        INFO: task libupnp:21868 blocked for more than 120 seconds.
        libupnp         D    0 21868      1 0x08100008
        ...
        Call Trace:
        __schedule()
        schedule()
        __down_read()
        do_exit()
        do_group_exit()
        __wake_up_parent()
      
      This bug has already been fixed for CONFIG_RWSEM_XCHGADD_ALGORITHM=y in
      the following commit:
      
       04cafed7 ("locking/rwsem: Fix down_write_killable()")
      
      ... however, this bug also exists for CONFIG_RWSEM_GENERIC_SPINLOCK=y.
      Signed-off-by: default avatarNiklas Cassel <niklas.cassel@axis.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <mhocko@suse.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Niklas Cassel <niklass@axis.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: d4799608 ("locking/rwsem: Introduce basis for down_write_killable()")
      Link: http://lkml.kernel.org/r/1487981873-12649-1-git-send-email-niklass@axis.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      de3c88fa
    • Peter Zijlstra's avatar
      futex: Add missing error handling to FUTEX_REQUEUE_PI · d80e46d9
      Peter Zijlstra authored
      commit 9bbb25af upstream.
      
      Thomas spotted that fixup_pi_state_owner() can return errors and we
      fail to unlock the rt_mutex in that case.
      Reported-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarDarren Hart <dvhart@linux.intel.com>
      Cc: juri.lelli@arm.com
      Cc: bigeasy@linutronix.de
      Cc: xlpang@redhat.com
      Cc: rostedt@goodmis.org
      Cc: mathieu.desnoyers@efficios.com
      Cc: jdesfossez@efficios.com
      Cc: dvhart@infradead.org
      Cc: bristot@redhat.com
      Link: http://lkml.kernel.org/r/20170304093558.867401760@infradead.orgSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d80e46d9
    • Peter Zijlstra's avatar
      futex: Fix potential use-after-free in FUTEX_REQUEUE_PI · 575caefc
      Peter Zijlstra authored
      commit c236c8e9 upstream.
      
      While working on the futex code, I stumbled over this potential
      use-after-free scenario. Dmitry triggered it later with syzkaller.
      
      pi_mutex is a pointer into pi_state, which we drop the reference on in
      unqueue_me_pi(). So any access to that pointer after that is bad.
      
      Since other sites already do rt_mutex_unlock() with hb->lock held, see
      for example futex_lock_pi(), simply move the unlock before
      unqueue_me_pi().
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarDarren Hart <dvhart@linux.intel.com>
      Cc: juri.lelli@arm.com
      Cc: bigeasy@linutronix.de
      Cc: xlpang@redhat.com
      Cc: rostedt@goodmis.org
      Cc: mathieu.desnoyers@efficios.com
      Cc: jdesfossez@efficios.com
      Cc: dvhart@infradead.org
      Cc: bristot@redhat.com
      Link: http://lkml.kernel.org/r/20170304093558.801744246@infradead.orgSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      575caefc
    • Andy Lutomirski's avatar
      x86/perf: Fix CR4.PCE propagation to use active_mm instead of mm · 57ad6c8e
      Andy Lutomirski authored
      commit 5dc855d4 upstream.
      
      If one thread mmaps a perf event while another thread in the same mm
      is in some context where active_mm != mm (which can happen in the
      scheduler, for example), refresh_pce() would write the wrong value
      to CR4.PCE.  This broke some PAPI tests.
      Reported-and-tested-by: default avatarVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 7911d3f7 ("perf/x86: Only allow rdpmc if a perf_event is mapped")
      Link: http://lkml.kernel.org/r/0c5b38a76ea50e405f9abe07a13dfaef87c173a1.1489694270.git.luto@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      57ad6c8e
    • Jiri Olsa's avatar
      x86/intel_rdt: Put group node in rdtgroup_kn_unlock · 34314610
      Jiri Olsa authored
      commit 49ec8f5b upstream.
      
      The rdtgroup_kn_unlock waits for the last user to release and put its
      node. But it's calling kernfs_put on the node which calls the
      rdtgroup_kn_unlock, which might not be the group's directory node, but
      another group's file node.
      
      This race could be easily reproduced by running 2 instances
      of following script:
      
        mount -t resctrl resctrl /sys/fs/resctrl/
        pushd /sys/fs/resctrl/
        mkdir krava
        echo "krava" > krava/schemata
        rmdir krava
        popd
        umount  /sys/fs/resctrl
      
      It triggers the slub debug error message with following command
      line config: slub_debug=,kernfs_node_cache.
      
      Call kernfs_put on the group's node to fix it.
      
      Fixes: 60cf5e10 ("x86/intel_rdt: Add mkdir to resctrl file system")
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Shaohua Li <shli@fb.com>
      Link: http://lkml.kernel.org/r/1489501253-20248-1-git-send-email-jolsa@kernel.orgSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      34314610
    • Andrey Ryabinin's avatar
      x86/kasan: Fix boot with KASAN=y and PROFILE_ANNOTATED_BRANCHES=y · 7621600b
      Andrey Ryabinin authored
      commit be3606ff upstream.
      
      The kernel doesn't boot with both PROFILE_ANNOTATED_BRANCHES=y and KASAN=y
      options selected. With branch profiling enabled we end up calling
      ftrace_likely_update() before kasan_early_init(). ftrace_likely_update() is
      built with KASAN instrumentation, so calling it before kasan has been
      initialized leads to crash.
      
      Use DISABLE_BRANCH_PROFILING define to make sure that we don't call
      ftrace_likely_update() from early code before kasan_early_init().
      
      Fixes: ef7f0d6a ("x86_64: add KASan support")
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: kasan-dev@googlegroups.com
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: lkp@01.org
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Link: http://lkml.kernel.org/r/20170313163337.1704-1-aryabinin@virtuozzo.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7621600b
    • Peter Zijlstra's avatar
      x86/tsc: Fix ART for TSC_KNOWN_FREQ · bd5ee529
      Peter Zijlstra authored
      commit 44fee88c upstream.
      
      Subhransu reported that convert_art_to_tsc() isn't working for him.
      
      The ART to TSC relation is only set up for systems which use the refined
      TSC calibration. Systems with known TSC frequency (available via CPUID 15)
      are not using the refined calibration and therefor the ART to TSC relation
      is never established.
      
      Add the setup to the known frequency init path which skips ART
      calibration. The init code needs to be duplicated as for systems which use
      refined calibration the ART setup must be delayed until calibration has
      been done.
      
      The problem has been there since the ART support was introdduced, but only
      detected now because Subhransu tested the first time on hardware which has
      TSC frequency enumerated via CPUID 15.
      
      Note for stable: The conditional has changed from TSC_RELIABLE to
           	 	 TSC_KNOWN_FREQUENCY.
      
      [ tglx: Rewrote changelog and identified the proper 'Fixes' commit ]
      
      Fixes: f9677e0f ("x86/tsc: Always Running Timer (ART) correlated clocksource")
      Reported-by: default avatar"Prusty, Subhransu S" <subhransu.s.prusty@intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: christopher.s.hall@intel.com
      Cc: kevin.b.stanton@intel.com
      Cc: john.stultz@linaro.org
      Cc: akataria@vmware.com
      Link: http://lkml.kernel.org/r/20170313145712.GI3312@twins.programming.kicks-ass.netSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bd5ee529
    • Josh Poimboeuf's avatar
      x86/unwind: Fix last frame check for aligned function stacks · a0256e0c
      Josh Poimboeuf authored
      commit 87a6b297 upstream.
      
      Pavel Machek reported the following warning on x86-32:
      
        WARNING: kernel stack frame pointer at f50cdf98 in swapper/2:0 has bad value   (null)
      
      The warning is caused by the unwinder not realizing that it reached the
      end of the stack, due to an unusual prologue which gcc sometimes
      generates for aligned stacks.  The prologue is based on a gcc feature
      called the Dynamic Realign Argument Pointer (DRAP).  It's almost always
      enabled for aligned stacks when -maccumulate-outgoing-args isn't set.
      
      This issue is similar to the one fixed by the following commit:
      
        8023e0e2 ("x86/unwind: Adjust last frame check for aligned function stacks")
      
      ... but that fix was specific to x86-64.
      
      Make the fix more generic to cover x86-32 as well, and also ensure that
      the return address referred to by the frame pointer is a copy of the
      original return address.
      
      Fixes: acb4608a ("x86/unwind: Create stack frames for saved syscall registers")
      Reported-by: default avatarPavel Machek <pavel@ucw.cz>
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Link: http://lkml.kernel.org/r/50d4924db716c264b14f1633037385ec80bf89d2.1489465609.git.jpoimboe@redhat.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a0256e0c
    • Imre Deak's avatar
      drm/i915/lspcon: Fix resume time initialization due to unasserted HPD · 5b115b8b
      Imre Deak authored
      commit 4b84b4a5 upstream.
      
      During system resume time initialization the HPD level on LSPCON ports
      can stay low for an extended amount of time, leading to failed AUX
      transfers and LSPCON initialization. Fix this by waiting for HPD to get
      asserted.
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99178
      Cc: Shashank Sharma <shashank.sharma@intel.com>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Reviewed-by: default avatarShashank Sharma <shashank.sharma@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1485509961-9010-3-git-send-email-imre.deak@intel.comSigned-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      (corrected stable tag)
      Signed-off-by: default avatarImre Deak <imre.deak@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5b115b8b
    • Imre Deak's avatar
      drm/i915/gen9+: Enable hotplug detection early · ebd9dbab
      Imre Deak authored
      commit 2a57d9cc upstream.
      
      For LSPCON resume time initialization we need to sample the
      corresponding pin's HPD level, but this is only available when HPD
      detection is enabled. Currently we enable detection only when enabling
      HPD interrupts which is too late, so bring the enabling of detection
      earlier.
      
      This is needed by the next patch.
      
      Cc: Shashank Sharma <shashank.sharma@intel.com>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Reviewed-by: default avatarShashank Sharma <shashank.sharma@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1485509961-9010-2-git-send-email-imre.deak@intel.comSigned-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      (rebased onto v4.10.4 due to missing s/IS_BROXTON/IS_GEN9_LP/ upstream change)
      (corrected stable tag)
      Signed-off-by: default avatarImre Deak <imre.deak@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ebd9dbab
    • Imre Deak's avatar
      drm/i915/lspcon: Enable AUX interrupts for resume time initialization · b9208ab3
      Imre Deak authored
      commit 908764f6 upstream.
      
      For LSPCON initialization during system resume we need AUX
      functionality, but we call the corresponding encoder reset hook with all
      interrupts disabled. Without interrupts we'll do a poll-wait for AUX
      transfer completions, which adds a significant delay if the transfers
      timeout/need to be retried for some reason.
      
      Fix this by enabling interrupts before calling the reset hooks. Note
      that while this will enable AUX interrupts it will keep HPD interrupts
      disabled, in a similar way to the init time output setup code.
      
      This issue existed since LSPCON support was added.
      
      v2:
      - Rebased on drm-tip.
      
      Cc: Shashank Sharma <shashank.sharma@intel.com>
      Reviewed-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Tested-by: default avatarDavid Weinehall <david.weinehall@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1480448429-27739-1-git-send-email-imre.deak@intel.com
      (rebased onto v4.10.4 due to missing s/dev/dev_priv/ upstream change)
      (corrected stable tag)
      Signed-off-by: default avatarImre Deak <imre.deak@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b9208ab3
    • Shanker Donthineni's avatar
      irqchip/gicv3-its: Add workaround for QDF2400 ITS erratum 0065 · 1740a61c
      Shanker Donthineni authored
      commit 90922a2d upstream.
      
      On Qualcomm Datacenter Technologies QDF2400 SoCs, the ITS hardware
      implementation uses 16Bytes for Interrupt Translation Entry (ITE),
      but reports an incorrect value of 8Bytes in GITS_TYPER.ITTE_size.
      
      It might cause kernel memory corruption depending on the number
      of MSI(x) that are configured and the amount of memory that has
      been allocated for ITEs in its_create_device().
      
      This patch fixes the potential memory corruption by setting the
      correct ITE size to 16Bytes.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarShanker Donthineni <shankerd@codeaurora.org>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1740a61c
    • Marc Zyngier's avatar
      arm64: KVM: VHE: Clear HCR_TGE when invalidating guest TLBs · ef217ea7
      Marc Zyngier authored
      commit 68925176 upstream.
      
      When invalidating guest TLBs, special care must be taken to
      actually shoot the guest TLBs and not the host ones if we're
      running on a VHE system.  This is controlled by the HCR_EL2.TGE
      bit, which we forget to clear before invalidating TLBs.
      
      Address the issue by introducing two wrappers (__tlb_switch_to_guest
      and __tlb_switch_to_host) that take care of both the VTTBR_EL2
      and HCR_EL2.TGE switching.
      Reported-by: default avatarTomasz Nowicki <tnowicki@caviumnetworks.com>
      Tested-by: default avatarTomasz Nowicki <tnowicki@caviumnetworks.com>
      Reviewed-by: default avatarChristoffer Dall <cdall@linaro.org>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ef217ea7
    • Hannes Frederic Sowa's avatar
      dccp: fix memory leak during tear-down of unsuccessful connection request · f70ce6c6
      Hannes Frederic Sowa authored
      [ Upstream commit 72ef9c41 ]
      
      This patch fixes a memory leak, which happens if the connection request
      is not fulfilled between parsing the DCCP options and handling the SYN
      (because e.g. the backlog is full), because we forgot to free the
      list of ack vectors.
      Reported-by: default avatarJianwen Ji <jiji@redhat.com>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f70ce6c6
    • Hannes Frederic Sowa's avatar
      tun: fix premature POLLOUT notification on tun devices · a79fa23c
      Hannes Frederic Sowa authored
      [ Upstream commit b20e2d54 ]
      
      aszlig observed failing ssh tunnels (-w) during initialization since
      commit cc9da6cc ("ipv6: addrconf: use stable address generator for
      ARPHRD_NONE"). We already had reports that the mentioned commit breaks
      Juniper VPN connections. I can't clearly say that the Juniper VPN client
      has the same problem, but it is worth a try to hint to this patch.
      
      Because of the early generation of link local addresses, the kernel now
      can start asking for routers on the local subnet much earlier than usual.
      Those router solicitation packets arrive inside the ssh channels and
      should be transmitted to the tun fd before the configuration scripts
      might have upped the interface and made it ready for transmission.
      
      ssh polls on the interface and receives back a POLL_OUT. It tries to send
      the earily router solicitation packet to the tun interface.  Unfortunately
      it hasn't been up'ed yet by config scripts, thus failing with -EIO. ssh
      doesn't retry again and considers the tun interface broken forever.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=121131
      Fixes: cc9da6cc ("ipv6: addrconf: use stable address generator for ARPHRD_NONE")
      Cc: Bjørn Mork <bjorn@mork.no>
      Reported-by: default avatarValdis Kletnieks <Valdis.Kletnieks@vt.edu>
      Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
      Reported-by: default avatarJonas Lippuner <jonas@lippuner.ca>
      Cc: Jonas Lippuner <jonas@lippuner.ca>
      Reported-by: default avataraszlig <aszlig@redmoonstudios.org>
      Cc: aszlig <aszlig@redmoonstudios.org>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a79fa23c
    • Jon Maxwell's avatar
      dccp/tcp: fix routing redirect race · b34c9f7f
      Jon Maxwell authored
      [ Upstream commit 45caeaa5 ]
      
      As Eric Dumazet pointed out this also needs to be fixed in IPv6.
      v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.
      
      We have seen a few incidents lately where a dst_enty has been freed
      with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
      dst_entry. If the conditions/timings are right a crash then ensues when the
      freed dst_entry is referenced later on. A Common crashing back trace is:
      
       #8 [] page_fault at ffffffff8163e648
          [exception RIP: __tcp_ack_snd_check+74]
      .
      .
       #9 [] tcp_rcv_established at ffffffff81580b64
      #10 [] tcp_v4_do_rcv at ffffffff8158b54a
      #11 [] tcp_v4_rcv at ffffffff8158cd02
      #12 [] ip_local_deliver_finish at ffffffff815668f4
      #13 [] ip_local_deliver at ffffffff81566bd9
      #14 [] ip_rcv_finish at ffffffff8156656d
      #15 [] ip_rcv at ffffffff81566f06
      #16 [] __netif_receive_skb_core at ffffffff8152b3a2
      #17 [] __netif_receive_skb at ffffffff8152b608
      #18 [] netif_receive_skb at ffffffff8152b690
      #19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
      #20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
      #21 [] net_rx_action at ffffffff8152bac2
      #22 [] __do_softirq at ffffffff81084b4f
      #23 [] call_softirq at ffffffff8164845c
      #24 [] do_softirq at ffffffff81016fc5
      #25 [] irq_exit at ffffffff81084ee5
      #26 [] do_IRQ at ffffffff81648ff8
      
      Of course it may happen with other NIC drivers as well.
      
      It's found the freed dst_entry here:
      
       224 static bool tcp_in_quickack_mode(struct sock *sk)
       225 {
       226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);
       227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);
       228 
       229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||
       230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);
       231 }
      
      But there are other backtraces attributed to the same freed dst_entry in
      netfilter code as well.
      
      All the vmcores showed 2 significant clues:
      
      - Remote hosts behind the default gateway had always been redirected to a
      different gateway. A rtable/dst_entry will be added for that host. Making
      more dst_entrys with lower reference counts. Making this more probable.
      
      - All vmcores showed a postitive LockDroppedIcmps value, e.g:
      
      LockDroppedIcmps                  267
      
      A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
      regardless of whether user space has the socket locked. This can result in a
      race condition where the same dst_entry cached in sk->sk_dst_entry can be
      decremented twice for the same socket via:
      
      do_redirect()->__sk_dst_check()-> dst_release().
      
      Which leads to the dst_entry being prematurely freed with another socket
      pointing to it via sk->sk_dst_cache and a subsequent crash.
      
      To fix this skip do_redirect() if usespace has the socket locked. Instead let
      the redirect take place later when user space does not have the socket
      locked.
      
      The dccp/IPv6 code is very similar in this respect, so fixing it there too.
      
      As Eric Garver pointed out the following commit now invalidates routes. Which
      can set the dst->obsolete flag so that ipv4_dst_check() returns null and
      triggers the dst_release().
      
      Fixes: ceb33206 ("ipv4: Kill routes during PMTU/redirect updates.")
      Cc: Eric Garver <egarver@redhat.com>
      Cc: Hannes Sowa <hsowa@redhat.com>
      Signed-off-by: default avatarJon Maxwell <jmaxwell37@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b34c9f7f
    • Andrey Vagin's avatar
      net: use net->count to check whether a netns is alive or not · 7ebf301d
      Andrey Vagin authored
      [ Upstream commit 91864f58 ]
      
      The previous idea was to check whether a net namespace is in
      net_exit_list or not. It doesn't work, because net->exit_list is used in
      __register_pernet_operations and __unregister_pernet_operations where
      all namespaces are added to a temporary list to make cleanup in a error
      case, so list_empty(&net->exit_list) always returns false.
      Reported-by: default avatarMantas Mikulėnas <grawity@gmail.com>
      Fixes: 002d8a1a ("net: skip genenerating uevents for network namespaces that are exiting")
      Signed-off-by: default avatarAndrei Vagin <avagin@openvz.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7ebf301d
    • Florian Westphal's avatar
      bridge: drop netfilter fake rtable unconditionally · 47808872
      Florian Westphal authored
      [ Upstream commit a13b2082 ]
      
      Andreas reports kernel oops during rmmod of the br_netfilter module.
      Hannes debugged the oops down to a NULL rt6info->rt6i_indev.
      
      Problem is that br_netfilter has the nasty concept of adding a fake
      rtable to skb->dst; this happens in a br_netfilter prerouting hook.
      
      A second hook (in bridge LOCAL_IN) is supposed to remove these again
      before the skb is handed up the stack.
      
      However, on module unload hooks get unregistered which means an
      skb could traverse the prerouting hook that attaches the fake_rtable,
      while the 'fake rtable remove' hook gets removed from the hooklist
      immediately after.
      
      Fixes: 34666d46 ("netfilter: bridge: move br_netfilter out of the core")
      Reported-by: default avatarAndreas Karis <akaris@redhat.com>
      Debugged-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Acked-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      47808872
    • Florian Westphal's avatar
      ipv6: avoid write to a possibly cloned skb · fdb09132
      Florian Westphal authored
      [ Upstream commit 79e49503 ]
      
      ip6_fragment, in case skb has a fraglist, checks if the
      skb is cloned.  If it is, it will move to the 'slow path' and allocates
      new skbs for each fragment.
      
      However, right before entering the slowpath loop, it updates the
      nexthdr value of the last ipv6 extension header to NEXTHDR_FRAGMENT,
      to account for the fragment header that will be inserted in the new
      ipv6-fragment skbs.
      
      In case original skb is cloned this munges nexthdr value of another
      skb.  Avoid this by doing the nexthdr update for each of the new fragment
      skbs separately.
      
      This was observed with tcpdump on a bridge device where netfilter ipv6
      reassembly is active:  tcpdump shows malformed fragment headers as
      the l4 header (icmpv6, tcp, etc). is decoded as a fragment header.
      
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Reported-by: default avatarAndreas Karis <akaris@redhat.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fdb09132
    • Sabrina Dubroca's avatar
      ipv6: make ECMP route replacement less greedy · b74b74e2
      Sabrina Dubroca authored
      [ Upstream commit 67e19400 ]
      
      Commit 27596472 ("ipv6: fix ECMP route replacement") introduced a
      loop that removes all siblings of an ECMP route that is being
      replaced. However, this loop doesn't stop when it has replaced
      siblings, and keeps removing other routes with a higher metric.
      We also end up triggering the WARN_ON after the loop, because after
      this nsiblings < 0.
      
      Instead, stop the loop when we have taken care of all routes with the
      same metric as the route being replaced.
      
        Reproducer:
        ===========
          #!/bin/sh
      
          ip netns add ns1
          ip netns add ns2
          ip -net ns1 link set lo up
      
          for x in 0 1 2 ; do
              ip link add veth$x netns ns2 type veth peer name eth$x netns ns1
              ip -net ns1 link set eth$x up
              ip -net ns2 link set veth$x up
          done
      
          ip -net ns1 -6 r a 2000::/64 nexthop via fe80::0 dev eth0 \
                  nexthop via fe80::1 dev eth1 nexthop via fe80::2 dev eth2
          ip -net ns1 -6 r a 2000::/64 via fe80::42 dev eth0 metric 256
          ip -net ns1 -6 r a 2000::/64 via fe80::43 dev eth0 metric 2048
      
          echo "before replace, 3 routes"
          ip -net ns1 -6 r | grep -v '^fe80\|^ff00'
          echo
      
          ip -net ns1 -6 r c 2000::/64 nexthop via fe80::4 dev eth0 \
                  nexthop via fe80::5 dev eth1 nexthop via fe80::6 dev eth2
      
          echo "after replace, only 2 routes, metric 2048 is gone"
          ip -net ns1 -6 r | grep -v '^fe80\|^ff00'
      
      Fixes: 27596472 ("ipv6: fix ECMP route replacement")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Acked-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Reviewed-by: default avatarXin Long <lucien.xin@gmail.com>
      Reviewed-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b74b74e2
    • David Ahern's avatar
      mpls: Do not decrement alive counter for unregister events · ed44bf89
      David Ahern authored
      [ Upstream commit 79099aab ]
      
      Multipath routes can be rendered usesless when a device in one of the
      paths is deleted. For example:
      
      $ ip -f mpls ro ls
      100
      	nexthop as to 200 via inet 172.16.2.2  dev virt12
      	nexthop as to 300 via inet 172.16.3.2  dev br0
      101
      	nexthop as to 201 via inet6 2000:2::2  dev virt12
      	nexthop as to 301 via inet6 2000:3::2  dev br0
      
      $ ip li del br0
      
      When br0 is deleted the other hop is not considered in
      mpls_select_multipath because of the alive check -- rt_nhn_alive
      is 0.
      
      rt_nhn_alive is decremented once in mpls_ifdown when the device is taken
      down (NETDEV_DOWN) and again when it is deleted (NETDEV_UNREGISTER). For
      a 2 hop route, deleting one device drops the alive count to 0. Since
      devices are taken down before unregistering, the decrement on
      NETDEV_UNREGISTER is redundant.
      
      Fixes: c89359a4 ("mpls: support for dead routes")
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ed44bf89
    • David Ahern's avatar
      mpls: Send route delete notifications when router module is unloaded · 61cc1778
      David Ahern authored
      [ Upstream commit e37791ec ]
      
      When the mpls_router module is unloaded, mpls routes are deleted but
      notifications are not sent to userspace leaving userspace caches
      out of sync. Add the call to mpls_notify_route in mpls_net_exit as
      routes are freed.
      
      Fixes: 0189197f ("mpls: Basic routing support")
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      61cc1778
    • Etienne Noss's avatar
      act_connmark: avoid crashing on malformed nlattrs with null parms · 8e9bacd9
      Etienne Noss authored
      [ Upstream commit 52491c76 ]
      
      tcf_connmark_init does not check in its configuration if TCA_CONNMARK_PARMS
      is set, resulting in a null pointer dereference when trying to access it.
      
      [501099.043007] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
      [501099.043039] IP: [<ffffffffc10c60fb>] tcf_connmark_init+0x8b/0x180 [act_connmark]
      ...
      [501099.044334] Call Trace:
      [501099.044345]  [<ffffffffa47270e8>] ? tcf_action_init_1+0x198/0x1b0
      [501099.044363]  [<ffffffffa47271b0>] ? tcf_action_init+0xb0/0x120
      [501099.044380]  [<ffffffffa47250a4>] ? tcf_exts_validate+0xc4/0x110
      [501099.044398]  [<ffffffffc0f5fa97>] ? u32_set_parms+0xa7/0x270 [cls_u32]
      [501099.044417]  [<ffffffffc0f60bf0>] ? u32_change+0x680/0x87b [cls_u32]
      [501099.044436]  [<ffffffffa4725d1d>] ? tc_ctl_tfilter+0x4dd/0x8a0
      [501099.044454]  [<ffffffffa44a23a1>] ? security_capable+0x41/0x60
      [501099.044471]  [<ffffffffa470ca01>] ? rtnetlink_rcv_msg+0xe1/0x220
      [501099.044490]  [<ffffffffa470c920>] ? rtnl_newlink+0x870/0x870
      [501099.044507]  [<ffffffffa472cc61>] ? netlink_rcv_skb+0xa1/0xc0
      [501099.044524]  [<ffffffffa47073f4>] ? rtnetlink_rcv+0x24/0x30
      [501099.044541]  [<ffffffffa472c634>] ? netlink_unicast+0x184/0x230
      [501099.044558]  [<ffffffffa472c9d8>] ? netlink_sendmsg+0x2f8/0x3b0
      [501099.044576]  [<ffffffffa46d8880>] ? sock_sendmsg+0x30/0x40
      [501099.044592]  [<ffffffffa46d8e03>] ? SYSC_sendto+0xd3/0x150
      [501099.044608]  [<ffffffffa425fda1>] ? __do_page_fault+0x2d1/0x510
      [501099.044626]  [<ffffffffa47fbd7b>] ? system_call_fast_compare_end+0xc/0x9b
      
      Fixes: 22a5dc0e ("net: sched: Introduce connmark action")
      Signed-off-by: default avatarÉtienne Noss <etienne.noss@wifirst.fr>
      Signed-off-by: default avatarVictorien Molle <victorien.molle@wifirst.fr>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8e9bacd9
    • Lendacky, Thomas's avatar
      amd-xgbe: Enable IRQs only if napi_complete_done() is true · cdb9caeb
      Lendacky, Thomas authored
      [ Upstream commit d7aba644 ]
      
      Depending on the hardware, the amd-xgbe driver may use disable_irq_nosync()
      and enable_irq() when an interrupt is received to process Rx packets. If
      the napi_complete_done() return value isn't checked an unbalanced enable
      for the IRQ could result, generating a warning stack trace.
      
      Update the driver to only enable interrupts if napi_complete_done() returns
      true.
      Reported-by: default avatarJeremy Linton <jeremy.linton@arm.com>
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cdb9caeb
    • Dmitry V. Levin's avatar
      uapi: fix linux/packet_diag.h userspace compilation error · 110e7778
      Dmitry V. Levin authored
      [ Upstream commit 745cb7f8 ]
      
      Replace MAX_ADDR_LEN with its numeric value to fix the following
      linux/packet_diag.h userspace compilation error:
      
      /usr/include/linux/packet_diag.h:67:17: error: 'MAX_ADDR_LEN' undeclared here (not in a function)
        __u8 pdmc_addr[MAX_ADDR_LEN];
      
      This is not the first case in the UAPI where the numeric value
      of MAX_ADDR_LEN is used instead of symbolic one, uapi/linux/if_link.h
      already does the same:
      
      $ grep MAX_ADDR_LEN include/uapi/linux/if_link.h
      	__u8 mac[32]; /* MAX_ADDR_LEN */
      
      There are no UAPI headers besides these two that use MAX_ADDR_LEN.
      Signed-off-by: default avatarDmitry V. Levin <ldv@altlinux.org>
      Acked-by: default avatarPavel Emelyanov <xemul@virtuozzo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      110e7778
    • Paolo Abeni's avatar
      net/tunnel: set inner protocol in network gro hooks · 5344ec08
      Paolo Abeni authored
      [ Upstream commit 294acf1c ]
      
      The gso code of several tunnels type (gre and udp tunnels)
      takes for granted that the skb->inner_protocol is properly
      initialized and drops the packet elsewhere.
      
      On the forwarding path no one is initializing such field,
      so gro encapsulated packets are dropped on forward.
      
      Since commit 38720352 ("gre: Use inner_proto to obtain
      inner header protocol"), this can be reproduced when the
      encapsulated packets use gre as the tunneling protocol.
      
      The issue happens also with vxlan and geneve tunnels since
      commit 8bce6d7d ("udp: Generalize skb_udp_segment"), if the
      forwarding host's ingress nic has h/w offload for such tunnel
      and a vxlan/geneve device is configured on top of it, regardless
      of the configured peer address and vni.
      
      To address the issue, this change initialize the inner_protocol
      field for encapsulated packets in both ipv4 and ipv6 gro complete
      callbacks.
      
      Fixes: 38720352 ("gre: Use inner_proto to obtain inner header protocol")
      Fixes: 8bce6d7d ("udp: Generalize skb_udp_segment")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Acked-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5344ec08
    • David Ahern's avatar
      vrf: Fix use-after-free in vrf_xmit · 7360a1fd
      David Ahern authored
      [ Upstream commit f7887d40 ]
      
      KASAN detected a use-after-free:
      
      [  269.467067] BUG: KASAN: use-after-free in vrf_xmit+0x7f1/0x827 [vrf] at addr ffff8800350a21c0
      [  269.467067] Read of size 4 by task ssh/1879
      [  269.467067] CPU: 1 PID: 1879 Comm: ssh Not tainted 4.10.0+ #249
      [  269.467067] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
      [  269.467067] Call Trace:
      [  269.467067]  dump_stack+0x81/0xb6
      [  269.467067]  kasan_object_err+0x21/0x78
      [  269.467067]  kasan_report+0x2f7/0x450
      [  269.467067]  ? vrf_xmit+0x7f1/0x827 [vrf]
      [  269.467067]  ? ip_output+0xa4/0xdb
      [  269.467067]  __asan_load4+0x6b/0x6d
      [  269.467067]  vrf_xmit+0x7f1/0x827 [vrf]
      ...
      
      Which corresponds to the skb access after xmit handling. Fix by saving
      skb->len and using the saved value to update stats.
      
      Fixes: 193125db ("net: Introduce VRF device driver")
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7360a1fd
    • Jarod Wilson's avatar
      team: use ETH_MAX_MTU as max mtu · be18cce7
      Jarod Wilson authored
      [ Upstream commit 3331aa37 ]
      
      This restores the ability to set a team device's mtu to anything higher
      than 1500. Similar to the reported issue with bonding, the team driver
      calls ether_setup(), which sets an initial max_mtu of 1500, while the
      underlying hardware can handle something much larger. Just set it to
      ETH_MAX_MTU to support all possible values, and the limitations of the
      underlying devices will prevent setting anything too large.
      
      Fixes: 91572088 ("net: use core MTU range checking in core net infra")
      CC: Cong Wang <xiyou.wangcong@gmail.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      CC: netdev@vger.kernel.org
      Signed-off-by: default avatarJarod Wilson <jarod@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      be18cce7
    • Eric Dumazet's avatar
      dccp: fix use-after-free in dccp_feat_activate_values · 92ab4dea
      Eric Dumazet authored
      [ Upstream commit 62f8f4d9 ]
      
      Dmitry reported crashes in DCCP stack [1]
      
      Problem here is that when I got rid of listener spinlock, I missed the
      fact that DCCP stores a complex state in struct dccp_request_sock,
      while TCP does not.
      
      Since multiple cpus could access it at the same time, we need to add
      protection.
      
      [1]
      BUG: KASAN: use-after-free in dccp_feat_activate_values+0x967/0xab0
      net/dccp/feat.c:1541 at addr ffff88003713be68
      Read of size 8 by task syz-executor2/8457
      CPU: 2 PID: 8457 Comm: syz-executor2 Not tainted 4.10.0-rc7+ #127
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:15 [inline]
       dump_stack+0x292/0x398 lib/dump_stack.c:51
       kasan_object_err+0x1c/0x70 mm/kasan/report.c:162
       print_address_description mm/kasan/report.c:200 [inline]
       kasan_report_error mm/kasan/report.c:289 [inline]
       kasan_report.part.1+0x20e/0x4e0 mm/kasan/report.c:311
       kasan_report mm/kasan/report.c:332 [inline]
       __asan_report_load8_noabort+0x29/0x30 mm/kasan/report.c:332
       dccp_feat_activate_values+0x967/0xab0 net/dccp/feat.c:1541
       dccp_create_openreq_child+0x464/0x610 net/dccp/minisocks.c:121
       dccp_v6_request_recv_sock+0x1f6/0x1960 net/dccp/ipv6.c:457
       dccp_check_req+0x335/0x5a0 net/dccp/minisocks.c:186
       dccp_v6_rcv+0x69e/0x1d00 net/dccp/ipv6.c:711
       ip6_input_finish+0x46d/0x17a0 net/ipv6/ip6_input.c:279
       NF_HOOK include/linux/netfilter.h:257 [inline]
       ip6_input+0xdb/0x590 net/ipv6/ip6_input.c:322
       dst_input include/net/dst.h:507 [inline]
       ip6_rcv_finish+0x289/0x890 net/ipv6/ip6_input.c:69
       NF_HOOK include/linux/netfilter.h:257 [inline]
       ipv6_rcv+0x12ec/0x23d0 net/ipv6/ip6_input.c:203
       __netif_receive_skb_core+0x1ae5/0x3400 net/core/dev.c:4190
       __netif_receive_skb+0x2a/0x170 net/core/dev.c:4228
       process_backlog+0xe5/0x6c0 net/core/dev.c:4839
       napi_poll net/core/dev.c:5202 [inline]
       net_rx_action+0xe70/0x1900 net/core/dev.c:5267
       __do_softirq+0x2fb/0xb7d kernel/softirq.c:284
       do_softirq_own_stack+0x1c/0x30 arch/x86/entry/entry_64.S:902
       </IRQ>
       do_softirq.part.17+0x1e8/0x230 kernel/softirq.c:328
       do_softirq kernel/softirq.c:176 [inline]
       __local_bh_enable_ip+0x1f2/0x200 kernel/softirq.c:181
       local_bh_enable include/linux/bottom_half.h:31 [inline]
       rcu_read_unlock_bh include/linux/rcupdate.h:971 [inline]
       ip6_finish_output2+0xbb0/0x23d0 net/ipv6/ip6_output.c:123
       ip6_finish_output+0x302/0x960 net/ipv6/ip6_output.c:148
       NF_HOOK_COND include/linux/netfilter.h:246 [inline]
       ip6_output+0x1cb/0x8d0 net/ipv6/ip6_output.c:162
       ip6_xmit+0xcdf/0x20d0 include/net/dst.h:501
       inet6_csk_xmit+0x320/0x5f0 net/ipv6/inet6_connection_sock.c:179
       dccp_transmit_skb+0xb09/0x1120 net/dccp/output.c:141
       dccp_xmit_packet+0x215/0x760 net/dccp/output.c:280
       dccp_write_xmit+0x168/0x1d0 net/dccp/output.c:362
       dccp_sendmsg+0x79c/0xb10 net/dccp/proto.c:796
       inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
       sock_sendmsg_nosec net/socket.c:635 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:645
       SYSC_sendto+0x660/0x810 net/socket.c:1687
       SyS_sendto+0x40/0x50 net/socket.c:1655
       entry_SYSCALL_64_fastpath+0x1f/0xc2
      RIP: 0033:0x4458b9
      RSP: 002b:00007f8ceb77bb58 EFLAGS: 00000282 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 0000000000000017 RCX: 00000000004458b9
      RDX: 0000000000000023 RSI: 0000000020e60000 RDI: 0000000000000017
      RBP: 00000000006e1b90 R08: 00000000200f9fe1 R09: 0000000000000020
      R10: 0000000000008010 R11: 0000000000000282 R12: 00000000007080a8
      R13: 0000000000000000 R14: 00007f8ceb77c9c0 R15: 00007f8ceb77c700
      Object at ffff88003713be50, in cache kmalloc-64 size: 64
      Allocated:
      PID = 8446
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
       save_stack+0x43/0xd0 mm/kasan/kasan.c:502
       set_track mm/kasan/kasan.c:514 [inline]
       kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:605
       kmem_cache_alloc_trace+0x82/0x270 mm/slub.c:2738
       kmalloc include/linux/slab.h:490 [inline]
       dccp_feat_entry_new+0x214/0x410 net/dccp/feat.c:467
       dccp_feat_push_change+0x38/0x220 net/dccp/feat.c:487
       __feat_register_sp+0x223/0x2f0 net/dccp/feat.c:741
       dccp_feat_propagate_ccid+0x22b/0x2b0 net/dccp/feat.c:949
       dccp_feat_server_ccid_dependencies+0x1b3/0x250 net/dccp/feat.c:1012
       dccp_make_response+0x1f1/0xc90 net/dccp/output.c:423
       dccp_v6_send_response+0x4ec/0xc20 net/dccp/ipv6.c:217
       dccp_v6_conn_request+0xaba/0x11b0 net/dccp/ipv6.c:377
       dccp_rcv_state_process+0x51e/0x1650 net/dccp/input.c:606
       dccp_v6_do_rcv+0x213/0x350 net/dccp/ipv6.c:632
       sk_backlog_rcv include/net/sock.h:893 [inline]
       __sk_receive_skb+0x36f/0xcc0 net/core/sock.c:479
       dccp_v6_rcv+0xba5/0x1d00 net/dccp/ipv6.c:742
       ip6_input_finish+0x46d/0x17a0 net/ipv6/ip6_input.c:279
       NF_HOOK include/linux/netfilter.h:257 [inline]
       ip6_input+0xdb/0x590 net/ipv6/ip6_input.c:322
       dst_input include/net/dst.h:507 [inline]
       ip6_rcv_finish+0x289/0x890 net/ipv6/ip6_input.c:69
       NF_HOOK include/linux/netfilter.h:257 [inline]
       ipv6_rcv+0x12ec/0x23d0 net/ipv6/ip6_input.c:203
       __netif_receive_skb_core+0x1ae5/0x3400 net/core/dev.c:4190
       __netif_receive_skb+0x2a/0x170 net/core/dev.c:4228
       process_backlog+0xe5/0x6c0 net/core/dev.c:4839
       napi_poll net/core/dev.c:5202 [inline]
       net_rx_action+0xe70/0x1900 net/core/dev.c:5267
       __do_softirq+0x2fb/0xb7d kernel/softirq.c:284
      Freed:
      PID = 15
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
       save_stack+0x43/0xd0 mm/kasan/kasan.c:502
       set_track mm/kasan/kasan.c:514 [inline]
       kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:578
       slab_free_hook mm/slub.c:1355 [inline]
       slab_free_freelist_hook mm/slub.c:1377 [inline]
       slab_free mm/slub.c:2954 [inline]
       kfree+0xe8/0x2b0 mm/slub.c:3874
       dccp_feat_entry_destructor.part.4+0x48/0x60 net/dccp/feat.c:418
       dccp_feat_entry_destructor net/dccp/feat.c:416 [inline]
       dccp_feat_list_pop net/dccp/feat.c:541 [inline]
       dccp_feat_activate_values+0x57f/0xab0 net/dccp/feat.c:1543
       dccp_create_openreq_child+0x464/0x610 net/dccp/minisocks.c:121
       dccp_v6_request_recv_sock+0x1f6/0x1960 net/dccp/ipv6.c:457
       dccp_check_req+0x335/0x5a0 net/dccp/minisocks.c:186
       dccp_v6_rcv+0x69e/0x1d00 net/dccp/ipv6.c:711
       ip6_input_finish+0x46d/0x17a0 net/ipv6/ip6_input.c:279
       NF_HOOK include/linux/netfilter.h:257 [inline]
       ip6_input+0xdb/0x590 net/ipv6/ip6_input.c:322
       dst_input include/net/dst.h:507 [inline]
       ip6_rcv_finish+0x289/0x890 net/ipv6/ip6_input.c:69
       NF_HOOK include/linux/netfilter.h:257 [inline]
       ipv6_rcv+0x12ec/0x23d0 net/ipv6/ip6_input.c:203
       __netif_receive_skb_core+0x1ae5/0x3400 net/core/dev.c:4190
       __netif_receive_skb+0x2a/0x170 net/core/dev.c:4228
       process_backlog+0xe5/0x6c0 net/core/dev.c:4839
       napi_poll net/core/dev.c:5202 [inline]
       net_rx_action+0xe70/0x1900 net/core/dev.c:5267
       __do_softirq+0x2fb/0xb7d kernel/softirq.c:284
      Memory state around the buggy address:
       ffff88003713bd00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff88003713bd80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      >ffff88003713be00: fc fc fc fc fc fc fc fc fc fc fb fb fb fb fb fb
                                                                ^
      
      Fixes: 079096f1 ("tcp/dccp: install syn_recv requests into ehash table")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Tested-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      92ab4dea
    • Alexey Khoroshilov's avatar
      net/sched: act_skbmod: remove unneeded rcu_read_unlock in tcf_skbmod_dump · a6ff0621
      Alexey Khoroshilov authored
      [ Upstream commit 6c4dc75c ]
      
      Found by Linux Driver Verification project (linuxtesting.org).
      Signed-off-by: default avatarAlexey Khoroshilov <khoroshilov@ispras.ru>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a6ff0621
    • Eric Dumazet's avatar
      net: fix socket refcounting in skb_complete_tx_timestamp() · 27d0c80f
      Eric Dumazet authored
      [ Upstream commit 9ac25fc0 ]
      
      TX skbs do not necessarily hold a reference on skb->sk->sk_refcnt
      By the time TX completion happens, sk_refcnt might be already 0.
      
      sock_hold()/sock_put() would then corrupt critical state, like
      sk_wmem_alloc and lead to leaks or use after free.
      
      Fixes: 62bccb8c ("net-timestamp: Make the clone operation stand-alone from phy timestamping")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Alexander Duyck <alexander.h.duyck@intel.com>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      27d0c80f
    • Eric Dumazet's avatar
      net: fix socket refcounting in skb_complete_wifi_ack() · 80691f38
      Eric Dumazet authored
      [ Upstream commit dd4f1072 ]
      
      TX skbs do not necessarily hold a reference on skb->sk->sk_refcnt
      By the time TX completion happens, sk_refcnt might be already 0.
      
      sock_hold()/sock_put() would then corrupt critical state, like
      sk_wmem_alloc.
      
      Fixes: bf7fa551 ("mac80211: Resolve sk_refcnt/sk_wmem_alloc issue in wifi ack path")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Alexander Duyck <alexander.h.duyck@intel.com>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      80691f38
    • Eric Dumazet's avatar
      tcp: fix various issues for sockets morphing to listen state · 81a43770
      Eric Dumazet authored
      [ Upstream commit 02b2faaf ]
      
      Dmitry Vyukov reported a divide by 0 triggered by syzkaller, exploiting
      tcp_disconnect() path that was never really considered and/or used
      before syzkaller ;)
      
      I was not able to reproduce the bug, but it seems issues here are the
      three possible actions that assumed they would never trigger on a
      listener.
      
      1) tcp_write_timer_handler
      2) tcp_delack_timer_handler
      3) MTU reduction
      
      Only IPv6 MTU reduction was properly testing TCP_CLOSE and TCP_LISTEN
       states from tcp_v6_mtu_reduced()
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      81a43770
    • WANG Cong's avatar
      strparser: destroy workqueue on module exit · 178e86ff
      WANG Cong authored
      [ Upstream commit f78ef7cd ]
      
      Fixes: 43a0c675 ("strparser: Stream parser for messages")
      Cc: Tom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      178e86ff
    • WANG Cong's avatar
      bonding: use ETH_MAX_MTU as max mtu · aa677aaf
      WANG Cong authored
      [ Upstream commit 31c05415 ]
      
      This restores the ability of setting bond device's mtu to 9000.
      
      Fixes: 91572088 ("net: use core MTU range checking in core net infra")
      Reported-by: daznis@gmail.com
      Reported-by: default avatarBrad Campbell <lists2009@fnarfbargle.com>
      Cc: Jarod Wilson <jarod@redhat.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aa677aaf
    • Lendacky, Thomas's avatar
      amd-xgbe: Don't overwrite SFP PHY mod_absent settings · 0ee7666f
      Lendacky, Thomas authored
      [ Upstream commit 2697ea5a ]
      
      If an SFP module is not present, xgbe_phy_sfp_phy_settings() should
      return after applying the default settings. Currently there is no return
      statement and the default settings are overwritten.
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0ee7666f
    • Lendacky, Thomas's avatar
      amd-xgbe: Be sure to set MDIO modes on device (re)start · 9919f222
      Lendacky, Thomas authored
      [ Upstream commit b42c6761 ]
      
      The MDIO register mode is set when the device is probed. But when the
      device is brought down and then back up, the MDIO register mode has been
      reset.  Be sure to reset the mode during device startup and only change
      the mode of the address specified.
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9919f222