1. 16 Aug, 2017 15 commits
    • Nicholas Bellinger's avatar
      target: Fix node_acl demo-mode + uncached dynamic shutdown regression · 59c74236
      Nicholas Bellinger authored
      commit 6f48655f upstream.
      
      This patch fixes a generate_node_acls = 1 + cache_dynamic_acls = 0
      regression, that was introduced by
      
        commit 01d4d673
        Author: Nicholas Bellinger <nab@linux-iscsi.org>
        Date:   Wed Dec 7 12:55:54 2016 -0800
      
      which originally had the proper list_del_init() usage, but was
      dropped during list review as it was thought unnecessary by HCH.
      
      However, list_del_init() usage is required during the special
      generate_node_acls = 1 + cache_dynamic_acls = 0 case when
      transport_free_session() does a list_del(&se_nacl->acl_list),
      followed by target_complete_nacl() doing the same thing.
      
      This was manifesting as a general protection fault as reported
      by Justin:
      
      kernel: general protection fault: 0000 [#1] SMP
      kernel: Modules linked in:
      kernel: CPU: 0 PID: 11047 Comm: iscsi_ttx Not tainted 4.13.0-rc2.x86_64.1+ #20
      kernel: Hardware name: Intel Corporation S5500BC/S5500BC, BIOS S5500.86B.01.00.0064.050520141428 05/05/2014
      kernel: task: ffff88026939e800 task.stack: ffffc90007884000
      kernel: RIP: 0010:target_put_nacl+0x49/0xb0
      kernel: RSP: 0018:ffffc90007887d70 EFLAGS: 00010246
      kernel: RAX: dead000000000200 RBX: ffff8802556ca000 RCX: 0000000000000000
      kernel: RDX: dead000000000100 RSI: 0000000000000246 RDI: ffff8802556ce028
      kernel: RBP: ffffc90007887d88 R08: 0000000000000001 R09: 0000000000000000
      kernel: R10: ffffc90007887df8 R11: ffffea0009986900 R12: ffff8802556ce020
      kernel: R13: ffff8802556ce028 R14: ffff8802556ce028 R15: ffffffff88d85540
      kernel: FS:  0000000000000000(0000) GS:ffff88027fc00000(0000) knlGS:0000000000000000
      kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      kernel: CR2: 00007fffe36f5f94 CR3: 0000000009209000 CR4: 00000000003406f0
      kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      kernel: Call Trace:
      kernel:  transport_free_session+0x67/0x140
      kernel:  transport_deregister_session+0x7a/0xc0
      kernel:  iscsit_close_session+0x92/0x210
      kernel:  iscsit_close_connection+0x5f9/0x840
      kernel:  iscsit_take_action_for_connection_exit+0xfe/0x110
      kernel:  iscsi_target_tx_thread+0x140/0x1e0
      kernel:  ? wait_woken+0x90/0x90
      kernel:  kthread+0x124/0x160
      kernel:  ? iscsit_thread_get_cpumask+0x90/0x90
      kernel:  ? kthread_create_on_node+0x40/0x40
      kernel:  ret_from_fork+0x22/0x30
      kernel: Code: 00 48 89 fb 4c 8b a7 48 01 00 00 74 68 4d 8d 6c 24 08 4c
      89 ef e8 e8 28 43 00 48 8b 93 20 04 00 00 48 8b 83 28 04 00 00 4c 89
      ef <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 83 20
      kernel: RIP: target_put_nacl+0x49/0xb0 RSP: ffffc90007887d70
      kernel: ---[ end trace f12821adbfd46fed ]---
      
      To address this, go ahead and use proper list_del_list() for all
      cases of se_nacl->acl_list deletion.
      Reported-by: default avatarJustin Maggard <jmaggard01@gmail.com>
      Tested-by: default avatarJustin Maggard <jmaggard01@gmail.com>
      Cc: Justin Maggard <jmaggard01@gmail.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      59c74236
    • Alan Stern's avatar
      usb-storage: fix deadlock involving host lock and scsi_done · 7b0d44e2
      Alan Stern authored
      commit 8b52291a upstream.
      
      Christoph Hellwig says that since version 4.12, the kernel switched to
      using blk-mq by default.  The old code used a softirq for handling
      request completions, but blk-mq can handle completions in the caller's
      context.  This may cause a problem for usb-storage, because it invokes
      the ->scsi_done callback while holding the host lock, and the
      completion routine sometimes tries to acquire the same lock (when
      running the error handler, for example).
      
      The consequence is that the existing code will sometimes deadlock upon
      error completion of a SCSI command (with a lockdep warning).
      
      This is easy enough to fix, since usb-storage doesn't really need to
      hold the host lock while the callback runs.  It was simpler to write
      it that way, but moving the call outside the locked region is pretty
      easy and there's no downside.  That's what this patch does.
      Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Reported-and-tested-by: default avatarArthur Marsh <arthur.marsh@internode.on.net>
      CC: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7b0d44e2
    • Nicholas Bellinger's avatar
      iscsi-target: Fix iscsi_np reset hung task during parallel delete · 42804812
      Nicholas Bellinger authored
      commit 978d13d6 upstream.
      
      This patch fixes a bug associated with iscsit_reset_np_thread()
      that can occur during parallel configfs rmdir of a single iscsi_np
      used across multiple iscsi-target instances, that would result in
      hung task(s) similar to below where configfs rmdir process context
      was blocked indefinately waiting for iscsi_np->np_restart_comp
      to finish:
      
      [ 6726.112076] INFO: task dcp_proxy_node_:15550 blocked for more than 120 seconds.
      [ 6726.119440]       Tainted: G        W  O     4.1.26-3321 #2
      [ 6726.125045] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 6726.132927] dcp_proxy_node_ D ffff8803f202bc88     0 15550      1 0x00000000
      [ 6726.140058]  ffff8803f202bc88 ffff88085c64d960 ffff88083b3b1ad0 ffff88087fffeb08
      [ 6726.147593]  ffff8803f202c000 7fffffffffffffff ffff88083f459c28 ffff88083b3b1ad0
      [ 6726.155132]  ffff88035373c100 ffff8803f202bca8 ffffffff8168ced2 ffff8803f202bcb8
      [ 6726.162667] Call Trace:
      [ 6726.165150]  [<ffffffff8168ced2>] schedule+0x32/0x80
      [ 6726.170156]  [<ffffffff8168f5b4>] schedule_timeout+0x214/0x290
      [ 6726.176030]  [<ffffffff810caef2>] ? __send_signal+0x52/0x4a0
      [ 6726.181728]  [<ffffffff8168d7d6>] wait_for_completion+0x96/0x100
      [ 6726.187774]  [<ffffffff810e7c80>] ? wake_up_state+0x10/0x10
      [ 6726.193395]  [<ffffffffa035d6e2>] iscsit_reset_np_thread+0x62/0xe0 [iscsi_target_mod]
      [ 6726.201278]  [<ffffffffa0355d86>] iscsit_tpg_disable_portal_group+0x96/0x190 [iscsi_target_mod]
      [ 6726.210033]  [<ffffffffa0363f7f>] lio_target_tpg_store_enable+0x4f/0xc0 [iscsi_target_mod]
      [ 6726.218351]  [<ffffffff81260c5a>] configfs_write_file+0xaa/0x110
      [ 6726.224392]  [<ffffffff811ea364>] vfs_write+0xa4/0x1b0
      [ 6726.229576]  [<ffffffff811eb111>] SyS_write+0x41/0xb0
      [ 6726.234659]  [<ffffffff8169042e>] system_call_fastpath+0x12/0x71
      
      It would happen because each iscsit_reset_np_thread() sets state
      to ISCSI_NP_THREAD_RESET, sends SIGINT, and then blocks waiting
      for completion on iscsi_np->np_restart_comp.
      
      However, if iscsi_np was active processing a login request and
      more than a single iscsit_reset_np_thread() caller to the same
      iscsi_np was blocked on iscsi_np->np_restart_comp, iscsi_np
      kthread process context in __iscsi_target_login_thread() would
      flush pending signals and only perform a single completion of
      np->np_restart_comp before going back to sleep within transport
      specific iscsit_transport->iscsi_accept_np code.
      
      To address this bug, add a iscsi_np->np_reset_count and update
      __iscsi_target_login_thread() to keep completing np->np_restart_comp
      until ->np_reset_count has reached zero.
      Reported-by: default avatarGary Guo <ghg@datera.io>
      Tested-by: default avatarGary Guo <ghg@datera.io>
      Cc: Mike Christie <mchristi@redhat.com>
      Cc: Hannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      42804812
    • Varun Prakash's avatar
      iscsi-target: fix memory leak in iscsit_setup_text_cmd() · f838bd17
      Varun Prakash authored
      commit ea8dc5b4 upstream.
      
      On receiving text request iscsi-target allocates buffer for
      payload in iscsit_handle_text_cmd() and assigns buffer pointer
      to cmd->text_in_ptr, this buffer is currently freed in
      iscsit_release_cmd(), if iscsi-target sets 'C' bit in text
      response then it will receive another text request from the
      initiator with ttt != 0xffffffff in this case iscsi-target
      will find cmd using itt and call iscsit_setup_text_cmd()
      which will set cmd->text_in_ptr to NULL without freeing
      previously allocated buffer.
      
      This patch fixes this issue by calling kfree(cmd->text_in_ptr)
      in iscsit_setup_text_cmd() before assigning NULL to it.
      
      For the first text request cmd->text_in_ptr is NULL as
      cmd is memset to 0 in iscsit_allocate_cmd().
      Signed-off-by: default avatarVarun Prakash <varun@chelsio.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f838bd17
    • Boris Brezillon's avatar
      mtd: nand: Declare tBERS, tR and tPROG as u64 to avoid integer overflow · a0e1953e
      Boris Brezillon authored
      commit 6d292310 upstream.
      
      All timings in nand_sdr_timings are expressed in picoseconds but some
      of them may not fit in an u32.
      Signed-off-by: default avatarBoris Brezillon <boris.brezillon@free-electrons.com>
      Fixes: 204e7ecd ("mtd: nand: Add a few more timings to nand_sdr_timings")
      Reported-by: default avatarAlexander Dahl <ada@thorsis.com>
      Reviewed-by: default avatarAlexander Dahl <ada@thorsis.com>
      Tested-by: default avatarAlexander Dahl <ada@thorsis.com>
      Signed-off-by: default avatarBoris Brezillon <boris.brezillon@free-electrons.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a0e1953e
    • Boris Brezillon's avatar
      mtd: nand: Fix timing setup for NANDs that do not support SET FEATURES · 867c0778
      Boris Brezillon authored
      commit a11bf5ed upstream.
      
      Some ONFI NANDs do not support the SET/GET FEATURES commands, which,
      according to the spec, is perfectly valid.
      
      On these NANDs we can't set a specific timing mode using the "timing
      mode" feature, and we should assume the NAND does not require any setup
      to enter a specific timing mode.
      Signed-off-by: default avatarBoris Brezillon <boris.brezillon@free-electrons.com>
      Fixes: d8e725dd ("mtd: nand: automate NAND timings selection")
      Reported-by: default avatarAlexander Dahl <ada@thorsis.com>
      Tested-by: default avatarAlexander Dahl <ada@thorsis.com>
      Signed-off-by: default avatarBoris Brezillon <boris.brezillon@free-electrons.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      867c0778
    • Boris Brezillon's avatar
      mtd: nand: atmel: Fix DT backward compatibility in pmecc.c · a34d48d5
      Boris Brezillon authored
      commit 3aa09076 upstream.
      
      PMECC caps extraction from old DT bindings is broken, thus leading to
      erroneous EL registers offset, which in turn make HW ECC unusable on
      sama5d2 when old bindings are in use.
      
      Passing the NAND dev node instead of the NFC node to of_match_node()
      solves the problem.
      Signed-off-by: default avatarBoris Brezillon <boris.brezillon@free-electrons.com>
      Fixes: f88fc122 ("mtd: nand: Cleanup/rework the atmel_nand driver")
      Tested-by: default avatarRomain Izard <romain.izard.pro@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a34d48d5
    • Gregory CLEMENT's avatar
      pinctrl: armada-37xx: Fix number of pin in south bridge · 0eda7e0b
      Gregory CLEMENT authored
      commit 6b67c390 upstream.
      
      On the south bridge we have pin from to 29, so it gives 30 pins (and not
      29).
      
      Without this patch the kernel complain with the following traces:
      cat /sys/kernel/debug/pinctrl/d0018800.pinctrl/pingroups
      [  154.530205] armada-37xx-pinctrl d0018800.pinctrl: failed to get pin(29) name
      [  154.537567] ------------[ cut here ]------------
      [  154.542348] WARNING: CPU: 1 PID: 1347 at /home/gclement/open/kernel/marvell-mainline-linux/drivers/pinctrl/core.c:1610 pinctrl_groups_show+0x15c/0x1a0
      [  154.555918] Modules linked in:
      [  154.558890] CPU: 1 PID: 1347 Comm: cat Tainted: G        W       4.13.0-rc1-00001-g19e1b9fa219d #525
      [  154.568316] Hardware name: Marvell Armada 3720 Development Board DB-88F3720-DDR3 (DT)
      [  154.576311] task: ffff80001d32d100 task.stack: ffff80001bdc0000
      [  154.583048] PC is at pinctrl_groups_show+0x15c/0x1a0
      [  154.587816] LR is at pinctrl_groups_show+0x148/0x1a0
      [  154.592847] pc : [<ffff0000083e3adc>] lr : [<ffff0000083e3ac8>] pstate: 00000145
      [  154.600840] sp : ffff80001bdc3c80
      [  154.604255] x29: ffff80001bdc3c80 x28: 00000000f7750000
      [  154.609825] x27: ffff80001d05d198 x26: 0000000000000009
      [  154.615224] x25: ffff0000089ead20 x24: 0000000000000002
      [  154.620705] x23: ffff000008c8e1d0 x22: ffff80001be55700
      [  154.626187] x21: ffff80001d05d100 x20: 0000000000000005
      [  154.631667] x19: 0000000000000006 x18: 0000000000000010
      [  154.637238] x17: 0000000000000000 x16: ffff0000081fc4b8
      [  154.642726] x15: 0000000000000006 x14: ffff0000899e537f
      [  154.648214] x13: ffff0000099e538d x12: 206f742064656c69
      [  154.653613] x11: 6166203a6c727463 x10: 0000000005f5e0ff
      [  154.659094] x9 : ffff80001bdc38c0 x8 : 286e697020746567
      [  154.664576] x7 : ffff000008551870 x6 : 000000000000011b
      [  154.670146] x5 : 0000000000000000 x4 : 0000000000000000
      [  154.675544] x3 : 0000000000000000 x2 : 0000000000000000
      [  154.681025] x1 : ffff000008c8e1d0 x0 : ffff80001be55700
      [  154.686507] Call trace:
      [  154.688668] Exception stack(0xffff80001bdc3ab0 to 0xffff80001bdc3be0)
      [  154.695224] 3aa0:                                   0000000000000006 0001000000000000
      [  154.703310] 3ac0: ffff80001bdc3c80 ffff0000083e3adc ffff80001bdc3bb0 00000000ffffffd8
      [  154.711304] 3ae0: 4554535953425553 6f6674616c703d4d 4349564544006d72 6674616c702b3d45
      [  154.719478] 3b00: 313030643a6d726f 6e69702e30303838 ffff80006c727463 ffff0000089635d8
      [  154.727562] 3b20: ffff80001d1ca0cb ffff000008af0fa4 ffff80001bdc3b40 ffff000008c8e1dc
      [  154.735648] 3b40: ffff80001bdc3bc0 ffff000008223174 ffff80001be55700 ffff000008c8e1d0
      [  154.743731] 3b60: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
      [  154.752354] 3b80: 000000000000011b ffff000008551870 286e697020746567 ffff80001bdc38c0
      [  154.760446] 3ba0: 0000000005f5e0ff 6166203a6c727463 206f742064656c69 ffff0000099e538d
      [  154.767910] 3bc0: ffff0000899e537f 0000000000000006 ffff0000081fc4b8 0000000000000000
      [  154.776085] [<ffff0000083e3adc>] pinctrl_groups_show+0x15c/0x1a0
      [  154.782823] [<ffff000008222abc>] seq_read+0x184/0x460
      [  154.787505] [<ffff000008344120>] full_proxy_read+0x60/0xa8
      [  154.793431] [<ffff0000081f9bec>] __vfs_read+0x1c/0x110
      [  154.799001] [<ffff0000081faff4>] vfs_read+0x84/0x140
      [  154.803860] [<ffff0000081fc4fc>] SyS_read+0x44/0xa0
      [  154.808983] [<ffff000008082f30>] el0_svc_naked+0x24/0x28
      [  154.814459] ---[ end trace 4cbb00a92d616b95 ]---
      
      Fixes: 87466ccd ("pinctrl: armada-37xx: Add pin controller support
      for Armada 37xx")
      Signed-off-by: default avatarGregory CLEMENT <gregory.clement@free-electrons.com>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0eda7e0b
    • Jan Kara's avatar
      xfs: Fix leak of discard bio · 84524948
      Jan Kara authored
      commit ea7bd56f upstream.
      
      The bio describing discard operation is allocated by
      __blkdev_issue_discard() which returns us a reference to it. That
      reference is never released and thus we leak this bio. Drop the bio
      reference once it completes in xlog_discard_endio().
      
      Fixes: 4560e78fSigned-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      84524948
    • Max Filippov's avatar
      xtensa: don't limit csum_partial export by CONFIG_NET · 0af69956
      Max Filippov authored
      commit 7f81e55c upstream.
      
      csum_partial and csum_partial_copy_generic are defined unconditionally
      and are available even when CONFIG_NET is disabled. They are used not
      only by the network drivers, but also by scsi and media.
      Don't limit these functions export by CONFIG_NET.
      Signed-off-by: default avatarMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0af69956
    • Max Filippov's avatar
      xtensa: mm/cache: add missing EXPORT_SYMBOLs · 094849d6
      Max Filippov authored
      commit bc652eb6 upstream.
      
      Functions clear_user_highpage, copy_user_highpage, flush_dcache_page,
      local_flush_cache_range and local_flush_cache_page may be used from
      modules. Export them.
      Signed-off-by: default avatarMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      094849d6
    • Max Filippov's avatar
      xtensa: fix cache aliasing handling code for WT cache · 5e96389b
      Max Filippov authored
      commit 6d0f581d upstream.
      
      Currently building kernel for xtensa core with aliasing WT cache fails
      with the following messages:
      
        mm/memory.c:2152: undefined reference to `flush_dcache_page'
        mm/memory.c:2332: undefined reference to `local_flush_cache_page'
        mm/memory.c:1919: undefined reference to `local_flush_cache_range'
        mm/memory.c:4179: undefined reference to `copy_to_user_page'
        mm/memory.c:4183: undefined reference to `copy_from_user_page'
      
      This happens because implementation of these functions is only compiled
      when data cache is WB, which looks wrong: even when data cache doesn't
      need flushing it still needs invalidation. The functions like
      __flush_[invalidate_]dcache_* are correctly defined for both WB and WT
      caches (and even if they weren't that'd still be ok, just slower).
      
      Fix this by providing the same implementation of the above functions for
      both WB and WT cache.
      Signed-off-by: default avatarMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5e96389b
    • Mel Gorman's avatar
      futex: Remove unnecessary warning from get_futex_key · 5c1d458d
      Mel Gorman authored
      commit 48fb6f4d upstream.
      
      Commit 65d8fc77 ("futex: Remove requirement for lock_page() in
      get_futex_key()") removed an unnecessary lock_page() with the
      side-effect that page->mapping needed to be treated very carefully.
      
      Two defensive warnings were added in case any assumption was missed and
      the first warning assumed a correct application would not alter a
      mapping backing a futex key.  Since merging, it has not triggered for
      any unexpected case but Mark Rutland reported the following bug
      triggering due to the first warning.
      
        kernel BUG at kernel/futex.c:679!
        Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
        Modules linked in:
        CPU: 0 PID: 3695 Comm: syz-executor1 Not tainted 4.13.0-rc3-00020-g307fec773ba3 #3
        Hardware name: linux,dummy-virt (DT)
        task: ffff80001e271780 task.stack: ffff000010908000
        PC is at get_futex_key+0x6a4/0xcf0 kernel/futex.c:679
        LR is at get_futex_key+0x6a4/0xcf0 kernel/futex.c:679
        pc : [<ffff00000821ac14>] lr : [<ffff00000821ac14>] pstate: 80000145
      
      The fact that it's a bug instead of a warning was due to an unrelated
      arm64 problem, but the warning itself triggered because the underlying
      mapping changed.
      
      This is an application issue but from a kernel perspective it's a
      recoverable situation and the warning is unnecessary so this patch
      removes the warning.  The warning may potentially be triggered with the
      following test program from Mark although it may be necessary to adjust
      NR_FUTEX_THREADS to be a value smaller than the number of CPUs in the
      system.
      
          #include <linux/futex.h>
          #include <pthread.h>
          #include <stdio.h>
          #include <stdlib.h>
          #include <sys/mman.h>
          #include <sys/syscall.h>
          #include <sys/time.h>
          #include <unistd.h>
      
          #define NR_FUTEX_THREADS 16
          pthread_t threads[NR_FUTEX_THREADS];
      
          void *mem;
      
          #define MEM_PROT  (PROT_READ | PROT_WRITE)
          #define MEM_SIZE  65536
      
          static int futex_wrapper(int *uaddr, int op, int val,
                                   const struct timespec *timeout,
                                   int *uaddr2, int val3)
          {
              syscall(SYS_futex, uaddr, op, val, timeout, uaddr2, val3);
          }
      
          void *poll_futex(void *unused)
          {
              for (;;) {
                  futex_wrapper(mem, FUTEX_CMP_REQUEUE_PI, 1, NULL, mem + 4, 1);
              }
          }
      
          int main(int argc, char *argv[])
          {
              int i;
      
              mem = mmap(NULL, MEM_SIZE, MEM_PROT,
                     MAP_SHARED | MAP_ANONYMOUS, -1, 0);
      
              printf("Mapping @ %p\n", mem);
      
              printf("Creating futex threads...\n");
      
              for (i = 0; i < NR_FUTEX_THREADS; i++)
                  pthread_create(&threads[i], NULL, poll_futex, NULL);
      
              printf("Flipping mapping...\n");
              for (;;) {
                  mmap(mem, MEM_SIZE, MEM_PROT,
                       MAP_FIXED | MAP_SHARED | MAP_ANONYMOUS, -1, 0);
              }
      
              return 0;
          }
      Reported-and-tested-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5c1d458d
    • Cong Wang's avatar
      mm: fix list corruptions on shmem shrinklist · 5f064f8a
      Cong Wang authored
      commit d041353d upstream.
      
      We saw many list corruption warnings on shmem shrinklist:
      
        WARNING: CPU: 18 PID: 177 at lib/list_debug.c:59 __list_del_entry+0x9e/0xc0
        list_del corruption. prev->next should be ffff9ae5694b82d8, but was ffff9ae5699ba960
        Modules linked in: intel_rapl sb_edac edac_core x86_pkg_temp_thermal coretemp iTCO_wdt iTCO_vendor_support crct10dif_pclmul crc32_pclmul ghash_clmulni_intel raid0 dcdbas shpchp wmi hed i2c_i801 ioatdma lpc_ich i2c_smbus acpi_cpufreq tcp_diag inet_diag sch_fq_codel ipmi_si ipmi_devintf ipmi_msghandler igb ptp crc32c_intel pps_core i2c_algo_bit i2c_core dca ipv6 crc_ccitt
        CPU: 18 PID: 177 Comm: kswapd1 Not tainted 4.9.34-t3.el7.twitter.x86_64 #1
        Hardware name: Dell Inc. PowerEdge C6220/0W6W6G, BIOS 2.2.3 11/07/2013
        Call Trace:
          dump_stack+0x4d/0x66
          __warn+0xcb/0xf0
          warn_slowpath_fmt+0x4f/0x60
          __list_del_entry+0x9e/0xc0
          shmem_unused_huge_shrink+0xfa/0x2e0
          shmem_unused_huge_scan+0x20/0x30
          super_cache_scan+0x193/0x1a0
          shrink_slab.part.41+0x1e3/0x3f0
          shrink_slab+0x29/0x30
          shrink_node+0xf9/0x2f0
          kswapd+0x2d8/0x6c0
          kthread+0xd7/0xf0
          ret_from_fork+0x22/0x30
      
        WARNING: CPU: 23 PID: 639 at lib/list_debug.c:33 __list_add+0x89/0xb0
        list_add corruption. prev->next should be next (ffff9ae5699ba960), but was ffff9ae5694b82d8. (prev=ffff9ae5694b82d8).
        Modules linked in: intel_rapl sb_edac edac_core x86_pkg_temp_thermal coretemp iTCO_wdt iTCO_vendor_support crct10dif_pclmul crc32_pclmul ghash_clmulni_intel raid0 dcdbas shpchp wmi hed i2c_i801 ioatdma lpc_ich i2c_smbus acpi_cpufreq tcp_diag inet_diag sch_fq_codel ipmi_si ipmi_devintf ipmi_msghandler igb ptp crc32c_intel pps_core i2c_algo_bit i2c_core dca ipv6 crc_ccitt
        CPU: 23 PID: 639 Comm: systemd-udevd Tainted: G        W       4.9.34-t3.el7.twitter.x86_64 #1
        Hardware name: Dell Inc. PowerEdge C6220/0W6W6G, BIOS 2.2.3 11/07/2013
        Call Trace:
          dump_stack+0x4d/0x66
          __warn+0xcb/0xf0
          warn_slowpath_fmt+0x4f/0x60
          __list_add+0x89/0xb0
          shmem_setattr+0x204/0x230
          notify_change+0x2ef/0x440
          do_truncate+0x5d/0x90
          path_openat+0x331/0x1190
          do_filp_open+0x7e/0xe0
          do_sys_open+0x123/0x200
          SyS_open+0x1e/0x20
          do_syscall_64+0x61/0x170
          entry_SYSCALL64_slow_path+0x25/0x25
      
      The problem is that shmem_unused_huge_shrink() moves entries from the
      global sbinfo->shrinklist to its local lists and then releases the
      spinlock.  However, a parallel shmem_setattr() could access one of these
      entries directly and add it back to the global shrinklist if it is
      removed, with the spinlock held.
      
      The logic itself looks solid since an entry could be either in a local
      list or the global list, otherwise it is removed from one of them by
      list_del_init().  So probably the race condition is that, one CPU is in
      the middle of INIT_LIST_HEAD() but the other CPU calls list_empty()
      which returns true too early then the following list_add_tail() sees a
      corrupted entry.
      
      list_empty_careful() is designed to fix this situation.
      
      [akpm@linux-foundation.org: add comments]
      Link: http://lkml.kernel.org/r/20170803054630.18775-1-xiyou.wangcong@gmail.com
      Fixes: 779750d2 ("shmem: split huge pages beyond i_size under memory pressure")
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5f064f8a
    • Jonathan Toppins's avatar
      mm: ratelimit PFNs busy info message · 10df3471
      Jonathan Toppins authored
      commit 75dddef3 upstream.
      
      The RDMA subsystem can generate several thousand of these messages per
      second eventually leading to a kernel crash.  Ratelimit these messages
      to prevent this crash.
      
      Doug said:
       "I've been carrying a version of this for several kernel versions. I
        don't remember when they started, but we have one (and only one) class
        of machines: Dell PE R730xd, that generate these errors. When it
        happens, without a rate limit, we get rcu timeouts and kernel oopses.
        With the rate limit, we just get a lot of annoying kernel messages but
        the machine continues on, recovers, and eventually the memory
        operations all succeed"
      
      And:
       "> Well... why are all these EBUSY's occurring? It sounds inefficient
        > (at least) but if it is expected, normal and unavoidable then
        > perhaps we should just remove that message altogether?
      
        I don't have an answer to that question. To be honest, I haven't
        looked real hard. We never had this at all, then it started out of the
        blue, but only on our Dell 730xd machines (and it hits all of them),
        but no other classes or brands of machines. And we have our 730xd
        machines loaded up with different brands and models of cards (for
        instance one dedicated to mlx4 hardware, one for qib, one for mlx5, an
        ocrdma/cxgb4 combo, etc), so the fact that it hit all of the machines
        meant it wasn't tied to any particular brand/model of RDMA hardware.
        To me, it always smelled of a hardware oddity specific to maybe the
        CPUs or mainboard chipsets in these machines, so given that I'm not an
        mm expert anyway, I never chased it down.
      
        A few other relevant details: it showed up somewhere around 4.8/4.9 or
        thereabouts. It never happened before, but the prinkt has been there
        since the 3.18 days, so possibly the test to trigger this message was
        changed, or something else in the allocator changed such that the
        situation started happening on these machines?
      
        And, like I said, it is specific to our 730xd machines (but they are
        all identical, so that could mean it's something like their specific
        ram configuration is causing the allocator to hit this on these
        machine but not on other machines in the cluster, I don't want to say
        it's necessarily the model of chipset or CPU, there are other bits of
        identicalness between these machines)"
      
      Link: http://lkml.kernel.org/r/499c0f6cc10d6eb829a67f2a4d75b4228a9b356e.1501695897.git.jtoppins@redhat.comSigned-off-by: default avatarJonathan Toppins <jtoppins@redhat.com>
      Reviewed-by: default avatarDoug Ledford <dledford@redhat.com>
      Tested-by: default avatarDoug Ledford <dledford@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      10df3471
  2. 13 Aug, 2017 18 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.12.7 · 1ea1c41d
      Greg Kroah-Hartman authored
      1ea1c41d
    • Qu Wenruo's avatar
      btrfs: Remove false alert when fiemap range is smaller than on-disk extent · d5ad3308
      Qu Wenruo authored
      commit 848c23b7 upstream.
      
      Commit 4751832d ("btrfs: fiemap: Cache and merge fiemap extent before
      submit it to user") introduced a warning to catch unemitted cached
      fiemap extent.
      
      However such warning doesn't take the following case into consideration:
      
      0			4K			8K
      |<---- fiemap range --->|
      |<----------- On-disk extent ------------------>|
      
      In this case, the whole 0~8K is cached, and since it's larger than
      fiemap range, it break the fiemap extent emit loop.
      This leaves the fiemap extent cached but not emitted, and caught by the
      final fiemap extent sanity check, causing kernel warning.
      
      This patch removes the kernel warning and renames the sanity check to
      emit_last_fiemap_cache() since it's possible and valid to have cached
      fiemap extent.
      Reported-by: default avatarDavid Sterba <dsterba@suse.cz>
      Reported-by: default avatarAdam Borowski <kilobyte@angband.pl>
      Fixes: 4751832d ("btrfs: fiemap: Cache and merge fiemap extent ...")
      Signed-off-by: default avatarQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Cc: Holger Hoffstätte <holger@applied-asynchrony.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d5ad3308
    • Johannes Thumshirn's avatar
      scsi: sg: only check for dxfer_len greater than 256M · c4b3691a
      Johannes Thumshirn authored
      commit f930c704 upstream.
      
      Don't make any assumptions on the sg_io_hdr_t::dxfer_direction or the
      sg_io_hdr_t::dxferp in order to determine if it is a valid request. The
      only way we can check for bad requests is by checking if the length
      exceeds 256M.
      Signed-off-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Fixes: 28676d86 (scsi: sg: check for valid direction before starting the
      request)
      Reported-by: default avatarJason L Tibbitts III <tibbs@math.uh.edu>
      Tested-by: default avatarJason L Tibbitts III <tibbs@math.uh.edu>
      Suggested-by: default avatarDoug Gilbert <dgilbert@interlog.com>
      Cc: Doug Gilbert <dgilbert@interlog.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      c4b3691a
    • Willem de Bruijn's avatar
      packet: fix tp_reserve race in packet_set_ring · 91b2b39b
      Willem de Bruijn authored
      
      [ Upstream commit c27927e3 ]
      
      Updates to tp_reserve can race with reads of the field in
      packet_set_ring. Avoid this by holding the socket lock during
      updates in setsockopt PACKET_RESERVE.
      
      This bug was discovered by syzkaller.
      
      Fixes: 8913336a ("packet: add PACKET_RESERVE sockopt")
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      91b2b39b
    • Willem de Bruijn's avatar
      udp: consistently apply ufo or fragmentation · 2a8c396a
      Willem de Bruijn authored
      
      [ Upstream commit 85f1bd9a ]
      
      When iteratively building a UDP datagram with MSG_MORE and that
      datagram exceeds MTU, consistently choose UFO or fragmentation.
      
      Once skb_is_gso, always apply ufo. Conversely, once a datagram is
      split across multiple skbs, do not consider ufo.
      
      Sendpage already maintains the first invariant, only add the second.
      IPv6 does not have a sendpage implementation to modify.
      
      A gso skb must have a partial checksum, do not follow sk_no_check_tx
      in udp_send_skb.
      
      Found by syzkaller.
      
      Fixes: e89e9cf5 ("[IPv4/IPv6]: UFO Scatter-gather approach")
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2a8c396a
    • Nikolay Borisov's avatar
      igmp: Fix regression caused by igmp sysctl namespace code. · 2b5e5a8d
      Nikolay Borisov authored
      
      [ Upstream commit 1714020e ]
      
      Commit dcd87999 ("igmp: net: Move igmp namespace init to correct file")
      moved the igmp sysctls initialization from tcp_sk_init to igmp_net_init. This
      function is only called as part of per-namespace initialization, only if
      CONFIG_IP_MULTICAST is defined, otherwise igmp_mc_init() call in ip_init is
      compiled out, casuing the igmp pernet ops to not be registerd and those sysctl
      being left initialized with 0. However, there are certain functions, such as
      ip_mc_join_group which are always compiled and make use of some of those
      sysctls. Let's do a partial revert of the aforementioned commit and move the
      sysctl initialization into inet_init_net, that way they will always have
      sane values.
      
      Fixes: dcd87999 ("igmp: net: Move igmp namespace init to correct file")
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=196595Reported-by: default avatarGerardo Exequiel Pozzi <vmlinuz386@gmail.com>
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2b5e5a8d
    • Willem de Bruijn's avatar
      net: avoid skb_warn_bad_offload false positives on UFO · 8ad4bc97
      Willem de Bruijn authored
      
      [ Upstream commit 8d63bee6 ]
      
      skb_warn_bad_offload triggers a warning when an skb enters the GSO
      stack at __skb_gso_segment that does not have CHECKSUM_PARTIAL
      checksum offload set.
      
      Commit b2504a5d ("net: reduce skb_warn_bad_offload() noise")
      observed that SKB_GSO_DODGY producers can trigger the check and
      that passing those packets through the GSO handlers will fix it
      up. But, the software UFO handler will set ip_summed to
      CHECKSUM_NONE.
      
      When __skb_gso_segment is called from the receive path, this
      triggers the warning again.
      
      Make UFO set CHECKSUM_UNNECESSARY instead of CHECKSUM_NONE. On
      Tx these two are equivalent. On Rx, this better matches the
      skb state (checksum computed), as CHECKSUM_NONE here means no
      checksum computed.
      
      See also this thread for context:
      http://patchwork.ozlabs.org/patch/799015/
      
      Fixes: b2504a5d ("net: reduce skb_warn_bad_offload() noise")
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8ad4bc97
    • Bjørn Mork's avatar
      qmi_wwan: fix NULL deref on disconnect · 4c345783
      Bjørn Mork authored
      
      [ Upstream commit bbae08e5 ]
      
      qmi_wwan_disconnect is called twice when disconnecting devices with
      separate control and data interfaces.  The first invocation will set
      the interface data to NULL for both interfaces to flag that the
      disconnect has been handled.  But the matching NULL check was left
      out when qmi_wwan_disconnect was added, resulting in this oops:
      
        usb 2-1.4: USB disconnect, device number 4
        qmi_wwan 2-1.4:1.6 wwp0s29u1u4i6: unregister 'qmi_wwan' usb-0000:00:1d.0-1.4, WWAN/QMI device
        BUG: unable to handle kernel NULL pointer dereference at 00000000000000e0
        IP: qmi_wwan_disconnect+0x25/0xc0 [qmi_wwan]
        PGD 0
        P4D 0
        Oops: 0000 [#1] SMP
        Modules linked in: <stripped irrelevant module list>
        CPU: 2 PID: 33 Comm: kworker/2:1 Tainted: G            E   4.12.3-nr44-normandy-r1500619820+ #1
        Hardware name: LENOVO 4291LR7/4291LR7, BIOS CBET4000 4.6-810-g50522254fb 07/21/2017
        Workqueue: usb_hub_wq hub_event [usbcore]
        task: ffff8c882b716040 task.stack: ffffb8e800d84000
        RIP: 0010:qmi_wwan_disconnect+0x25/0xc0 [qmi_wwan]
        RSP: 0018:ffffb8e800d87b38 EFLAGS: 00010246
        RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
        RDX: 0000000000000001 RSI: ffff8c8824f3f1d0 RDI: ffff8c8824ef6400
        RBP: ffff8c8824ef6400 R08: 0000000000000000 R09: 0000000000000000
        R10: ffffb8e800d87780 R11: 0000000000000011 R12: ffffffffc07ea0e8
        R13: ffff8c8824e2e000 R14: ffff8c8824e2e098 R15: 0000000000000000
        FS:  0000000000000000(0000) GS:ffff8c8835300000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00000000000000e0 CR3: 0000000229ca5000 CR4: 00000000000406e0
        Call Trace:
         ? usb_unbind_interface+0x71/0x270 [usbcore]
         ? device_release_driver_internal+0x154/0x210
         ? qmi_wwan_unbind+0x6d/0xc0 [qmi_wwan]
         ? usbnet_disconnect+0x6c/0xf0 [usbnet]
         ? qmi_wwan_disconnect+0x87/0xc0 [qmi_wwan]
         ? usb_unbind_interface+0x71/0x270 [usbcore]
         ? device_release_driver_internal+0x154/0x210
      Reported-and-tested-by: default avatarNathaniel Roach <nroach44@gmail.com>
      Fixes: c6adf779 ("net: usb: qmi_wwan: add qmap mux protocol support")
      Cc: Daniele Palmas <dnlplm@gmail.com>
      Signed-off-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4c345783
    • Eric Dumazet's avatar
      tcp: fastopen: tcp_connect() must refresh the route · 70165c27
      Eric Dumazet authored
      
      [ Upstream commit 8ba60924 ]
      
      With new TCP_FASTOPEN_CONNECT socket option, there is a possibility
      to call tcp_connect() while socket sk_dst_cache is either NULL
      or invalid.
      
       +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 4
       +0 fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
       +0 setsockopt(4, SOL_TCP, TCP_FASTOPEN_CONNECT, [1], 4) = 0
       +0 connect(4, ..., ...) = 0
      
      << sk->sk_dst_cache becomes obsolete, or even set to NULL >>
      
       +1 sendto(4, ..., 1000, MSG_FASTOPEN, ..., ...) = 1000
      
      We need to refresh the route otherwise bad things can happen,
      especially when syzkaller is running on the host :/
      
      Fixes: 19f6d3f3 ("net/tcp-fastopen: Add new API support")
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Wei Wang <weiwan@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: default avatarWei Wang <weiwan@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      70165c27
    • Xin Long's avatar
      net: sched: set xt_tgchk_param par.nft_compat as 0 in ipt_init_target · 53531574
      Xin Long authored
      
      [ Upstream commit 96d97030 ]
      
      Commit 55917a21 ("netfilter: x_tables: add context to know if
      extension runs from nft_compat") introduced a member nft_compat to
      xt_tgchk_param structure.
      
      But it didn't set it's value for ipt_init_target. With unexpected
      value in par.nft_compat, it may return unexpected result in some
      target's checkentry.
      
      This patch is to set all it's fields as 0 and only initialize the
      non-zero fields in ipt_init_target.
      
      v1->v2:
        As Wang Cong's suggestion, fix it by setting all it's fields as
        0 and only initializing the non-zero fields.
      
      Fixes: 55917a21 ("netfilter: x_tables: add context to know if extension runs from nft_compat")
      Suggested-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      53531574
    • Xin Long's avatar
      net: sched: set xt_tgchk_param par.net properly in ipt_init_target · 882cce21
      Xin Long authored
      
      [ Upstream commit ec0acb09 ]
      
      Now xt_tgchk_param par in ipt_init_target is a local varibale,
      par.net is not initialized there. Later when xt_check_target
      calls target's checkentry in which it may access par.net, it
      would cause kernel panic.
      
      Jaroslav found this panic when running:
      
        # ip link add TestIface type dummy
        # tc qd add dev TestIface ingress handle ffff:
        # tc filter add dev TestIface parent ffff: u32 match u32 0 0 \
          action xt -j CONNMARK --set-mark 4
      
      This patch is to pass net param into ipt_init_target and set
      par.net with it properly in there.
      
      v1->v2:
        As Wang Cong pointed, I missed ipt_net_id != xt_net_id, so fix
        it by also passing net_id to __tcf_ipt_init.
      v2->v3:
        Missed the fixes tag, so add it.
      
      Fixes: ecb2421b ("netfilter: add and use nf_ct_netns_get/put")
      Reported-by: default avatarJaroslav Aster <jaster@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      882cce21
    • Davide Caratti's avatar
      net/mlx4_en: don't set CHECKSUM_COMPLETE on SCTP packets · 7be680f7
      Davide Caratti authored
      
      [ Upstream commit e718fe45 ]
      
      if the NIC fails to validate the checksum on TCP/UDP, and validation of IP
      checksum is successful, the driver subtracts the pseudo-header checksum
      from the value obtained by the hardware and sets CHECKSUM_COMPLETE. Don't
      do that if protocol is IPPROTO_SCTP, otherwise CRC32c validation fails.
      
      V2: don't test MLX4_CQE_STATUS_IPV6 if MLX4_CQE_STATUS_IPV4 is set
      Reported-by: default avatarShuang Li <shuali@redhat.com>
      Fixes: f8c6455b ("net/mlx4_en: Extend checksum offloading by CHECKSUM COMPLETE")
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Acked-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7be680f7
    • Daniel Borkmann's avatar
      bpf, s390: fix jit branch offset related to ldimm64 · c531692a
      Daniel Borkmann authored
      
      [ Upstream commit b0a0c256 ]
      
      While testing some other work that required JIT modifications, I
      run into test_bpf causing a hang when JIT enabled on s390. The
      problematic test case was the one from ddc665a4 (bpf, arm64:
      fix jit branch offset related to ldimm64), and turns out that we
      do have a similar issue on s390 as well. In bpf_jit_prog() we
      update next instruction address after returning from bpf_jit_insn()
      with an insn_count. bpf_jit_insn() returns either -1 in case of
      error (e.g. unsupported insn), 1 or 2. The latter is only the
      case for ldimm64 due to spanning 2 insns, however, next address
      is only set to i + 1 not taking actual insn_count into account,
      thus fix is to use insn_count instead of 1. bpf_jit_enable in
      mode 2 provides also disasm on s390:
      
      Before fix:
      
        000003ff800349b6: a7f40003   brc     15,3ff800349bc                 ; target
        000003ff800349ba: 0000               unknown
        000003ff800349bc: e3b0f0700024       stg     %r11,112(%r15)
        000003ff800349c2: e3e0f0880024       stg     %r14,136(%r15)
        000003ff800349c8: 0db0               basr    %r11,%r0
        000003ff800349ca: c0ef00000000       llilf   %r14,0
        000003ff800349d0: e320b0360004       lg      %r2,54(%r11)
        000003ff800349d6: e330b03e0004       lg      %r3,62(%r11)
        000003ff800349dc: ec23ffeda065       clgrj   %r2,%r3,10,3ff800349b6 ; jmp
        000003ff800349e2: e3e0b0460004       lg      %r14,70(%r11)
        000003ff800349e8: e3e0b04e0004       lg      %r14,78(%r11)
        000003ff800349ee: b904002e   lgr     %r2,%r14
        000003ff800349f2: e3b0f0700004       lg      %r11,112(%r15)
        000003ff800349f8: e3e0f0880004       lg      %r14,136(%r15)
        000003ff800349fe: 07fe               bcr     15,%r14
      
      After fix:
      
        000003ff80ef3db4: a7f40003   brc     15,3ff80ef3dba
        000003ff80ef3db8: 0000               unknown
        000003ff80ef3dba: e3b0f0700024       stg     %r11,112(%r15)
        000003ff80ef3dc0: e3e0f0880024       stg     %r14,136(%r15)
        000003ff80ef3dc6: 0db0               basr    %r11,%r0
        000003ff80ef3dc8: c0ef00000000       llilf   %r14,0
        000003ff80ef3dce: e320b0360004       lg      %r2,54(%r11)
        000003ff80ef3dd4: e330b03e0004       lg      %r3,62(%r11)
        000003ff80ef3dda: ec230006a065       clgrj   %r2,%r3,10,3ff80ef3de6 ; jmp
        000003ff80ef3de0: e3e0b0460004       lg      %r14,70(%r11)
        000003ff80ef3de6: e3e0b04e0004       lg      %r14,78(%r11)          ; target
        000003ff80ef3dec: b904002e   lgr     %r2,%r14
        000003ff80ef3df0: e3b0f0700004       lg      %r11,112(%r15)
        000003ff80ef3df6: e3e0f0880004       lg      %r14,136(%r15)
        000003ff80ef3dfc: 07fe               bcr     15,%r14
      
      test_bpf.ko suite runs fine after the fix.
      
      Fixes: 05462310 ("s390/bpf: Add s390x eBPF JIT compiler backend")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: default avatarMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c531692a
    • Xin Long's avatar
      ipv6: set rt6i_protocol properly in the route when it is installed · f725c587
      Xin Long authored
      
      [ Upstream commit b91d5329 ]
      
      After commit c2ed1880 ("net: ipv6: check route protocol when
      deleting routes"), ipv6 route checks rt protocol when trying to
      remove a rt entry.
      
      It introduced a side effect causing 'ip -6 route flush cache' not
      to work well. When flushing caches with iproute, all route caches
      get dumped from kernel then removed one by one by sending DELROUTE
      requests to kernel for each cache.
      
      The thing is iproute sends the request with the cache whose proto
      is set with RTPROT_REDIRECT by rt6_fill_node() when kernel dumps
      it. But in kernel the rt_cache protocol is still 0, which causes
      the cache not to be matched and removed.
      
      So the real reason is rt6i_protocol in the route is not set when
      it is allocated. As David Ahern's suggestion, this patch is to
      set rt6i_protocol properly in the route when it is installed and
      remove the codes setting rtm_protocol according to rt6i_flags in
      rt6_fill_node.
      
      This is also an improvement to keep rt6i_protocol consistent with
      rtm_protocol.
      
      Fixes: c2ed1880 ("net: ipv6: check route protocol when deleting routes")
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Suggested-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f725c587
    • Eric Dumazet's avatar
      net: fix keepalive code vs TCP_FASTOPEN_CONNECT · 963f8fb6
      Eric Dumazet authored
      
      [ Upstream commit 2dda6400 ]
      
      syzkaller was able to trigger a divide by 0 in TCP stack [1]
      
      Issue here is that keepalive timer needs to be updated to not attempt
      to send a probe if the connection setup was deferred using
      TCP_FASTOPEN_CONNECT socket option added in linux-4.11
      
      [1]
       divide error: 0000 [#1] SMP
       CPU: 18 PID: 0 Comm: swapper/18 Not tainted
       task: ffff986f62f4b040 ti: ffff986f62fa2000 task.ti: ffff986f62fa2000
       RIP: 0010:[<ffffffff8409cc0d>]  [<ffffffff8409cc0d>] __tcp_select_window+0x8d/0x160
       Call Trace:
        <IRQ>
        [<ffffffff8409d951>] tcp_transmit_skb+0x11/0x20
        [<ffffffff8409da21>] tcp_xmit_probe_skb+0xc1/0xe0
        [<ffffffff840a0ee8>] tcp_write_wakeup+0x68/0x160
        [<ffffffff840a151b>] tcp_keepalive_timer+0x17b/0x230
        [<ffffffff83b3f799>] call_timer_fn+0x39/0xf0
        [<ffffffff83b40797>] run_timer_softirq+0x1d7/0x280
        [<ffffffff83a04ddb>] __do_softirq+0xcb/0x257
        [<ffffffff83ae03ac>] irq_exit+0x9c/0xb0
        [<ffffffff83a04c1a>] smp_apic_timer_interrupt+0x6a/0x80
        [<ffffffff83a03eaf>] apic_timer_interrupt+0x7f/0x90
        <EOI>
        [<ffffffff83fed2ea>] ? cpuidle_enter_state+0x13a/0x3b0
        [<ffffffff83fed2cd>] ? cpuidle_enter_state+0x11d/0x3b0
      
      Tested:
      
      Following packetdrill no longer crashes the kernel
      
      `echo 0 >/proc/sys/net/ipv4/tcp_timestamps`
      
      // Cache warmup: send a Fast Open cookie request
          0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
         +0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
         +0 setsockopt(3, SOL_TCP, TCP_FASTOPEN_CONNECT, [1], 4) = 0
         +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation is now in progress)
         +0 > S 0:0(0) <mss 1460,nop,nop,sackOK,nop,wscale 8,FO,nop,nop>
       +.01 < S. 123:123(0) ack 1 win 14600 <mss 1460,nop,nop,sackOK,nop,wscale 6,FO abcd1234,nop,nop>
         +0 > . 1:1(0) ack 1
         +0 close(3) = 0
         +0 > F. 1:1(0) ack 1
         +0 < F. 1:1(0) ack 2 win 92
         +0 > .  2:2(0) ack 2
      
         +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 4
         +0 fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
         +0 setsockopt(4, SOL_TCP, TCP_FASTOPEN_CONNECT, [1], 4) = 0
         +0 setsockopt(4, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
       +.01 connect(4, ..., ...) = 0
         +0 setsockopt(4, SOL_TCP, TCP_KEEPIDLE, [5], 4) = 0
         +10 close(4) = 0
      
      `echo 1 >/proc/sys/net/ipv4/tcp_timestamps`
      
      Fixes: 19f6d3f3 ("net/tcp-fastopen: Add new API support")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Wei Wang <weiwan@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      963f8fb6
    • Yuchung Cheng's avatar
      tcp: avoid setting cwnd to invalid ssthresh after cwnd reduction states · 4fe36242
      Yuchung Cheng authored
      
      [ Upstream commit ed254971 ]
      
      If the sender switches the congestion control during ECN-triggered
      cwnd-reduction state (CA_CWR), upon exiting recovery cwnd is set to
      the ssthresh value calculated by the previous congestion control. If
      the previous congestion control is BBR that always keep ssthresh
      to TCP_INIFINITE_SSTHRESH, cwnd ends up being infinite. The safe
      step is to avoid assigning invalid ssthresh value when recovery ends.
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4fe36242
    • Guillaume Nault's avatar
      ppp: fix xmit recursion detection on ppp channels · 5028e6c9
      Guillaume Nault authored
      
      [ Upstream commit 0a0e1a85 ]
      
      Commit e5dadc65 ("ppp: Fix false xmit recursion detect with two ppp
      devices") dropped the xmit_recursion counter incrementation in
      ppp_channel_push() and relied on ppp_xmit_process() for this task.
      But __ppp_channel_push() can also send packets directly (using the
      .start_xmit() channel callback), in which case the xmit_recursion
      counter isn't incremented anymore. If such packets get routed back to
      the parent ppp unit, ppp_xmit_process() won't notice the recursion and
      will call ppp_channel_push() on the same channel, effectively creating
      the deadlock situation that the xmit_recursion mechanism was supposed
      to prevent.
      
      This patch re-introduces the xmit_recursion counter incrementation in
      ppp_channel_push(). Since the xmit_recursion variable is now part of
      the parent ppp unit, incrementation is skipped if the channel doesn't
      have any. This is fine because only packets routed through the parent
      unit may enter the channel recursively.
      
      Finally, we have to ensure that pch->ppp is not going to be modified
      while executing ppp_channel_push(). Instead of taking this lock only
      while calling ppp_xmit_process(), we now have to hold it for the full
      ppp_channel_push() execution. This respects the ppp locks ordering
      which requires locking ->upl before ->downl.
      
      Fixes: e5dadc65 ("ppp: Fix false xmit recursion detect with two ppp devices")
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5028e6c9
    • Gao Feng's avatar
      ppp: Fix false xmit recursion detect with two ppp devices · 67410cd2
      Gao Feng authored
      
      [ Upstream commit e5dadc65 ]
      
      The global percpu variable ppp_xmit_recursion is used to detect the ppp
      xmit recursion to avoid the deadlock, which is caused by one CPU tries to
      lock the xmit lock twice. But it would report false recursion when one CPU
      wants to send the skb from two different PPP devices, like one L2TP on the
      PPPoE. It is a normal case actually.
      
      Now use one percpu member of struct ppp instead of the gloable variable to
      detect the xmit recursion of one ppp device.
      
      Fixes: 55454a56 ("ppp: avoid dealock on recursive xmit")
      Signed-off-by: default avatarGao Feng <gfree.wind@vip.163.com>
      Signed-off-by: default avatarLiu Jianying <jianying.liu@ikuai8.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      67410cd2
  3. 11 Aug, 2017 7 commits