1. 14 Jun, 2017 32 commits
    • Jin Yao's avatar
      perf/core: Drop kernel samples even though :u is specified · 389dafda
      Jin Yao authored
      commit cc1582c2 upstream.
      
      When doing sampling, for example:
      
        perf record -e cycles:u ...
      
      On workloads that do a lot of kernel entry/exits we see kernel
      samples, even though :u is specified. This is due to skid existing.
      
      This might be a security issue because it can leak kernel addresses even
      though kernel sampling support is disabled.
      
      The patch drops the kernel samples if exclude_kernel is specified.
      
      For example, test on Haswell desktop:
      
        perf record -e cycles:u <mgen>
        perf report --stdio
      
      Before patch applied:
      
          99.77%  mgen     mgen              [.] buf_read
           0.20%  mgen     mgen              [.] rand_buf_init
           0.01%  mgen     [kernel.vmlinux]  [k] apic_timer_interrupt
           0.00%  mgen     mgen              [.] last_free_elem
           0.00%  mgen     libc-2.23.so      [.] __random_r
           0.00%  mgen     libc-2.23.so      [.] _int_malloc
           0.00%  mgen     mgen              [.] rand_array_init
           0.00%  mgen     [kernel.vmlinux]  [k] page_fault
           0.00%  mgen     libc-2.23.so      [.] __random
           0.00%  mgen     libc-2.23.so      [.] __strcasestr
           0.00%  mgen     ld-2.23.so        [.] strcmp
           0.00%  mgen     ld-2.23.so        [.] _dl_start
           0.00%  mgen     libc-2.23.so      [.] sched_setaffinity@@GLIBC_2.3.4
           0.00%  mgen     ld-2.23.so        [.] _start
      
      We can see kernel symbols apic_timer_interrupt and page_fault.
      
      After patch applied:
      
          99.79%  mgen     mgen           [.] buf_read
           0.19%  mgen     mgen           [.] rand_buf_init
           0.00%  mgen     libc-2.23.so   [.] __random_r
           0.00%  mgen     mgen           [.] rand_array_init
           0.00%  mgen     mgen           [.] last_free_elem
           0.00%  mgen     libc-2.23.so   [.] vfprintf
           0.00%  mgen     libc-2.23.so   [.] rand
           0.00%  mgen     libc-2.23.so   [.] __random
           0.00%  mgen     libc-2.23.so   [.] _int_malloc
           0.00%  mgen     libc-2.23.so   [.] _IO_doallocbuf
           0.00%  mgen     ld-2.23.so     [.] do_lookup_x
           0.00%  mgen     ld-2.23.so     [.] open_verify.constprop.7
           0.00%  mgen     ld-2.23.so     [.] _dl_important_hwcaps
           0.00%  mgen     libc-2.23.so   [.] sched_setaffinity@@GLIBC_2.3.4
           0.00%  mgen     ld-2.23.so     [.] _start
      
      There are only userspace symbols.
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Cc: jolsa@kernel.org
      Cc: kan.liang@intel.com
      Cc: mark.rutland@arm.com
      Cc: will.deacon@arm.com
      Cc: yao.jin@intel.com
      Link: http://lkml.kernel.org/r/1495706947-3744-1-git-send-email-yao.jin@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      389dafda
    • Michael Ellerman's avatar
      powerpc/numa: Fix percpu allocations to be NUMA aware · 138bb148
      Michael Ellerman authored
      commit ba4a648f upstream.
      
      In commit 8c272261 ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID"), we
      switched to the generic implementation of cpu_to_node(), which uses a percpu
      variable to hold the NUMA node for each CPU.
      
      Unfortunately we neglected to notice that we use cpu_to_node() in the allocation
      of our percpu areas, leading to a chicken and egg problem. In practice what
      happens is when we are setting up the percpu areas, cpu_to_node() reports that
      all CPUs are on node 0, so we allocate all percpu areas on node 0.
      
      This is visible in the dmesg output, as all pcpu allocs being in group 0:
      
        pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
        pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
        pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
        pcpu-alloc: [0] 24 25 26 27 [0] 28 29 30 31
        pcpu-alloc: [0] 32 33 34 35 [0] 36 37 38 39
        pcpu-alloc: [0] 40 41 42 43 [0] 44 45 46 47
      
      To fix it we need an early_cpu_to_node() which can run prior to percpu being
      setup. We already have the numa_cpu_lookup_table we can use, so just plumb it
      in. With the patch dmesg output shows two groups, 0 and 1:
      
        pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
        pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
        pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
        pcpu-alloc: [1] 24 25 26 27 [1] 28 29 30 31
        pcpu-alloc: [1] 32 33 34 35 [1] 36 37 38 39
        pcpu-alloc: [1] 40 41 42 43 [1] 44 45 46 47
      
      We can also check the data_offset in the paca of various CPUs, with the fix we
      see:
      
        CPU 0:  data_offset = 0x0ffe8b0000
        CPU 24: data_offset = 0x1ffe5b0000
      
      And we can see from dmesg that CPU 24 has an allocation on node 1:
      
        node   0: [mem 0x0000000000000000-0x0000000fffffffff]
        node   1: [mem 0x0000001000000000-0x0000001fffffffff]
      
      Fixes: 8c272261 ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID")
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      138bb148
    • Russell Currey's avatar
      powerpc/eeh: Avoid use after free in eeh_handle_special_event() · 82ee8ae3
      Russell Currey authored
      commit daeba295 upstream.
      
      eeh_handle_special_event() is called when an EEH event is detected but
      can't be narrowed down to a specific PE.  This function looks through
      every PE to find one in an erroneous state, then calls the regular event
      handler eeh_handle_normal_event() once it knows which PE has an error.
      
      However, if eeh_handle_normal_event() found that the PE cannot possibly
      be recovered, it will free it, rendering the passed PE stale.
      This leads to a use after free in eeh_handle_special_event() as it attempts to
      clear the "recovering" state on the PE after eeh_handle_normal_event() returns.
      
      Thus, make sure the PE is valid when attempting to clear state in
      eeh_handle_special_event().
      
      Fixes: 8a6b1bc7 ("powerpc/eeh: EEH core to handle special event")
      Reported-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: default avatarRussell Currey <ruscur@russell.cc>
      Reviewed-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      82ee8ae3
    • Johannes Thumshirn's avatar
      scsi: qla2xxx: don't disable a not previously enabled PCI device · 1c59546e
      Johannes Thumshirn authored
      commit ddff7ed4 upstream.
      
      When pci_enable_device() or pci_enable_device_mem() fail in
      qla2x00_probe_one() we bail out but do a call to
      pci_disable_device(). This causes the dev_WARN_ON() in
      pci_disable_device() to trigger, as the device wasn't enabled
      previously.
      
      So instead of taking the 'probe_out' error path we can directly return
      *iff* one of the pci_enable_device() calls fails.
      
      Additionally rename the 'probe_out' goto label's name to the more
      descriptive 'disable_device'.
      Signed-off-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Fixes: e315cd28 ("[SCSI] qla2xxx: Code changes for qla data structure refactoring")
      Reviewed-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Reviewed-by: default avatarGiridhar Malavali <giridhar.malavali@cavium.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1c59546e
    • Jeff Mahoney's avatar
      btrfs: fix memory leak in update_space_info failure path · 167d652e
      Jeff Mahoney authored
      commit 896533a7 upstream.
      
      If we fail to add the space_info kobject, we'll leak the memory
      for the percpu counter.
      
      Fixes: 6ab0a202 (btrfs: publish allocation data in sysfs)
      Signed-off-by: default avatarJeff Mahoney <jeffm@suse.com>
      Reviewed-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      167d652e
    • David Sterba's avatar
      btrfs: use correct types for page indices in btrfs_page_exists_in_range · 0ff6be8f
      David Sterba authored
      commit cc2b702c upstream.
      
      Variables start_idx and end_idx are supposed to hold a page index
      derived from the file offsets. The int type is not the right one though,
      offsets larger than 1 << 44 will get silently trimmed off the high bits.
      (1 << 44 is 16TiB)
      
      What can go wrong, if start is below the boundary and end gets trimmed:
      - if there's a page after start, we'll find it (radix_tree_gang_lookup_slot)
      - the final check "if (page->index <= end_idx)" will unexpectedly fail
      
      The function will return false, ie. "there's no page in the range",
      although there is at least one.
      
      btrfs_page_exists_in_range is used to prevent races in:
      
      * in hole punching, where we make sure there are not pages in the
        truncated range, otherwise we'll wait for them to finish and redo
        truncation, but we're going to replace the pages with holes anyway so
        the only problem is the intermediate state
      
      * lock_extent_direct: we want to make sure there are no pages before we
        lock and start DIO, to prevent stale data reads
      
      For practical occurence of the bug, there are several constaints.  The
      file must be quite large, the affected range must cross the 16TiB
      boundary and the internal state of the file pages and pending operations
      must match.  Also, we must not have started any ordered data in the
      range, otherwise we don't even reach the buggy function check.
      
      DIO locking tries hard in several places to avoid deadlocks with
      buffered IO and avoids waiting for ranges. The worst consequence seems
      to be stale data read.
      
      CC: Liu Bo <bo.li.liu@oracle.com>
      Fixes: fc4adbff ("btrfs: Drop EXTENT_UPTODATE check in hole punching and direct locking")
      Reviewed-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0ff6be8f
    • Daniel Micay's avatar
      stackprotector: Increase the per-task stack canary's random range from 32 bits... · 74d7dce0
      Daniel Micay authored
      stackprotector: Increase the per-task stack canary's random range from 32 bits to 64 bits on 64-bit platforms
      
      commit 5ea30e4e upstream.
      
      The stack canary is an 'unsigned long' and should be fully initialized to
      random data rather than only 32 bits of random data.
      Signed-off-by: default avatarDaniel Micay <danielmicay@gmail.com>
      Acked-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Arjan van Ven <arjan@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kernel-hardening@lists.openwall.com
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20170504133209.3053-1-danielmicay@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      74d7dce0
    • Eric Biggers's avatar
      random: properly align get_random_int_hash · 804b19ca
      Eric Biggers authored
      commit b1132dea upstream.
      
      get_random_long() reads from the get_random_int_hash array using an
      unsigned long pointer.  For this code to be guaranteed correct on all
      architectures, the array must be aligned to an unsigned long boundary.
      Signed-off-by: default avatarEric Biggers <ebiggers3@gmail.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      804b19ca
    • Daniel Cashman's avatar
      drivers: char: random: add get_random_long() · f7533e65
      Daniel Cashman authored
      commit ec9ee4ac upstream.
      
      Commit d07e2259 ("mm: mmap: add new /proc tunable for mmap_base
      ASLR") added the ability to choose from a range of values to use for
      entropy count in generating the random offset to the mmap_base address.
      
      The maximum value on this range was set to 32 bits for 64-bit x86
      systems, but this value could be increased further, requiring more than
      the 32 bits of randomness provided by get_random_int(), as is already
      possible for arm64.  Add a new function: get_random_long() which more
      naturally fits with the mmap usage of get_random_int() but operates
      exactly the same as get_random_int().
      
      Also, fix the shifting constant in mmap_rnd() to be an unsigned long so
      that values greater than 31 bits generate an appropriate mask without
      overflow.  This is especially important on x86, as its shift instruction
      uses a 5-bit mask for the shift operand, which meant that any value for
      mmap_rnd_bits over 31 acts as a no-op and effectively disables mmap_base
      randomization.
      
      Finally, replace calls to get_random_int() with get_random_long() where
      appropriate.
      
      This patch (of 2):
      
      Add get_random_long().
      Signed-off-by: default avatarDaniel Cashman <dcashman@android.com>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Nick Kralevich <nnk@google.com>
      Cc: Jeff Vander Stoep <jeffv@google.com>
      Cc: Mark Salyzyn <salyzyn@android.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f7533e65
    • Matt Ranostay's avatar
      iio: proximity: as3935: fix AS3935_INT mask · 74eab0e8
      Matt Ranostay authored
      commit 275292d3 upstream.
      
      AS3935 interrupt mask has been incorrect so valid lightning events
      would never trigger an buffer event. Also noise interrupt should be
      BIT(0).
      
      Fixes: 24ddb0e4 ("iio: Add AS3935 lightning sensor support")
      Signed-off-by: default avatarMatt Ranostay <matt.ranostay@konsulko.com>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      74eab0e8
    • Oleg Drokin's avatar
      staging/lustre/lov: remove set_fs() call from lov_getstripe() · 1bd1abfa
      Oleg Drokin authored
      commit 0a33252e upstream.
      
      lov_getstripe() calls set_fs(KERNEL_DS) so that it can handle a struct
      lov_user_md pointer from user- or kernel-space.  This changes the
      behavior of copy_from_user() on SPARC and may result in a misaligned
      access exception which in turn oopses the kernel.  In fact the
      relevant argument to lov_getstripe() is never called with a
      kernel-space pointer and so changing the address limits is unnecessary
      and so we remove the calls to save, set, and restore the address
      limits.
      Signed-off-by: default avatarJohn L. Hammond <john.hammond@intel.com>
      Reviewed-on: http://review.whamcloud.com/6150
      Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3221Reviewed-by: default avatarAndreas Dilger <andreas.dilger@intel.com>
      Reviewed-by: default avatarLi Wei <wei.g.li@intel.com>
      Signed-off-by: default avatarOleg Drokin <green@linuxhacker.ru>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1bd1abfa
    • Michael Thalmeier's avatar
      usb: chipidea: debug: check before accessing ci_role · 0d335f5b
      Michael Thalmeier authored
      commit 0340ff83 upstream.
      
      ci_role BUGs when the role is >= CI_ROLE_END.
      Signed-off-by: default avatarMichael Thalmeier <michael.thalmeier@hale.at>
      Signed-off-by: default avatarPeter Chen <peter.chen@nxp.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0d335f5b
    • Jisheng Zhang's avatar
      usb: chipidea: udc: fix NULL pointer dereference if udc_start failed · 900b4568
      Jisheng Zhang authored
      commit aa1f058d upstream.
      
      Fix below NULL pointer dereference. we set ci->roles[CI_ROLE_GADGET]
      too early in ci_hdrc_gadget_init(), if udc_start() fails due to some
      reason, the ci->roles[CI_ROLE_GADGET] check in  ci_hdrc_gadget_destroy
      can't protect us.
      
      We fix this issue by only setting ci->roles[CI_ROLE_GADGET] if
      udc_start() succeed.
      
      [    1.398550] Unable to handle kernel NULL pointer dereference at
      virtual address 00000000
      ...
      [    1.448600] PC is at dma_pool_free+0xb8/0xf0
      [    1.453012] LR is at dma_pool_free+0x28/0xf0
      [    2.113369] [<ffffff80081817d8>] dma_pool_free+0xb8/0xf0
      [    2.118857] [<ffffff800841209c>] destroy_eps+0x4c/0x68
      [    2.124165] [<ffffff8008413770>] ci_hdrc_gadget_destroy+0x28/0x50
      [    2.130461] [<ffffff800840fa30>] ci_hdrc_probe+0x588/0x7e8
      [    2.136129] [<ffffff8008380fb8>] platform_drv_probe+0x50/0xb8
      [    2.142066] [<ffffff800837f494>] driver_probe_device+0x1fc/0x2a8
      [    2.148270] [<ffffff800837f68c>] __device_attach_driver+0x9c/0xf8
      [    2.154563] [<ffffff800837d570>] bus_for_each_drv+0x58/0x98
      [    2.160317] [<ffffff800837f174>] __device_attach+0xc4/0x138
      [    2.166072] [<ffffff800837f738>] device_initial_probe+0x10/0x18
      [    2.172185] [<ffffff800837e58c>] bus_probe_device+0x94/0xa0
      [    2.177940] [<ffffff800837c560>] device_add+0x3f0/0x560
      [    2.183337] [<ffffff8008380d20>] platform_device_add+0x180/0x240
      [    2.189541] [<ffffff800840f0e8>] ci_hdrc_add_device+0x440/0x4f8
      [    2.195654] [<ffffff8008414194>] ci_hdrc_usb2_probe+0x13c/0x2d8
      [    2.201769] [<ffffff8008380fb8>] platform_drv_probe+0x50/0xb8
      [    2.207705] [<ffffff800837f494>] driver_probe_device+0x1fc/0x2a8
      [    2.213910] [<ffffff800837f5ec>] __driver_attach+0xac/0xb0
      [    2.219575] [<ffffff800837d4b0>] bus_for_each_dev+0x60/0xa0
      [    2.225329] [<ffffff800837ec80>] driver_attach+0x20/0x28
      [    2.230816] [<ffffff800837e880>] bus_add_driver+0x1d0/0x238
      [    2.236571] [<ffffff800837fdb0>] driver_register+0x60/0xf8
      [    2.242237] [<ffffff8008380ef4>] __platform_driver_register+0x44/0x50
      [    2.248891] [<ffffff80086fd440>] ci_hdrc_usb2_driver_init+0x18/0x20
      [    2.255365] [<ffffff8008082950>] do_one_initcall+0x38/0x128
      [    2.261121] [<ffffff80086e0d00>] kernel_init_freeable+0x1ac/0x250
      [    2.267414] [<ffffff800852f0b8>] kernel_init+0x10/0x100
      [    2.272810] [<ffffff8008082680>] ret_from_fork+0x10/0x50
      
      Fixes: 3f124d23 ("usb: chipidea: add role init and destroy APIs")
      Signed-off-by: default avatarJisheng Zhang <jszhang@marvell.com>
      Signed-off-by: default avatarPeter Chen <peter.chen@nxp.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      900b4568
    • Thinh Nguyen's avatar
      usb: gadget: f_mass_storage: Serialize wake and sleep execution · 52d2cabe
      Thinh Nguyen authored
      commit dc9217b6 upstream.
      
      f_mass_storage has a memorry barrier issue with the sleep and wake
      functions that can cause a deadlock. This results in intermittent hangs
      during MSC file transfer. The host will reset the device after receiving
      no response to resume the transfer. This issue is seen when dwc3 is
      processing 2 transfer-in-progress events at the same time, invoking
      completion handlers for CSW and CBW. Also this issue occurs depending on
      the system timing and latency.
      
      To increase the chance to hit this issue, you can force dwc3 driver to
      wait and process those 2 events at once by adding a small delay (~100us)
      in dwc3_check_event_buf() whenever the request is for CSW and read the
      event count again. Avoid debugging with printk and ftrace as extra
      delays and memory barrier will mask this issue.
      
      Scenario which can lead to failure:
      -----------------------------------
      1) The main thread sleeps and waits for the next command in
         get_next_command().
      2) bulk_in_complete() wakes up main thread for CSW.
      3) bulk_out_complete() tries to wake up the running main thread for CBW.
      4) thread_wakeup_needed is not loaded with correct value in
         sleep_thread().
      5) Main thread goes to sleep again.
      
      The pattern is shown below. Note the 2 critical variables.
       * common->thread_wakeup_needed
       * bh->state
      
      	CPU 0 (sleep_thread)		CPU 1 (wakeup_thread)
      	==============================  ===============================
      
      					bh->state = BH_STATE_FULL;
      					smp_wmb();
      	thread_wakeup_needed = 0;	thread_wakeup_needed = 1;
      	smp_rmb();
      	if (bh->state != BH_STATE_FULL)
      		sleep again ...
      
      As pointed out by Alan Stern, this is an R-pattern issue. The issue can
      be seen when there are two wakeups in quick succession. The
      thread_wakeup_needed can be overwritten in sleep_thread, and the read of
      the bh->state maybe reordered before the write to thread_wakeup_needed.
      
      This patch applies full memory barrier smp_mb() in both sleep_thread()
      and wakeup_thread() to ensure the order which the thread_wakeup_needed
      and bh->state are written and loaded.
      
      However, a better solution in the future would be to use wait_queue
      method that takes care of managing memory barrier between waker and
      waiter.
      Acked-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: default avatarThinh Nguyen <thinhn@synopsys.com>
      Signed-off-by: default avatarFelipe Balbi <felipe.balbi@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      52d2cabe
    • Konstantin Khlebnikov's avatar
      ext4: keep existing extra fields when inode expands · 99362b9c
      Konstantin Khlebnikov authored
      commit 887a9730 upstream.
      
      ext4_expand_extra_isize() should clear only space between old and new
      size.
      
      Fixes: 6dd4ee7c # v2.6.23
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      99362b9c
    • Jan Kara's avatar
      ext4: fix SEEK_HOLE · 07e47097
      Jan Kara authored
      commit 7d95eddf upstream.
      
      Currently, SEEK_HOLE implementation in ext4 may both return that there's
      a hole at some offset although that offset already has data and skip
      some holes during a search for the next hole. The first problem is
      demostrated by:
      
      xfs_io -c "falloc 0 256k" -c "pwrite 0 56k" -c "seek -h 0" file
      wrote 57344/57344 bytes at offset 0
      56 KiB, 14 ops; 0.0000 sec (2.054 GiB/sec and 538461.5385 ops/sec)
      Whence	Result
      HOLE	0
      
      Where we can see that SEEK_HOLE wrongly returned offset 0 as containing
      a hole although we have written data there. The second problem can be
      demonstrated by:
      
      xfs_io -c "falloc 0 256k" -c "pwrite 0 56k" -c "pwrite 128k 8k"
             -c "seek -h 0" file
      
      wrote 57344/57344 bytes at offset 0
      56 KiB, 14 ops; 0.0000 sec (1.978 GiB/sec and 518518.5185 ops/sec)
      wrote 8192/8192 bytes at offset 131072
      8 KiB, 2 ops; 0.0000 sec (2 GiB/sec and 500000.0000 ops/sec)
      Whence	Result
      HOLE	139264
      
      Where we can see that hole at offsets 56k..128k has been ignored by the
      SEEK_HOLE call.
      
      The underlying problem is in the ext4_find_unwritten_pgoff() which is
      just buggy. In some cases it fails to update returned offset when it
      finds a hole (when no pages are found or when the first found page has
      higher index than expected), in some cases conditions for detecting hole
      are just missing (we fail to detect a situation where indices of
      returned pages are not contiguous).
      
      Fix ext4_find_unwritten_pgoff() to properly detect non-contiguous page
      indices and also handle all cases where we got less pages then expected
      in one place and handle it properly there.
      
      Fixes: c8c0df24
      CC: Zheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      07e47097
    • Alexander Sverdlin's avatar
      dmaengine: ep93xx: Always start from BASE0 · b6c8d568
      Alexander Sverdlin authored
      commit 0037ae47 upstream.
      
      The current buffer is being reset to zero on device_free_chan_resources()
      but not on device_terminate_all(). It could happen that HW is restarted and
      expects BASE0 to be used, but the driver is not synchronized and will start
      from BASE1. One solution is to reset the buffer explicitly in
      m2p_hw_setup().
      Signed-off-by: default avatarAlexander Sverdlin <alexander.sverdlin@gmail.com>
      Signed-off-by: default avatarVinod Koul <vinod.koul@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b6c8d568
    • Marc Zyngier's avatar
      arm: KVM: Allow unaligned accesses at HYP · 84b47444
      Marc Zyngier authored
      commit 33b5c388 upstream.
      
      We currently have the HSCTLR.A bit set, trapping unaligned accesses
      at HYP, but we're not really prepared to deal with it.
      
      Since the rest of the kernel is pretty happy about that, let's follow
      its example and set HSCTLR.A to zero. Modern CPUs don't really care.
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarChristoffer Dall <cdall@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      84b47444
    • Wanpeng Li's avatar
      KVM: cpuid: Fix read/write out-of-bounds vulnerability in cpuid emulation · 9cdc5d2e
      Wanpeng Li authored
      commit a3641631 upstream.
      
      If "i" is the last element in the vcpu->arch.cpuid_entries[] array, it
      potentially can be exploited the vulnerability. this will out-of-bounds
      read and write.  Luckily, the effect is small:
      
      	/* when no next entry is found, the current entry[i] is reselected */
      	for (j = i + 1; ; j = (j + 1) % nent) {
      		struct kvm_cpuid_entry2 *ej = &vcpu->arch.cpuid_entries[j];
      		if (ej->function == e->function) {
      
      It reads ej->maxphyaddr, which is user controlled.  However...
      
      			ej->flags |= KVM_CPUID_FLAG_STATE_READ_NEXT;
      
      After cpuid_entries there is
      
      	int maxphyaddr;
      	struct x86_emulate_ctxt emulate_ctxt;  /* 16-byte aligned */
      
      So we have:
      
      - cpuid_entries at offset 1B50 (6992)
      - maxphyaddr at offset 27D0 (6992 + 3200 = 10192)
      - padding at 27D4...27DF
      - emulate_ctxt at 27E0
      
      And it writes in the padding.  Pfew, writing the ops field of emulate_ctxt
      would have been much worse.
      
      This patch fixes it by modding the index to avoid the out-of-bounds
      access. Worst case, i == j and ej->function == e->function,
      the loop can bail out.
      Reported-by: default avatarMoguofang <moguofang@huawei.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Guofang Mo <moguofang@huawei.com>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9cdc5d2e
    • Paolo Bonzini's avatar
      kvm: async_pf: fix rcu_irq_enter() with irqs enabled · 1173e90e
      Paolo Bonzini authored
      commit bbaf0e2b upstream.
      
      native_safe_halt enables interrupts, and you just shouldn't
      call rcu_irq_enter() with interrupts enabled.  Reorder the
      call with the following local_irq_disable() to respect the
      invariant.
      Reported-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Acked-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1173e90e
    • J. Bruce Fields's avatar
      nfsd4: fix null dereference on replay · b324fbe2
      J. Bruce Fields authored
      commit 9a307403 upstream.
      
      if we receive a compound such that:
      
      	- the sessionid, slot, and sequence number in the SEQUENCE op
      	  match a cached succesful reply with N ops, and
      	- the Nth operation of the compound is a PUTFH, PUTPUBFH,
      	  PUTROOTFH, or RESTOREFH,
      
      then nfsd4_sequence will return 0 and set cstate->status to
      nfserr_replay_cache.  The current filehandle will not be set.  This will
      cause us to call check_nfsd_access with first argument NULL.
      
      To nfsd4_compound it looks like we just succesfully executed an
      operation that set a filehandle, but the current filehandle is not set.
      
      Fix this by moving the nfserr_replay_cache earlier.  There was never any
      reason to have it after the encode_op label, since the only case where
      he hit that is when opdesc->op_func sets it.
      
      Note that there are two ways we could hit this case:
      
      	- a client is resending a previously sent compound that ended
      	  with one of the four PUTFH-like operations, or
      	- a client is sending a *new* compound that (incorrectly) shares
      	  sessionid, slot, and sequence number with a previously sent
      	  compound, and the length of the previously sent compound
      	  happens to match the position of a PUTFH-like operation in the
      	  new compound.
      
      The second is obviously incorrect client behavior.  The first is also
      very strange--the only purpose of a PUTFH-like operation is to set the
      current filehandle to be used by the following operation, so there's no
      point in having it as the last in a compound.
      
      So it's likely this requires a buggy or malicious client to reproduce.
      Reported-by: default avatarScott Mayhew <smayhew@redhat.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b324fbe2
    • Gilad Ben-Yossef's avatar
      crypto: gcm - wait for crypto op not signal safe · 0640eeea
      Gilad Ben-Yossef authored
      commit f3ad5870 upstream.
      
      crypto_gcm_setkey() was using wait_for_completion_interruptible() to
      wait for completion of async crypto op but if a signal occurs it
      may return before DMA ops of HW crypto provider finish, thus
      corrupting the data buffer that is kfree'ed in this case.
      
      Resolve this by using wait_for_completion() instead.
      Reported-by: default avatarEric Biggers <ebiggers3@gmail.com>
      Signed-off-by: default avatarGilad Ben-Yossef <gilad@benyossef.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0640eeea
    • Eric Biggers's avatar
      KEYS: fix freeing uninitialized memory in key_update() · 45771cd7
      Eric Biggers authored
      commit 63a0b050 upstream.
      
      key_update() freed the key_preparsed_payload even if it was not
      initialized first.  This would cause a crash if userspace called
      keyctl_update() on a key with type like "asymmetric" that has a
      ->preparse() method but not an ->update() method.  Possibly it could
      even be triggered for other key types by racing with keyctl_setperm() to
      make the KEY_NEED_WRITE check fail (the permission was already checked,
      so normally it wouldn't fail there).
      
      Reproducer with key type "asymmetric", given a valid cert.der:
      
      keyctl new_session
      keyid=$(keyctl padd asymmetric desc @s < cert.der)
      keyctl setperm $keyid 0x3f000000
      keyctl update $keyid data
      
      [  150.686666] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
      [  150.687601] IP: asymmetric_key_free_kids+0x12/0x30
      [  150.688139] PGD 38a3d067
      [  150.688141] PUD 3b3de067
      [  150.688447] PMD 0
      [  150.688745]
      [  150.689160] Oops: 0000 [#1] SMP
      [  150.689455] Modules linked in:
      [  150.689769] CPU: 1 PID: 2478 Comm: keyctl Not tainted 4.11.0-rc4-xfstests-00187-ga9f6b6b8 #742
      [  150.690916] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-20170228_101828-anatol 04/01/2014
      [  150.692199] task: ffff88003b30c480 task.stack: ffffc90000350000
      [  150.692952] RIP: 0010:asymmetric_key_free_kids+0x12/0x30
      [  150.693556] RSP: 0018:ffffc90000353e58 EFLAGS: 00010202
      [  150.694142] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000004
      [  150.694845] RDX: ffffffff81ee3920 RSI: ffff88003d4b0700 RDI: 0000000000000001
      [  150.697569] RBP: ffffc90000353e60 R08: ffff88003d5d2140 R09: 0000000000000000
      [  150.702483] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
      [  150.707393] R13: 0000000000000004 R14: ffff880038a4d2d8 R15: 000000000040411f
      [  150.709720] FS:  00007fcbcee35700(0000) GS:ffff88003fd00000(0000) knlGS:0000000000000000
      [  150.711504] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  150.712733] CR2: 0000000000000001 CR3: 0000000039eab000 CR4: 00000000003406e0
      [  150.714487] Call Trace:
      [  150.714975]  asymmetric_key_free_preparse+0x2f/0x40
      [  150.715907]  key_update+0xf7/0x140
      [  150.716560]  ? key_default_cmp+0x20/0x20
      [  150.717319]  keyctl_update_key+0xb0/0xe0
      [  150.718066]  SyS_keyctl+0x109/0x130
      [  150.718663]  entry_SYSCALL_64_fastpath+0x1f/0xc2
      [  150.719440] RIP: 0033:0x7fcbce75ff19
      [  150.719926] RSP: 002b:00007ffd5d167088 EFLAGS: 00000206 ORIG_RAX: 00000000000000fa
      [  150.720918] RAX: ffffffffffffffda RBX: 0000000000404d80 RCX: 00007fcbce75ff19
      [  150.721874] RDX: 00007ffd5d16785e RSI: 000000002866cd36 RDI: 0000000000000002
      [  150.722827] RBP: 0000000000000006 R08: 000000002866cd36 R09: 00007ffd5d16785e
      [  150.723781] R10: 0000000000000004 R11: 0000000000000206 R12: 0000000000404d80
      [  150.724650] R13: 00007ffd5d16784d R14: 00007ffd5d167238 R15: 000000000040411f
      [  150.725447] Code: 83 c4 08 31 c0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 85 ff 74 23 55 48 89 e5 53 48 89 fb <48> 8b 3f e8 06 21 c5 ff 48 8b 7b 08 e8 fd 20 c5 ff 48 89 df e8
      [  150.727489] RIP: asymmetric_key_free_kids+0x12/0x30 RSP: ffffc90000353e58
      [  150.728117] CR2: 0000000000000001
      [  150.728430] ---[ end trace f7f8fe1da2d5ae8d ]---
      
      Fixes: 4d8c0250 ("KEYS: Call ->free_preparse() even after ->preparse() returns an error")
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarJames Morris <james.l.morris@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      45771cd7
    • Eric Biggers's avatar
      KEYS: fix dereferencing NULL payload with nonzero length · 8206e0a2
      Eric Biggers authored
      commit 5649645d upstream.
      
      sys_add_key() and the KEYCTL_UPDATE operation of sys_keyctl() allowed a
      NULL payload with nonzero length to be passed to the key type's
      ->preparse(), ->instantiate(), and/or ->update() methods.  Various key
      types including asymmetric, cifs.idmap, cifs.spnego, and pkcs7_test did
      not handle this case, allowing an unprivileged user to trivially cause a
      NULL pointer dereference (kernel oops) if one of these key types was
      present.  Fix it by doing the copy_from_user() when 'plen' is nonzero
      rather than when '_payload' is non-NULL, causing the syscall to fail
      with EFAULT as expected when an invalid buffer is specified.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarJames Morris <james.l.morris@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8206e0a2
    • Johan Hovold's avatar
      serial: ifx6x60: fix use-after-free on module unload · ff1a321f
      Johan Hovold authored
      commit 1e948479 upstream.
      
      Make sure to deregister the SPI driver before releasing the tty driver
      to avoid use-after-free in the SPI remove callback where the tty
      devices are deregistered.
      
      Fixes: 72d4724e ("serial: ifx6x60: Add modem power off function in the platform reboot process")
      Cc: Jun Chen <jun.d.chen@intel.com>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ff1a321f
    • Max Filippov's avatar
      net: ethoc: enable NAPI before poll may be scheduled · 665fe924
      Max Filippov authored
      
      [ Upstream commit d220b942 ]
      
      ethoc_reset enables device interrupts, ethoc_interrupt may schedule a
      NAPI poll before NAPI is enabled in the ethoc_open, which results in
      device being unable to send or receive anything until it's closed and
      reopened. In case the device is flooded with ingress packets it may be
      unable to recover at all.
      Move napi_enable above ethoc_reset in the ethoc_open to fix that.
      
      Fixes: a1702857 ("net: Add support for the OpenCores 10/100 Mbps Ethernet MAC.")
      Signed-off-by: default avatarMax Filippov <jcmvbkbc@gmail.com>
      Reviewed-by: default avatarTobias Klauser <tklauser@distanz.ch>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      665fe924
    • Eric Dumazet's avatar
      net: ping: do not abuse udp_poll() · 7c7c21ee
      Eric Dumazet authored
      
      [ Upstream commit 77d4b1d3 ]
      
      Alexander reported various KASAN messages triggered in recent kernels
      
      The problem is that ping sockets should not use udp_poll() in the first
      place, and recent changes in UDP stack finally exposed this old bug.
      
      Fixes: c319b4d7 ("net: ipv4: add IPPROTO_ICMP socket kind")
      Fixes: 6d0bfe22 ("net: ipv6: Add IPv6 support to the ping socket.")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Cc: Solar Designer <solar@openwall.com>
      Cc: Vasiliy Kulikov <segoon@openwall.com>
      Cc: Lorenzo Colitti <lorenzo@google.com>
      Acked-By: default avatarLorenzo Colitti <lorenzo@google.com>
      Tested-By: default avatarLorenzo Colitti <lorenzo@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7c7c21ee
    • David S. Miller's avatar
      ipv6: Fix leak in ipv6_gso_segment(). · 22315daf
      David S. Miller authored
      
      [ Upstream commit e3e86b51 ]
      
      If ip6_find_1stfragopt() fails and we return an error we have to free
      up 'segs' because nobody else is going to.
      
      Fixes: 2423496a ("ipv6: Prevent overrun when parsing v6 header options")
      Reported-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      22315daf
    • Yuchung Cheng's avatar
      tcp: disallow cwnd undo when switching congestion control · bca288c7
      Yuchung Cheng authored
      
      [ Upstream commit 44abafc4 ]
      
      When the sender switches its congestion control during loss
      recovery, if the recovery is spurious then it may incorrectly
      revert cwnd and ssthresh to the older values set by a previous
      congestion control. Consider a congestion control (like BBR)
      that does not use ssthresh and keeps it infinite: the connection
      may incorrectly revert cwnd to an infinite value when switching
      from BBR to another congestion control.
      
      This patch fixes it by disallowing such cwnd undo operation
      upon switching congestion control.  Note that undo_marker
      is not reset s.t. the packets that were incorrectly marked
      lost would be corrected. We only avoid undoing the cwnd in
      tcp_undo_cwnd_reduction().
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bca288c7
    • Ganesh Goudar's avatar
      cxgb4: avoid enabling napi twice to the same queue · 8c66bf2f
      Ganesh Goudar authored
      
      [ Upstream commit e7519f99 ]
      
      Take uld mutex to avoid race between cxgb_up() and
      cxgb4_register_uld() to enable napi for the same uld
      queue.
      Signed-off-by: default avatarGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8c66bf2f
    • Ben Hutchings's avatar
      ipv6: xfrm: Handle errors reported by xfrm6_find_1stfragopt() · 45039be5
      Ben Hutchings authored
      
      [ Upstream commit 6e80ac5c ]
      
      xfrm6_find_1stfragopt() may now return an error code and we must
      not treat it as a length.
      
      Fixes: 2423496a ("ipv6: Prevent overrun when parsing v6 header options")
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Acked-by: default avatarCraig Gallek <kraig@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      45039be5
    • Mintz, Yuval's avatar
      bnx2x: Fix Multi-Cos · 697d4230
      Mintz, Yuval authored
      
      [ Upstream commit 3968d389 ]
      
      Apparently multi-cos isn't working for bnx2x quite some time -
      driver implements ndo_select_queue() to allow queue-selection
      for FCoE, but the regular L2 flow would cause it to modulo the
      fallback's result by the number of queues.
      The fallback would return a queue matching the needed tc
      [via __skb_tx_hash()], but since the modulo is by the number of TSS
      queues where number of TCs is not accounted, transmission would always
      be done by a queue configured into using TC0.
      
      Fixes: ada7c19e ("bnx2x: use XPS if possible for bnx2x_select_queue instead of pure hash")
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      697d4230
  2. 07 Jun, 2017 8 commits
    • Greg Kroah-Hartman's avatar
      Linux 3.18.56 · 88ff45d0
      Greg Kroah-Hartman authored
      88ff45d0
    • Eric Sandeen's avatar
      xfs: fix unaligned access in xfs_btree_visit_blocks · c6444e02
      Eric Sandeen authored
      commit a4d768e7 upstream.
      
      This structure copy was throwing unaligned access warnings on sparc64:
      
      Kernel unaligned access at TPC[1043c088] xfs_btree_visit_blocks+0x88/0xe0 [xfs]
      
      xfs_btree_copy_ptrs does a memcpy, which avoids it.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c6444e02
    • Zorro Lang's avatar
      xfs: bad assertion for delalloc an extent that start at i_size · 0d600519
      Zorro Lang authored
      commit 892d2a5f upstream.
      
      By run fsstress long enough time enough in RHEL-7, I find an
      assertion failure (harder to reproduce on linux-4.11, but problem
      is still there):
      
        XFS: Assertion failed: (iflags & BMV_IF_DELALLOC) != 0, file: fs/xfs/xfs_bmap_util.c
      
      The assertion is in xfs_getbmap() funciton:
      
        if (map[i].br_startblock == DELAYSTARTBLOCK &&
      -->   map[i].br_startoff <= XFS_B_TO_FSB(mp, XFS_ISIZE(ip)))
                ASSERT((iflags & BMV_IF_DELALLOC) != 0);
      
      When map[i].br_startoff == XFS_B_TO_FSB(mp, XFS_ISIZE(ip)), the
      startoff is just at EOF. But we only need to make sure delalloc
      extents that are within EOF, not include EOF.
      Signed-off-by: default avatarZorro Lang <zlang@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0d600519
    • Brian Foster's avatar
      xfs: fix indlen accounting error on partial delalloc conversion · 6e12db5d
      Brian Foster authored
      commit 0daaecac upstream.
      
      The delalloc -> real block conversion path uses an incorrect
      calculation in the case where the middle part of a delalloc extent
      is being converted. This is documented as a rare situation because
      XFS generally attempts to maximize contiguity by converting as much
      of a delalloc extent as possible.
      
      If this situation does occur, the indlen reservation for the two new
      delalloc extents left behind by the conversion of the middle range
      is calculated and compared with the original reservation. If more
      blocks are required, the delta is allocated from the global block
      pool. This delta value can be characterized as the difference
      between the new total requirement (temp + temp2) and the currently
      available reservation minus those blocks that have already been
      allocated (startblockval(PREV.br_startblock) - allocated).
      
      The problem is that the current code does not account for previously
      allocated blocks correctly. It subtracts the current allocation
      count from the (new - old) delta rather than the old indlen
      reservation. This means that more indlen blocks than have been
      allocated end up stashed in the remaining extents and free space
      accounting is broken as a result.
      
      Fix up the calculation to subtract the allocated block count from
      the original extent indlen and thus correctly allocate the
      reservation delta based on the difference between the new total
      requirement and the unused blocks from the original reservation.
      Also remove a bogus assert that contradicts the fact that the new
      indlen reservation can be larger than the original indlen
      reservation.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6e12db5d
    • Brian Foster's avatar
      xfs: fix up quotacheck buffer list error handling · 04c28605
      Brian Foster authored
      commit 20e8a063 upstream.
      
      The quotacheck error handling of the delwri buffer list assumes the
      resident buffers are locked and doesn't clear the _XBF_DELWRI_Q flag
      on the buffers that are dequeued. This can lead to assert failures
      on buffer release and possibly other locking problems.
      
      Move this code to a delwri queue cancel helper function to
      encapsulate the logic required to properly release buffers from a
      delwri queue. Update the helper to clear the delwri queue flag and
      call it from quotacheck.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      04c28605
    • Brian Foster's avatar
      xfs: prevent multi-fsb dir readahead from reading random blocks · 955f1151
      Brian Foster authored
      commit cb52ee33 upstream.
      
      Directory block readahead uses a complex iteration mechanism to map
      between high-level directory blocks and underlying physical extents.
      This mechanism attempts to traverse the higher-level dir blocks in a
      manner that handles multi-fsb directory blocks and simultaneously
      maintains a reference to the corresponding physical blocks.
      
      This logic doesn't handle certain (discontiguous) physical extent
      layouts correctly with multi-fsb directory blocks. For example,
      consider the case of a 4k FSB filesystem with a 2 FSB (8k) directory
      block size and a directory with the following extent layout:
      
       EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
         0: [0..7]:          88..95            0 (88..95)             8
         1: [8..15]:         80..87            0 (80..87)             8
         2: [16..39]:        168..191          0 (168..191)          24
         3: [40..63]:        5242952..5242975  1 (72..95)            24
      
      Directory block 0 spans physical extents 0 and 1, dirblk 1 lies
      entirely within extent 2 and dirblk 2 spans extents 2 and 3. Because
      extent 2 is larger than the directory block size, the readahead code
      erroneously assumes the block is contiguous and issues a readahead
      based on the physical mapping of the first fsb of the dirblk. This
      results in read verifier failure and a spurious corruption or crc
      failure, depending on the filesystem format.
      
      Further, the subsequent readahead code responsible for walking
      through the physical table doesn't correctly advance the physical
      block reference for dirblk 2. Instead of advancing two physical
      filesystem blocks, the first iteration of the loop advances 1 block
      (correctly), but the subsequent iteration advances 2 more physical
      blocks because the next physical extent (extent 3, above) happens to
      cover more than dirblk 2. At this point, the higher-level directory
      block walking is completely off the rails of the actual physical
      layout of the directory for the respective mapping table.
      
      Update the contiguous dirblock logic to consider the current offset
      in the physical extent to avoid issuing directory readahead to
      unrelated blocks. Also, update the mapping table advancing code to
      consider the current offset within the current dirblock to avoid
      advancing the mapping reference too far beyond the dirblock.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      955f1151
    • Eric Sandeen's avatar
      xfs: handle array index overrun in xfs_dir2_leaf_readbuf() · c4d3116e
      Eric Sandeen authored
      commit 023cc840 upstream.
      
      Carlos had a case where "find" seemed to start spinning
      forever and never return.
      
      This was on a filesystem with non-default multi-fsb (8k)
      directory blocks, and a fragmented directory with extents
      like this:
      
      0:[0,133646,2,0]
      1:[2,195888,1,0]
      2:[3,195890,1,0]
      3:[4,195892,1,0]
      4:[5,195894,1,0]
      5:[6,195896,1,0]
      6:[7,195898,1,0]
      7:[8,195900,1,0]
      8:[9,195902,1,0]
      9:[10,195908,1,0]
      10:[11,195910,1,0]
      11:[12,195912,1,0]
      12:[13,195914,1,0]
      ...
      
      i.e. the first extent is a contiguous 2-fsb dir block, but
      after that it is fragmented into 1 block extents.
      
      At the top of the readdir path, we allocate a mapping array
      which (for this filesystem geometry) can hold 10 extents; see
      the assignment to map_info->map_size.  During readdir, we are
      therefore able to map extents 0 through 9 above into the array
      for readahead purposes.  If we count by 2, we see that the last
      mapped index (9) is the first block of a 2-fsb directory block.
      
      At the end of xfs_dir2_leaf_readbuf() we have 2 loops to fill
      more readahead; the outer loop assumes one full dir block is
      processed each loop iteration, and an inner loop that ensures
      that this is so by advancing to the next extent until a full
      directory block is mapped.
      
      The problem is that this inner loop may step past the last
      extent in the mapping array as it tries to reach the end of
      the directory block.  This will read garbage for the extent
      length, and as a result the loop control variable 'j' may
      become corrupted and never fail the loop conditional.
      
      The number of valid mappings we have in our array is stored
      in map->map_valid, so stop this inner loop based on that limit.
      
      There is an ASSERT at the top of the outer loop for this
      same condition, but we never made it out of the inner loop,
      so the ASSERT never fired.
      
      Huge appreciation for Carlos for debugging and isolating
      the problem.
      Debugged-and-analyzed-by: default avatarCarlos Maiolino <cmaiolino@redhat.com>
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Tested-by: default avatarCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: default avatarCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: default avatarBill O'Donnell <billodo@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c4d3116e
    • Darrick J. Wong's avatar
      xfs: fix over-copying of getbmap parameters from userspace · 8d5d3fb3
      Darrick J. Wong authored
      commit be6324c0 upstream.
      
      In xfs_ioc_getbmap, we should only copy the fields of struct getbmap
      from userspace, or else we end up copying random stack contents into the
      kernel.  struct getbmap is a strict subset of getbmapx, so a partial
      structure copy should work fine.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8d5d3fb3