1. 30 May, 2018 22 commits
    • Matthew Wilcox's avatar
      idr: fix invalid ptr dereference on item delete · 0472f94c
      Matthew Wilcox authored
      commit 7a4deea1 upstream.
      
      If the radix tree underlying the IDR happens to be full and we attempt
      to remove an id which is larger than any id in the IDR, we will call
      __radix_tree_delete() with an uninitialised 'slot' pointer, at which
      point anything could happen.  This was easiest to hit with a single
      entry at id 0 and attempting to remove a non-0 id, but it could have
      happened with 64 entries and attempting to remove an id >= 64.
      
      Roman said:
      
        The syzcaller test boils down to opening /dev/kvm, creating an
        eventfd, and calling a couple of KVM ioctls. None of this requires
        superuser. And the result is dereferencing an uninitialized pointer
        which is likely a crash. The specific path caught by syzbot is via
        KVM_HYPERV_EVENTD ioctl which is new in 4.17. But I guess there are
        other user-triggerable paths, so cc:stable is probably justified.
      
      Matthew added:
      
        We have around 250 calls to idr_remove() in the kernel today. Many of
        them pass an ID which is embedded in the object they're removing, so
        they're safe. Picking a few likely candidates:
      
        drivers/firewire/core-cdev.c looks unsafe; the ID comes from an ioctl.
        drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c is similar
        drivers/atm/nicstar.c could be taken down by a handcrafted packet
      
      Link: http://lkml.kernel.org/r/20180518175025.GD6361@bombadil.infradead.org
      Fixes: 0a835c4f ("Reimplement IDR and IDA using the radix tree")
      Reported-by: <syzbot+35666cba7f0a337e2e79@syzkaller.appspotmail.com>
      Debugged-by: default avatarRoman Kagan <rkagan@virtuozzo.com>
      Signed-off-by: default avatarMatthew Wilcox <mawilcox@microsoft.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0472f94c
    • Jens Axboe's avatar
      sr: pass down correctly sized SCSI sense buffer · 2a039b93
      Jens Axboe authored
      commit f7068114 upstream.
      
      We're casting the CDROM layer request_sense to the SCSI sense
      buffer, but the former is 64 bytes and the latter is 96 bytes.
      As we generally allocate these on the stack, we end up blowing
      up the stack.
      
      Fix this by wrapping the scsi_execute() call with a properly
      sized sense buffer, and copying back the bits for the CDROM
      layer.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarPiotr Gabriel Kosinski <pg.kosinski@gmail.com>
      Reported-by: default avatarDaniel Shapira <daniel@twistlock.com>
      Tested-by: default avatarKees Cook <keescook@chromium.org>
      Fixes: 82ed4db4 ("block: split scsi_request out of struct request")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2a039b93
    • Lidong Chen's avatar
      IB/umem: Use the correct mm during ib_umem_release · a59bd819
      Lidong Chen authored
      commit 8e907ed4 upstream.
      
      User-space may invoke ibv_reg_mr and ibv_dereg_mr in different threads.
      
      If ibv_dereg_mr is called after the thread which invoked ibv_reg_mr has
      exited, get_pid_task will return NULL and ib_umem_release will not
      decrease mm->pinned_vm.
      
      Instead of using threads to locate the mm, use the overall tgid from the
      ib_ucontext struct instead. This matches the behavior of ODP and
      disassociate in handling the mm of the process that called ibv_reg_mr.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 87773dd5 ("IB: ib_umem_release() should decrement mm->pinned_vm from ib_umem_get")
      Signed-off-by: default avatarLidong Chen <lidongchen@tencent.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a59bd819
    • Michael J. Ruhl's avatar
      IB/hfi1: Use after free race condition in send context error path · 7a5b3b91
      Michael J. Ruhl authored
      commit f9e76ca3 upstream.
      
      A pio send egress error can occur when the PSM library attempts to
      to send a bad packet.  That issue is still being investigated.
      
      The pio error interrupt handler then attempts to progress the recovery
      of the errored pio send context.
      
      Code inspection reveals that the handling lacks the necessary locking
      if that recovery interleaves with a PSM close of the "context" object
      contains the pio send context.
      
      The lack of the locking can cause the recovery to access the already
      freed pio send context object and incorrectly deduce that the pio
      send context is actually a kernel pio send context as shown by the
      NULL deref stack below:
      
      [<ffffffff8143d78c>] _dev_info+0x6c/0x90
      [<ffffffffc0613230>] sc_restart+0x70/0x1f0 [hfi1]
      [<ffffffff816ab124>] ? __schedule+0x424/0x9b0
      [<ffffffffc06133c5>] sc_halted+0x15/0x20 [hfi1]
      [<ffffffff810aa3ba>] process_one_work+0x17a/0x440
      [<ffffffff810ab086>] worker_thread+0x126/0x3c0
      [<ffffffff810aaf60>] ? manage_workers.isra.24+0x2a0/0x2a0
      [<ffffffff810b252f>] kthread+0xcf/0xe0
      [<ffffffff810b2460>] ? insert_kthread_work+0x40/0x40
      [<ffffffff816b8798>] ret_from_fork+0x58/0x90
      [<ffffffff810b2460>] ? insert_kthread_work+0x40/0x40
      
      This is the best case scenario and other scenarios can corrupt the
      already freed memory.
      
      Fix by adding the necessary locking in the pio send context error
      handler.
      
      Cc: <stable@vger.kernel.org> # 4.9.x
      Reviewed-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Reviewed-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: default avatarMichael J. Ruhl <michael.j.ruhl@intel.com>
      Signed-off-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7a5b3b91
    • Michael Neuling's avatar
      powerpc/64s: Clear PCR on boot · df07f271
      Michael Neuling authored
      commit faf37c44 upstream.
      
      Clear the PCR (Processor Compatibility Register) on boot to ensure we
      are not running in a compatibility mode.
      
      We've seen this cause problems when a crash (and kdump) occurs while
      running compat mode guests. The kdump kernel then runs with the PCR
      set and causes problems. The symptom in the kdump kernel (also seen in
      petitboot after fast-reboot) is early userspace programs taking
      sigills on newer instructions (seen in libc).
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      df07f271
    • Will Deacon's avatar
      arm64: lse: Add early clobbers to some input/output asm operands · 92169a01
      Will Deacon authored
      commit 32c3fa7c upstream.
      
      For LSE atomics that read and write a register operand, we need to
      ensure that these operands are annotated as "early clobber" if the
      register is written before all of the input operands have been consumed.
      Failure to do so can result in the compiler allocating the same register
      to both operands, leading to splats such as:
      
       Unable to handle kernel paging request at virtual address 11111122222221
       [...]
       x1 : 1111111122222222 x0 : 1111111122222221
       Process swapper/0 (pid: 1, stack limit = 0x000000008209f908)
       Call trace:
        test_atomic64+0x1360/0x155c
      
      where x0 has been allocated as both the value to be stored and also the
      atomic_t pointer.
      
      This patch adds the missing clobbers.
      
      Cc: <stable@vger.kernel.org>
      Cc: Dave Martin <dave.martin@arm.com>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Reported-by: default avatarMark Salter <msalter@redhat.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      92169a01
    • Thomas Hellstrom's avatar
      drm/vmwgfx: Fix 32-bit VMW_PORT_HB_[IN|OUT] macros · 760e4d7e
      Thomas Hellstrom authored
      commit 938ae725 upstream.
      
      Depending on whether the kernel is compiled with frame-pointer or not,
      the temporary memory location used for the bp parameter in these macros
      is referenced relative to the stack pointer or the frame pointer.
      Hence we can never reference that parameter when we've modified either
      the stack pointer or the frame pointer, because then the compiler would
      generate an incorrect stack reference.
      
      Fix this by pushing the temporary memory parameter on a known location on
      the stack before modifying the stack- and frame pointers.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarThomas Hellstrom <thellstrom@vmware.com>
      Reviewed-by: default avatarBrian Paul <brianp@vmware.com>
      Reviewed-by: default avatarSinclair Yeh <syeh@vmware.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      760e4d7e
    • Joe Jin's avatar
      xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent · a0f8cbce
      Joe Jin authored
      commit 4855c92d upstream.
      
      When run raidconfig from Dom0 we found that the Xen DMA heap is reduced,
      but Dom Heap is increased by the same size. Tracing raidconfig we found
      that the related ioctl() in megaraid_sas will call dma_alloc_coherent()
      to apply memory. If the memory allocated by Dom0 is not in the DMA area,
      it will exchange memory with Xen to meet the requiment. Later drivers
      call dma_free_coherent() to free the memory, on xen_swiotlb_free_coherent()
      the check condition (dev_addr + size - 1 <= dma_mask) is always false,
      it prevents calling xen_destroy_contiguous_region() to return the memory
      to the Xen DMA heap.
      
      This issue introduced by commit 6810df88 "xen-swiotlb: When doing
      coherent alloc/dealloc check before swizzling the MFNs.".
      Signed-off-by: default avatarJoe Jin <joe.jin@oracle.com>
      Tested-by: default avatarJohn Sobecki <john.sobecki@oracle.com>
      Reviewed-by: default avatarRzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a0f8cbce
    • Sudip Mukherjee's avatar
      libata: blacklist Micron 500IT SSD with MU01 firmware · 4182f5a0
      Sudip Mukherjee authored
      commit 136d769e upstream.
      
      While whitelisting Micron M500DC drives, the tweaked blacklist entry
      enabled queued TRIM from M500IT variants also. But these do not support
      queued TRIM. And while using those SSDs with the latest kernel we have
      seen errors and even the partition table getting corrupted.
      
      Some part from the dmesg:
      [    6.727384] ata1.00: ATA-9: Micron_M500IT_MTFDDAK060MBD, MU01, max UDMA/133
      [    6.727390] ata1.00: 117231408 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
      [    6.741026] ata1.00: supports DRM functions and may not be fully accessible
      [    6.759887] ata1.00: configured for UDMA/133
      [    6.762256] scsi 0:0:0:0: Direct-Access     ATA      Micron_M500IT_MT MU01 PQ: 0 ANSI: 5
      
      and then for the error:
      [  120.860334] ata1.00: exception Emask 0x1 SAct 0x7ffc0007 SErr 0x0 action 0x6 frozen
      [  120.860338] ata1.00: irq_stat 0x40000008
      [  120.860342] ata1.00: failed command: SEND FPDMA QUEUED
      [  120.860351] ata1.00: cmd 64/01:00:00:00:00/00:00:00:00:00/a0 tag 0 ncq dma 512 out
               res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x5 (timeout)
      [  120.860353] ata1.00: status: { DRDY }
      [  120.860543] ata1: hard resetting link
      [  121.166128] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
      [  121.166376] ata1.00: supports DRM functions and may not be fully accessible
      [  121.186238] ata1.00: supports DRM functions and may not be fully accessible
      [  121.204445] ata1.00: configured for UDMA/133
      [  121.204454] ata1.00: device reported invalid CHS sector 0
      [  121.204541] sd 0:0:0:0: [sda] tag#18 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
      [  121.204546] sd 0:0:0:0: [sda] tag#18 Sense Key : 0x5 [current]
      [  121.204550] sd 0:0:0:0: [sda] tag#18 ASC=0x21 ASCQ=0x4
      [  121.204555] sd 0:0:0:0: [sda] tag#18 CDB: opcode=0x93 93 08 00 00 00 00 00 04 28 80 00 00 00 30 00 00
      [  121.204559] print_req_error: I/O error, dev sda, sector 272512
      
      After few reboots with these errors, and the SSD is corrupted.
      After blacklisting it, the errors are not seen and the SSD does not get
      corrupted any more.
      
      Fixes: 243918be ("libata: Do not blacklist Micron M500DC")
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSudip Mukherjee <sudipm.mukherjee@gmail.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4182f5a0
    • Tejun Heo's avatar
      libata: Blacklist some Sandisk SSDs for NCQ · 21712abb
      Tejun Heo authored
      commit 322579dc upstream.
      
      Sandisk SSDs SD7SN6S256G and SD8SN8U256G are regularly locking up
      regularly under sustained moderate load with NCQ enabled.  Blacklist
      for now.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarDave Jones <davej@codemonkey.org.uk>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      21712abb
    • Corneliu Doban's avatar
      mmc: sdhci-iproc: add SDHCI_QUIRK2_HOST_OFF_CARD_ON for cygnus · f2a3c8bb
      Corneliu Doban authored
      commit 3de06d5a upstream.
      
      The SDHCI_QUIRK2_HOST_OFF_CARD_ON is needed for the driver to
      properly reset the host controller (reset all) on initialization
      after exiting deep sleep.
      Signed-off-by: default avatarCorneliu Doban <corneliu.doban@broadcom.com>
      Signed-off-by: default avatarScott Branden <scott.branden@broadcom.com>
      Reviewed-by: default avatarRay Jui <ray.jui@broadcom.com>
      Reviewed-by: default avatarSrinath Mannam <srinath.mannam@broadcom.com>
      Fixes: c833e92b ("mmc: sdhci-iproc: support standard byte register accesses")
      Cc: stable@vger.kernel.org # v4.10+
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f2a3c8bb
    • Corneliu Doban's avatar
      mmc: sdhci-iproc: fix 32bit writes for TRANSFER_MODE register · 4da8f20a
      Corneliu Doban authored
      commit 5f651b87 upstream.
      
      When the host controller accepts only 32bit writes, the value of the
      16bit TRANSFER_MODE register, that has the same 32bit address as the
      16bit COMMAND register, needs to be saved and it will be written
      in a 32bit write together with the command as this will trigger the
      host to send the command on the SD interface.
      When sending the tuning command, TRANSFER_MODE is written and then
      sdhci_set_transfer_mode reads it back to clear AUTO_CMD12 bit and
      write it again resulting in wrong value to be written because the
      initial write value was saved in a shadow and the read-back returned
      a wrong value, from the register.
      Fix sdhci_iproc_readw to return the saved value of TRANSFER_MODE
      when a saved value exist.
      Same fix for read of BLOCK_SIZE and BLOCK_COUNT registers, that are
      saved for a different reason, although a scenario that will cause the
      mentioned problem on this registers is not probable.
      
      Fixes: b580c52d ("mmc: sdhci-iproc: add IPROC SDHCI driver")
      Signed-off-by: default avatarCorneliu Doban <corneliu.doban@broadcom.com>
      Signed-off-by: default avatarScott Branden <scott.branden@broadcom.com>
      Cc: stable@vger.kernel.org # v4.1+
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4da8f20a
    • Srinath Mannam's avatar
      mmc: sdhci-iproc: remove hard coded mmc cap 1.8v · ebedf0b2
      Srinath Mannam authored
      commit 4c94238f upstream.
      
      Remove hard coded mmc cap 1.8v from platform data as it is board specific.
      The 1.8v DDR mmc caps can be enabled using DTS property for those
      boards that support it.
      
      Fixes: b17b4ab8 ("mmc: sdhci-iproc: define MMC caps in platform data")
      Signed-off-by: default avatarSrinath Mannam <srinath.mannam@broadcom.com>
      Signed-off-by: default avatarScott Branden <scott.branden@broadcom.com>
      Reviewed-by: default avatarRay Jui <ray.jui@broadcom.com>
      Cc: stable@vger.kernel.org # v4.8+
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ebedf0b2
    • Al Viro's avatar
      do d_instantiate/unlock_new_inode combinations safely · f440ea85
      Al Viro authored
      commit 1e2e547a upstream.
      
      For anything NFS-exported we do _not_ want to unlock new inode
      before it has grown an alias; original set of fixes got the
      ordering right, but missed the nasty complication in case of
      lockdep being enabled - unlock_new_inode() does
      	lockdep_annotate_inode_mutex_key(inode)
      which can only be done before anyone gets a chance to touch
      ->i_mutex.  Unfortunately, flipping the order and doing
      unlock_new_inode() before d_instantiate() opens a window when
      mkdir can race with open-by-fhandle on a guessed fhandle, leading
      to multiple aliases for a directory inode and all the breakage
      that follows from that.
      
      	Correct solution: a new primitive (d_instantiate_new())
      combining these two in the right order - lockdep annotate, then
      d_instantiate(), then the rest of unlock_new_inode().  All
      combinations of d_instantiate() with unlock_new_inode() should
      be converted to that.
      
      Cc: stable@kernel.org	# 2.6.29 and later
      Tested-by: default avatarMike Marshall <hubcap@omnibond.com>
      Reviewed-by: default avatarAndreas Dilger <adilger@dilger.ca>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f440ea85
    • Ben Hutchings's avatar
      ALSA: timer: Fix pause event notification · ba3fbb7a
      Ben Hutchings authored
      commit 3ae18097 upstream.
      
      Commit f65e0d29 ("ALSA: timer: Call notifier in the same spinlock")
      combined the start/continue and stop/pause functions, and in doing so
      changed the event code for the pause case to SNDRV_TIMER_EVENT_CONTINUE.
      Change it back to SNDRV_TIMER_EVENT_PAUSE.
      
      Fixes: f65e0d29 ("ALSA: timer: Call notifier in the same spinlock")
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@codethink.co.uk>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ba3fbb7a
    • Al Viro's avatar
      aio: fix io_destroy(2) vs. lookup_ioctx() race · fbcede36
      Al Viro authored
      commit baf10564 upstream.
      
      kill_ioctx() used to have an explicit RCU delay between removing the
      reference from ->ioctx_table and percpu_ref_kill() dropping the refcount.
      At some point that delay had been removed, on the theory that
      percpu_ref_kill() itself contained an RCU delay.  Unfortunately, that was
      the wrong kind of RCU delay and it didn't care about rcu_read_lock() used
      by lookup_ioctx().  As the result, we could get ctx freed right under
      lookup_ioctx().  Tejun has fixed that in a6d7cff4 ("fs/aio: Add explicit
      RCU grace period when freeing kioctx"); however, that fix is not enough.
      
      Suppose io_destroy() from one thread races with e.g. io_setup() from another;
      CPU1 removes the reference from current->mm->ioctx_table[...] just as CPU2
      has picked it (under rcu_read_lock()).  Then CPU1 proceeds to drop the
      refcount, getting it to 0 and triggering a call of free_ioctx_users(),
      which proceeds to drop the secondary refcount and once that reaches zero
      calls free_ioctx_reqs().  That does
              INIT_RCU_WORK(&ctx->free_rwork, free_ioctx);
              queue_rcu_work(system_wq, &ctx->free_rwork);
      and schedules freeing the whole thing after RCU delay.
      
      In the meanwhile CPU2 has gotten around to percpu_ref_get(), bumping the
      refcount from 0 to 1 and returned the reference to io_setup().
      
      Tejun's fix (that queue_rcu_work() in there) guarantees that ctx won't get
      freed until after percpu_ref_get().  Sure, we'd increment the counter before
      ctx can be freed.  Now we are out of rcu_read_lock() and there's nothing to
      stop freeing of the whole thing.  Unfortunately, CPU2 assumes that since it
      has grabbed the reference, ctx is *NOT* going away until it gets around to
      dropping that reference.
      
      The fix is obvious - use percpu_ref_tryget_live() and treat failure as miss.
      It's not costlier than what we currently do in normal case, it's safe to
      call since freeing *is* delayed and it closes the race window - either
      lookup_ioctx() comes before percpu_ref_kill() (in which case ctx->users
      won't reach 0 until the caller of lookup_ioctx() drops it) or lookup_ioctx()
      fails, ctx->users is unaffected and caller of lookup_ioctx() doesn't see
      the object in question at all.
      
      Cc: stable@kernel.org
      Fixes: a6d7cff4 "fs/aio: Add explicit RCU grace period when freeing kioctx"
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fbcede36
    • Dave Chinner's avatar
      fs: don't scan the inode cache before SB_BORN is set · b9659ff3
      Dave Chinner authored
      commit 79f546a6 upstream.
      
      We recently had an oops reported on a 4.14 kernel in
      xfs_reclaim_inodes_count() where sb->s_fs_info pointed to garbage
      and so the m_perag_tree lookup walked into lala land.  It produces
      an oops down this path during the failed mount:
      
        radix_tree_gang_lookup_tag+0xc4/0x130
        xfs_perag_get_tag+0x37/0xf0
        xfs_reclaim_inodes_count+0x32/0x40
        xfs_fs_nr_cached_objects+0x11/0x20
        super_cache_count+0x35/0xc0
        shrink_slab.part.66+0xb1/0x370
        shrink_node+0x7e/0x1a0
        try_to_free_pages+0x199/0x470
        __alloc_pages_slowpath+0x3a1/0xd20
        __alloc_pages_nodemask+0x1c3/0x200
        cache_grow_begin+0x20b/0x2e0
        fallback_alloc+0x160/0x200
        kmem_cache_alloc+0x111/0x4e0
      
      The problem is that the superblock shrinker is running before the
      filesystem structures it depends on have been fully set up. i.e.
      the shrinker is registered in sget(), before ->fill_super() has been
      called, and the shrinker can call into the filesystem before
      fill_super() does it's setup work. Essentially we are exposed to
      both use-after-free and use-before-initialisation bugs here.
      
      To fix this, add a check for the SB_BORN flag in super_cache_count.
      In general, this flag is not set until ->fs_mount() completes
      successfully, so we know that it is set after the filesystem
      setup has completed. This matches the trylock_super() behaviour
      which will not let super_cache_scan() run if SB_BORN is not set, and
      hence will not allow the superblock shrinker from entering the
      filesystem while it is being set up or after it has failed setup
      and is being torn down.
      
      Cc: stable@kernel.org
      Signed-Off-By: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b9659ff3
    • Al Viro's avatar
      affs_lookup(): close a race with affs_remove_link() · 1e5edf32
      Al Viro authored
      commit 30da870c upstream.
      
      we unlock the directory hash too early - if we are looking at secondary
      link and primary (in another directory) gets removed just as we unlock,
      we could have the old primary moved in place of the secondary, leaving
      us to look into freed entry (and leaving our dentry with ->d_fsdata
      pointing to a freed entry).
      
      Cc: stable@vger.kernel.org # 2.4.4+
      Acked-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1e5edf32
    • Colin Ian King's avatar
      KVM: Fix spelling mistake: "cop_unsuable" -> "cop_unusable" · 2871a701
      Colin Ian King authored
      commit ba3696e9 upstream.
      
      Trivial fix to spelling mistake in debugfs_entries text.
      
      Fixes: 669e846e ("KVM/MIPS32: MIPS arch specific APIs for KVM")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kernel-janitors@vger.kernel.org
      Cc: <stable@vger.kernel.org> # 3.10+
      Signed-off-by: default avatarJames Hogan <jhogan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2871a701
    • Maciej W. Rozycki's avatar
      MIPS: Fix ptrace(2) PTRACE_PEEKUSR and PTRACE_POKEUSR accesses to o32 FGRs · bba75a0c
      Maciej W. Rozycki authored
      commit 9a3a92cc upstream.
      
      Check the TIF_32BIT_FPREGS task setting of the tracee rather than the
      tracer in determining the layout of floating-point general registers in
      the floating-point context, correcting access to odd-numbered registers
      for o32 tracees where the setting disagrees between the two processes.
      
      Fixes: 597ce172 ("MIPS: Support for 64-bit FP with O32 binaries")
      Signed-off-by: default avatarMaciej W. Rozycki <macro@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: <stable@vger.kernel.org> # 3.14+
      Signed-off-by: default avatarJames Hogan <jhogan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bba75a0c
    • Maciej W. Rozycki's avatar
      MIPS: ptrace: Expose FIR register through FP regset · 769fc447
      Maciej W. Rozycki authored
      commit 71e909c0 upstream.
      
      Correct commit 7aeb753b ("MIPS: Implement task_user_regset_view.")
      and expose the FIR register using the unused 4 bytes at the end of the
      NT_PRFPREG regset.  Without that register included clients cannot use
      the PTRACE_GETREGSET request to retrieve the complete FPU register set
      and have to resort to one of the older interfaces, either PTRACE_PEEKUSR
      or PTRACE_GETFPREGS, to retrieve the missing piece of data.  Also the
      register is irreversibly missing from core dumps.
      
      This register is architecturally hardwired and read-only so the write
      path does not matter.  Ignore data supplied on writes then.
      
      Fixes: 7aeb753b ("MIPS: Implement task_user_regset_view.")
      Signed-off-by: default avatarJames Hogan <jhogan@kernel.org>
      Signed-off-by: default avatarMaciej W. Rozycki <macro@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: <stable@vger.kernel.org> # 3.13+
      Patchwork: https://patchwork.linux-mips.org/patch/19273/Signed-off-by: default avatarJames Hogan <jhogan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      769fc447
    • NeilBrown's avatar
      MIPS: c-r4k: Fix data corruption related to cache coherence · 368b7085
      NeilBrown authored
      commit 55a2aa08 upstream.
      
      When DMA will be performed to a MIPS32 1004K CPS, the L1-cache for the
      range needs to be flushed and invalidated first.
      The code currently takes one of two approaches.
      1/ If the range is less than the size of the dcache, then HIT type
         requests flush/invalidate cache lines for the particular addresses.
         HIT-type requests a globalised by the CPS so this is safe on SMP.
      
      2/ If the range is larger than the size of dcache, then INDEX type
         requests flush/invalidate the whole cache. INDEX type requests affect
         the local cache only. CPS does not propagate them in any way. So this
         invalidation is not safe on SMP CPS systems.
      
      Data corruption due to '2' can quite easily be demonstrated by
      repeatedly "echo 3 > /proc/sys/vm/drop_caches" and then sha1sum a file
      that is several times the size of available memory. Dropping caches
      means that large contiguous extents (large than dcache) are more likely.
      
      This was not a problem before Linux-4.8 because option 2 was never used
      if CONFIG_MIPS_CPS was defined. The commit which removed that apparently
      didn't appreciate the full consequence of the change.
      
      We could, in theory, globalize the INDEX based flush by sending an IPI
      to other cores. These cache invalidation routines can be called with
      interrupts disabled and synchronous IPI require interrupts to be
      enabled. Asynchronous IPI may not trigger writeback soon enough. So we
      cannot use IPI in practice.
      
      We can already test if IPI would be needed for an INDEX operation with
      r4k_op_needs_ipi(R4K_INDEX). If this is true then we mustn't try the
      INDEX approach as we cannot use IPI. If this is false (e.g. when there
      is only one core and hence one L1 cache) then it is safe to use the
      INDEX approach without IPI.
      
      This patch avoids options 2 if r4k_op_needs_ipi(R4K_INDEX), and so
      eliminates the corruption.
      
      Fixes: c00ab489 ("MIPS: Remove cpu_has_safe_index_cacheops")
      Signed-off-by: default avatarNeilBrown <neil@brown.name>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: linux-mips@linux-mips.org
      Cc: <stable@vger.kernel.org> # 4.8+
      Patchwork: https://patchwork.linux-mips.org/patch/19259/Signed-off-by: default avatarJames Hogan <jhogan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      368b7085
  2. 25 May, 2018 18 commits