1. 30 Oct, 2014 14 commits
    • David Matlack's avatar
      kvm: fix potentially corrupt mmio cache · 553de4db
      David Matlack authored
      commit ee3d1570 upstream.
      
      vcpu exits and memslot mutations can run concurrently as long as the
      vcpu does not aquire the slots mutex. Thus it is theoretically possible
      for memslots to change underneath a vcpu that is handling an exit.
      
      If we increment the memslot generation number again after
      synchronize_srcu_expedited(), vcpus can safely cache memslot generation
      without maintaining a single rcu_dereference through an entire vm exit.
      And much of the x86/kvm code does not maintain a single rcu_dereference
      of the current memslots during each exit.
      
      We can prevent the following case:
      
         vcpu (CPU 0)                             | thread (CPU 1)
      --------------------------------------------+--------------------------
      1  vm exit                                  |
      2  srcu_read_unlock(&kvm->srcu)             |
      3  decide to cache something based on       |
           old memslots                           |
      4                                           | change memslots
                                                  | (increments generation)
      5                                           | synchronize_srcu(&kvm->srcu);
      6  retrieve generation # from new memslots  |
      7  tag cache with new memslot generation    |
      8  srcu_read_unlock(&kvm->srcu)             |
      ...                                         |
         <action based on cache occurs even       |
          though the caching decision was based   |
          on the old memslots>                    |
      ...                                         |
         <action *continues* to occur until next  |
          memslot generation change, which may    |
          be never>                               |
                                                  |
      
      By incrementing the generation after synchronizing with kvm->srcu readers,
      we ensure that the generation retrieved in (6) will become invalid soon
      after (8).
      
      Keeping the existing increment is not strictly necessary, but we
      do keep it and just move it for consistency from update_memslots to
      install_new_memslots.  It invalidates old cached MMIOs immediately,
      instead of having to wait for the end of synchronize_srcu_expedited,
      which makes the code more clearly correct in case CPU 1 is preempted
      right after synchronize_srcu() returns.
      
      To avoid halving the generation space in SPTEs, always presume that the
      low bit of the generation is zero when reconstructing a generation number
      out of an SPTE.  This effectively disables MMIO caching in SPTEs during
      the call to synchronize_srcu_expedited.  Using the low bit this way is
      somewhat like a seqcount---where the protected thing is a cache, and
      instead of retrying we can simply punt if we observe the low bit to be 1.
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      Reviewed-by: default avatarXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Reviewed-by: default avatarDavid Matlack <dmatlack@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      553de4db
    • David Matlack's avatar
      kvm: x86: fix stale mmio cache bug · bb15dea0
      David Matlack authored
      commit 56f17dd3 upstream.
      
      The following events can lead to an incorrect KVM_EXIT_MMIO bubbling
      up to userspace:
      
      (1) Guest accesses gpa X without a memory slot. The gfn is cached in
      struct kvm_vcpu_arch (mmio_gfn). On Intel EPT-enabled hosts, KVM sets
      the SPTE write-execute-noread so that future accesses cause
      EPT_MISCONFIGs.
      
      (2) Host userspace creates a memory slot via KVM_SET_USER_MEMORY_REGION
      covering the page just accessed.
      
      (3) Guest attempts to read or write to gpa X again. On Intel, this
      generates an EPT_MISCONFIG. The memory slot generation number that
      was incremented in (2) would normally take care of this but we fast
      path mmio faults through quickly_check_mmio_pf(), which only checks
      the per-vcpu mmio cache. Since we hit the cache, KVM passes a
      KVM_EXIT_MMIO up to userspace.
      
      This patch fixes the issue by using the memslot generation number
      to validate the mmio cache.
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      [xiaoguangrong: adjust the code to make it simpler for stable-tree fix.]
      Signed-off-by: default avatarXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Reviewed-by: default avatarDavid Matlack <dmatlack@google.com>
      Reviewed-by: default avatarXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Tested-by: default avatarDavid Matlack <dmatlack@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bb15dea0
    • Josef Ahmad's avatar
      pci_ids: Add support for Intel Quark ILB · 002b2d79
      Josef Ahmad authored
      commit bb048713 upstream.
      
      This patch adds the PCI id for Intel Quark ILB.
      It will be used for GPIO and Multifunction device driver.
      Signed-off-by: default avatarJosef Ahmad <josef.ahmad@intel.com>
      Acked-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarLee Jones <lee.jones@linaro.org>
      Signed-off-by: default avatarChang Rebecca Swee Fun <rebecca.swee.fun.chang@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      002b2d79
    • Bryan O'Donoghue's avatar
      usb: pch_udc: usb gadget device support for Intel Quark X1000 · aabe5263
      Bryan O'Donoghue authored
      commit a68df706 upstream.
      
      This patch is to enable the USB gadget device for Intel Quark X1000
      Signed-off-by: default avatarBryan O'Donoghue <bryan.odonoghue@intel.com>
      Signed-off-by: default avatarBing Niu <bing.niu@intel.com>
      Signed-off-by: default avatarAlvin (Weike) Chen <alvin.chen@intel.com>
      Signed-off-by: default avatarFelipe Balbi <balbi@ti.com>
      Signed-off-by: default avatarChang Rebecca Swee Fun <rebecca.swee.fun.chang@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aabe5263
    • Andy Lutomirski's avatar
      fs: Add a missing permission check to do_umount · c436c911
      Andy Lutomirski authored
      commit a1480dcc upstream.
      
      Accessing do_remount_sb should require global CAP_SYS_ADMIN, but
      only one of the two call sites was appropriately protected.
      
      Fixes CVE-2014-7975.
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c436c911
    • Sage Weil's avatar
      Btrfs: fix race in WAIT_SYNC ioctl · 7d6d0aa7
      Sage Weil authored
      commit 42383020 upstream.
      
      We check whether transid is already committed via last_trans_committed and
      then search through trans_list for pending transactions.  If
      last_trans_committed is updated by btrfs_commit_transaction after we check
      it (there is no locking), we will fail to find the committed transaction
      and return EINVAL to the caller.  This has been observed occasionally by
      ceph-osd (which uses this ioctl heavily).
      
      Fix by rechecking whether the provided transid <= last_trans_committed
      after the search fails, and if so return 0.
      Signed-off-by: default avatarSage Weil <sage@redhat.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7d6d0aa7
    • Josef Bacik's avatar
      Btrfs: fix build_backref_tree issue with multiple shared blocks · b341d2a4
      Josef Bacik authored
      commit bbe90514 upstream.
      
      Marc Merlin sent me a broken fs image months ago where it would blow up in the
      upper->checked BUG_ON() in build_backref_tree.  This is because we had a
      scenario like this
      
      block a -- level 4 (not shared)
         |
      block b -- level 3 (reloc block, shared)
         |
      block c -- level 2 (not shared)
         |
      block d -- level 1 (shared)
         |
      block e -- level 0 (shared)
      
      We go to build a backref tree for block e, we notice block d is shared and add
      it to the list of blocks to lookup it's backrefs for.  Now when we loop around
      we will check edges for the block, so we will see we looked up block c last
      time.  So we lookup block d and then see that the block that points to it is
      block c and we can just skip that edge since we've already been up this path.
      The problem is because we clear need_check when we see block d (as it is shared)
      we never add block b as needing to be checked.  And because block c is in our
      path already we bail out before we walk up to block b and add it to the backref
      check list.
      
      To fix this we need to reset need_check if we trip over a block that doesn't
      need to be checked.  This will make sure that any subsequent blocks in the path
      as we're walking up afterwards are added to the list to be processed.  With this
      patch I can now mount Marc's fs image and it'll complete the balance without
      panicing.  Thanks,
      Reported-by: default avatarMarc MERLIN <marc@merlins.org>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b341d2a4
    • Josef Bacik's avatar
      Btrfs: cleanup error handling in build_backref_tree · eb7ddab5
      Josef Bacik authored
      commit 75bfb9af upstream.
      
      When balance panics it tends to panic in the
      
      BUG_ON(!upper->checked);
      
      test, because it means it couldn't build the backref tree properly.  This is
      annoying to users and frankly a recoverable error, nothing in this function is
      actually fatal since it is just an in-memory building of the backrefs for a
      given bytenr.  So go through and change all the BUG_ON()'s to ASSERT()'s, and
      fix the BUG_ON(!upper->checked) thing to just return an error.
      
      This patch also fixes the error handling so it tears down the work we've done
      properly.  This code was horribly broken since we always just panic'ed instead
      of actually erroring out, so it needed to be completely re-worked.  With this
      patch my broken image no longer panics when I mount it.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eb7ddab5
    • Josef Bacik's avatar
      Btrfs: try not to ENOSPC on log replay · 581aa18a
      Josef Bacik authored
      commit 1d52c78a upstream.
      
      When doing log replay we may have to update inodes, which traditionally goes
      through our delayed inode stuff.  This will try to move space over from the
      trans handle, but we don't reserve space in our trans handle on replay since we
      don't know how much we will need, so instead we try to flush.  But because we
      have a trans handle open we won't flush anything, so if we are out of reserve
      space we will simply return ENOSPC.  Since we know that if an operation made it
      into the log then we definitely had space before the box bought the farm then we
      don't need to worry about doing this space reservation.  Use the
      fs_info->log_root_recovering flag to skip the delayed inode stuff and update the
      item directly.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      581aa18a
    • Josef Bacik's avatar
      Btrfs: don't do async reclaim during log replay · 206b329c
      Josef Bacik authored
      commit f6acfd50 upstream.
      
      Trying to reproduce a log enospc bug I hit a panic in the async reclaim code
      during log replay.  This is because we use fs_info->fs_root as our root for
      shrinking and such.  Technically we can use whatever root we want, but let's
      just not allow async reclaim while we're doing log replay.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      206b329c
    • Liu Bo's avatar
      Btrfs: fix up bounds checking in lseek · aee22378
      Liu Bo authored
      commit 4d1a40c6 upstream.
      
      An user reported this, it is because that lseek's SEEK_SET/SEEK_CUR/SEEK_END
      allow a negative value for @offset, but btrfs's SEEK_DATA/SEEK_HOLE don't
      prepare for that and convert the negative @offset into unsigned type,
      so we get (end < start) warning.
      
      [ 1269.835374] ------------[ cut here ]------------
      [ 1269.836809] WARNING: CPU: 0 PID: 1241 at fs/btrfs/extent_io.c:430 insert_state+0x11d/0x140()
      [ 1269.838816] BTRFS: end < start 4094 18446744073709551615
      [ 1269.840334] CPU: 0 PID: 1241 Comm: a.out Tainted: G        W      3.16.0+ #306
      [ 1269.858229] Call Trace:
      [ 1269.858612]  [<ffffffff81801a69>] dump_stack+0x4e/0x68
      [ 1269.858952]  [<ffffffff8107894c>] warn_slowpath_common+0x8c/0xc0
      [ 1269.859416]  [<ffffffff81078a36>] warn_slowpath_fmt+0x46/0x50
      [ 1269.859929]  [<ffffffff813b0fbd>] insert_state+0x11d/0x140
      [ 1269.860409]  [<ffffffff813b1396>] __set_extent_bit+0x3b6/0x4e0
      [ 1269.860805]  [<ffffffff813b21c7>] lock_extent_bits+0x87/0x200
      [ 1269.861697]  [<ffffffff813a5b28>] btrfs_file_llseek+0x148/0x2a0
      [ 1269.862168]  [<ffffffff811f201e>] SyS_lseek+0xae/0xc0
      [ 1269.862620]  [<ffffffff8180b212>] system_call_fastpath+0x16/0x1b
      [ 1269.862970] ---[ end trace 4d33ea885832054b ]---
      
      This assumes that btrfs starts finding DATA/HOLE from the beginning of file
      if the assigned @offset is negative.
      
      Also we add alignment for lock_extent_bits 's range.
      Reported-by: default avatarToralf Förster <toralf.foerster@gmx.de>
      Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aee22378
    • Filipe Manana's avatar
      Btrfs: add missing compression property remove in btrfs_ioctl_setflags · 31b89b44
      Filipe Manana authored
      commit 78a017a2 upstream.
      
      The behaviour of a 'chattr -c' consists of getting the current flags,
      clearing the FS_COMPR_FL bit and then sending the result to the set
      flags ioctl - this means the bit FS_NOCOMP_FL isn't set in the flags
      passed to the ioctl. This results in the compression property not being
      cleared from the inode - it was cleared only if the bit FS_NOCOMP_FL
      was set in the received flags.
      
      Reproducer:
      
          $ mkfs.btrfs -f /dev/sdd
          $ mount /dev/sdd /mnt && cd /mnt
          $ mkdir a
          $ chattr +c a
          $ touch a/file
          $ lsattr a/file
          --------c------- a/file
          $ chattr -c a
          $ touch a/file2
          $ lsattr a/file2
          --------c------- a/file2
          $ lsattr -d a
          ---------------- a
      Reported-by: default avatarAndreas Schneider <asn@cryptomilk.org>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      31b89b44
    • Qu Wenruo's avatar
      btrfs: Fix a deadlock in btrfs_dev_replace_finishing() · 88579aa5
      Qu Wenruo authored
      commit 12b894cb upstream.
      
      btrfs-transacion:5657
      [stack snip]
      btrfs_bio_map()
          btrfs_bio_counter_inc_blocked()
              percpu_counter_inc(&fs_info->bio_counter)  ###bio_counter > 0(A)
              __btrfs_bio_map()
                  btrfs_dev_replace_lock()
                      mutex_lock(dev_replace->lock)	   ###wait mutex(B)
      
      btrfs:32612
      [stack snip]
      btrfs_dev_replace_start()
          btrfs_dev_replace_lock()
      	mutex_lock(dev_replace->lock)		   ###hold mutex(B)
          btrfs_dev_replace_finishing()
              btrfs_rm_dev_replace_blocked()
                  wait until percpu_counter_sum == 0	   ###wait on bio_counter(A)
      
      This bug can be triggered quite easily by the following test script:
      http://pastebin.com/MQmb37Cy
      
      This patch will fix the ABBA problem by calling
      btrfs_dev_replace_unlock() before btrfs_rm_dev_replace_blocked().
      
      The consistency of btrfs devices list and their superblocks is protected
      by device_list_mutex, not btrfs_dev_replace_lock/unlock().
      So it is safe the move btrfs_dev_replace_unlock() before
      btrfs_rm_dev_replace_blocked().
      Reported-by: default avatarZhao Lei <zhaolei@cn.fujitsu.com>
      Signed-off-by: default avatarQu Wenruo <quwenruo@cn.fujitsu.com>
      Cc: Stefan Behrens <sbehrens@giantdisaster.de>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      88579aa5
    • David Sterba's avatar
      btrfs: wake up transaction thread from SYNC_FS ioctl · 4abbb927
      David Sterba authored
      commit 2fad4e83 upstream.
      
      The transaction thread may want to do more work, namely it pokes the
      cleaner ktread that will start processing uncleaned subvols.
      
      This can be triggered by user via the 'btrfs fi sync' command, otherwise
      there was a delay up to 30 seconds before the cleaner started to clean
      old snapshots.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4abbb927
  2. 15 Oct, 2014 26 commits