1. 13 Dec, 2013 14 commits
  2. 12 Dec, 2013 26 commits
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 54fb723c
      Linus Torvalds authored
      Pull kvm fixes from Paolo Bonzini:
       "Four security fixes for KVM on x86.  Thanks to Andrew Honig and Lars
        Bull from Google for reporting them"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)
        KVM: x86: Convert vapic synchronization to _cached functions (CVE-2013-6368)
        KVM: x86: Fix potential divide by 0 in lapic (CVE-2013-6367)
        KVM: Improve create VCPU parameter (CVE-2013-4587)
      54fb723c
    • Linus Torvalds's avatar
      Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · ea1e61cb
      Linus Torvalds authored
      Pull ARM SoC fixes from Olof Johansson:
       "Another week, another batch of fixes.
      
        Again, OMAP regressions due to move to DT is the bulk of the changes
        here, but this should be the last of it for 3.13.  There are also a
        handful of OMAP hwmod changes (power management, reset handling) for
        USB on OMAP3 that fixes some longish-standing bugs around USB resets.
      
        There are a couple of other changes that also add up line count a bit:
        One is a long-standing bug with the keyboard layout on one of the PXA
        platforms.  The other is a fix for highbank that moves their
        power-off/reset button handling to be done in-kernel since relying on
        userspace to handle it was fragile and awkward"
      
      * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
        ARM: sun6i: dt: Fix interrupt trigger types
        ARM: sun7i: dt: Fix interrupt trigger types
        MAINTAINERS: merge IMX6 entry into IMX
        ARM: tegra: add missing break to fuse initialization code
        ARM: pxa: prevent PXA270 occasional reboot freezes
        ARM: pxa: tosa: fix keys mapping
        ARM: OMAP2+: omap_device: add fail hook for runtime_pm when bad data is detected
        ARM: OMAP2+: hwmod: Fix usage of invalid iclk / oclk when clock node is not present
        ARM: OMAP3: hwmod data: Don't prevent RESET of USB Host module
        ARM: OMAP2+: hwmod: Fix SOFTRESET logic
        ARM: OMAP4+: hwmod data: Don't prevent RESET of USB Host module
        ARM: dts: Fix booting for secure omaps
        ARM: OMAP2+: Fix the machine entry for am3517
        ARM: dts: Fix missing entries for am3517
        ARM: OMAP2+: Fix overwriting hwmod data with data from device tree
        ARM: davinci: Fix McASP mem resource names
        ARM: highbank: handle soft poweroff and reset key events
        ARM: davinci: fix number of resources passed to davinci_gpio_register()
        gpio: davinci: fix check for unbanked gpio
      ea1e61cb
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · e09f67f1
      Linus Torvalds authored
      Pull btrfs fixes from Chris Mason:
       "This is a small collection of fixes.  It was rebased this morning, but
        I was just fixing signed-off-by tags with the wrong email"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
        Btrfs: fix access_ok() check in btrfs_ioctl_send()
        Btrfs: make sure we cleanup all reloc roots if error happens
        Btrfs: skip building backref tree for uuid and quota tree when doing balance relocation
        Btrfs: fix an oops when doing balance relocation
        Btrfs: don't miss skinny extent items on delayed ref head contention
        btrfs: call mnt_drop_write after interrupted subvol deletion
        Btrfs: don't clear the default compression type
      e09f67f1
    • Linus Torvalds's avatar
      Merge branch 'for-3.13' of git://linux-nfs.org/~bfields/linux · c9111b4d
      Linus Torvalds authored
      Pull nfsd reply cache bugfix from Bruce Fields:
       "One bugfix for nfsd crashes"
      
      * 'for-3.13' of git://linux-nfs.org/~bfields/linux:
        nfsd: when reusing an existing repcache entry, unhash it first
      c9111b4d
    • Gleb Natapov's avatar
      KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376) · 17d68b76
      Gleb Natapov authored
      A guest can cause a BUG_ON() leading to a host kernel crash.
      When the guest writes to the ICR to request an IPI, while in x2apic
      mode the following things happen, the destination is read from
      ICR2, which is a register that the guest can control.
      
      kvm_irq_delivery_to_apic_fast uses the high 16 bits of ICR2 as the
      cluster id.  A BUG_ON is triggered, which is a protection against
      accessing map->logical_map with an out-of-bounds access and manages
      to avoid that anything really unsafe occurs.
      
      The logic in the code is correct from real HW point of view. The problem
      is that KVM supports only one cluster with ID 0 in clustered mode, but
      the code that has the bug does not take this into account.
      Reported-by: default avatarLars Bull <larsbull@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGleb Natapov <gleb@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      17d68b76
    • Andy Honig's avatar
      KVM: x86: Convert vapic synchronization to _cached functions (CVE-2013-6368) · fda4e2e8
      Andy Honig authored
      In kvm_lapic_sync_from_vapic and kvm_lapic_sync_to_vapic there is the
      potential to corrupt kernel memory if userspace provides an address that
      is at the end of a page.  This patches concerts those functions to use
      kvm_write_guest_cached and kvm_read_guest_cached.  It also checks the
      vapic_address specified by userspace during ioctl processing and returns
      an error to userspace if the address is not a valid GPA.
      
      This is generally not guest triggerable, because the required write is
      done by firmware that runs before the guest.  Also, it only affects AMD
      processors and oldish Intel that do not have the FlexPriority feature
      (unless you disable FlexPriority, of course; then newer processors are
      also affected).
      
      Fixes: b93463aa ('KVM: Accelerated apic support')
      Reported-by: default avatarAndrew Honig <ahonig@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAndrew Honig <ahonig@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fda4e2e8
    • Andy Honig's avatar
      KVM: x86: Fix potential divide by 0 in lapic (CVE-2013-6367) · b963a22e
      Andy Honig authored
      Under guest controllable circumstances apic_get_tmcct will execute a
      divide by zero and cause a crash.  If the guest cpuid support
      tsc deadline timers and performs the following sequence of requests
      the host will crash.
      - Set the mode to periodic
      - Set the TMICT to 0
      - Set the mode bits to 11 (neither periodic, nor one shot, nor tsc deadline)
      - Set the TMICT to non-zero.
      Then the lapic_timer.period will be 0, but the TMICT will not be.  If the
      guest then reads from the TMCCT then the host will perform a divide by 0.
      
      This patch ensures that if the lapic_timer.period is 0, then the division
      does not occur.
      Reported-by: default avatarAndrew Honig <ahonig@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAndrew Honig <ahonig@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b963a22e
    • Andy Honig's avatar
      KVM: Improve create VCPU parameter (CVE-2013-4587) · 338c7dba
      Andy Honig authored
      In multiple functions the vcpu_id is used as an offset into a bitfield.  Ag
      malicious user could specify a vcpu_id greater than 255 in order to set or
      clear bits in kernel memory.  This could be used to elevate priveges in the
      kernel.  This patch verifies that the vcpu_id provided is less than 255.
      The api documentation already specifies that the vcpu_id must be less than
      max_vcpus, but this is currently not checked.
      Reported-by: default avatarAndrew Honig <ahonig@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAndrew Honig <ahonig@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      338c7dba
    • Linus Torvalds's avatar
      Merge tag 'sound-3.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 2208f651
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "Still a slightly high amount of changes than wished, but they are all
        good regression and/or device-specific fixes.  Majority of commits are
        for HD-audio, an HDMI ctl index fix that hits old graphics boards,
        regression fixes for AD codecs and a few quirks.
      
        Other than that, two major fixes are included: a 64bit ABI fix for
        compress offload, and 64bit dma_addr_t truncation fix, which had hit
        on PAE kernels"
      
      * tag 'sound-3.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: hda - Add static DAC/pin mapping for AD1986A codec
        ALSA: hda - One more Dell headset detection quirk
        ALSA: hda - hdmi: Fix IEC958 ctl indexes for some simple HDMI devices
        ALSA: hda - Mute all aamix inputs as default
        ALSA: compress: Fix 64bit ABI incompatibility
        ALSA: memalloc.h - fix wrong truncation of dma_addr_t
        ALSA: hda - Another Dell headset detection quirk
        ALSA: hda - A Dell headset detection quirk
        ALSA: hda - Remove quirk for Dell Vostro 131
        ALSA: usb-audio: fix uninitialized variable compile warning
        ALSA: hda - fix mic issues on Acer Aspire E-572
      2208f651
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · ea4ebd1c
      Linus Torvalds authored
      Pull input fixes from Dmitry Torokhov:
       "A fix for recent sysfs breakage in serio subsystem plus a fixup to
        adxl34x driver"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Input: adxl34x - Fix bug in definition of ADXL346_2D_ORIENT
        Input: serio - fix sysfs layout
      ea4ebd1c
    • Linus Torvalds's avatar
      Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · 846f29a6
      Linus Torvalds authored
      Pull media fixes from Mauro Carvalho Chehab:
       "A dvb core deadlock fix, a couple videobuf2 fixes an a series of media
        driver fixes"
      
      * 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (30 commits)
        [media] videobuf2-dma-sg: fix possible memory leak
        [media] vb2: regression fix: always set length field.
        [media] mt9p031: Include linux/of.h header
        [media] rtl2830: add parent for I2C adapter
        [media] media: marvell-ccic: use devm to release clk
        [media] ths7303: Declare as static a private function
        [media] em28xx-video: Swap release order to avoid lock nesting
        [media] usbtv: Add support for PAL video source
        [media] media_tree: Fix spelling errors
        [media] videobuf2: Add support for file access mode flags for DMABUF exporting
        [media] radio-shark2: Mark shark_resume_leds() inline to kill compiler warning
        [media] radio-shark: Mark shark_resume_leds() inline to kill compiler warning
        [media] af9035: unlock on error in af9035_i2c_master_xfer()
        [media] af9033: fix broken I2C
        [media] v4l: omap3isp: Don't check for missing get_fmt op on remote subdev
        [media] af9035: fix broken I2C and USB I/O
        [media] wm8775: fix broken audio routing
        [media] marvell-ccic: drop resource free in driver remove
        [media] tef6862/radio-tea5764: actually assign clamp result
        [media] cx231xx: use after free on error path in probe
        ...
      846f29a6
    • Linus Torvalds's avatar
      Merge tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging · 86b581f6
      Linus Torvalds authored
      Pull hwmon fix from Guenter Roeck:
       "Fix HIH-6130 driver to work with BeagleBone"
      
      * tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
        hwmon: HIH-6130: Support I2C bus drivers without I2C_FUNC_SMBUS_QUICK
      86b581f6
    • Linus Torvalds's avatar
      Merge branch 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging · c8469441
      Linus Torvalds authored
      Pull hwmon fixes from Jean Delvare.
      
      * 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
        hwmon: Prevent some divide by zeros in FAN_TO_REG()
        hwmon: (w83l768ng) Fix fan speed control range
        hwmon: (w83l786ng) Fix fan speed control mode setting and reporting
        hwmon: (lm90) Unregister hwmon device if interrupt setup fails
      c8469441
    • Will Deacon's avatar
      word-at-a-time: provide generic big-endian zero_bytemask implementation · 11ec50ca
      Will Deacon authored
      Whilst architectures may be able to do better than this (which they can,
      by simply defining their own macro), this is a generic stab at a
      zero_bytemask implementation for the asm-generic, big-endian
      word-at-a-time implementation.
      
      On arm64, a clz instruction is used to implement the fls efficiently.
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      11ec50ca
    • Will Deacon's avatar
      dcache: allow word-at-a-time name hashing with big-endian CPUs · a5c21dce
      Will Deacon authored
      When explicitly hashing the end of a string with the word-at-a-time
      interface, we have to be careful which end of the word we pick up.
      
      On big-endian CPUs, the upper-bits will contain the data we're after, so
      ensure we generate our masks accordingly (and avoid hashing whatever
      random junk may have been sitting after the string).
      
      This patch adds a new dcache helper, bytemask_from_count, which creates
      a mask appropriate for the CPU endianness.
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a5c21dce
    • Linus Torvalds's avatar
      Merge tag 'iommu-fixes-for-v3.13-rc4' of git://github.com/awilliam/linux-vfio · 319720f5
      Linus Torvalds authored
      Pull iommu fixes from Alex Williamson:
       "arm/smmu driver updates via Will Deacon fixing locking around page
        table walks and a couple other issues"
      
      * tag 'iommu-fixes-for-v3.13-rc4' of git://github.com/awilliam/linux-vfio:
        iommu/arm-smmu: fix error return code in arm_smmu_device_dt_probe()
        iommu/arm-smmu: remove potential NULL dereference on mapping path
        iommu/arm-smmu: use mutex instead of spinlock for locking page tables
      319720f5
    • Linus Torvalds's avatar
      Merge tag 'keys-devel-20131210' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · 5dec682c
      Linus Torvalds authored
      Pull misc keyrings fixes from David Howells:
       "These break down into five sets:
      
         - A patch to error handling in the big_key type for huge payloads.
           If the payload is larger than the "low limit" and the backing store
           allocation fails, then big_key_instantiate() doesn't clear the
           payload pointers in the key, assuming them to have been previously
           cleared - but only one of them is.
      
           Unfortunately, the garbage collector still calls big_key_destroy()
           when sees one of the pointers with a weird value in it (and not
           NULL) which it then tries to clean up.
      
         - Three patches to fix the keyring type:
      
           * A patch to fix the hash function to correctly divide keyrings off
             from keys in the topology of the tree inside the associative
             array.  This is only a problem if searching through nested
             keyrings - and only if the hash function incorrectly puts the a
             keyring outside of the 0 branch of the root node.
      
           * A patch to fix keyrings' use of the associative array.  The
             __key_link_begin() function initially passes a NULL key pointer
             to assoc_array_insert() on the basis that it's holding a place in
             the tree whilst it does more allocation and stuff.
      
             This is only a problem when a node contains 16 keys that match at
             that level and we want to add an also matching 17th.  This should
             easily be manufactured with a keyring full of keyrings (without
             chucking any other sort of key into the mix) - except for (a)
             above which makes it on average adding the 65th keyring.
      
           * A patch to fix searching down through nested keyrings, where any
             keyring in the set has more than 16 keyrings and none of the
             first keyrings we look through has a match (before the tree
             iteration needs to step to a more distal node).
      
           Test in keyutils test suite:
      
              http://git.kernel.org/cgit/linux/kernel/git/dhowells/keyutils.git/commit/?id=8b4ae963ed92523aea18dfbb8cab3f4979e13bd1
      
         - A patch to fix the big_key type's use of a shmem file as its
           backing store causing audit messages and LSM check failures.  This
           is done by setting S_PRIVATE on the file to avoid LSM checks on the
           file (access to the shmem file goes through the keyctl() interface
           and so is gated by the LSM that way).
      
           This isn't normally a problem if a key is used by the context that
           generated it - and it's currently only used by libkrb5.
      
           Test in keyutils test suite:
      
              http://git.kernel.org/cgit/linux/kernel/git/dhowells/keyutils.git/commit/?id=d9a53cbab42c293962f2f78f7190253fc73bd32e
      
         - A patch to add a generated file to .gitignore.
      
         - A patch to fix the alignment of the system certificate data such
           that it it works on s390.  As I understand it, on the S390 arch,
           symbols must be 2-byte aligned because loading the address discards
           the least-significant bit"
      
      * tag 'keys-devel-20131210' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
        KEYS: correct alignment of system_certificate_list content in assembly file
        Ignore generated file kernel/x509_certificate_list
        security: shmem: implement kernel private shmem inodes
        KEYS: Fix searching of nested keyrings
        KEYS: Fix multiple key add into associative array
        KEYS: Fix the keyring hash function
        KEYS: Pre-clear struct key on allocation
      5dec682c
    • Linus Torvalds's avatar
      Merge tag 'xfs-for-linus-v3.13-rc4' of git://oss.sgi.com/xfs/xfs · 48a2f0b2
      Linus Torvalds authored
      Pull xfs bugfixes from Ben Myers:
      
       - fix for buffer overrun in agfl with growfs on v4 superblock
      
       - return EINVAL if requested discard length is less than a block
      
       - fix possible memory corruption in xfs_attrlist_by_handle()
      
      * tag 'xfs-for-linus-v3.13-rc4' of git://oss.sgi.com/xfs/xfs:
        xfs: growfs overruns AGFL buffer on V4 filesystems
        xfs: don't perform discard if the given range length is less than block size
        xfs: underflow bug in xfs_attrlist_by_handle()
      48a2f0b2
    • Linus Torvalds's avatar
      futex: move user address verification up to common code · 5cdec2d8
      Linus Torvalds authored
      When debugging the read-only hugepage case, I was confused by the fact
      that get_futex_key() did an access_ok() only for the non-shared futex
      case, since the user address checking really isn't in any way specific
      to the private key handling.
      
      Now, it turns out that the shared key handling does effectively do the
      equivalent checks inside get_user_pages_fast() (it doesn't actually
      check the address range on x86, but does check the page protections for
      being a user page).  So it wasn't actually a bug, but the fact that we
      treat the address differently for private and shared futexes threw me
      for a loop.
      
      Just move the check up, so that it gets done for both cases.  Also, use
      the 'rw' parameter for the type, even if it doesn't actually matter any
      more (it's a historical artifact of the old racy i386 "page faults from
      kernel space don't check write protections").
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5cdec2d8
    • Linus Torvalds's avatar
      futex: fix handling of read-only-mapped hugepages · f12d5bfc
      Linus Torvalds authored
      The hugepage code had the exact same bug that regular pages had in
      commit 7485d0d3 ("futexes: Remove rw parameter from
      get_futex_key()").
      
      The regular page case was fixed by commit 9ea71503 ("futex: Fix
      regression with read only mappings"), but the transparent hugepage case
      (added in a5b338f2: "thp: update futex compound knowledge") case
      remained broken.
      
      Found by Dave Jones and his trinity tool.
      Reported-and-tested-by: default avatarDave Jones <davej@fedoraproject.org>
      Cc: stable@kernel.org # v2.6.38+
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Darren Hart <dvhart@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f12d5bfc
    • Dan Carpenter's avatar
      Btrfs: fix access_ok() check in btrfs_ioctl_send() · 700ff4f0
      Dan Carpenter authored
      The closing parenthesis is in the wrong place.  We want to check
      "sizeof(*arg->clone_sources) * arg->clone_sources_count" instead of
      "sizeof(*arg->clone_sources * arg->clone_sources_count)".
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: default avatarJie Liu <jeff.liu@oracle.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      cc: stable@vger.kernel.org
      700ff4f0
    • Wang Shilong's avatar
      Btrfs: make sure we cleanup all reloc roots if error happens · 467bb1d2
      Wang Shilong authored
      I hit an oops when merging reloc roots fails, the reason is that
      new reloc roots may be added and we should make sure we cleanup
      all reloc roots.
      Signed-off-by: default avatarWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      467bb1d2
    • Wang Shilong's avatar
      Btrfs: skip building backref tree for uuid and quota tree when doing balance relocation · 66463748
      Wang Shilong authored
      Quota tree and UUID Tree is only cowed, they can not be snapshoted.
      Signed-off-by: default avatarWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      66463748
    • Wang Shilong's avatar
      Btrfs: fix an oops when doing balance relocation · c974c464
      Wang Shilong authored
      I hit an oops when inserting reloc root into @reloc_root_tree(it can be
      easily triggered when forcing cow for relocation root)
      
      [  866.494539]  [<ffffffffa0499579>] btrfs_init_reloc_root+0x79/0xb0 [btrfs]
      [  866.495321]  [<ffffffffa044c240>] record_root_in_trans+0xb0/0x110 [btrfs]
      [  866.496109]  [<ffffffffa044d758>] btrfs_record_root_in_trans+0x48/0x80 [btrfs]
      [  866.496908]  [<ffffffffa0494da8>] select_reloc_root+0xa8/0x210 [btrfs]
      [  866.497703]  [<ffffffffa0495c8a>] do_relocation+0x16a/0x540 [btrfs]
      
      This is because reloc root inserted into @reloc_root_tree is not within one
      transaction,reloc root may be cowed and root block bytenr will be reused then
      oops happens.We should update reloc root in @reloc_root_tree when cow reloc
      root node, fix it.
      Signed-off-by: default avatarWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Reviewed-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      c974c464
    • Filipe David Borba Manana's avatar
      Btrfs: don't miss skinny extent items on delayed ref head contention · 639eefc8
      Filipe David Borba Manana authored
      Currently extent-tree.c:btrfs_lookup_extent_info() can miss the lookup
      of skinny extent items. This can happen when the execution flow is the
      following:
      
      * We do an extent tree lookup and fail to find a skinny extent item;
      
      * As a result, we attempt to see if a non-skinny extent item exists,
        either by looking at previous item in the leaf or by doing another
        full extent tree search;
      
      * We have a transaction and then we check for a matching delayed ref
        head in the transaction's delayed refs rbtree;
      
      * We find such delayed ref head and then we try to lock it with a
        call to mutex_trylock();
      
      * The lock was contended so we jump to the label "again", which repeats
        the extent tree search but for a non-skinny extent item, because we set
        previously metadata variable to 0 and the search key to look for a
        non-skinny extent-item;
      
      * After the jump (and after releasing the transaction's delayed refs
        lock), a skinny extent item might have been added to the extent tree
        but we will miss it because metadata is set to 0 and the search key
        is set for a non-skinny extent-item.
      
      The fix here is to not reset metadata to 0 and to jump to the initial search
      key setup if the delayed ref head is contended, instead of jumping directly
      to the extent tree search label ("again").
      
      This issue was found while investigating the issue reported at Bugzilla 64961.
      
      David Sterba suspected this function was missing extent items, and that
      this could be caused by the last change to this function, which was made
      in the following patch:
      
          [PATCH] Btrfs: optimize btrfs_lookup_extent_info()
          (commit 74be9510)
      
      But in fact this issue already existed before, because after failing to find
      a skinny extent item, the code set the search key for a non-skinny extent
      item, and on contention of a matching delayed ref head it would not search
      the extent tree for a skinny extent item anymore.
      Signed-off-by: default avatarFilipe David Borba Manana <fdmanana@gmail.com>
      Reviewed-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      639eefc8
    • David Sterba's avatar
      btrfs: call mnt_drop_write after interrupted subvol deletion · e43f998e
      David Sterba authored
      If btrfs_ioctl_snap_destroy blocks on the mutex and the process is
      killed, mnt_write count is unbalanced and leads to unmountable
      filesystem.
      
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      e43f998e