1. 21 Jun, 2015 5 commits
  2. 28 May, 2015 7 commits
    • Brian Foster's avatar
      xfs: fix broken i_nlink accounting for whiteout tmpfile inode · 22419ac9
      Brian Foster authored
      XFS uses the internal tmpfile() infrastructure for the whiteout inode
      used for RENAME_WHITEOUT operations. For tmpfile inodes, XFS allocates
      the inode, drops di_nlink, adds the inode to the agi unlinked list,
      calls d_tmpfile() which correspondingly drops i_nlink of the vfs inode,
      and then finishes the common inode setup (e.g., clear I_NEW and unlock).
      
      The d_tmpfile() call was originally made inxfs_create_tmpfile(), but was
      pulled up out of that function as part of the following commit to
      resolve a deadlock issue:
      
      	330033d6 xfs: fix tmpfile/selinux deadlock and initialize security
      
      As a result, callers of xfs_create_tmpfile() are responsible for either
      calling d_tmpfile() or fixing up i_nlink appropriately. The whiteout
      tmpfile allocation helper does neither. As a result, the vfs ->i_nlink
      becomes inconsistent with the on-disk ->di_nlink once xfs_rename() links
      it back into the source dentry and calls xfs_bumplink().
      
      Update the assert in xfs_rename() to help detect this problem in the
      future and update xfs_rename_alloc_whiteout() to decrement the link
      count as part of the manual tmpfile inode setup.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      
      22419ac9
    • Dave Chinner's avatar
      xfs: xfs_iozero can return positive errno · cddc1162
      Dave Chinner authored
      It was missed when we converted everything in XFs to use negative error
      numbers, so fix it now. Bug introduced in 3.17 by commit 2451337d ("xfs: global
      error sign conversion"), and should go back to stable kernels.
      
      Thanks to Brian Foster for noticing it.
      
      cc: <stable@vger.kernel.org> # 3.17, 3.18, 3.19, 4.0
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      cddc1162
    • Dave Chinner's avatar
      xfs: xfs_attr_inactive leaves inconsistent attr fork state behind · 6dfe5a04
      Dave Chinner authored
      xfs_attr_inactive() is supposed to clean up the attribute fork when
      the inode is being freed. While it removes attribute fork extents,
      it completely ignores attributes in local format, which means that
      there can still be active attributes on the inode after
      xfs_attr_inactive() has run.
      
      This leads to problems with concurrent inode writeback - the in-core
      inode attribute fork is removed without locking on the assumption
      that nothing will be attempting to access the attribute fork after a
      call to xfs_attr_inactive() because it isn't supposed to exist on
      disk any more.
      
      To fix this, make xfs_attr_inactive() completely remove all traces
      of the attribute fork from the inode, regardless of it's state.
      Further, also remove the in-core attribute fork structure safely so
      that there is nothing further that needs to be done by callers to
      clean up the attribute fork. This means we can remove the in-core
      and on-disk attribute forks atomically.
      
      Also, on error simply remove the in-memory attribute fork. There's
      nothing that can be done with it once we have failed to remove the
      on-disk attribute fork, so we may as well just blow it away here
      anyway.
      
      cc: <stable@vger.kernel.org> # 3.12 to 4.0
      Reported-by: default avatarWaiman Long <waiman.long@hp.com>
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      
      6dfe5a04
    • Dave Chinner's avatar
      xfs: extent size hints can round up extents past MAXEXTLEN · 6dea405e
      Dave Chinner authored
      This results in BMBT corruption, as seen by this test:
      
      # mkfs.xfs -f -d size=40051712b,agcount=4 /dev/vdc
      ....
      # mount /dev/vdc /mnt/scratch
      # xfs_io -ft -c "extsize 16m" -c "falloc 0 30g" -c "bmap -vp" /mnt/scratch/foo
      
      which results in this failure on a debug kernel:
      
      XFS: Assertion failed: (blockcount & xfs_mask64hi(64-BMBT_BLOCKCOUNT_BITLEN)) == 0, file: fs/xfs/libxfs/xfs_bmap_btree.c, line: 211
      ....
      Call Trace:
       [<ffffffff814cf0ff>] xfs_bmbt_set_allf+0x8f/0x100
       [<ffffffff814cf18d>] xfs_bmbt_set_all+0x1d/0x20
       [<ffffffff814f2efe>] xfs_iext_insert+0x9e/0x120
       [<ffffffff814c7956>] ? xfs_bmap_add_extent_hole_real+0x1c6/0xc70
       [<ffffffff814c7956>] xfs_bmap_add_extent_hole_real+0x1c6/0xc70
       [<ffffffff814caaab>] xfs_bmapi_write+0x72b/0xed0
       [<ffffffff811c72ac>] ? kmem_cache_alloc+0x15c/0x170
       [<ffffffff814fe070>] xfs_alloc_file_space+0x160/0x400
       [<ffffffff81ddcc29>] ? down_write+0x29/0x60
       [<ffffffff815063eb>] xfs_file_fallocate+0x29b/0x310
       [<ffffffff811d2bc8>] ? __sb_start_write+0x58/0x120
       [<ffffffff811e3e18>] ? do_vfs_ioctl+0x318/0x570
       [<ffffffff811cd680>] vfs_fallocate+0x140/0x260
       [<ffffffff811ce6f8>] SyS_fallocate+0x48/0x80
       [<ffffffff81ddec09>] system_call_fastpath+0x12/0x17
      
      The tracepoint that indicates the extent that triggered the assert
      failure is:
      
      xfs_iext_insert:   idx 0 offset 0 block 16777224 count 2097152 flag 1
      
      Clearly indicating that the extent length is greater than MAXEXTLEN,
      which is 2097151. A prior trace point shows the allocation was an
      exact size match and that a length greater than MAXEXTLEN was asked
      for:
      
      xfs_alloc_size_done:  agno 1 agbno 8 minlen 2097152 maxlen 2097152
      					    ^^^^^^^        ^^^^^^^
      
      We don't see this problem with extent size hints through the IO path
      because we can't do single IOs large enough to trigger MAXEXTLEN
      allocation. fallocate(), OTOH, is not limited in it's allocation
      sizes and so needs help here.
      
      The issue is that the extent size hint alignment is rounding up the
      extent size past MAXEXTLEN, because xfs_bmapi_write() is not taking
      into account extent size hints when calculating the maximum extent
      length to allocate. xfs_bmapi_reserve_delalloc() is already doing
      this, but direct extent allocation is not.
      
      Unfortunately, the calculation in xfs_bmapi_reserve_delalloc() is
      wrong, and it works only because delayed allocation extents are not
      limited in size to MAXEXTLEN in the in-core extent tree. hence this
      calculation does not work for direct allocation, and the delalloc
      code needs fixing. This may, in fact be the underlying bug that
      occassionally causes transaction overruns in delayed allocation
      extent conversion, so now we know it's wrong we should fix it, too.
      Many thanks to Brian Foster for finding this problem during review
      of this patch.
      
      Hence the fix, after much code reading, is to allow
      xfs_bmap_extsize_align() to align partial extents when full
      alignment would extend the alignment past MAXEXTLEN. We can safely
      do this because all callers have higher layer allocation loops that
      already handle short allocations, and so will simply run another
      allocation to cover the remainder of the requested allocation range
      that we ignored during alignment. The advantage of this approach is
      that it also removes the need for callers to do anything other than
      limit their requests to MAXEXTLEN - they don't really need to be
      aware of extent size hints at all.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      6dea405e
    • Dave Chinner's avatar
      xfs: inode and free block counters need to use __percpu_counter_compare · 8c1903d3
      Dave Chinner authored
      Because the counters use a custom batch size, the comparison
      functions need to be aware of that batch size otherwise the
      comparison does not work correctly. This leads to ASSERT failures
      on generic/027 like this:
      
       XFS: Assertion failed: 0, file: fs/xfs/xfs_mount.c, line: 1099
       ------------[ cut here ]------------
      ....
       Call Trace:
        [<ffffffff81522a39>] xfs_mod_icount+0x99/0xc0
        [<ffffffff815285cb>] xfs_trans_unreserve_and_mod_sb+0x28b/0x5b0
        [<ffffffff8152f941>] xfs_log_commit_cil+0x321/0x580
        [<ffffffff81528e17>] xfs_trans_commit+0xb7/0x260
        [<ffffffff81503d4d>] xfs_bmap_finish+0xcd/0x1b0
        [<ffffffff8151da41>] xfs_inactive_ifree+0x1e1/0x250
        [<ffffffff8151dbe0>] xfs_inactive+0x130/0x200
        [<ffffffff81523a21>] xfs_fs_evict_inode+0x91/0xf0
        [<ffffffff811f3958>] evict+0xb8/0x190
        [<ffffffff811f433b>] iput+0x18b/0x1f0
        [<ffffffff811e8853>] do_unlinkat+0x1f3/0x320
        [<ffffffff811d548a>] ? filp_close+0x5a/0x80
        [<ffffffff811e999b>] SyS_unlinkat+0x1b/0x40
        [<ffffffff81e0892e>] system_call_fastpath+0x12/0x71
      
      This is a regression introduced by commit 501ab323 ("xfs: use generic
      percpu counters for inode counter").
      
      This patch fixes the same problem for both the inode counter and the
      free block counter in the superblocks.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      8c1903d3
    • Dave Chinner's avatar
      percpu_counter: batch size aware __percpu_counter_compare() · 80188b0d
      Dave Chinner authored
      XFS uses non-stanard batch sizes for avoiding frequent global
      counter updates on it's allocated inode counters, as they increment
      or decrement in batches of 64 inodes. Hence the standard percpu
      counter batch of 32 means that the counter is effectively a global
      counter. Currently Xfs uses a batch size of 128 so that it doesn't
      take the global lock on every single modification.
      
      However, Xfs also needs to compare accurately against zero, which
      means we need to use percpu_counter_compare(), and that has a
      hard-coded batch size of 32, and hence will spuriously fail to
      detect when it is supposed to use precise comparisons and hence
      the accounting goes wrong.
      
      Add __percpu_counter_compare() to take a custom batch size so we can
      use it sanely in XFS and factor percpu_counter_compare() to use it.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      80188b0d
    • George Wang's avatar
      xfs: use percpu_counter_read_positive for mp->m_icount · 74f9ce1c
      George Wang authored
      Function percpu_counter_read just return the current counter, which can be
      negative. This will cause the checking of "allocated inode
      counts <= m_maxicount" false positive. Use percpu_counter_read_positive can
      solve this problem, and be consistent with the purpose to introduce percpu
      mechanism to xfs.
      Signed-off-by: default avatarGeorge Wang <xuw2015@gmail.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      
      74f9ce1c
  3. 04 May, 2015 3 commits
    • Linus Torvalds's avatar
      Linux 4.1-rc2 · 5ebe6afa
      Linus Torvalds authored
      5ebe6afa
    • Linus Torvalds's avatar
      Merge tag 'for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · 8663da2c
      Linus Torvalds authored
      Pull ext4 fixes from Ted Ts'o:
       "Some miscellaneous bug fixes and some final on-disk and ABI changes
        for ext4 encryption which provide better security and performance"
      
      * tag 'for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
        ext4: fix growing of tiny filesystems
        ext4: move check under lock scope to close a race.
        ext4: fix data corruption caused by unwritten and delayed extents
        ext4 crypto: remove duplicated encryption mode definitions
        ext4 crypto: do not select from EXT4_FS_ENCRYPTION
        ext4 crypto: add padding to filenames before encrypting
        ext4 crypto: simplify and speed up filename encryption
      8663da2c
    • Linus Torvalds's avatar
      Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux · 101a6fd3
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "One intel fix, one rockchip fix, and a bunch of radeon fixes for some
        regressions from audio rework and vm stability"
      
      * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
        drm/i915/chv: Implement WaDisableShadowRegForCpd
        drm/radeon: fix userptr return value checking (v2)
        drm/radeon: check new address before removing old one
        drm/radeon: reset BOs address after clearing it.
        drm/radeon: fix lockup when BOs aren't part of the VM on release
        drm/radeon: add SI DPM quirk for Sapphire R9 270 Dual-X 2G GDDR5
        drm/radeon: adjust pll when audio is not enabled
        drm/radeon: only enable audio streams if the monitor supports it
        drm/radeon: only mark audio as connected if the monitor supports it (v3)
        drm/radeon/audio: don't enable packets until the end
        drm/radeon: drop dce6_dp_enable
        drm/radeon: fix ordering of AVI packet setup
        drm/radeon: Use drm_calloc_ab for CS relocs
        drm/rockchip: fix error check when getting irq
        MAINTAINERS: add entry for Rockchip drm drivers
      101a6fd3
  4. 03 May, 2015 8 commits
    • Dave Airlie's avatar
      Merge tag 'drm-intel-fixes-2015-04-30' of git://anongit.freedesktop.org/drm-intel into drm-fixes · 71aee819
      Dave Airlie authored
      Just a single intel fix
      * tag 'drm-intel-fixes-2015-04-30' of git://anongit.freedesktop.org/drm-intel:
        drm/i915/chv: Implement WaDisableShadowRegForCpd
      71aee819
    • Dave Airlie's avatar
      Merge branch 'drm-next0420' of https://github.com/markyzq/kernel-drm-rockchip into drm-fixes · df9ebeb2
      Dave Airlie authored
      one fix and maintainers update
      * 'drm-next0420' of https://github.com/markyzq/kernel-drm-rockchip:
        drm/rockchip: fix error check when getting irq
        MAINTAINERS: add entry for Rockchip drm drivers
      df9ebeb2
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 61f06db0
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "This is three logical fixes (as 5 patches).
      
        The 3ware class of drivers were causing an oops with multiqueue by
        tearing down the command mappings after completing the command (where
        the variables in the command used to tear down the mapping were
        no-longer valid).  There's also a fix for the qnap iscsi target which
        was choking on us sending it commands that were too long and a fix for
        the reworked aha1542 allocating GFP_KERNEL under a lock"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        3w-9xxx: fix command completion race
        3w-xxxx: fix command completion race
        3w-sas: fix command completion race
        aha1542: Allocate memory before taking a lock
        SCSI: add 1024 max sectors black list flag
      61f06db0
    • Linus Torvalds's avatar
      Merge branch 'next' of git://git.infradead.org/users/vkoul/slave-dma · 33332224
      Linus Torvalds authored
      Pull slave dmaengine fixes from Vinod Koul:
       "Here are the fixes in dmaengine subsystem for rc2:
      
         - privatecnt fix for slave dma request API by Christopher
      
         - warn fix for PM ifdef in usb-dmac by Geert
      
         - fix hardware dependency for xgene by Jean"
      
      * 'next' of git://git.infradead.org/users/vkoul/slave-dma:
        dmaengine: increment privatecnt when using dma_get_any_slave_channel
        dmaengine: xgene: Set hardware dependency
        dmaengine: usb-dmac: Protect PM-only functions to kill warning
      33332224
    • Linus Torvalds's avatar
      Merge tag 'powerpc-4.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux · 180d89f6
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       - build fix for SMP=n in book3s_xics.c
       - fix for Daniel's pci_controller_ops on powernv.
       - revert the TM syscall abort patch for now.
       - CPU affinity fix from Nathan.
       - two EEH fixes from Gavin.
       - fix for CR corruption from Sam.
       - selftest build fix.
      
      * tag 'powerpc-4.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux:
        powerpc/powernv: Restore non-volatile CRs after nap
        powerpc/eeh: Delay probing EEH device during hotplug
        powerpc/eeh: Fix race condition in pcibios_set_pcie_reset_state()
        powerpc/pseries: Correct cpu affinity for dlpar added cpus
        selftests/powerpc: Fix the pmu install rule
        Revert "powerpc/tm: Abort syscalls in active transactions"
        powerpc/powernv: Fix early pci_controller_ops loading.
        powerpc/kvm: Fix SMP=n build error in book3s_xics.c
      180d89f6
    • Jan Kara's avatar
      ext4: fix growing of tiny filesystems · 2c869b26
      Jan Kara authored
      The estimate of necessary transaction credits in ext4_flex_group_add()
      is too pessimistic. It reserves credit for sb, resize inode, and resize
      inode dindirect block for each group added in a flex group although they
      are always the same block and thus it is enough to account them only
      once. Also the number of modified GDT block is overestimated since we
      fit EXT4_DESC_PER_BLOCK(sb) descriptors in one block.
      
      Make the estimation more precise. That reduces number of requested
      credits enough that we can grow 20 MB filesystem (which has 1 MB
      journal, 79 reserved GDT blocks, and flex group size 16 by default).
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarEric Sandeen <sandeen@redhat.com>
      2c869b26
    • Davide Italiano's avatar
      ext4: move check under lock scope to close a race. · 280227a7
      Davide Italiano authored
      fallocate() checks that the file is extent-based and returns
      EOPNOTSUPP in case is not. Other tasks can convert from and to
      indirect and extent so it's safe to check only after grabbing
      the inode mutex.
      Signed-off-by: default avatarDavide Italiano <dccitaliano@gmail.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      280227a7
    • Lukas Czerner's avatar
      ext4: fix data corruption caused by unwritten and delayed extents · d2dc317d
      Lukas Czerner authored
      Currently it is possible to lose whole file system block worth of data
      when we hit the specific interaction with unwritten and delayed extents
      in status extent tree.
      
      The problem is that when we insert delayed extent into extent status
      tree the only way to get rid of it is when we write out delayed buffer.
      However there is a limitation in the extent status tree implementation
      so that when inserting unwritten extent should there be even a single
      delayed block the whole unwritten extent would be marked as delayed.
      
      At this point, there is no way to get rid of the delayed extents,
      because there are no delayed buffers to write out. So when a we write
      into said unwritten extent we will convert it to written, but it still
      remains delayed.
      
      When we try to write into that block later ext4_da_map_blocks() will set
      the buffer new and delayed and map it to invalid block which causes
      the rest of the block to be zeroed loosing already written data.
      
      For now we can fix this by simply not allowing to set delayed status on
      written extent in the extent status tree. Also add WARN_ON() to make
      sure that we notice if this happens in the future.
      
      This problem can be easily reproduced by running the following xfs_io.
      
      xfs_io -f -c "pwrite -S 0xaa 4096 2048" \
                -c "falloc 0 131072" \
                -c "pwrite -S 0xbb 65536 2048" \
                -c "fsync" /mnt/test/fff
      
      echo 3 > /proc/sys/vm/drop_caches
      xfs_io -c "pwrite -S 0xdd 67584 2048" /mnt/test/fff
      
      This can be theoretically also reproduced by at random by running fsx,
      but it's not very reliable, though on machines with bigger page size
      (like ppc) this can be seen more often (especially xfstest generic/127)
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      d2dc317d
  5. 02 May, 2015 7 commits
  6. 01 May, 2015 10 commits
    • Ilya Dryomov's avatar
      rbd: end I/O the entire obj_request on error · 082a75da
      Ilya Dryomov authored
      When we end I/O struct request with error, we need to pass
      obj_request->length as @nr_bytes so that the entire obj_request worth
      of bytes is completed.  Otherwise block layer ends up confused and we
      trip on
      
          rbd_assert(more ^ (which == img_request->obj_request_count));
      
      in rbd_img_obj_callback() due to more being true no matter what.  We
      already do it in most cases but we are missing some, in particular
      those where we don't even get a chance to submit any obj_requests, due
      to an early -ENOMEM for example.
      
      A number of obj_request->xferred assignments seem to be redundant but
      I haven't touched any of obj_request->xferred stuff to keep this small
      and isolated.
      
      Cc: Alex Elder <elder@linaro.org>
      Cc: stable@vger.kernel.org # 3.10+
      Reported-by: default avatarShawn Edwards <lesser.evil@gmail.com>
      Reviewed-by: default avatarSage Weil <sage@redhat.com>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      082a75da
    • Theodore Ts'o's avatar
      ext4 crypto: add padding to filenames before encrypting · a44cd7a0
      Theodore Ts'o authored
      This obscures the length of the filenames, to decrease the amount of
      information leakage.  By default, we pad the filenames to the next 4
      byte boundaries.  This costs nothing, since the directory entries are
      aligned to 4 byte boundaries anyway.  Filenames can also be padded to
      8, 16, or 32 bytes, which will consume more directory space.
      
      Change-Id: Ibb7a0fb76d2c48e2061240a709358ff40b14f322
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      a44cd7a0
    • Theodore Ts'o's avatar
      ext4 crypto: simplify and speed up filename encryption · 5de0b4d0
      Theodore Ts'o authored
      Avoid using SHA-1 when calculating the user-visible filename when the
      encryption key is available, and avoid decrypting lots of filenames
      when searching for a directory entry in a directory block.
      
      Change-Id: If4655f144784978ba0305b597bfa1c8d7bb69e63
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      5de0b4d0
    • Linus Torvalds's avatar
      Merge branch 'for-linus-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · 64887b68
      Linus Torvalds authored
      Pull btrfs fixes from Chris Mason:
       "A few more btrfs fixes.
      
        These range from corners Filipe found in the new free space cache
        writeback to a grab bag of fixes from the list"
      
      * 'for-linus-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
        Btrfs: btrfs_release_extent_buffer_page didn't free pages of dummy extent
        Btrfs: fill ->last_trans for delayed inode in btrfs_fill_inode.
        btrfs: unlock i_mutex after attempting to delete subvolume during send
        btrfs: check io_ctl_prepare_pages return in __btrfs_write_out_cache
        btrfs: fix race on ENOMEM in alloc_extent_buffer
        btrfs: handle ENOMEM in btrfs_alloc_tree_block
        Btrfs: fix find_free_dev_extent() malfunction in case device tree has hole
        Btrfs: don't check for delalloc_bytes in cache_save_setup
        Btrfs: fix deadlock when starting writeback of bg caches
        Btrfs: fix race between start dirty bg cache writeout and bg deletion
      64887b68
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 036f351e
      Linus Torvalds authored
      Pull arm64 fixes from Will Deacon:
       "Not too much here, but we've addressed a couple of nasty issues in the
        dma-mapping code as well as adding the halfword and byte variants of
        load_acquire/store_release following on from the CSD locking bug that
        you fixed in the core.
      
         - fix perf devicetree warnings at probe time
      
         - fix memory leak in __dma_free()
      
         - ensure DMA buffers are always zeroed
      
         - show IRQ trigger in /proc/interrupts (for parity with ARM)
      
         - implement byte and halfword access for smp_{load_acquire,store_release}"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: perf: Fix the pmu node name in warning message
        arm64: perf: don't warn about missing interrupt-affinity property for PPIs
        arm64: add missing PAGE_ALIGN() to __dma_free()
        arm64: dma-mapping: always clear allocated buffers
        ARM64: Enable CONFIG_GENERIC_IRQ_SHOW_LEVEL
        arm64: add missing data types in smp_load_acquire/smp_store_release
      036f351e
    • Sam Bobroff's avatar
      powerpc/powernv: Restore non-volatile CRs after nap · 0aab3747
      Sam Bobroff authored
      Patches 7cba160a "powernv/cpuidle: Redesign idle states management"
      and 77b54e9f "powernv/powerpc: Add winkle support for offline cpus"
      use non-volatile condition registers (cr2, cr3 and cr4) early in the system
      reset interrupt handler (system_reset_pSeries()) before it has been determined
      if state loss has occurred. If state loss has not occurred, control returns via
      the power7_wakeup_noloss() path which does not restore those condition
      registers, leaving them corrupted.
      
      Fix this by restoring the condition registers in the power7_wakeup_noloss()
      case.
      
      This is apparent when running a KVM guest on hardware that does not
      support winkle or sleep and the guest makes use of secondary threads. In
      practice this means Power7 machines, though some early unreleased Power8
      machines may also be susceptible.
      
      The secondary CPUs are taken off line before the guest is started and
      they call pnv_smp_cpu_kill_self(). This checks support for sleep
      states (in this case there is no support) and power7_nap() is called.
      
      When the CPU is woken, power7_nap() returns and because the CPU is
      still off line, the main while loop executes again. The sleep states
      support test is executed again, but because the tested values cannot
      have changed, the compiler has optimized the test away and instead we
      rely on the result of the first test, which has been left in cr3
      and/or cr4. With the result overwritten, the wrong branch is taken and
      power7_winkle() is called on a CPU that does not support it, leading
      to it stalling.
      
      Fixes: 7cba160a ("powernv/cpuidle: Redesign idle states management")
      Fixes: 77b54e9f ("powernv/powerpc: Add winkle support for offline cpus")
      [mpe: Massage change log a bit more]
      Signed-off-by: default avatarSam Bobroff <sam.bobroff@au1.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      0aab3747
    • Gavin Shan's avatar
      powerpc/eeh: Delay probing EEH device during hotplug · d91dafc0
      Gavin Shan authored
      Commit 1c509148b ("powerpc/eeh: Do probe on pci_dn") probes EEH
      devices in early stage, which is reasonable to pSeries platform.
      However, it's wrong for PowerNV platform because the PE# isn't
      determined until the resources (IO and MMIO) are assigned to
      PE in hotplug case. So we have to delay probing EEH devices
      for PowerNV platform until the PE# is assigned.
      
      Fixes: ff57b454 ("powerpc/eeh: Do probe on pci_dn")
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      d91dafc0
    • Gavin Shan's avatar
      powerpc/eeh: Fix race condition in pcibios_set_pcie_reset_state() · 1ae79b78
      Gavin Shan authored
      When asserting reset in pcibios_set_pcie_reset_state(), the PE
      is enforced to (hardware) frozen state in order to drop unexpected
      PCI transactions (except PCI config read/write) automatically by
      hardware during reset, which would cause recursive EEH error.
      However, the (software) frozen state EEH_PE_ISOLATED is missed.
      When users get 0xFF from PCI config or MMIO read, EEH_PE_ISOLATED
      is set in PE state retrival backend. Unfortunately, nobody (the
      reset handler or the EEH recovery functinality in host) will clear
      EEH_PE_ISOLATED when the PE has been passed through to guest.
      
      The patch sets and clears EEH_PE_ISOLATED properly during reset
      in function pcibios_set_pcie_reset_state() to fix the issue.
      
      Fixes: 28158cd1 ("Enhance pcibios_set_pcie_reset_state()")
      Reported-by: default avatarCarol L. Soto <clsoto@us.ibm.com>
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Tested-by: default avatarCarol L. Soto <clsoto@us.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      1ae79b78
    • Nathan Fontenot's avatar
      powerpc/pseries: Correct cpu affinity for dlpar added cpus · f32393c9
      Nathan Fontenot authored
      The incorrect ordering of operations during cpu dlpar add results in invalid
      affinity for the cpu being added. The ibm,associativity property in the
      device tree is populated with all zeroes for the added cpu which results in
      invalid affinity mappings and all cpus appear to belong to node 0.
      
      This occurs because rtas configure-connector is called prior to making the
      rtas set-indicator calls. Phyp does not assign affinity information
      for a cpu until the rtas set-indicator calls are made to set the isolation
      and allocation state.
      
      Correct the order of operations to make the rtas set-indicator
      calls (done in dlpar_acquire_drc) before calling rtas configure-connector.
      
      Fixes: 1a8061c4 ("powerpc/pseries: Add kernel based CPU DLPAR handling")
      Signed-off-by: default avatarNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      f32393c9
    • Michael Ellerman's avatar
      selftests/powerpc: Fix the pmu install rule · 2fa30fe9
      Michael Ellerman authored
      My patch to add install support for the powerpc selftests had a typo,
      leading to the three tests in the pmu directory itself not being
      installed.
      
      Fixes: 6faeeea4 ("selftests: Add install support for the powerpc tests")
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      2fa30fe9