1. 12 Oct, 2015 13 commits
    • Dave Chinner's avatar
      Merge branch 'xfs-io-fixes' into for-next · 8a56d7c3
      Dave Chinner authored
      8a56d7c3
    • Dave Chinner's avatar
      316433be
    • Eric Sandeen's avatar
      xfs: simplify /proc teardown & error handling · 9e92054e
      Eric Sandeen authored
      remove_proc_subtree() was added in 3.9, and can be
      used to simplify our procfile creation error handling
      and cleanup, removing the nested gotos.  It simply
      removes fs/xfs and everything created under it.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      9e92054e
    • Bill O'Donnell's avatar
      xfs: per-filesystem stats counter implementation · ff6d6af2
      Bill O'Donnell authored
      This patch modifies the stats counting macros and the callers
      to those macros to properly increment, decrement, and add-to
      the xfs stats counts. The counts for global and per-fs stats
      are correctly advanced, and cleared by writing a "1" to the
      corresponding clear file.
      
      global counts: /sys/fs/xfs/stats/stats
      per-fs counts: /sys/fs/xfs/sda*/stats/stats
      
      global clear:  /sys/fs/xfs/stats/stats_clear
      per-fs clear:  /sys/fs/xfs/sda*/stats/stats_clear
      
      [dchinner: cleaned up macro variables, removed CONFIG_FS_PROC around
       stats structures and macros. ]
      Signed-off-by: default avatarBill O'Donnell <billodo@redhat.com>
      Reviewed-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      ff6d6af2
    • Bill O'Donnell's avatar
      xfs: per-filesystem stats in sysfs · 225e4635
      Bill O'Donnell authored
      This patch implements per-filesystem stats objects in sysfs. It
      depends on the application of the previous patch series that
      develops the infrastructure to support both xfs global stats and
      xfs per-fs stats in sysfs.
      
      Stats objects are instantiated when an xfs filesystem is mounted
      and deleted on unmount. With this patch, the stats directory is
      created and populated with the familiar stats and stats_clear files.
      Example:
              /sys/fs/xfs/sda9/stats/stats
              /sys/fs/xfs/sda9/stats/stats_clear
      
      With this patch, the individual counts within the new per-fs
      stats file(s) remain at zero. Functions that use the the macros
      to increment, decrement, and add-to the per-fs stats counts will
      be covered in a separate new patch to follow this one. Note that
      the counts within the global stats file (/sys/fs/xfs/stats/stats)
      advance normally and can be cleared as it was prior to this patch.
      
      [dchinner: move setup/teardown to xfs_fs_{fill|put}_super() so
      it is down before/after any path that uses the per-mount stats. ]
      Signed-off-by: default avatarBill O'Donnell <billodo@redhat.com>
      Reviewed-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      225e4635
    • Eric Sandeen's avatar
      xfs: avoid null *src in memcpy call in xlog_write · 91f9f5fe
      Eric Sandeen authored
      The gcc undefined behavior sanitizer caught this; surely
      any sane memcpy implementation will no-op if size == 0,
      but behavior with a *src of NULL is technically undefined
      (declared nonnull), so avoid it here.
      
      We are actually in this situation frequently via
      xlog_commit_record(), because:
      
              struct xfs_log_iovec reg = {
                      .i_addr = NULL,
                      .i_len = 0,
                      .i_type = XLOG_REG_TYPE_COMMIT,
              };
      Reported-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      91f9f5fe
    • Brian Foster's avatar
      xfs: pass total block res. as total xfs_bmapi_write() parameter · dbd5c8c9
      Brian Foster authored
      The total field from struct xfs_alloc_arg is a bit of an unknown
      commodity. It is documented as the total block requirement for the
      transaction and is used in this manner from most call sites by virtue of
      passing the total block reservation of the transaction associated with
      an allocation. Several xfs_bmapi_write() callers pass hardcoded values
      of 0 or 1 for the total block requirement, which is a historical oddity
      without any clear reasoning.
      
      The xfs_iomap_write_direct() caller, for example, passes 0 for the total
      block requirement. This has been determined to cause problems in the
      form of ABBA deadlocks of AGF buffers due to incorrect AG selection in
      the block allocator. Specifically, the xfs_alloc_space_available()
      function incorrectly selects an AG that doesn't actually have sufficient
      space for the allocation. This occurs because the args.total field is 0
      and thus the remaining free space check on the AG doesn't actually
      consider the size of the allocation request. This locks the AGF buffer,
      the allocation attempt proceeds and ultimately fails (in
      xfs_alloc_fix_minleft()), and xfs_alloc_vexent() moves on to the next
      AG. In turn, this can lead to incorrect AG locking order (if the
      allocator wraps around, attempting to lock AG 0 after acquiring AG N)
      and thus deadlock if racing with another operation. This problem has
      been reproduced via generic/299 on smallish (1GB) ramdisk test devices.
      
      To avoid this problem, replace the undocumented hardcoded total
      parameters from the iomap and utility callers to pass the block
      reservation used for the associated transaction. This is consistent with
      other xfs_bmapi_write() callers throughout XFS. The assumption is that
      the total field allows the selection of an AG that can handle the entire
      operation rather than simply the allocation/range being requested (e.g.,
      resulting btree splits, etc.). This addresses the aforementioned
      generic/299 hang by ensuring AG selection only occurs when the
      allocation can be satisfied by the AG.
      Reported-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      dbd5c8c9
    • Brian Foster's avatar
      xfs: add an xfs_zero_eof() tracepoint · 0a50f162
      Brian Foster authored
      Add a tracepoint in xfs_zero_eof() to facilitate tracking and debugging
      EOF zeroing events. This has proven useful in the context of other
      direct I/O tracepoints to ensure EOF zeroing occurs within appropriate
      file ranges.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      0a50f162
    • Brian Foster's avatar
      xfs: always drain dio before extending aio write submission · 3136e8bb
      Brian Foster authored
      XFS supports and typically allows concurrent asynchronous direct I/O
      submission to a single file. One exception to the rule is that file
      extending dio writes that start beyond the current EOF (e.g.,
      potentially create a hole at EOF) require exclusive I/O access to the
      file. This is because such writes must zero any pre-existing blocks
      beyond EOF that are exposed by virtue of now residing within EOF as a
      result of the write about to be submitted.
      
      Before EOF zeroing can occur, the current file i_size must be stabilized
      to avoid data corruption. In this scenario, XFS upgrades the iolock to
      exclude any further I/O submission, waits on in-flight I/O to complete
      to ensure i_size is up to date (i_size is updated on dio write
      completion) and restarts the various checks against the state of the
      file. The problem is that this protection sequence is triggered only
      when the iolock is currently held shared. While this is true for async
      dio in most cases, the caller may upgrade the lock in advance based on
      arbitrary circumstances with respect to EOF zeroing. For example, the
      iolock is always acquired exclusively if the start offset is not block
      aligned. This means that even though the iolock is already held
      exclusive for such I/Os, pending I/O is not drained and thus EOF zeroing
      can occur based on an unstable i_size.
      
      This problem has been reproduced as guest data corruption in virtual
      machines with file-backed qcow2 virtual disks hosted on an XFS
      filesystem. The virtual disks must be configured with aio=native mode
      and the must not be truncated out to the maximum file size (as some virt
      managers will do).
      
      Update xfs_file_aio_write_checks() to unconditionally drain in-flight
      dio before EOF zeroing can occur. Rather than trigger the wait based on
      iolock state, use a new flag and upgrade the iolock when necessary. Note
      that this results in a full restart of the inode checks even when the
      iolock was already held exclusive when technically it is only required
      to recheck i_size. This should be a rare enough occurrence that it is
      preferable to keep the code simple rather than create an alternate
      restart jump target.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      3136e8bb
    • Brian Foster's avatar
      xfs: validate metadata LSNs against log on v5 superblocks · a45086e2
      Brian Foster authored
      Since the onset of v5 superblocks, the LSN of the last modification has
      been included in a variety of on-disk data structures. This LSN is used
      to provide log recovery ordering guarantees (e.g., to ensure an older
      log recovery item is not replayed over a newer target data structure).
      
      While this works correctly from the point a filesystem is formatted and
      mounted, userspace tools have some problematic behaviors that defeat
      this mechanism. For example, xfs_repair historically zeroes out the log
      unconditionally (regardless of whether corruption is detected). If this
      occurs, the LSN of the filesystem is reset and the log is now in a
      problematic state with respect to on-disk metadata structures that might
      have a larger LSN. Until either the log catches up to the highest
      previously used metadata LSN or each affected data structure is modified
      and written out without incident (which resets the metadata LSN), log
      recovery is susceptible to filesystem corruption.
      
      This problem is ultimately addressed and repaired in the associated
      userspace tools. The kernel is still responsible to detect the problem
      and notify the user that something is wrong. Check the superblock LSN at
      mount time and fail the mount if it is invalid. From that point on,
      trigger verifier failure on any metadata I/O where an invalid LSN is
      detected. This results in a filesystem shutdown and guarantees that we
      do not log metadata changes with invalid LSNs on disk. Since this is a
      known issue with a known recovery path, present a warning to instruct
      the user how to recover.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      a45086e2
    • Brian Foster's avatar
      xfs: log local to remote symlink conversions correctly on v5 supers · b7cdc66b
      Brian Foster authored
      A local format symlink inode is converted to extent format when an
      extended attribute is set on an inode as part of the attribute fork
      creation. This means a block is allocated, the local symlink target name
      is copied to the block and the block is logged. Currently,
      xfs_bmap_local_to_extents() handles logging the remote block data based
      on the size of the data fork prior to the conversion. This is not
      correct on v5 superblock filesystems, which add an additional header to
      remote symlink blocks that is nonexistent in local format inodes.
      
      As a result, the full length of the remote symlink block content is not
      logged. This can lead to corruption should a crash occur and log
      recovery replay this transaction.
      
      Since a callout is already used to initialize the new remote symlink
      block, update the local-to-extents conversion mechanism to make the
      callout also responsible for logging the block. It is already required
      to set the log buffer type and format the block appropriately based on
      the superblock version. This ensures the remote symlink is always logged
      correctly. Note that xfs_bmap_local_to_extents() is only called for
      symlinks so there are no other callouts that require modification.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      b7cdc66b
    • Brian Foster's avatar
      xfs: add missing ilock around dio write last extent alignment · 009c6e87
      Brian Foster authored
      The iomap codepath (via get_blocks()) acquires and release the inode
      lock in the case of a direct write that requires block allocation. This
      is because xfs_iomap_write_direct() allocates a transaction, which means
      the ilock must be dropped and reacquired after the transaction is
      allocated and reserved.
      
      xfs_iomap_write_direct() invokes xfs_iomap_eof_align_last_fsb() before
      the transaction is created and thus before the ilock is reacquired. This
      can lead to calls to xfs_iread_extents() and reads of the in-core extent
      list without any synchronization (via xfs_bmap_eof() and
      xfs_bmap_last_extent()). xfs_iread_extents() assert fails if the ilock
      is not held, but this is not currently seen in practice as the current
      callers had already invoked xfs_bmapi_read().
      
      What has been seen in practice are reports of crashes down in the
      xfs_bmap_eof() codepath on direct writes due to seemingly bogus pointer
      references from xfs_iext_get_ext(). While an explicit reproducer is not
      currently available to confirm the cause of the problem, crash analysis
      and code inspection from David Jeffrey had identified the insufficient
      locking.
      
      xfs_iomap_eof_align_last_fsb() is called from other contexts with the
      inode lock already held, so we cannot acquire it therein.
      __xfs_get_blocks() acquires and drops the ilock with variable flags to
      cover the event that the extent list must be read in. The common case is
      that __xfs_get_blocks() acquires the shared ilock. To provide locking
      around the last extent alignment call without adding more lock cycles to
      the dio path, update xfs_iomap_write_direct() to expect the shared ilock
      held on entry and do the extent alignment under its protection. Demote
      the lock, if necessary, from __xfs_get_blocks() and push the
      xfs_qm_dqattach() call outside of the shared lock critical section.
      Also, add an assert to document that the extent list is always expected
      to be present in this path. Otherwise, we risk a call to
      xfs_iread_extents() while under the shared ilock. This is safe as all
      current callers have executed an xfs_bmapi_read() call under the current
      iolock context.
      Reported-by: default avatarDavid Jeffery <djeffery@redhat.com>
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      009c6e87
    • Zhaohongjiang's avatar
      cancel the setfilesize transation when io error happen · 5cb13dcd
      Zhaohongjiang authored
      When I ran xfstest/073 case, the remount process was blocked to wait
      transactions to be zero. I found there was a io error happened, and
      the setfilesize transaction was not released properly. We should add
      the changes to cancel the io error in this case.
      
      Reproduction steps:
      1. dd if=/dev/zero of=xfs1.img bs=1M count=2048
      2. mkfs.xfs xfs1.img
      3. losetup -f ./xfs1.img /dev/loop0
      4. mount -t xfs /dev/loop0 /home/test_dir/
      5. mkdir /home/test_dir/test
      6. mkfs.xfs -dfile,name=image,size=2g
      7. mount -t xfs -o loop image /home/test_dir/test
      8. cp a file bigger than 2g to /home/test_dir/test
      9. mount -t xfs -o remount,ro /home/test_dir/test
      
      [ dchinner: moved io error detection to xfs_setfilesize_ioend() after
        transaction context restoration. ]
      Signed-off-by: default avatarZhao Hongjiang <zhaohongjiang@huawei.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      5cb13dcd
  2. 11 Oct, 2015 5 commits
  3. 20 Sep, 2015 10 commits
    • Linus Torvalds's avatar
      Linux 4.3-rc2 · 1f93e4a9
      Linus Torvalds authored
      1f93e4a9
    • Linus Torvalds's avatar
      Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm · 99bc7215
      Linus Torvalds authored
      Pull ARM fixes from Russell King:
       "Three fixes and a resulting cleanup for -rc2:
      
         - Andre Przywara reported that he was seeing a warning with the new
           cast inside DMA_ERROR_CODE's definition, and fixed the incorrect
           use.
      
         - Doug Anderson noticed that kgdb causes a "scheduling while atomic"
           bug.
      
         - OMAP5 folk noticed that their Thumb-2 compiled X servers crashed
           when enabling support to cover ARMv6 CPUs due to a kernel bug
           leaking some conditional context into the signal handler"
      
      * 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
        ARM: 8425/1: kgdb: Don't try to stop the machine when setting breakpoints
        ARM: 8437/1: dma-mapping: fix build warning with new DMA_ERROR_CODE definition
        ARM: get rid of needless #if in signal handling code
        ARM: fix Thumb2 signal handling when ARMv6 is enabled
      99bc7215
    • Linus Torvalds's avatar
      Merge tag 'linux-kselftest-4.3-rc2' of... · 30ec5682
      Linus Torvalds authored
      Merge tag 'linux-kselftest-4.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
      
      Pull kselftest fixes from Shuah Khan:
       "This update contains 7 fixes for problems ranging from build failurs
        to incorrect error reporting"
      
      * tag 'linux-kselftest-4.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
        selftests: exec: revert to default emit rule
        selftests: change install command to rsync
        selftests: mqueue: simplify the Makefile
        selftests: mqueue: allow extra cflags
        selftests: rename jump label to static_keys
        selftests/seccomp: add support for s390
        seltests/zram: fix syntax error
      30ec5682
    • Linus Torvalds's avatar
      Merge tag 'pm+acpi-4.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 009884f3
      Linus Torvalds authored
      Pull power management and ACPI updates from Rafael Wysocki:
       "Included are: a somewhat late devfreq update which however is mostly
        fixes and cleanups with one new thing only (the PPMUv2 support on
        Exynos5433), an ACPI cpufreq driver fixup and two ACPI core cleanups
        related to preprocessor directives.
      
        Specifics:
      
         - Fix a memory allocation size in the devfreq core (Xiaolong Ye).
      
         - Fix a mistake in the exynos-ppmu DT binding (Javier Martinez
           Canillas).
      
         - Add support for PPMUv2 ((Platform Performance Monitoring Unit
           version 2.0) on the Exynos5433 SoCs (Chanwoo Choi).
      
         - Fix a type casting bug in the Exynos PPMU code (MyungJoo Ham).
      
         - Assorted devfreq code cleanups and optimizations (Javi Merino,
           MyungJoo Ham, Viresh Kumar).
      
         - Fix up the ACPI cpufreq driver to use a more lightweight way to get
           to its private data in the ->get() callback (Rafael J Wysocki).
      
         - Fix a CONFIG_ prefix bug in one of the ACPI drivers and make the
           ACPI subsystem use IS_ENABLED() instead of #ifdefs in function
           bodies (Sudeep Holla)"
      
      * tag 'pm+acpi-4.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        cpufreq: acpi-cpufreq: Use cpufreq_cpu_get_raw() in ->get()
        ACPI: Eliminate CONFIG_.*{, _MODULE} #ifdef in favor of IS_ENABLED()
        ACPI: int340x_thermal: add missing CONFIG_ prefix
        PM / devfreq: Fix incorrect type issue.
        PM / devfreq: tegra: Update governor to use devfreq_update_stats()
        PM / devfreq: comments for get_dev_status usage updated
        PM / devfreq: drop comment about thermal setting max_freq
        PM / devfreq: cache the last call to get_dev_status()
        PM / devfreq: Drop unlikely before IS_ERR(_OR_NULL)
        PM / devfreq: exynos-ppmu: bit-wise operation bugfix.
        PM / devfreq: exynos-ppmu: Update documentation to support PPMUv2
        PM / devfreq: exynos-ppmu: Add the support of PPMUv2 for Exynos5433
        PM / devfreq: event: Remove incorrect property in exynos-ppmu DT binding
      009884f3
    • Linus Torvalds's avatar
      Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · d590b2d4
      Linus Torvalds authored
      Pull clk fixes from Stephen Boyd:
       "A few driver fixes for tegra, rockchip, and st SoCs and a two-liner in
        the framework to avoid oops when get_parent ops return out of range
        values on tegra platforms"
      
      * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
        drivers: clk: st: Rename st_pll3200c32_407_c0_x into st_pll3200c32_cx_x
        clk: check for invalid parent index of orphans in __clk_init()
        clk: tegra: dfll: Properly protect OPP list
        clk: rockchip: add critical clock for rk3368
      d590b2d4
    • Linus Torvalds's avatar
      Merge tag 'led-fixes-for-v4.3-rc2' of... · e6827baf
      Linus Torvalds authored
      Merge tag 'led-fixes-for-v4.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds
      
      Pull LED fixes from Jacek Anaszewski:
       - fix module autoload for six OF platform drivers (aat1290, bcm6328,
         bcm6358, ktd2692, max77693, ns2)
       - aat1290: add missing static modifier
       - ipaq-micro: add missing LEDS_CLASS dependency
       - lp55xx: correct Kconfig dependecy for f/w user helper
      
      * tag 'led-fixes-for-v4.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds:
        leds:lp55xx: Correct Kconfig dependency for f/w user helper
        leds: leds-ipaq-micro: Add LEDS_CLASS dependency
        leds: aat1290: add 'static' modifier to init_mm_current_scale
        leds: leds-ns2: Fix module autoload for OF platform driver
        leds: max77693: Fix module autoload for OF platform driver
        leds: ktd2692: Fix module autoload for OF platform driver
        leds: bcm6358: Fix module autoload for OF platform driver
        leds: bcm6328: Fix module autoload for OF platform driver
        leds: aat1290: Fix module autoload for OF platform driver
      e6827baf
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma · dc847d5b
      Linus Torvalds authored
      Pull rdma fixes from Doug Ledford:
       "The new hfi1 driver in staging/rdma has had a number of fixup patches
        since being added to the tree.  This is the first batch of those fixes"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
        IB/hfi: Properly set permissions for user device files
        IB/hfi1: mask vs shift confusion
        IB/hfi1: clean up some defines
        IB/hfi1: info leak in get_ctxt_info()
        IB/hfi1: fix a locking bug
        IB/hfi1: checking for NULL instead of IS_ERR
        IB/hfi1: fix sdma_descq_cnt parameter parsing
        IB/hfi1: fix copy_to/from_user() error handling
        IB/hfi1: fix pstateinfo from returning improperly byteswapped value
      dc847d5b
    • Linus Torvalds's avatar
      Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · 2673ee56
      Linus Torvalds authored
      Pull libnvdimm fixes from Dan Williams:
      
       - a boot regression (since v4.2) fix for some ARM configurations from
         Tyler
      
       - regression (since v4.1) fixes for mkfs.xfs on a DAX enabled device
         from Jeff.  These are tagged for -stable.
      
       - a pair of locking fixes from Axel that are hidden from lockdep since
         they involve device_lock().  The "btt" one is tagged for -stable, the
         other only applies to the new "pfn" mechanism in v4.3.
      
       - a fix for the pmem ->rw_page() path to use wmb_pmem() from Ross.
      
      * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        mm: fix type cast in __pfn_to_phys()
        pmem: add proper fencing to pmem_rw_page()
        libnvdimm: pfn_devs: Fix locking in namespace_store
        libnvdimm: btt_devs: Fix locking in namespace_store
        blockdev: don't set S_DAX for misaligned partitions
        dax: fix O_DIRECT I/O to the last block of a blockdev
      2673ee56
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 133bb595
      Linus Torvalds authored
      Pull block updates from Jens Axboe:
       "This is a bit bigger than it should be, but I could (did) not want to
        send it off last week due to both wanting extra testing, and expecting
        a fix for the bounce regression as well.  In any case, this contains:
      
         - Fix for the blk-merge.c compilation warning on gcc 5.x from me.
      
         - A set of back/front SG gap merge fixes, from me and from Sagi.
           This ensures that we honor SG gapping for integrity payloads as
           well.
      
         - Two small fixes for null_blk from Matias, fixing a leak and a
           capacity propagation issue.
      
         - A blkcg fix from Tejun, fixing a NULL dereference.
      
         - A fast clone optimization from Ming, fixing a performance
           regression since the arbitrarily sized bio's were introduced.
      
         - Also from Ming, a regression fix for bouncing IOs"
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        block: fix bounce_end_io
        block: blk-merge: fast-clone bio when splitting rw bios
        block: blkg_destroy_all() should clear q->root_blkg and ->root_rl.blkg
        block: Copy a user iovec if it includes gaps
        block: Refuse adding appending a gapped integrity page to a bio
        block: Refuse request/bio merges with gaps in the integrity payload
        block: Check for gaps on front and back merges
        null_blk: fix wrong capacity when bs is not 512 bytes
        null_blk: fix memory leak on cleanup
        block: fix bogus compiler warnings in blk-merge.c
      133bb595
    • Chris Mason's avatar
      fs-writeback: unplug before cond_resched in writeback_sb_inodes · 590dca3a
      Chris Mason authored
      Commit 505a666e ("writeback: plug writeback in wb_writeback() and
      writeback_inodes_wb()") has us holding a plug during writeback_sb_inodes,
      which increases the merge rate when relatively contiguous small files
      are written by the filesystem.  It helps both on flash and spindles.
      
      For an fs_mark workload creating 4K files in parallel across 8 drives,
      this commit improves performance ~9% more by unplugging before calling
      cond_resched().  cond_resched() doesn't trigger an implicit unplug, so
      explicitly getting the IO down to the device before scheduling reduces
      latencies for anyone waiting on clean pages.
      
      It also cuts down on how often we use kblockd to unplug, which means
      less work bouncing from one workqueue to another.
      
      Many more details about how we got here:
      
        https://lkml.org/lkml/2015/9/11/570Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      590dca3a
  4. 19 Sep, 2015 1 commit
  5. 18 Sep, 2015 11 commits