1. 03 Nov, 2015 6 commits
    • Dave Chinner's avatar
      xfs: optimise away log forces on timestamp updates for fdatasync · fc0561ce
      Dave Chinner authored
      xfs: timestamp updates cause excessive fdatasync log traffic
      
      Sage Weil reported that a ceph test workload was writing to the
      log on every fdatasync during an overwrite workload. Event tracing
      showed that the only metadata modification being made was the
      timestamp updates during the write(2) syscall, but fdatasync(2)
      is supposed to ignore them. The key observation was that the
      transactions in the log all looked like this:
      
      INODE: #regs: 4   ino: 0x8b  flags: 0x45   dsize: 32
      
      And contained a flags field of 0x45 or 0x85, and had data and
      attribute forks following the inode core. This means that the
      timestamp updates were triggering dirty relogging of previously
      logged parts of the inode that hadn't yet been flushed back to
      disk.
      
      There are two parts to this problem. The first is that XFS relogs
      dirty regions in subsequent transactions, so it carries around the
      fields that have been dirtied since the last time the inode was
      written back to disk, not since the last time the inode was forced
      into the log.
      
      The second part is that on v5 filesystems, the inode change count
      update during inode dirtying also sets the XFS_ILOG_CORE flag, so
      on v5 filesystems this makes a timestamp update dirty the entire
      inode.
      
      As a result when fdatasync is run, it looks at the dirty fields in
      the inode, and sees more than just the timestamp flag, even though
      the only metadata change since the last fdatasync was just the
      timestamps. Hence we force the log on every subsequent fdatasync
      even though it is not needed.
      
      To fix this, add a new field to the inode log item that tracks
      changes since the last time fsync/fdatasync forced the log to flush
      the changes to the journal. This flag is updated when we dirty the
      inode, but we do it before updating the change count so it does not
      carry the "core dirty" flag from timestamp updates. The fields are
      zeroed when the inode is marked clean (due to writeback/freeing) or
      when an fsync/datasync forces the log. Hence if we only dirty the
      timestamps on the inode between fsync/fdatasync calls, the fdatasync
      will not trigger another log force.
      
      Over 100 runs of the test program:
      
      Ext4 baseline:
      	runtime: 1.63s +/- 0.24s
      	avg lat: 1.59ms +/- 0.24ms
      	iops: ~2000
      
      XFS, vanilla kernel:
              runtime: 2.45s +/- 0.18s
      	avg lat: 2.39ms +/- 0.18ms
      	log forces: ~400/s
      	iops: ~1000
      
      XFS, patched kernel:
              runtime: 1.49s +/- 0.26s
      	avg lat: 1.46ms +/- 0.25ms
      	log forces: ~30/s
      	iops: ~1500
      Reported-by: default avatarSage Weil <sage@redhat.com>
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      fc0561ce
    • Darrick J. Wong's avatar
      xfs: don't leak uuid table on rmmod · af3b6382
      Darrick J. Wong authored
      Don't leak the UUID table when the module is unloaded.
      (Found with kmemleak.)
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      af3b6382
    • Andreas Gruenbacher's avatar
      xfs: invalidate cached acl if set via ioctl · 47e1bf64
      Andreas Gruenbacher authored
      Setting or removing the "SGI_ACL_[FILE|DEFAULT]" attributes via the
      XFS_IOC_ATTRMULTI_BY_HANDLE ioctl completely bypasses the POSIX ACL
      infrastructure, like setting the "trusted.SGI_ACL_[FILE|DEFAULT]" xattrs
      did until commit 6caa1056.  Similar to that commit, invalidate cached
      acls when setting/removing them via the ioctl as well.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      47e1bf64
    • Andreas Gruenbacher's avatar
      xfs: Plug memory leak in xfs_attrmulti_attr_set · 09cb22d2
      Andreas Gruenbacher authored
      When setting attributes via XFS_IOC_ATTRMULTI_BY_HANDLE, the user-space
      buffer is copied into a new kernel-space buffer via memdup_user; that
      buffer then isn't freed.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      09cb22d2
    • Andreas Gruenbacher's avatar
      xfs: Validate the length of on-disk ACLs · 86a21c79
      Andreas Gruenbacher authored
      In xfs_acl_from_disk, instead of trusting that xfs_acl.acl_cnt is correct,
      make sure that the length of the attributes is correct as well.  Also, turn
      the aclp parameter into a const pointer.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      86a21c79
    • Brian Foster's avatar
      xfs: invalidate cached acl if set directly via xattr · 67d8e04e
      Brian Foster authored
      ACLs are stored as extended attributes of the inode to which they apply.
      XFS converts the standard "system.posix_acl_[access|default]" attribute
      names used to control ACLs to "trusted.SGI_ACL_[FILE|DEFAULT]" as stored
      on-disk. These xattrs are directly exposed in on-disk format via
      getxattr/setxattr, without any ACL aware code in the path to perform
      validation, etc. This is partly historical and supports backup/restore
      applications such as xfsdump to back up and restore the binary blob that
      represents ACLs as-is.
      
      Andreas reports that the ACLs observed via the getfacl interface is not
      consistent when ACLs are set directly via the setxattr path. This occurs
      because the ACLs are cached in-core against the inode and the xattr path
      has no knowledge that the operation relates to ACLs.
      
      Update the xattr set codepath to trap writes of the special XFS ACL
      attributes and invalidate the associated cached ACL when this occurs.
      This ensures that the correct ACLs are used on a subsequent operation
      through the actual ACL interface.
      
      Note that this does not update or add support for setting the ACL xattrs
      directly beyond the restore use case that requires a correctly formatted
      binary blob and to restore a consistent i_mode at the same time. It is
      still possible for a root user to set an invalid or inconsistent (with
      i_mode) ACL blob on-disk and potentially cause corruption.
      
      [ With fixes from Andreas Gruenbacher. ]
      Reported-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      67d8e04e
  2. 02 Nov, 2015 1 commit
  3. 20 Sep, 2015 10 commits
    • Linus Torvalds's avatar
      Linux 4.3-rc2 · 1f93e4a9
      Linus Torvalds authored
      1f93e4a9
    • Linus Torvalds's avatar
      Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm · 99bc7215
      Linus Torvalds authored
      Pull ARM fixes from Russell King:
       "Three fixes and a resulting cleanup for -rc2:
      
         - Andre Przywara reported that he was seeing a warning with the new
           cast inside DMA_ERROR_CODE's definition, and fixed the incorrect
           use.
      
         - Doug Anderson noticed that kgdb causes a "scheduling while atomic"
           bug.
      
         - OMAP5 folk noticed that their Thumb-2 compiled X servers crashed
           when enabling support to cover ARMv6 CPUs due to a kernel bug
           leaking some conditional context into the signal handler"
      
      * 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
        ARM: 8425/1: kgdb: Don't try to stop the machine when setting breakpoints
        ARM: 8437/1: dma-mapping: fix build warning with new DMA_ERROR_CODE definition
        ARM: get rid of needless #if in signal handling code
        ARM: fix Thumb2 signal handling when ARMv6 is enabled
      99bc7215
    • Linus Torvalds's avatar
      Merge tag 'linux-kselftest-4.3-rc2' of... · 30ec5682
      Linus Torvalds authored
      Merge tag 'linux-kselftest-4.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
      
      Pull kselftest fixes from Shuah Khan:
       "This update contains 7 fixes for problems ranging from build failurs
        to incorrect error reporting"
      
      * tag 'linux-kselftest-4.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
        selftests: exec: revert to default emit rule
        selftests: change install command to rsync
        selftests: mqueue: simplify the Makefile
        selftests: mqueue: allow extra cflags
        selftests: rename jump label to static_keys
        selftests/seccomp: add support for s390
        seltests/zram: fix syntax error
      30ec5682
    • Linus Torvalds's avatar
      Merge tag 'pm+acpi-4.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 009884f3
      Linus Torvalds authored
      Pull power management and ACPI updates from Rafael Wysocki:
       "Included are: a somewhat late devfreq update which however is mostly
        fixes and cleanups with one new thing only (the PPMUv2 support on
        Exynos5433), an ACPI cpufreq driver fixup and two ACPI core cleanups
        related to preprocessor directives.
      
        Specifics:
      
         - Fix a memory allocation size in the devfreq core (Xiaolong Ye).
      
         - Fix a mistake in the exynos-ppmu DT binding (Javier Martinez
           Canillas).
      
         - Add support for PPMUv2 ((Platform Performance Monitoring Unit
           version 2.0) on the Exynos5433 SoCs (Chanwoo Choi).
      
         - Fix a type casting bug in the Exynos PPMU code (MyungJoo Ham).
      
         - Assorted devfreq code cleanups and optimizations (Javi Merino,
           MyungJoo Ham, Viresh Kumar).
      
         - Fix up the ACPI cpufreq driver to use a more lightweight way to get
           to its private data in the ->get() callback (Rafael J Wysocki).
      
         - Fix a CONFIG_ prefix bug in one of the ACPI drivers and make the
           ACPI subsystem use IS_ENABLED() instead of #ifdefs in function
           bodies (Sudeep Holla)"
      
      * tag 'pm+acpi-4.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        cpufreq: acpi-cpufreq: Use cpufreq_cpu_get_raw() in ->get()
        ACPI: Eliminate CONFIG_.*{, _MODULE} #ifdef in favor of IS_ENABLED()
        ACPI: int340x_thermal: add missing CONFIG_ prefix
        PM / devfreq: Fix incorrect type issue.
        PM / devfreq: tegra: Update governor to use devfreq_update_stats()
        PM / devfreq: comments for get_dev_status usage updated
        PM / devfreq: drop comment about thermal setting max_freq
        PM / devfreq: cache the last call to get_dev_status()
        PM / devfreq: Drop unlikely before IS_ERR(_OR_NULL)
        PM / devfreq: exynos-ppmu: bit-wise operation bugfix.
        PM / devfreq: exynos-ppmu: Update documentation to support PPMUv2
        PM / devfreq: exynos-ppmu: Add the support of PPMUv2 for Exynos5433
        PM / devfreq: event: Remove incorrect property in exynos-ppmu DT binding
      009884f3
    • Linus Torvalds's avatar
      Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · d590b2d4
      Linus Torvalds authored
      Pull clk fixes from Stephen Boyd:
       "A few driver fixes for tegra, rockchip, and st SoCs and a two-liner in
        the framework to avoid oops when get_parent ops return out of range
        values on tegra platforms"
      
      * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
        drivers: clk: st: Rename st_pll3200c32_407_c0_x into st_pll3200c32_cx_x
        clk: check for invalid parent index of orphans in __clk_init()
        clk: tegra: dfll: Properly protect OPP list
        clk: rockchip: add critical clock for rk3368
      d590b2d4
    • Linus Torvalds's avatar
      Merge tag 'led-fixes-for-v4.3-rc2' of... · e6827baf
      Linus Torvalds authored
      Merge tag 'led-fixes-for-v4.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds
      
      Pull LED fixes from Jacek Anaszewski:
       - fix module autoload for six OF platform drivers (aat1290, bcm6328,
         bcm6358, ktd2692, max77693, ns2)
       - aat1290: add missing static modifier
       - ipaq-micro: add missing LEDS_CLASS dependency
       - lp55xx: correct Kconfig dependecy for f/w user helper
      
      * tag 'led-fixes-for-v4.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds:
        leds:lp55xx: Correct Kconfig dependency for f/w user helper
        leds: leds-ipaq-micro: Add LEDS_CLASS dependency
        leds: aat1290: add 'static' modifier to init_mm_current_scale
        leds: leds-ns2: Fix module autoload for OF platform driver
        leds: max77693: Fix module autoload for OF platform driver
        leds: ktd2692: Fix module autoload for OF platform driver
        leds: bcm6358: Fix module autoload for OF platform driver
        leds: bcm6328: Fix module autoload for OF platform driver
        leds: aat1290: Fix module autoload for OF platform driver
      e6827baf
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma · dc847d5b
      Linus Torvalds authored
      Pull rdma fixes from Doug Ledford:
       "The new hfi1 driver in staging/rdma has had a number of fixup patches
        since being added to the tree.  This is the first batch of those fixes"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
        IB/hfi: Properly set permissions for user device files
        IB/hfi1: mask vs shift confusion
        IB/hfi1: clean up some defines
        IB/hfi1: info leak in get_ctxt_info()
        IB/hfi1: fix a locking bug
        IB/hfi1: checking for NULL instead of IS_ERR
        IB/hfi1: fix sdma_descq_cnt parameter parsing
        IB/hfi1: fix copy_to/from_user() error handling
        IB/hfi1: fix pstateinfo from returning improperly byteswapped value
      dc847d5b
    • Linus Torvalds's avatar
      Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · 2673ee56
      Linus Torvalds authored
      Pull libnvdimm fixes from Dan Williams:
      
       - a boot regression (since v4.2) fix for some ARM configurations from
         Tyler
      
       - regression (since v4.1) fixes for mkfs.xfs on a DAX enabled device
         from Jeff.  These are tagged for -stable.
      
       - a pair of locking fixes from Axel that are hidden from lockdep since
         they involve device_lock().  The "btt" one is tagged for -stable, the
         other only applies to the new "pfn" mechanism in v4.3.
      
       - a fix for the pmem ->rw_page() path to use wmb_pmem() from Ross.
      
      * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        mm: fix type cast in __pfn_to_phys()
        pmem: add proper fencing to pmem_rw_page()
        libnvdimm: pfn_devs: Fix locking in namespace_store
        libnvdimm: btt_devs: Fix locking in namespace_store
        blockdev: don't set S_DAX for misaligned partitions
        dax: fix O_DIRECT I/O to the last block of a blockdev
      2673ee56
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 133bb595
      Linus Torvalds authored
      Pull block updates from Jens Axboe:
       "This is a bit bigger than it should be, but I could (did) not want to
        send it off last week due to both wanting extra testing, and expecting
        a fix for the bounce regression as well.  In any case, this contains:
      
         - Fix for the blk-merge.c compilation warning on gcc 5.x from me.
      
         - A set of back/front SG gap merge fixes, from me and from Sagi.
           This ensures that we honor SG gapping for integrity payloads as
           well.
      
         - Two small fixes for null_blk from Matias, fixing a leak and a
           capacity propagation issue.
      
         - A blkcg fix from Tejun, fixing a NULL dereference.
      
         - A fast clone optimization from Ming, fixing a performance
           regression since the arbitrarily sized bio's were introduced.
      
         - Also from Ming, a regression fix for bouncing IOs"
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        block: fix bounce_end_io
        block: blk-merge: fast-clone bio when splitting rw bios
        block: blkg_destroy_all() should clear q->root_blkg and ->root_rl.blkg
        block: Copy a user iovec if it includes gaps
        block: Refuse adding appending a gapped integrity page to a bio
        block: Refuse request/bio merges with gaps in the integrity payload
        block: Check for gaps on front and back merges
        null_blk: fix wrong capacity when bs is not 512 bytes
        null_blk: fix memory leak on cleanup
        block: fix bogus compiler warnings in blk-merge.c
      133bb595
    • Chris Mason's avatar
      fs-writeback: unplug before cond_resched in writeback_sb_inodes · 590dca3a
      Chris Mason authored
      Commit 505a666e ("writeback: plug writeback in wb_writeback() and
      writeback_inodes_wb()") has us holding a plug during writeback_sb_inodes,
      which increases the merge rate when relatively contiguous small files
      are written by the filesystem.  It helps both on flash and spindles.
      
      For an fs_mark workload creating 4K files in parallel across 8 drives,
      this commit improves performance ~9% more by unplugging before calling
      cond_resched().  cond_resched() doesn't trigger an implicit unplug, so
      explicitly getting the IO down to the device before scheduling reduces
      latencies for anyone waiting on clean pages.
      
      It also cuts down on how often we use kblockd to unplug, which means
      less work bouncing from one workqueue to another.
      
      Many more details about how we got here:
      
        https://lkml.org/lkml/2015/9/11/570Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      590dca3a
  4. 19 Sep, 2015 1 commit
  5. 18 Sep, 2015 22 commits