Commits · 8a56d7c305b9613dfe416fd1af06871ec34bb103 · nexedi / linux

12 Oct, 2015 13 commits

Merge branch 'xfs-io-fixes' into for-next · 8a56d7c3
Dave Chinner authored Oct 12, 2015

8a56d7c3
Merge branch 'xfs-logging-fixes' into for-next · 316433be
Dave Chinner authored Oct 12, 2015

316433be

xfs: simplify /proc teardown & error handling · 9e92054e

Eric Sandeen authored Oct 12, 2015

remove_proc_subtree() was added in 3.9, and can be
used to simplify our procfile creation error handling
and cleanup, removing the nested gotos.  It simply
removes fs/xfs and everything created under it.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

9e92054e

xfs: per-filesystem stats counter implementation · ff6d6af2

Bill O'Donnell authored Oct 12, 2015

This patch modifies the stats counting macros and the callers
to those macros to properly increment, decrement, and add-to
the xfs stats counts. The counts for global and per-fs stats
are correctly advanced, and cleared by writing a "1" to the
corresponding clear file.

global counts: /sys/fs/xfs/stats/stats
per-fs counts: /sys/fs/xfs/sda*/stats/stats

global clear:  /sys/fs/xfs/stats/stats_clear
per-fs clear:  /sys/fs/xfs/sda*/stats/stats_clear

[dchinner: cleaned up macro variables, removed CONFIG_FS_PROC around
 stats structures and macros. ]
Signed-off-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

ff6d6af2

xfs: per-filesystem stats in sysfs · 225e4635

Bill O'Donnell authored Oct 12, 2015

This patch implements per-filesystem stats objects in sysfs. It
depends on the application of the previous patch series that
develops the infrastructure to support both xfs global stats and
xfs per-fs stats in sysfs.

Stats objects are instantiated when an xfs filesystem is mounted
and deleted on unmount. With this patch, the stats directory is
created and populated with the familiar stats and stats_clear files.
Example:
        /sys/fs/xfs/sda9/stats/stats
        /sys/fs/xfs/sda9/stats/stats_clear

With this patch, the individual counts within the new per-fs
stats file(s) remain at zero. Functions that use the the macros
to increment, decrement, and add-to the per-fs stats counts will
be covered in a separate new patch to follow this one. Note that
the counts within the global stats file (/sys/fs/xfs/stats/stats)
advance normally and can be cleared as it was prior to this patch.

[dchinner: move setup/teardown to xfs_fs_{fill|put}_super() so
it is down before/after any path that uses the per-mount stats. ]
Signed-off-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

225e4635

xfs: avoid null *src in memcpy call in xlog_write · 91f9f5fe

Eric Sandeen authored Oct 12, 2015

The gcc undefined behavior sanitizer caught this; surely
any sane memcpy implementation will no-op if size == 0,
but behavior with a *src of NULL is technically undefined
(declared nonnull), so avoid it here.

We are actually in this situation frequently via
xlog_commit_record(), because:

        struct xfs_log_iovec reg = {
                .i_addr = NULL,
                .i_len = 0,
                .i_type = XLOG_REG_TYPE_COMMIT,
        };
Reported-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

91f9f5fe

xfs: pass total block res. as total xfs_bmapi_write() parameter · dbd5c8c9

Brian Foster authored Oct 12, 2015

The total field from struct xfs_alloc_arg is a bit of an unknown
commodity. It is documented as the total block requirement for the
transaction and is used in this manner from most call sites by virtue of
passing the total block reservation of the transaction associated with
an allocation. Several xfs_bmapi_write() callers pass hardcoded values
of 0 or 1 for the total block requirement, which is a historical oddity
without any clear reasoning.

The xfs_iomap_write_direct() caller, for example, passes 0 for the total
block requirement. This has been determined to cause problems in the
form of ABBA deadlocks of AGF buffers due to incorrect AG selection in
the block allocator. Specifically, the xfs_alloc_space_available()
function incorrectly selects an AG that doesn't actually have sufficient
space for the allocation. This occurs because the args.total field is 0
and thus the remaining free space check on the AG doesn't actually
consider the size of the allocation request. This locks the AGF buffer,
the allocation attempt proceeds and ultimately fails (in
xfs_alloc_fix_minleft()), and xfs_alloc_vexent() moves on to the next
AG. In turn, this can lead to incorrect AG locking order (if the
allocator wraps around, attempting to lock AG 0 after acquiring AG N)
and thus deadlock if racing with another operation. This problem has
been reproduced via generic/299 on smallish (1GB) ramdisk test devices.

To avoid this problem, replace the undocumented hardcoded total
parameters from the iomap and utility callers to pass the block
reservation used for the associated transaction. This is consistent with
other xfs_bmapi_write() callers throughout XFS. The assumption is that
the total field allows the selection of an AG that can handle the entire
operation rather than simply the allocation/range being requested (e.g.,
resulting btree splits, etc.). This addresses the aforementioned
generic/299 hang by ensuring AG selection only occurs when the
allocation can be satisfied by the AG.
Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

dbd5c8c9

xfs: add an xfs_zero_eof() tracepoint · 0a50f162

Brian Foster authored Oct 12, 2015

Add a tracepoint in xfs_zero_eof() to facilitate tracking and debugging
EOF zeroing events. This has proven useful in the context of other
direct I/O tracepoints to ensure EOF zeroing occurs within appropriate
file ranges.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

0a50f162

xfs: always drain dio before extending aio write submission · 3136e8bb

Brian Foster authored Oct 12, 2015

XFS supports and typically allows concurrent asynchronous direct I/O
submission to a single file. One exception to the rule is that file
extending dio writes that start beyond the current EOF (e.g.,
potentially create a hole at EOF) require exclusive I/O access to the
file. This is because such writes must zero any pre-existing blocks
beyond EOF that are exposed by virtue of now residing within EOF as a
result of the write about to be submitted.

Before EOF zeroing can occur, the current file i_size must be stabilized
to avoid data corruption. In this scenario, XFS upgrades the iolock to
exclude any further I/O submission, waits on in-flight I/O to complete
to ensure i_size is up to date (i_size is updated on dio write
completion) and restarts the various checks against the state of the
file. The problem is that this protection sequence is triggered only
when the iolock is currently held shared. While this is true for async
dio in most cases, the caller may upgrade the lock in advance based on
arbitrary circumstances with respect to EOF zeroing. For example, the
iolock is always acquired exclusively if the start offset is not block
aligned. This means that even though the iolock is already held
exclusive for such I/Os, pending I/O is not drained and thus EOF zeroing
can occur based on an unstable i_size.

This problem has been reproduced as guest data corruption in virtual
machines with file-backed qcow2 virtual disks hosted on an XFS
filesystem. The virtual disks must be configured with aio=native mode
and the must not be truncated out to the maximum file size (as some virt
managers will do).

Update xfs_file_aio_write_checks() to unconditionally drain in-flight
dio before EOF zeroing can occur. Rather than trigger the wait based on
iolock state, use a new flag and upgrade the iolock when necessary. Note
that this results in a full restart of the inode checks even when the
iolock was already held exclusive when technically it is only required
to recheck i_size. This should be a rare enough occurrence that it is
preferable to keep the code simple rather than create an alternate
restart jump target.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

3136e8bb

xfs: validate metadata LSNs against log on v5 superblocks · a45086e2

Brian Foster authored Oct 12, 2015

Since the onset of v5 superblocks, the LSN of the last modification has
been included in a variety of on-disk data structures. This LSN is used
to provide log recovery ordering guarantees (e.g., to ensure an older
log recovery item is not replayed over a newer target data structure).

While this works correctly from the point a filesystem is formatted and
mounted, userspace tools have some problematic behaviors that defeat
this mechanism. For example, xfs_repair historically zeroes out the log
unconditionally (regardless of whether corruption is detected). If this
occurs, the LSN of the filesystem is reset and the log is now in a
problematic state with respect to on-disk metadata structures that might
have a larger LSN. Until either the log catches up to the highest
previously used metadata LSN or each affected data structure is modified
and written out without incident (which resets the metadata LSN), log
recovery is susceptible to filesystem corruption.

This problem is ultimately addressed and repaired in the associated
userspace tools. The kernel is still responsible to detect the problem
and notify the user that something is wrong. Check the superblock LSN at
mount time and fail the mount if it is invalid. From that point on,
trigger verifier failure on any metadata I/O where an invalid LSN is
detected. This results in a filesystem shutdown and guarantees that we
do not log metadata changes with invalid LSNs on disk. Since this is a
known issue with a known recovery path, present a warning to instruct
the user how to recover.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

a45086e2

xfs: log local to remote symlink conversions correctly on v5 supers · b7cdc66b

Brian Foster authored Oct 12, 2015

A local format symlink inode is converted to extent format when an
extended attribute is set on an inode as part of the attribute fork
creation. This means a block is allocated, the local symlink target name
is copied to the block and the block is logged. Currently,
xfs_bmap_local_to_extents() handles logging the remote block data based
on the size of the data fork prior to the conversion. This is not
correct on v5 superblock filesystems, which add an additional header to
remote symlink blocks that is nonexistent in local format inodes.

As a result, the full length of the remote symlink block content is not
logged. This can lead to corruption should a crash occur and log
recovery replay this transaction.

Since a callout is already used to initialize the new remote symlink
block, update the local-to-extents conversion mechanism to make the
callout also responsible for logging the block. It is already required
to set the log buffer type and format the block appropriately based on
the superblock version. This ensures the remote symlink is always logged
correctly. Note that xfs_bmap_local_to_extents() is only called for
symlinks so there are no other callouts that require modification.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

b7cdc66b

xfs: add missing ilock around dio write last extent alignment · 009c6e87

Brian Foster authored Oct 12, 2015

The iomap codepath (via get_blocks()) acquires and release the inode
lock in the case of a direct write that requires block allocation. This
is because xfs_iomap_write_direct() allocates a transaction, which means
the ilock must be dropped and reacquired after the transaction is
allocated and reserved.

xfs_iomap_write_direct() invokes xfs_iomap_eof_align_last_fsb() before
the transaction is created and thus before the ilock is reacquired. This
can lead to calls to xfs_iread_extents() and reads of the in-core extent
list without any synchronization (via xfs_bmap_eof() and
xfs_bmap_last_extent()). xfs_iread_extents() assert fails if the ilock
is not held, but this is not currently seen in practice as the current
callers had already invoked xfs_bmapi_read().

What has been seen in practice are reports of crashes down in the
xfs_bmap_eof() codepath on direct writes due to seemingly bogus pointer
references from xfs_iext_get_ext(). While an explicit reproducer is not
currently available to confirm the cause of the problem, crash analysis
and code inspection from David Jeffrey had identified the insufficient
locking.

xfs_iomap_eof_align_last_fsb() is called from other contexts with the
inode lock already held, so we cannot acquire it therein.
__xfs_get_blocks() acquires and drops the ilock with variable flags to
cover the event that the extent list must be read in. The common case is
that __xfs_get_blocks() acquires the shared ilock. To provide locking
around the last extent alignment call without adding more lock cycles to
the dio path, update xfs_iomap_write_direct() to expect the shared ilock
held on entry and do the extent alignment under its protection. Demote
the lock, if necessary, from __xfs_get_blocks() and push the
xfs_qm_dqattach() call outside of the shared lock critical section.
Also, add an assert to document that the extent list is always expected
to be present in this path. Otherwise, we risk a call to
xfs_iread_extents() while under the shared ilock. This is safe as all
current callers have executed an xfs_bmapi_read() call under the current
iolock context.
Reported-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

009c6e87

cancel the setfilesize transation when io error happen · 5cb13dcd

Zhaohongjiang authored Oct 12, 2015

When I ran xfstest/073 case, the remount process was blocked to wait
transactions to be zero. I found there was a io error happened, and
the setfilesize transaction was not released properly. We should add
the changes to cancel the io error in this case.

Reproduction steps:
1. dd if=/dev/zero of=xfs1.img bs=1M count=2048
2. mkfs.xfs xfs1.img
3. losetup -f ./xfs1.img /dev/loop0
4. mount -t xfs /dev/loop0 /home/test_dir/
5. mkdir /home/test_dir/test
6. mkfs.xfs -dfile,name=image,size=2g
7. mount -t xfs -o loop image /home/test_dir/test
8. cp a file bigger than 2g to /home/test_dir/test
9. mount -t xfs -o remount,ro /home/test_dir/test

[ dchinner: moved io error detection to xfs_setfilesize_ioend() after
  transaction context restoration. ]
Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

5cb13dcd

11 Oct, 2015 5 commits

xfs: pass xfsstats structures to handlers and macros · 80529c45

Bill O'Donnell authored Oct 12, 2015

This patch is the next step toward per-fs xfs stats. The patch makes
the show and clear routines able to handle any stats structure
associated with a kobject.

Instead of a single global xfsstats structure, add kobject and a pointer
to a per-cpu struct xfsstats. Modify the macros that manipulate the stats
accordingly: XFS_STATS_INC, XFS_STATS_DEC, and XFS_STATS_ADD now access
xfsstats->xs_stats.

The sysfs functions need to get from the kobject back to the xfsstats
structure which contains it, and pass the pointer to the ->xs_stats
percpu structure into the show & clear routines.
Signed-off-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

80529c45

xfs: consolidate sysfs ops · a27c2640

Bill O'Donnell authored Oct 12, 2015

As a part of the series to move xfs global stats from procfs to sysfs,
this patch consolidates the sysfs ops functions and removes redundancy.
Signed-off-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

a27c2640

xfs: remove unused procfs code · 50cf5b74

Bill O'Donnell authored Oct 12, 2015

As a part of the work to move xfs global stats from procfs to sysfs,
this patch removes the now unused procfs code that was xfs stat specific.
Signed-off-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

50cf5b74

xfs: create symlink proc/fs/xfs/stat to sys/fs/xfs/stats · 32f0ea05

Bill O'Donnell authored Oct 12, 2015

As a part of the work to move xfs global stats from procfs to sysfs,
this patch creates the symlink from proc/fs/xfs/stat to sys/fs/xfs/stats.
Signed-off-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

32f0ea05

xfs: create global stats and stats_clear in sysfs · bb230c12

Bill O'Donnell authored Oct 12, 2015

Currently, xfs global stats are in procfs. This patch introduces
(replicates) the global stats in sysfs. Additionally a stats_clear file
is introduced in sysfs.
Signed-off-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

bb230c12

20 Sep, 2015 10 commits

Linux 4.3-rc2 · 1f93e4a9
Linus Torvalds authored Sep 20, 2015

1f93e4a9

Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm · 99bc7215

Linus Torvalds authored Sep 19, 2015

Pull ARM fixes from Russell King:
 "Three fixes and a resulting cleanup for -rc2:

   - Andre Przywara reported that he was seeing a warning with the new
     cast inside DMA_ERROR_CODE's definition, and fixed the incorrect
     use.

   - Doug Anderson noticed that kgdb causes a "scheduling while atomic"
     bug.

   - OMAP5 folk noticed that their Thumb-2 compiled X servers crashed
     when enabling support to cover ARMv6 CPUs due to a kernel bug
     leaking some conditional context into the signal handler"

* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
  ARM: 8425/1: kgdb: Don't try to stop the machine when setting breakpoints
  ARM: 8437/1: dma-mapping: fix build warning with new DMA_ERROR_CODE definition
  ARM: get rid of needless #if in signal handling code
  ARM: fix Thumb2 signal handling when ARMv6 is enabled

99bc7215

Merge tag 'linux-kselftest-4.3-rc2' of... · 30ec5682

Linus Torvalds authored Sep 19, 2015

Merge tag 'linux-kselftest-4.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest fixes from Shuah Khan:
 "This update contains 7 fixes for problems ranging from build failurs
  to incorrect error reporting"

* tag 'linux-kselftest-4.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests: exec: revert to default emit rule
  selftests: change install command to rsync
  selftests: mqueue: simplify the Makefile
  selftests: mqueue: allow extra cflags
  selftests: rename jump label to static_keys
  selftests/seccomp: add support for s390
  seltests/zram: fix syntax error

30ec5682

Merge tag 'pm+acpi-4.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 009884f3

Linus Torvalds authored Sep 19, 2015

Pull power management and ACPI updates from Rafael Wysocki:
 "Included are: a somewhat late devfreq update which however is mostly
  fixes and cleanups with one new thing only (the PPMUv2 support on
  Exynos5433), an ACPI cpufreq driver fixup and two ACPI core cleanups
  related to preprocessor directives.

  Specifics:

   - Fix a memory allocation size in the devfreq core (Xiaolong Ye).

   - Fix a mistake in the exynos-ppmu DT binding (Javier Martinez
     Canillas).

   - Add support for PPMUv2 ((Platform Performance Monitoring Unit
     version 2.0) on the Exynos5433 SoCs (Chanwoo Choi).

   - Fix a type casting bug in the Exynos PPMU code (MyungJoo Ham).

   - Assorted devfreq code cleanups and optimizations (Javi Merino,
     MyungJoo Ham, Viresh Kumar).

   - Fix up the ACPI cpufreq driver to use a more lightweight way to get
     to its private data in the ->get() callback (Rafael J Wysocki).

   - Fix a CONFIG_ prefix bug in one of the ACPI drivers and make the
     ACPI subsystem use IS_ENABLED() instead of #ifdefs in function
     bodies (Sudeep Holla)"

* tag 'pm+acpi-4.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: acpi-cpufreq: Use cpufreq_cpu_get_raw() in ->get()
  ACPI: Eliminate CONFIG_.*{, _MODULE} #ifdef in favor of IS_ENABLED()
  ACPI: int340x_thermal: add missing CONFIG_ prefix
  PM / devfreq: Fix incorrect type issue.
  PM / devfreq: tegra: Update governor to use devfreq_update_stats()
  PM / devfreq: comments for get_dev_status usage updated
  PM / devfreq: drop comment about thermal setting max_freq
  PM / devfreq: cache the last call to get_dev_status()
  PM / devfreq: Drop unlikely before IS_ERR(_OR_NULL)
  PM / devfreq: exynos-ppmu: bit-wise operation bugfix.
  PM / devfreq: exynos-ppmu: Update documentation to support PPMUv2
  PM / devfreq: exynos-ppmu: Add the support of PPMUv2 for Exynos5433
  PM / devfreq: event: Remove incorrect property in exynos-ppmu DT binding

009884f3

Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · d590b2d4

Linus Torvalds authored Sep 19, 2015

Pull clk fixes from Stephen Boyd:
 "A few driver fixes for tegra, rockchip, and st SoCs and a two-liner in
  the framework to avoid oops when get_parent ops return out of range
  values on tegra platforms"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  drivers: clk: st: Rename st_pll3200c32_407_c0_x into st_pll3200c32_cx_x
  clk: check for invalid parent index of orphans in __clk_init()
  clk: tegra: dfll: Properly protect OPP list
  clk: rockchip: add critical clock for rk3368

d590b2d4

Merge tag 'led-fixes-for-v4.3-rc2' of... · e6827baf

Linus Torvalds authored Sep 19, 2015

Merge tag 'led-fixes-for-v4.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds

Pull LED fixes from Jacek Anaszewski:
 - fix module autoload for six OF platform drivers (aat1290, bcm6328,
   bcm6358, ktd2692, max77693, ns2)
 - aat1290: add missing static modifier
 - ipaq-micro: add missing LEDS_CLASS dependency
 - lp55xx: correct Kconfig dependecy for f/w user helper

* tag 'led-fixes-for-v4.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds:
  leds:lp55xx: Correct Kconfig dependency for f/w user helper
  leds: leds-ipaq-micro: Add LEDS_CLASS dependency
  leds: aat1290: add 'static' modifier to init_mm_current_scale
  leds: leds-ns2: Fix module autoload for OF platform driver
  leds: max77693: Fix module autoload for OF platform driver
  leds: ktd2692: Fix module autoload for OF platform driver
  leds: bcm6358: Fix module autoload for OF platform driver
  leds: bcm6328: Fix module autoload for OF platform driver
  leds: aat1290: Fix module autoload for OF platform driver

e6827baf

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma · dc847d5b

Linus Torvalds authored Sep 19, 2015

Pull rdma fixes from Doug Ledford:
 "The new hfi1 driver in staging/rdma has had a number of fixup patches
  since being added to the tree.  This is the first batch of those fixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
  IB/hfi: Properly set permissions for user device files
  IB/hfi1: mask vs shift confusion
  IB/hfi1: clean up some defines
  IB/hfi1: info leak in get_ctxt_info()
  IB/hfi1: fix a locking bug
  IB/hfi1: checking for NULL instead of IS_ERR
  IB/hfi1: fix sdma_descq_cnt parameter parsing
  IB/hfi1: fix copy_to/from_user() error handling
  IB/hfi1: fix pstateinfo from returning improperly byteswapped value

dc847d5b

Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · 2673ee56

Linus Torvalds authored Sep 19, 2015

Pull libnvdimm fixes from Dan Williams:

 - a boot regression (since v4.2) fix for some ARM configurations from
   Tyler

 - regression (since v4.1) fixes for mkfs.xfs on a DAX enabled device
   from Jeff.  These are tagged for -stable.

 - a pair of locking fixes from Axel that are hidden from lockdep since
   they involve device_lock().  The "btt" one is tagged for -stable, the
   other only applies to the new "pfn" mechanism in v4.3.

 - a fix for the pmem ->rw_page() path to use wmb_pmem() from Ross.

* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  mm: fix type cast in __pfn_to_phys()
  pmem: add proper fencing to pmem_rw_page()
  libnvdimm: pfn_devs: Fix locking in namespace_store
  libnvdimm: btt_devs: Fix locking in namespace_store
  blockdev: don't set S_DAX for misaligned partitions
  dax: fix O_DIRECT I/O to the last block of a blockdev

2673ee56

Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 133bb595

Linus Torvalds authored Sep 19, 2015

Pull block updates from Jens Axboe:
 "This is a bit bigger than it should be, but I could (did) not want to
  send it off last week due to both wanting extra testing, and expecting
  a fix for the bounce regression as well.  In any case, this contains:

   - Fix for the blk-merge.c compilation warning on gcc 5.x from me.

   - A set of back/front SG gap merge fixes, from me and from Sagi.
     This ensures that we honor SG gapping for integrity payloads as
     well.

   - Two small fixes for null_blk from Matias, fixing a leak and a
     capacity propagation issue.

   - A blkcg fix from Tejun, fixing a NULL dereference.

   - A fast clone optimization from Ming, fixing a performance
     regression since the arbitrarily sized bio's were introduced.

   - Also from Ming, a regression fix for bouncing IOs"

* 'for-linus' of git://git.kernel.dk/linux-block:
  block: fix bounce_end_io
  block: blk-merge: fast-clone bio when splitting rw bios
  block: blkg_destroy_all() should clear q->root_blkg and ->root_rl.blkg
  block: Copy a user iovec if it includes gaps
  block: Refuse adding appending a gapped integrity page to a bio
  block: Refuse request/bio merges with gaps in the integrity payload
  block: Check for gaps on front and back merges
  null_blk: fix wrong capacity when bs is not 512 bytes
  null_blk: fix memory leak on cleanup
  block: fix bogus compiler warnings in blk-merge.c

133bb595

fs-writeback: unplug before cond_resched in writeback_sb_inodes · 590dca3a

Chris Mason authored Sep 18, 2015

Commit 505a666e ("writeback: plug writeback in wb_writeback() and
writeback_inodes_wb()") has us holding a plug during writeback_sb_inodes,
which increases the merge rate when relatively contiguous small files
are written by the filesystem. It helps both on flash and spindles.

For an fs_mark workload creating 4K files in parallel across 8 drives,
this commit improves performance ~9% more by unplugging before calling
cond_resched(). cond_resched() doesn't trigger an implicit unplug, so
explicitly getting the IO down to the device before scheduling reduces
latencies for anyone waiting on clean pages.

It also cuts down on how often we use kblockd to unplug, which means
less work bouncing from one workqueue to another.

Many more details about how we got here:

https://lkml.org/lkml/2015/9/11/570Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

590dca3a

19 Sep, 2015 1 commit

mm: fix type cast in __pfn_to_phys() · ae4f9769

Tyler Baker authored Sep 19, 2015

The various definitions of __pfn_to_phys() have been consolidated to
use a generic macro in include/asm-generic/memory_model.h. This hit
mainline in the form of 012dcef3 "mm: move __phys_to_pfn and
__pfn_to_phys to asm/generic/memory_model.h". When the generic macro
was implemented the type cast to phys_addr_t was dropped which caused
boot regressions on ARM platforms with more than 4GB of memory and
LPAE enabled.

It was suggested to use PFN_PHYS() defined in include/linux/pfn.h
as provides the correct logic and avoids further duplication.
Reported-by: kernelci.org bot <bot@kernelci.org>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Tyler Baker <tyler.baker@linaro.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

ae4f9769

18 Sep, 2015 11 commits

Merge branch 'acpi-bus' · 0f40314b

Rafael J. Wysocki authored Sep 18, 2015

* acpi-bus:
  ACPI: Eliminate CONFIG_.*{, _MODULE} #ifdef in favor of IS_ENABLED()
  ACPI: int340x_thermal: add missing CONFIG_ prefix

0f40314b

Merge branches 'pm-cpufreq' and 'pm-devfreq' · 7dc1d36e

Rafael J. Wysocki authored Sep 18, 2015

* pm-cpufreq:
  cpufreq: acpi-cpufreq: Use cpufreq_cpu_get_raw() in ->get()

* pm-devfreq:
  PM / devfreq: Fix incorrect type issue.
  PM / devfreq: tegra: Update governor to use devfreq_update_stats()
  PM / devfreq: comments for get_dev_status usage updated
  PM / devfreq: drop comment about thermal setting max_freq
  PM / devfreq: cache the last call to get_dev_status()
  PM / devfreq: Drop unlikely before IS_ERR(_OR_NULL)
  PM / devfreq: exynos-ppmu: bit-wise operation bugfix.
  PM / devfreq: exynos-ppmu: Update documentation to support PPMUv2
  PM / devfreq: exynos-ppmu: Add the support of PPMUv2 for Exynos5433
  PM / devfreq: event: Remove incorrect property in exynos-ppmu DT binding

7dc1d36e

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · 00ade1f5

Linus Torvalds authored Sep 18, 2015

Pull virtio fixes and cleanups from Michael Tsirkin:
 "This fixes the virtio-test tool, and improves the error handling for
  virtio-ccw"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  virtio/s390: handle failures of READ_VQ_CONF ccw
  tools/virtio: propagate V=X to kernel build
  vhost: move features to core
  tools/virtio: fix build after 4.2 changes

00ade1f5

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 3ae83945

Linus Torvalds authored Sep 18, 2015

Pull KVM fixes from Paolo Bonzini:
 "Mostly stable material, a lot of ARM fixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits)
  sched: access local runqueue directly in single_task_running
  arm/arm64: KVM: Remove 'config KVM_ARM_MAX_VCPUS'
  arm64: KVM: Remove all traces of the ThumbEE registers
  arm: KVM: Disable virtual timer even if the guest is not using it
  arm64: KVM: Disable virtual timer even if the guest is not using it
  arm/arm64: KVM: vgic: Check for !irqchip_in_kernel() when mapping resources
  KVM: s390: Replace incorrect atomic_or with atomic_andnot
  arm: KVM: Fix incorrect device to IPA mapping
  arm64: KVM: Fix user access for debug registers
  KVM: vmx: fix VPID is 0000H in non-root operation
  KVM: add halt_attempted_poll to VCPU stats
  kvm: fix zero length mmio searching
  kvm: fix double free for fast mmio eventfd
  kvm: factor out core eventfd assign/deassign logic
  kvm: don't try to register to KVM_FAST_MMIO_BUS for non mmio eventfd
  KVM: make the declaration of functions within 80 characters
  KVM: arm64: add workaround for Cortex-A57 erratum #852523
  KVM: fix polling for guest halt continued even if disable it
  arm/arm64: KVM: Fix PSCI affinity info return value for non valid cores
  arm64: KVM: set {v,}TCR_EL2 RES1 bits
  ...

3ae83945

IB/hfi: Properly set permissions for user device files · e116a64f

Ira Weiny authored Sep 17, 2015

Some of the device files are required to be user accessible for PSM while
most should remain accessible only by root.

Add a parameter to hfi1_cdev_init which controls if the user should have access
to this device which places it in a different class with the appropriate
devnode callback.

In addition set the devnode call back for the existing class to be a bit more
explicit for those permissions.

Finally remove the unnecessary null check before class_destroy
Tested-by: Donald Dutile <ddutile@redhat.com>
Signed-off-by: Haralanov, Mitko (mitko.haralanov@intel.com)
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

e116a64f

IB/hfi1: mask vs shift confusion · 7d630467

Dan Carpenter authored Sep 16, 2015

We are shifting by the _MASK macros instead of the _SHIFT ones.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

7d630467

IB/hfi1: clean up some defines · 3f2686a2

Dan Carpenter authored Sep 16, 2015

I added spaces around operators so it matches kernel style because
normally "-1ULL" is a number and " - 1" is a subtract operation.  Also
removed some superflous "ULL" types so "1ULL" becomes "1".
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

3f2686a2

IB/hfi1: info leak in get_ctxt_info() · ebe6b2e8

Dan Carpenter authored Sep 16, 2015

The cinfo struct has a hole after the last struct member so we need to
zero it out.  Otherwise we disclose some uninitialized stack data.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

ebe6b2e8

IB/hfi1: fix a locking bug · 951842b0

Dan Carpenter authored Sep 16, 2015

mutex_trylock() returns zero on failure, not EBUSY.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

951842b0

IB/hfi1: checking for NULL instead of IS_ERR · 50b19729

Dan Carpenter authored Sep 16, 2015

__get_txreq() returns an ERR_PTR() but this checks for NULL so it would
oops on failure.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

50b19729

IB/hfi1: fix sdma_descq_cnt parameter parsing · aeef010a

Mike Marciniszyn authored Sep 15, 2015

The boolean tests should have been or-ed.
Reported-by: David Binderman <dcb314@hotmail.com>
Reviewed-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

aeef010a