1. 30 Oct, 2015 1 commit
    • Roman Gushchin's avatar
      md/raid5: fix locking in handle_stripe_clean_event() · b8a9d66d
      Roman Gushchin authored
      After commit 566c09c5 ("raid5: relieve lock contention in get_active_stripe()")
      __find_stripe() is called under conf->hash_locks + hash.
      But handle_stripe_clean_event() calls remove_hash() under
      conf->device_lock.
      
      Under some cirscumstances the hash chain can be circuited,
      and we get an infinite loop with disabled interrupts and locked hash
      lock in __find_stripe(). This leads to hard lockup on multiple CPUs
      and following system crash.
      
      I was able to reproduce this behavior on raid6 over 6 ssd disks.
      The devices_handle_discard_safely option should be set to enable trim
      support. The following script was used:
      
      for i in `seq 1 32`; do
          dd if=/dev/zero of=large$i bs=10M count=100 &
      done
      
      neilb: original was against a 3.x kernel.  I forward-ported
        to 4.3-rc.  This verison is suitable for any kernel since
        Commit: 59fc630b ("RAID5: batch adjacent full stripe write")
        (v4.1+).  I'll post a version for earlier kernels to stable.
      Signed-off-by: default avatarRoman Gushchin <klamm@yandex-team.ru>
      Fixes: 566c09c5 ("raid5: relieve lock contention in get_active_stripe()")
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: <stable@vger.kernel.org> # 3.13 - 4.2
      b8a9d66d
  2. 24 Oct, 2015 3 commits
    • NeilBrown's avatar
      md/raid10: fix the 'new' raid10 layout to work correctly. · 8bce6d35
      NeilBrown authored
      In Linux 3.9 we introduce a new 'far' layout for RAID10 which was
      supposed to rotate the replicas differently and so provide better
      resilience.  In particular it could survive more combinations of 2
      drive failures.
      
      Unfortunately. due to a coding error, this some did what was wanted,
      sometimes improved less than we hoped, and sometimes - in very
      unlikely circumstances - put multiple replicas on the same device so
      the redundancy was harmed.
      
      No public user-space tool has created arrays using this layout so it
      is very unlikely that zero-redundancy arrays actually exist.  Probably
      no arrays using any form of the new layout exist.  But we cannot be
      certain.
      
      So use another bit in the 'layout' number and introduce a bug-fixed
      version of the layout.
      Also when assembling an array, if it has a zero-redundancy layout,
      give a warning.
      Reported-by: default avatarHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      8bce6d35
    • NeilBrown's avatar
      md/raid10: don't clear bitmap bit when bad-block-list write fails. · c340702c
      NeilBrown authored
      When a write fails and a bad-block-list is present, we can
      update the bad-block-list instead of writing the data.  If
      this succeeds then it is OK clear the relevant bitmap-bit as
      no further 'sync' of the block is needed.
      
      However if writing the bad-block-list fails then we need to
      treat the write as failed and particularly must not clear
      the bitmap bit.  Otherwise the device can be re-added (after
      any hardware connection issues are resolved) and because the
      relevant bit in the bitmap is clear, that block will not be
      resynced.  This leads to data corruption.
      
      We already delay the final bio_endio() on the write until
      the bad-block-list is written so that when the write
      returns: either that data is safe, the bad-block record is
      safe, or the fact that the device is faulty is safe.
      However we *don't* delay the clearing of the bitmap, so the
      bitmap bit can be recorded as cleared before we know if the
      bad-block-list was written safely.
      
      So: delay that until the write really is safe.
      i.e. move the call to close_write() until just before
      calling bio_endio(), and recheck the 'is array degraded'
      status before making that call.
      
      This bug goes back to v3.1 when bad-block-lists were
      introduced, though it only affects arrays created with
      mdadm-3.3 or later as only those have bad-block lists.
      
      Backports will require at least
      Commit: 95af587e ("md/raid10: ensure device failure recorded before write request returns.")
      as well.  I'll send that to 'stable' separately.
      
      Note that of the two tests of R10BIO_WriteError that this
      patch adds, the first is certain to fail and the second is
      certain to succeed.  However doing it this way makes the
      patch more obviously correct.  I will tidy the code up in a
      future merge window.
      Reported-by: default avatarNate Dailey <nate.dailey@stratus.com>
      Fixes: bd870a16 ("md/raid10:  Handle write errors by updating badblock log.")
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      c340702c
    • NeilBrown's avatar
      md/raid1: don't clear bitmap bit when bad-block-list write fails. · bd8688a1
      NeilBrown authored
      When a write fails and a bad-block-list is present, we can
      update the bad-block-list instead of writing the data.  If
      this succeeds then it is OK clear the relevant bitmap-bit as
      no further 'sync' of the block is needed.
      
      However if writing the bad-block-list fails then we need to
      treat the write as failed and particularly must not clear
      the bitmap bit.  Otherwise the device can be re-added (after
      any hardware connection issues are resolved) and because the
      relevant bit in the bitmap is clear, that block will not be
      resynced.  This leads to data corruption.
      
      We already delay the final bio_endio() on the write until
      the bad-block-list is written so that when the write
      returns: either that data is safe, the bad-block record is
      safe, or the fact that the device is faulty is safe.
      However we *don't* delay the clearing of the bitmap, so the
      bitmap bit can be recorded as cleared before we know if the
      bad-block-list was written safely.
      
      So: delay that until the write really is safe.
      i.e. move the call to close_write() until just before
      calling bio_endio(), and recheck the 'is array degraded'
      status before making that call.
      
      This bug goes back to v3.1 when bad-block-lists were
      introduced, though it only affects arrays created with
      mdadm-3.3 or later as only those have bad-block lists.
      
      Backports will require at least
      Commit: 55ce74d4 ("md/raid1: ensure device failure recorded before write request returns.")
      as well.  I'll send that to 'stable' separately.
      
      Note that of the two tests of R1BIO_WriteError that this
      patch adds, the first is certain to fail and the second is
      certain to succeed.  However doing it this way makes the
      patch more obviously correct.  I will tidy the code up in a
      future merge window.
      Reported-and-tested-by: default avatarNate Dailey <nate.dailey@stratus.com>
      Cc: Jes Sorensen <Jes.Sorensen@redhat.com>
      Fixes: cd5ff9a1 ("md/raid1:  Handle write errors by updating badblock log.")
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      bd8688a1
  3. 20 Oct, 2015 2 commits
  4. 18 Oct, 2015 3 commits
    • Linus Torvalds's avatar
      Linux 4.3-rc6 · 7379047d
      Linus Torvalds authored
      7379047d
    • Linus Torvalds's avatar
      Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · c44b3255
      Linus Torvalds authored
      Pull i2c fixes from Wolfram Sang:
       "Here are some bugfixes for the I2C subsystem.
      
        Kieran found a flaw in the recently renewed wake irq handling.  Mika
        handled a user bug report where the ACPI info turned out to be
        unusable.  I updated MAINTAINERS so that such bug reports will sooner
        get to the right people.  Geert pointed me to a problem of some i2c
        drivers regarding PM which I fixed"
      
      * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: designware: Do not use parameters from ACPI on Dell Inspiron 7348
        MAINTAINERS: add maintainers for Synopsis Designware I2C drivers
        i2c: designware-platdrv: enable RuntimePM before registering to the core
        i2c: s3c2410: enable RuntimePM before registering to the core
        i2c: rcar: enable RuntimePM before registering to the core
        i2c: return probe deferred status on dev_pm_domain_attach
      c44b3255
    • Mika Westerberg's avatar
      i2c: designware: Do not use parameters from ACPI on Dell Inspiron 7348 · 56d4b8a2
      Mika Westerberg authored
      ACPI SSCN/FMCN methods were originally added because then the platform can
      provide the most accurate HCNT/LCNT values to the driver. However, this
      seems not to be true for Dell Inspiron 7348 where using these causes the
      touchpad to fail in boot:
      
        i2c_hid i2c-DLL0675:00: failed to retrieve report from device.
        i2c_designware INT3433:00: i2c_dw_handle_tx_abort: lost arbitration
        i2c_hid i2c-DLL0675:00: failed to retrieve report from device.
        i2c_designware INT3433:00: controller timed out
      
      The values received from ACPI are (in fast mode):
      
        HCNT: 72
        LCNT: 160
      
      this translates to following timings (input clock is 100MHz on Broadwell):
      
        tHIGH: 720 ns (spec min 600 ns)
        tLOW: 1600 ns (spec min 1300 ns)
        Bus period: 2920 ns (assuming 300 ns tf and tr)
        Bus speed: 342.5 kHz
      
      Both tHIGH and tLOW are within the I2C specification.
      
      The calculated values when ACPI parameters are not used are (in fast mode):
      
        HCNT: 87
        LCNT: 159
      
      which translates to:
      
        tHIGH: 870 ns (spec min 600 ns)
        tLOW: 1590 ns (spec min 1300 ns)
        Bus period 3060 ns (assuming 300 ns tf and tr)
        Bus speed 326.8 kHz
      
      These values are also within the I2C specification.
      
      Since both ACPI and calculated values meet the I2C specification timing
      requirements it is hard to say why the touchpad does not function properly
      with the ACPI values except that the bus speed is higher in this case (but
      still well below the max 400kHz).
      
      Solve this by adding DMI quirk to the driver that disables using ACPI
      parameters on this particulare machine.
      Reported-by: default avatarPavel Roskin <plroskin@gmail.com>
      Signed-off-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Tested-by: default avatarPavel Roskin <plroskin@gmail.com>
      Signed-off-by: default avatarWolfram Sang <wsa@the-dreams.de>
      Cc: stable@kernel.org
      56d4b8a2
  5. 17 Oct, 2015 3 commits
  6. 16 Oct, 2015 22 commits
    • Linus Torvalds's avatar
      Merge tag 'dm-4.3-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm · 045ce743
      Linus Torvalds authored
      Pull device mapper fixes from Mike Snitzer:
       "Two DM target error path cleanup fixes (one for stable in DM thinp and
        one for a v4.3-rc5 thinko in DM snapshot)"
      
      * tag 'dm-4.3-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
        dm thin: fix missing pool reference count decrement in pool_ctr error path
        dm snapshot persistent: fix missing cleanup in persistent_ctr error path
      045ce743
    • Linus Torvalds's avatar
      Merge branch 'for-linus-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · 6aa8ca4d
      Linus Torvalds authored
      Pull btrfs fixes from Chris Mason:
       "I have two more bug fixes for btrfs.
      
        My commit fixes a bug we hit last week at FB, a combination of lots of
        hard links and an admin command to resolve inode numbers.
      
        Dave is adding checks to make sure balance on current kernels ignores
        filters it doesn't understand.  The penalty for being wrong is just
        doing more work (not crashing etc), but it's a good fix"
      
      * 'for-linus-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
        btrfs: fix use after free iterating extrefs
        btrfs: check unsupported filters in balance arguments
      6aa8ca4d
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client · 59bcce12
      Linus Torvalds authored
      Pull Ceph fixes from Sage Weil:
       "Just two small items from Ilya:
      
        The first patch fixes the RBD readahead to grab full objects.  The
        second fixes the write ops to prevent undue promotion when a cache
        tier is configured on the server side"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
        rbd: use writefull op for object size writes
        rbd: set max_sectors explicitly
      59bcce12
    • Linus Torvalds's avatar
      Merge tag 'pm+acpi-4.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · a4c4c49a
      Linus Torvalds authored
      Pull power management and ACPI fixes from Rafael Wysocki:
       "These fix two recent regressions (ACPICA, the generic power domains
        framework) and one crash that may happen on specific hardware
        supported since 4.1 (intel_pstate).
      
        Specifics:
      
         - Fix a regression introduced by a recent ACPICA cleanup that
           uncovered a latent bug (Lv Zheng).
      
         - Fix a recent regression in the generic power domains framework that
           may cause it to violate PM QoS latency constraints in some cases
           (Ulf Hansson).
      
         - Fix an intel_pstate driver crash on the Knights Landing chips that
           do not update the MPERF counter as often as expected by the driver
           which may result in a divide by 0 (Srinivas Pandruvada)"
      
      * tag 'pm+acpi-4.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        cpufreq: intel_pstate: Fix divide by zero on Knights Landing (KNL)
        ACPICA: Tables: Fix FADT dependency regression
        PM / Domains: Fix validation of latency constraints in genpd governor
      a4c4c49a
    • Linus Torvalds's avatar
      Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux · 8b7b56f3
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Nothing too crazy or exciting:
      
         - two MAINTAINERS entries that I didn't see the point in delaying.
         - one drm mst fix to stop sending uninitialised data to monitors
         - two amdgpu fixes
         - one radeon mst tiling fix
         - one vmwgfx regression fix
         - one virtio warning fix.
      
        I have found one locking problem that needs a bit of reorg to fix, but
        I'm not sure it's worth putting in -fixes as I don't think we've seen
        it hit in the real world ever, I just found it using the virtio-gpu
        driver when working on it.  I'll possibly send it next week once I've
        time to discuss with Daniel"
      
      * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
        drm/virtio: use %llu format string form atomic64_t
        MAINTAINERS: Add myself as maintainer for the gma500 driver
        MAINTAINERS: add a maintainer for the atmel-hlcdc DRM driver
        drm/amdgpu: Keep the pflip interrupts always enabled v7
        drm/amdgpu: adjust default dispclk (v2)
        drm/dp/mst: make mst i2c transfer code more robust.
        drm/radeon: attach tile property to mst connector
        drm/vmwgfx: Fix kernel NULL pointer dereference on older hardware
      8b7b56f3
    • Linus Torvalds's avatar
      Merge tag 'powerpc-4.3-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · ebb65c81
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       - Re-enable CONFIG_SCSI_DH in our defconfigs
       - Remove unused os_area_db_id_video_mode
       - cxl: fix leak of IRQ names in cxl_free_afu_irqs() from Andrew
       - cxl: fix leak of ctx->irq_bitmap when releasing context via kernel API from Andrew
       - cxl: fix leak of ctx->mapping when releasing kernel API contexts from Andrew
       - cxl: Workaround malformed pcie packets on some cards from Philippe
       - cxl: Fix number of allocated pages in SPA from Christophe Lombard
       - Fix checkstop in native_hpte_clear() with lockdep from Cyril
       - Panic on unhandled Machine Check on powernv from Daniel
       - selftests/powerpc: Fix build failure of load_unaligned_zeropad test
      
      * tag 'powerpc-4.3-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        selftests/powerpc: Fix build failure of load_unaligned_zeropad test
        powerpc/powernv: Panic on unhandled Machine Check
        powerpc: Fix checkstop in native_hpte_clear() with lockdep
        cxl: Fix number of allocated pages in SPA
        cxl: Workaround malformed pcie packets on some cards
        cxl: fix leak of ctx->mapping when releasing kernel API contexts
        cxl: fix leak of ctx->irq_bitmap when releasing context via kernel API
        cxl: fix leak of IRQ names in cxl_free_afu_irqs()
        powerpc/ps3: Remove unused os_area_db_id_video_mode
        powerpc/configs: Re-enable CONFIG_SCSI_DH
      ebb65c81
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 3d875182
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "6 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        sh: add copy_user_page() alias for __copy_user()
        lib/Kconfig: ZLIB_DEFLATE must select BITREVERSE
        mm, dax: fix DAX deadlocks
        memcg: convert threshold to bytes
        builddeb: remove debian/files before build
        mm, fs: obey gfp_mapping for add_to_page_cache()
      3d875182
    • Ross Zwisler's avatar
      sh: add copy_user_page() alias for __copy_user() · 934ed25e
      Ross Zwisler authored
      copy_user_page() is needed by DAX.  Without this we get a compile error
      for DAX on SH:
      
        fs/dax.c:280:2: error: implicit declaration of function `copy_user_page' [-Werror=implicit-function-declaration]
          copy_user_page(vto, (void __force *)vfrom, vaddr, to);
            ^
      
      This was done with a random config that happened to include DAX support.
      
      This patch has only been compile tested.
      Signed-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Reported-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      934ed25e
    • Andrew Morton's avatar
      lib/Kconfig: ZLIB_DEFLATE must select BITREVERSE · 1fd4e5c3
      Andrew Morton authored
      lib/built-in.o: In function `__bitrev32':
      deftree.c:(.text+0x1e799): undefined reference to `byte_rev_table'
      deftree.c:(.text+0x1e7a0): undefined reference to `byte_rev_table'
      deftree.c:(.text+0x1e7b4): undefined reference to `byte_rev_table'
      deftree.c:(.text+0x1e7c1): undefined reference to `byte_rev_table'
      
      Anything which uses bitrevX() has to select BITREVERSE, to grab
      lib/bitrev.o.
      Reported-by: default avatarJim Davis <jim.epost@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1fd4e5c3
    • Ross Zwisler's avatar
      mm, dax: fix DAX deadlocks · 0f90cc66
      Ross Zwisler authored
      The following two locking commits in the DAX code:
      
      commit 84317297 ("dax: fix race between simultaneous faults")
      commit 46c043ed ("mm: take i_mmap_lock in unmap_mapping_range() for DAX")
      
      introduced a number of deadlocks and other issues which need to be fixed
      for the v4.3 kernel.  The list of issues in DAX after these commits
      (some newly introduced by the commits, some preexisting) can be found
      here:
      
        https://lkml.org/lkml/2015/9/25/602 (Subject: "Re: [PATCH] dax: fix deadlock in __dax_fault").
      
      This undoes most of the changes introduced by those two commits,
      essentially returning us to the DAX locking scheme that was used in
      v4.2.
      Signed-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Tested-by: default avatarDave Chinner <dchinner@redhat.com>
      Cc: Jan Kara <jack@suse.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0f90cc66
    • Shaohua Li's avatar
      memcg: convert threshold to bytes · 424cdc14
      Shaohua Li authored
      page_counter_memparse() returns pages for the threshold, while
      mem_cgroup_usage() returns bytes for memory usage.  Convert the
      threshold to bytes.
      
      Fixes: 3e32cb2e ("memcg: rename cgroup_event to mem_cgroup_event").
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      424cdc14
    • Riku Voipio's avatar
      builddeb: remove debian/files before build · 8d740a37
      Riku Voipio authored
      Commit 3716001b ("deb-pkg: add source package") added the ability to
      create a debian changelog file.  This exposed that previously the
      builddeb script hasn't cleared debian/files between builds.
      
      As debian/files keeps accumulating entries, the changes file will end up
      growing indefinelty.  With outdated entries in debian/files, builddeb
      script will exit with failure.  This regression impacts those who use
      "make deb-pkg" target to build kernel into a .deb package and never use
      "make mrproper" or other means to clean kernel tree from generated
      directories.
      
      To fix the regression, remove debian/files before starting build and in
      the generated clean rule.
      
      Fixes: 3716001b ("deb-pkg: add source package")
      Signed-off-by: default avatarRiku Voipio <riku.voipio@linaro.org>
      Reported-by: default avatarDoug Smythies <dsmythies@telus.net>
      Tested-by: default avatarDoug Smythies <dsmythies@telus.net>
      Tested-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Acked-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: maximilian attems <maks@stro.at>
      Cc: Chris J Arges <chris.j.arges@canonical.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8d740a37
    • Michal Hocko's avatar
      mm, fs: obey gfp_mapping for add_to_page_cache() · 063d99b4
      Michal Hocko authored
      Commit 6afdb859 ("mm: do not ignore mapping_gfp_mask in page cache
      allocation paths") has caught some users of hardcoded GFP_KERNEL used in
      the page cache allocation paths.  This, however, wasn't complete and
      there were others which went unnoticed.
      
      Dave Chinner has reported the following deadlock for xfs on loop device:
      : With the recent merge of the loop device changes, I'm now seeing
      : XFS deadlock on my single CPU, 1GB RAM VM running xfs/073.
      :
      : The deadlocked is as follows:
      :
      : kloopd1: loop_queue_read_work
      :       xfs_file_iter_read
      :       lock XFS inode XFS_IOLOCK_SHARED (on image file)
      :       page cache read (GFP_KERNEL)
      :       radix tree alloc
      :       memory reclaim
      :       reclaim XFS inodes
      :       log force to unpin inodes
      :       <wait for log IO completion>
      :
      : xfs-cil/loop1: <does log force IO work>
      :       xlog_cil_push
      :       xlog_write
      :       <loop issuing log writes>
      :               xlog_state_get_iclog_space()
      :               <blocks due to all log buffers under write io>
      :               <waits for IO completion>
      :
      : kloopd1: loop_queue_write_work
      :       xfs_file_write_iter
      :       lock XFS inode XFS_IOLOCK_EXCL (on image file)
      :       <wait for inode to be unlocked>
      :
      : i.e. the kloopd, with it's split read and write work queues, has
      : introduced a dependency through memory reclaim. i.e. that writes
      : need to be able to progress for reads make progress.
      :
      : The problem, fundamentally, is that mpage_readpages() does a
      : GFP_KERNEL allocation, rather than paying attention to the inode's
      : mapping gfp mask, which is set to GFP_NOFS.
      :
      : The didn't used to happen, because the loop device used to issue
      : reads through the splice path and that does:
      :
      :       error = add_to_page_cache_lru(page, mapping, index,
      :                       GFP_KERNEL & mapping_gfp_mask(mapping));
      
      This has changed by commit aa4d8616 ("block: loop: switch to VFS
      ITER_BVEC").
      
      This patch changes mpage_readpage{s} to follow gfp mask set for the
      mapping.  There are, however, other places which are doing basically the
      same.
      
      lustre:ll_dir_filler is doing GFP_KERNEL from the function which
      apparently uses GFP_NOFS for other allocations so let's make this
      consistent.
      
      cifs:readpages_get_pages is called from cifs_readpages and
      __cifs_readpages_from_fscache called from the same path obeys mapping
      gfp.
      
      ramfs_nommu_expand_for_mapping is hardcoding GFP_KERNEL as well
      regardless it uses mapping_gfp_mask for the page allocation.
      
      ext4_mpage_readpages is the called from the page cache allocation path
      same as read_pages and read_cache_pages
      
      As I've noticed in my previous post I cannot say I would be happy about
      sprinkling mapping_gfp_mask all over the place and it sounds like we
      should drop gfp_mask argument altogether and use it internally in
      __add_to_page_cache_locked that would require all the filesystems to use
      mapping gfp consistently which I am not sure is the case here.  From a
      quick glance it seems that some file system use it all the time while
      others are selective.
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reported-by: default avatarDave Chinner <david@fromorbit.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Ming Lei <ming.lei@canonical.com>
      Cc: Andreas Dilger <andreas.dilger@intel.com>
      Cc: Oleg Drokin <oleg.drokin@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      063d99b4
    • Ilya Dryomov's avatar
      rbd: use writefull op for object size writes · e30b7577
      Ilya Dryomov authored
      This covers only the simplest case - an object size sized write, but
      it's still useful in tiering setups when EC is used for the base tier
      as writefull op can be proxied, saving an object promotion.
      
      Even though updating ceph_osdc_new_request() to allow writefull should
      just be a matter of fixing an assert, I didn't do it because its only
      user is cephfs.  All other sites were updated.
      
      Reflects ceph.git commit 7bfb7f9025a8ee0d2305f49bf0336d2424da5b5b.
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarAlex Elder <elder@linaro.org>
      e30b7577
    • Ilya Dryomov's avatar
      rbd: set max_sectors explicitly · 0d9fde4f
      Ilya Dryomov authored
      Commit 30e2bc08 ("Revert "block: remove artifical max_hw_sectors
      cap"") restored a clamp on max_sectors.  It's now 2560 sectors instead
      of 1024, but it's not good enough: we set max_hw_sectors to rbd object
      size because we don't want object sized I/Os to be split, and the
      default object size is 4M.
      
      So, set max_sectors to max_hw_sectors in rbd at queue init time.
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarAlex Elder <elder@linaro.org>
      0d9fde4f
    • Thomas Gleixner's avatar
      timekeeping: Increment clock_was_set_seq in timekeeping_init() · 56fd16ca
      Thomas Gleixner authored
      timekeeping_init() can set the wall time offset, so we need to
      increment the clock_was_set_seq counter. That way hrtimers will pick
      up the early offset immediately. Otherwise on a machine which does not
      set wall time later in the boot process the hrtimer offset is stale at
      0 and wall time timers are going to expire with a delay of 45 years.
      
      Fixes: 868a3e91 "hrtimer: Make offset update smarter"
      Reported-and-tested-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Stefan Liebler <stli@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: John Stultz <john.stultz@linaro.org>
      56fd16ca
    • Rafael J. Wysocki's avatar
      Merge branches 'acpica', 'pm-domains' and 'pm-cpufreq' · fa548237
      Rafael J. Wysocki authored
      * acpica:
        ACPICA: Tables: Fix FADT dependency regression
      
      * pm-domains:
        PM / Domains: Fix validation of latency constraints in genpd governor
      
      * pm-cpufreq:
        cpufreq: intel_pstate: Fix divide by zero on Knights Landing (KNL)
      fa548237
    • Marc Zyngier's avatar
      genirq/msi: Do not use pci_msi_[un]mask_irq as default methods · 0701c53e
      Marc Zyngier authored
      When we create a generic MSI domain, that MSI_FLAG_USE_DEF_CHIP_OPS
      is set, and that any of .mask or .unmask are NULL in the irq_chip
      structure, we set them to pci_msi_[un]mask_irq.
      
      This is a bad idea for at least two reasons:
      - PCI_MSI might not be selected, kernel fails to build (yes, this is
        legitimate, at least on arm64!)
      - This may not be a PCI/MSI domain at all (platform MSI, for example)
      
      Either way, this looks wrong. Move the overriding of mask/unmask to
      the PCI counterpart, and panic is any of these two methods is not
      set in the core code (they really should be present).
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Cc: Jiang Liu <jiang.liu@linux.intel.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Link: http://lkml.kernel.org/r/1444760085-27857-1-git-send-email-marc.zyngier@arm.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      0701c53e
    • Arnd Bergmann's avatar
      drm/virtio: use %llu format string form atomic64_t · d549f545
      Arnd Bergmann authored
      The virtgpu driver prints the last_seq variable using the %ld or
      %lu format string, which does not work correctly on all architectures
      and causes this compiler warning on ARM:
      
      drivers/gpu/drm/virtio/virtgpu_fence.c: In function 'virtio_timeline_value_str':
      drivers/gpu/drm/virtio/virtgpu_fence.c:64:22: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'long long int' [-Wformat=]
        snprintf(str, size, "%lu", atomic64_read(&fence->drv->last_seq));
                            ^
      drivers/gpu/drm/virtio/virtgpu_debugfs.c: In function 'virtio_gpu_debugfs_irq_info':
      drivers/gpu/drm/virtio/virtgpu_debugfs.c:37:16: warning: format '%ld' expects argument of type 'long int', but argument 3 has type 'long long int' [-Wformat=]
        seq_printf(m, "fence %ld %lld\n",
                      ^
      
      In order to avoid the warnings, this changes the format strings to %llu
      and adds a cast to u64, which makes it work the same way everywhere.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      d549f545
    • Patrik Jakobsson's avatar
    • Boris BREZILLON's avatar
      MAINTAINERS: add a maintainer for the atmel-hlcdc DRM driver · 99763bb8
      Boris BREZILLON authored
      Add myself as the maintainer of the atmel-hlcdc DRM driver.
      Signed-off-by: default avatarBoris Brezillon <boris.brezillon@free-electrons.com>
      Acked-by: default avatarNicolas Ferre <nicolas.ferre@atmel.com>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      99763bb8
    • Dave Airlie's avatar
      Merge branch 'drm-fixes-4.3' of git://people.freedesktop.org/~agd5f/linux into drm-fixes · 57606c73
      Dave Airlie authored
      Just two fixes for amdgpu:
      - fix pageflip interrupt issue
      - fix display clock handling on certain fiji boards
      
      * 'drm-fixes-4.3' of git://people.freedesktop.org/~agd5f/linux:
        drm/amdgpu: Keep the pflip interrupts always enabled v7
        drm/amdgpu: adjust default dispclk (v2)
      57606c73
  7. 15 Oct, 2015 6 commits