1. 09 May, 2017 8 commits
    • Vlastimil Babka's avatar
      mm, compaction: finish whole pageblock to reduce fragmentation · baf6a9a1
      Vlastimil Babka authored
      The main goal of direct compaction is to form a high-order page for
      allocation, but it should also help against long-term fragmentation when
      possible.
      
      Most lower-than-pageblock-order compactions are for non-movable
      allocations, which means that if we compact in a movable pageblock and
      terminate as soon as we create the high-order page, it's unlikely that
      the fallback heuristics will claim the whole block.  Instead there might
      be a single unmovable page in a pageblock full of movable pages, and the
      next unmovable allocation might pick another pageblock and increase
      long-term fragmentation.
      
      To help against such scenarios, this patch changes the termination
      criteria for compaction so that the current pageblock is finished even
      though the high-order page already exists.  Note that it might be
      possible that the high-order page formed elsewhere in the zone due to
      parallel activity, but this patch doesn't try to detect that.
      
      This is only done with sync compaction, because async compaction is
      limited to pageblock of the same migratetype, where it cannot result in
      a migratetype fallback.  (Async compaction also eagerly skips
      order-aligned blocks where isolation fails, which is against the goal of
      migrating away as much of the pageblock as possible.)
      
      As a result of this patch, long-term memory fragmentation should be
      reduced.
      
      In testing based on 4.9 kernel with stress-highalloc from mmtests
      configured for order-4 GFP_KERNEL allocations, this patch has reduced
      the number of unmovable allocations falling back to movable pageblocks
      by 20%.  The number
      
      Link: http://lkml.kernel.org/r/20170307131545.28577-9-vbabka@suse.czSigned-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      baf6a9a1
    • Vlastimil Babka's avatar
      mm, compaction: restrict async compaction to pageblocks of same migratetype · 282722b0
      Vlastimil Babka authored
      The migrate scanner in async compaction is currently limited to
      MIGRATE_MOVABLE pageblocks.  This is a heuristic intended to reduce
      latency, based on the assumption that non-MOVABLE pageblocks are
      unlikely to contain movable pages.
      
      However, with the exception of THP's, most high-order allocations are
      not movable.  Should the async compaction succeed, this increases the
      chance that the non-MOVABLE allocations will fallback to a MOVABLE
      pageblock, making the long-term fragmentation worse.
      
      This patch attempts to help the situation by changing async direct
      compaction so that the migrate scanner only scans the pageblocks of the
      requested migratetype.  If it's a non-MOVABLE type and there are such
      pageblocks that do contain movable pages, chances are that the
      allocation can succeed within one of such pageblocks, removing the need
      for a fallback.  If that fails, the subsequent sync attempt will ignore
      this restriction.
      
      In testing based on 4.9 kernel with stress-highalloc from mmtests
      configured for order-4 GFP_KERNEL allocations, this patch has reduced
      the number of unmovable allocations falling back to movable pageblocks
      by 30%.  The number of movable allocations falling back is reduced by
      12%.
      
      Link: http://lkml.kernel.org/r/20170307131545.28577-8-vbabka@suse.czSigned-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      282722b0
    • Vlastimil Babka's avatar
      mm, compaction: add migratetype to compact_control · d39773a0
      Vlastimil Babka authored
      Preparation patch.  We are going to need migratetype at lower layers
      than compact_zone() and compact_finished().
      
      Link: http://lkml.kernel.org/r/20170307131545.28577-7-vbabka@suse.czSigned-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d39773a0
    • Vlastimil Babka's avatar
      mm, compaction: change migrate_async_suitable() to suitable_migration_source() · b682debd
      Vlastimil Babka authored
      Preparation for making the decisions more complex and depending on
      compact_control flags.  No functional change.
      
      Link: http://lkml.kernel.org/r/20170307131545.28577-6-vbabka@suse.czSigned-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b682debd
    • Vlastimil Babka's avatar
      mm, page_alloc: count movable pages when stealing from pageblock · 02aa0cdd
      Vlastimil Babka authored
      When stealing pages from pageblock of a different migratetype, we count
      how many free pages were stolen, and change the pageblock's migratetype
      if more than half of the pageblock was free.  This might be too
      conservative, as there might be other pages that are not free, but were
      allocated with the same migratetype as our allocation requested.
      
      While we cannot determine the migratetype of allocated pages precisely
      (at least without the page_owner functionality enabled), we can count
      pages that compaction would try to isolate for migration - those are
      either on LRU or __PageMovable().  The rest can be assumed to be
      MIGRATE_RECLAIMABLE or MIGRATE_UNMOVABLE, which we cannot easily
      distinguish.  This counting can be done as part of free page stealing
      with little additional overhead.
      
      The page stealing code is changed so that it considers free pages plus
      pages of the "good" migratetype for the decision whether to change
      pageblock's migratetype.
      
      The result should be more accurate migratetype of pageblocks wrt the
      actual pages in the pageblocks, when stealing from semi-occupied
      pageblocks.  This should help the efficiency of page grouping by
      mobility.
      
      In testing based on 4.9 kernel with stress-highalloc from mmtests
      configured for order-4 GFP_KERNEL allocations, this patch has reduced
      the number of unmovable allocations falling back to movable pageblocks
      by 47%.  The number of movable allocations falling back to other
      pageblocks are increased by 55%, but these events don't cause permanent
      fragmentation, so the tradeoff should be positive.  Later patches also
      offset the movable fallback increase to some extent.
      
      [akpm@linux-foundation.org: merge fix]
      Link: http://lkml.kernel.org/r/20170307131545.28577-5-vbabka@suse.czSigned-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      02aa0cdd
    • Vlastimil Babka's avatar
      mm, page_alloc: split smallest stolen page in fallback · 3bc48f96
      Vlastimil Babka authored
      The __rmqueue_fallback() function is called when there's no free page of
      requested migratetype, and we need to steal from a different one.
      
      There are various heuristics to make this event infrequent and reduce
      permanent fragmentation.  The main one is to try stealing from a
      pageblock that has the most free pages, and possibly steal them all at
      once and convert the whole pageblock.  Precise searching for such
      pageblock would be expensive, so instead the heuristics walks the free
      lists from MAX_ORDER down to requested order and assumes that the block
      with highest-order free page is likely to also have the most free pages
      in total.
      
      Chances are that together with the highest-order page, we steal also
      pages of lower orders from the same block.  But then we still split the
      highest order page.  This is wasteful and can contribute to
      fragmentation instead of avoiding it.
      
      This patch thus changes __rmqueue_fallback() to just steal the page(s)
      and put them on the freelist of the requested migratetype, and only
      report whether it was successful.  Then we pick (and eventually split)
      the smallest page with __rmqueue_smallest().  This all happens under
      zone lock, so nobody can steal it from us in the process.  This should
      reduce fragmentation due to fallbacks.  At worst we are only stealing a
      single highest-order page and waste some cycles by moving it between
      lists and then removing it, but fallback is not exactly hot path so that
      should not be a concern.  As a side benefit the patch removes some
      duplicate code by reusing __rmqueue_smallest().
      
      [vbabka@suse.cz: fix endless loop in the modified __rmqueue()]
        Link: http://lkml.kernel.org/r/59d71b35-d556-4fc9-ee2e-1574259282fd@suse.cz
      Link: http://lkml.kernel.org/r/20170307131545.28577-4-vbabka@suse.czSigned-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3bc48f96
    • Vlastimil Babka's avatar
      mm, compaction: remove redundant watermark check in compact_finished() · 228d7e33
      Vlastimil Babka authored
      When detecting whether compaction has succeeded in forming a high-order
      page, __compact_finished() employs a watermark check, followed by an own
      search for a suitable page in the freelists.  This is not ideal for two
      reasons:
      
       - The watermark check also searches high-order freelists, but has a
         less strict criteria wrt fallback. It's therefore redundant and waste
         of cycles. This was different in the past when high-order watermark
         check attempted to apply reserves to high-order pages.
      
       - The watermark check might actually fail due to lack of order-0 pages.
         Compaction can't help with that, so there's no point in continuing
         because of that. It's possible that high-order page still exists and
         it terminates.
      
      This patch therefore removes the watermark check.  This should save some
      cycles and terminate compaction sooner in some cases.
      
      Link: http://lkml.kernel.org/r/20170307131545.28577-3-vbabka@suse.czSigned-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      228d7e33
    • Vlastimil Babka's avatar
      mm, compaction: reorder fields in struct compact_control · f25ba6dc
      Vlastimil Babka authored
      Patch series "try to reduce fragmenting fallbacks", v3.
      
      Last year, Johannes Weiner has reported a regression in page mobility
      grouping [1] and while the exact cause was not found, I've come up with
      some ways to improve it by reducing the number of allocations falling
      back to different migratetype and causing permanent fragmentation.
      
      The series was tested with mmtests stress-highalloc modified to do
      GFP_KERNEL order-4 allocations, on 4.9 with "mm, vmscan: fix zone
      balance check in prepare_kswapd_sleep" (without that, kcompactd indeed
      wasn't woken up) on UMA machine with 4GB memory.  There were 5 repeats
      of each run, as the extfrag stats are quite volatile (note the stats
      below are sums, not averages, as it was less perl hacking for me).
      
      Success rate are the same, already high due to the low allocation order
      used, so I'm not including them.
      
      Compaction stats:
      (the patches are stacked, and I haven't measured the non-functional-changes
      patches separately)
      
                                           patch 1     patch 2     patch 3     patch 4     patch 7     patch 8
        Compaction stalls                    22449       24680       24846       19765       22059       17480
        Compaction success                   12971       14836       14608       10475       11632        8757
        Compaction failures                   9477        9843       10238        9290       10426        8722
        Page migrate success               3109022     3370438     3312164     1695105     1608435     2111379
        Page migrate failure                911588     1149065     1028264     1112675     1077251     1026367
        Compaction pages isolated          7242983     8015530     7782467     4629063     4402787     5377665
        Compaction migrate scanned       980838938   987367943   957690188   917647238   947155598  1018922197
        Compaction free scanned          557926893   598946443   602236894   594024490   541169699   763651731
        Compaction cost                      10243       10578       10304        8286        8398        9440
      
      Compaction stats are mostly within noise until patch 4, which decreases
      the number of compactions, and migrations.  Part of that could be due to
      more pageblocks marked as unmovable, and async compaction skipping
      those.  This changes a bit with patch 7, but not so much.  Patch 8
      increases free scanner stats and migrations, which comes from the
      changed termination criteria.  Interestingly number of compactions
      decreases - probably the fully compacted pageblock satisfies multiple
      subsequent allocations, so it amortizes.
      
      Next comes the extfrag tracepoint, where "fragmenting" means that an
      allocation had to fallback to a pageblock of another migratetype which
      wasn't fully free (which is almost all of the fallbacks).  I have
      locally added another tracepoint for "Page steal" into
      steal_suitable_fallback() which triggers in situations where we are
      allowed to do move_freepages_block().  If we decide to also do
      set_pageblock_migratetype(), it's "Pages steal with pageblock" with
      break down for which allocation migratetype we are stealing and from
      which fallback migratetype.  The last part "due to counting" comes from
      patch 4 and counts the events where the counting of movable pages
      allowed us to change pageblock's migratetype, while the number of free
      pages alone wouldn't be enough to cross the threshold.
      
                                                             patch 1     patch 2     patch 3     patch 4     patch 7     patch 8
        Page alloc extfrag event                            10155066     8522968    10164959    15622080    13727068    13140319
        Extfrag fragmenting                                 10149231     8517025    10159040    15616925    13721391    13134792
        Extfrag fragmenting for unmovable                     159504      168500      184177       97835       70625       56948
        Extfrag fragmenting unmovable placed with movable     153613      163549      172693       91740       64099       50917
        Extfrag fragmenting unmovable placed with reclaim.      5891        4951       11484        6095        6526        6031
        Extfrag fragmenting for reclaimable                     4738        4829        6345        4822        5640        5378
        Extfrag fragmenting reclaimable placed with movable     1836        1902        1851        1579        1739        1760
        Extfrag fragmenting reclaimable placed with unmov.      2902        2927        4494        3243        3901        3618
        Extfrag fragmenting for movable                      9984989     8343696     9968518    15514268    13645126    13072466
        Pages steal                                           179954      192291      210880      123254       94545       81486
        Pages steal with pageblock                             22153       18943       20154       33562       29969       33444
        Pages steal with pageblock for unmovable               14350       12858       13256       20660       19003       20852
        Pages steal with pageblock for unmovable from mov.     12812       11402       11683       19072       17467       19298
        Pages steal with pageblock for unmovable from recl.     1538        1456        1573        1588        1536        1554
        Pages steal with pageblock for movable                  7114        5489        5965       11787       10012       11493
        Pages steal with pageblock for movable from unmov.      6885        5291        5541       11179        9525       10885
        Pages steal with pageblock for movable from recl.        229         198         424         608         487         608
        Pages steal with pageblock for reclaimable               689         596         933        1115         954        1099
        Pages steal with pageblock for reclaimable from unmov.   273         219         537         658         547         667
        Pages steal with pageblock for reclaimable from mov.     416         377         396         457         407         432
        Pages steal with pageblock due to counting                                                 11834       10075        7530
        ... for unmovable                                                                           8993        7381        4616
        ... for movable                                                                             2792        2653        2851
        ... for reclaimable                                                                           49          41          63
      
      What we can see is that "Extfrag fragmenting for unmovable" and "...
      placed with movable" drops with almost each patch, which is good as we
      are polluting less movable pageblocks with unmovable pages.
      
      The most significant change is patch 4 with movable page counting.  On
      the other hand it increases "Extfrag fragmenting for movable" by 50%.
      "Pages steal" drops though, so these movable allocation fallbacks find
      only small free pages and are not allowed to steal whole pageblocks
      back.  "Pages steal with pageblock" raises, because the patch increases
      the chances of pageblock migratetype changes to happen.  This affects
      all migratetypes.
      
      The summary is that patch 4 is not a clear win wrt these stats, but I
      believe that the tradeoff it makes is a good one.  There's less
      pollution of movable pageblocks by unmovable allocations.  There's less
      stealing between pageblock, and those that remain have higher chance of
      changing migratetype also the pageblock itself, so it should more
      faithfully reflect the migratetype of the pages within the pageblock.
      The increase of movable allocations falling back to unmovable pageblock
      might look dramatic, but those allocations can be migrated by compaction
      when needed, and other patches in the series (7-9) improve that aspect.
      
      Patches 7 and 8 continue the trend of reduced unmovable fallbacks and
      also reduce the impact on movable fallbacks from patch 4.
      
      [1] https://www.spinics.net/lists/linux-mm/msg114237.html
      
      This patch (of 8):
      
      While currently there are (mostly by accident) no holes in struct
      compact_control (on x86_64), but we are going to add more bool flags, so
      place them all together to the end of the structure.  While at it, just
      order all fields from largest to smallest.
      
      Link: http://lkml.kernel.org/r/20170307131545.28577-2-vbabka@suse.czSigned-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f25ba6dc
  2. 07 May, 2017 1 commit
  3. 06 May, 2017 13 commits
    • Linus Torvalds's avatar
      Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6 · fe7a719b
      Linus Torvalds authored
      Pull cifs fixes from Steve French:
       "Various fixes for stable for CIFS/SMB3 especially for better
        interoperability for SMB3 to Macs.
      
        It also includes Pavel's improvements to SMB3 async i/o support
        (which is much faster now)"
      
      * 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
        CIFS: add misssing SFM mapping for doublequote
        SMB3: Work around mount failure when using SMB3 dialect to Macs
        cifs: fix CIFS_IOC_GET_MNT_INFO oops
        CIFS: fix mapping of SFM_SPACE and SFM_PERIOD
        CIFS: fix oplock break deadlocks
        cifs: fix CIFS_ENUMERATE_SNAPSHOTS oops
        cifs: fix leak in FSCTL_ENUM_SNAPS response handling
        Set unicode flag on cifs echo request to avoid Mac error
        CIFS: Add asynchronous write support through kernel AIO
        CIFS: Add asynchronous read support through kernel AIO
        CIFS: Add asynchronous context to support kernel AIO
        cifs: fix IPv6 link local, with scope id, address parsing
        cifs: small underflow in cnvrtDosUnixTm()
      fe7a719b
    • Linus Torvalds's avatar
      Merge tag 'xfs-4.12-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · d484467c
      Linus Torvalds authored
      Pull xfs updates from Darrick Wong:
       "Here are the XFS changes for 4.12. The big new feature for this
        release is the new space mapping ioctl that we've been discussing
        since LSF2016, but other than that most of the patches are larger bug
        fixes, memory corruption prevention, and other cleanups.
      
        Summary:
         - various code cleanups
         - introduce GETFSMAP ioctl
         - various refactoring
         - avoid dio reads past eof
         - fix memory corruption and other errors with fragmented directory blocks
         - fix accidental userspace memory corruptions
         - publish fs uuid in superblock
         - make fstrim terminatable
         - fix race between quotaoff and in-core inode creation
         - avoid use-after-free when finishing up w/ buffer heads
         - reserve enough space to handle bmap tree resizing during cow remap"
      
      * tag 'xfs-4.12-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (53 commits)
        xfs: fix use-after-free in xfs_finish_page_writeback
        xfs: reserve enough blocks to handle btree splits when remapping
        xfs: wait on new inodes during quotaoff dquot release
        xfs: update ag iterator to support wait on new inodes
        xfs: support ability to wait on new inodes
        xfs: publish UUID in struct super_block
        xfs: Allow user to kill fstrim process
        xfs: better log intent item refcount checking
        xfs: fix up quotacheck buffer list error handling
        xfs: remove xfs_trans_ail_delete_bulk
        xfs: don't use bool values in trace buffers
        xfs: fix getfsmap userspace memory corruption while setting OF_LAST
        xfs: fix __user annotations for xfs_ioc_getfsmap
        xfs: corruption needs to respect endianess too!
        xfs: use NULL instead of 0 to initialize a pointer in xfs_ioc_getfsmap
        xfs: use NULL instead of 0 to initialize a pointer in xfs_getfsmap
        xfs: simplify validation of the unwritten extent bit
        xfs: remove unused values from xfs_exntst_t
        xfs: remove the unused XFS_MAXLINK_1 define
        xfs: more do_div cleanups
        ...
      d484467c
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 044f1daa
      Linus Torvalds authored
      Pull block fixes and updates from Jens Axboe:
       "Some fixes and followup features/changes that should go in, in this
        merge window. This contains:
      
         - Two fixes for lightnvm from Javier, fixing problems in the new code
           merge previously in this merge window.
      
         - A fix from Jan for the backing device changes, fixing an issue in
           NFS that causes a failure to mount on certain setups.
      
         - A change from Christoph, cleaning up the blk-mq init and exit
           request paths.
      
         - Remove elevator_change(), which is now unused. From Bart.
      
         - A fix for queue operation invocation on a dead queue, from Bart.
      
         - A series fixing up mtip32xx for blk-mq scheduling, removing a
           bandaid we previously had in place for this. From me.
      
         - A regression fix for this series, fixing a case where we wait on
           workqueue flushing from an invalid (non-blocking) context. From me.
      
         - A fix/optimization from Ming, ensuring that we don't both quiesce
           and freeze a queue at the same time.
      
         - A fix from Peter on lock ordering for CPU hotplug. Not a real
           problem right now, but will be once the CPU hotplug rework goes in.
      
         - A series from Omar, cleaning up out blk-mq debugfs support, and
           adding support for exporting info from schedulers in debugfs as
           well. This is really useful in debugging stalls or livelocks. From
           Omar"
      
      * 'for-linus' of git://git.kernel.dk/linux-block: (28 commits)
        mq-deadline: add debugfs attributes
        kyber: add debugfs attributes
        blk-mq-debugfs: allow schedulers to register debugfs attributes
        blk-mq: untangle debugfs and sysfs
        blk-mq: move debugfs declarations to a separate header file
        blk-mq: Do not invoke queue operations on a dead queue
        blk-mq-debugfs: get rid of a bunch of boilerplate
        blk-mq-debugfs: rename hw queue directories from <n> to hctx<n>
        blk-mq-debugfs: don't open code strstrip()
        blk-mq-debugfs: error on long write to queue "state" file
        blk-mq-debugfs: clean up flag definitions
        blk-mq-debugfs: separate flags with |
        nfs: Fix bdi handling for cloned superblocks
        block/mq: Cure cpu hotplug lock inversion
        lightnvm: fix bad back free on error path
        lightnvm: create cmd before allocating request
        blk-mq: don't use sync workqueue flushing from drivers
        mtip32xx: convert internal commands to regular block infrastructure
        mtip32xx: cleanup internal tag assumptions
        block: don't call blk_mq_quiesce_queue() after queue is frozen
        ...
      044f1daa
    • Greg Kroah-Hartman's avatar
      refcount: change EXPORT_SYMBOL markings · d557d1b5
      Greg Kroah-Hartman authored
      Now that kref is using the refcount apis, the _GPL markings are getting
      exported to places that it previously wasn't.  Now kref.h is GPLv2
      licensed, so any non-GPL code using it better be talking to some
      lawyers, but changing api markings isn't considered "nice", so let's fix
      this up.
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d557d1b5
    • Masahiro Yamada's avatar
      docs: bump minimal GNU Make version to 3.81 · 37d69ee3
      Masahiro Yamada authored
      Since 2014, you can't successfully build kernels with GNU Make version
      3.80. Example errors:
      
        $ git describe
        v4.11
        $ make --version | head -1
        GNU Make 3.80
        $ make defconfig
          HOSTCC  scripts/basic/fixdep
        scripts/Makefile.host:135: *** missing separator.  Stop.
        make: *** [defconfig] Error 2
        $ make ARCH=arm64 help
        arch/arm64/Makefile:43: *** unterminated call to function `warning': missing `)'.  Stop.
        $ make help >/dev/null
        ./Documentation/Makefile.sphinx:25: Extraneous text after `else' directive
        ./Documentation/Makefile.sphinx:31: *** only one `else' per conditional.  Stop.
        make: *** [help] Error 2
      
      The first breakage was introduced by commit c8589d1e ("kbuild:
      handle multi-objs dependency appropriately").  Since then (i.e. v3.18),
      GNU Make 3.80 has not been able to compile the kernel, but nobody has
      ever complained aboutt (or noticed) it.
      
      Even GNU Make 3.81 is more than 10 years old.  It would not hurt to
      match the documentation with reality instead of fixing makefiles.
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      37d69ee3
    • Linus Torvalds's avatar
      initramfs: avoid "label at end of compound statement" error · 394e4f5d
      Linus Torvalds authored
      Commit 17a9be31 ("initramfs: Always do fput() and load modules after
      rootfs populate") introduced an error for the
      
          CONFIG_BLK_DEV_RAM=y
      
      case, because even though the code looks fine, the compiler really wants
      a statement after a label, or you'll get complaints:
      
        init/initramfs.c: In function 'populate_rootfs':
        init/initramfs.c:644:2: error: label at end of compound statement
      
      That commit moved the subsequent statements to outside the compound
      statement, leaving the label without any associated statements.
      Reported-by: default avatarJörg Otte <jrg.otte@gmail.com>
      Fixes: 17a9be31 ("initramfs: Always do fput() and load modules after rootfs populate")
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: stable@vger.kernel.org  # if 17a9be31 gets backported
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      394e4f5d
    • Linus Torvalds's avatar
      Merge tag 'devicetree-for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux · 3ef2bc09
      Linus Torvalds authored
      Pull DeviceTree updates from Rob Herring:
      
       - fix sparse warnings in drivers/of/
      
       - add more overlay unittests
      
       - update dtc to v1.4.4-8-g756ffc4f52f6. This adds more checks on dts
         files such as unit-address formatting and stricter character sets for
         node and property names
      
       - add a common DT modalias function
      
       - move trivial-devices.txt up and out of i2c dir
      
       - ARM NVIC interrupt controller binding
      
       - vendor prefixes for Sensirion, Dioo, Nordic, ROHM
      
       - correct some binding file locations
      
      * tag 'devicetree-for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (24 commits)
        of: fix sparse warnings in fdt, irq, reserved mem, and resolver code
        of: fix sparse warning in of_pci_range_parser_one
        of: fix sparse warnings in of_find_next_cache_node
        of/unittest: Missing unlocks on error
        of: fix uninitialized variable warning for overlay test
        of: fix unittest build without CONFIG_OF_OVERLAY
        of: Add unit tests for applying overlays
        of: per-file dtc compiler flags
        fpga: region: add missing DT documentation for config complete timeout
        of: Add vendor prefix for ROHM Semiconductor
        of: fix "/cpus" reference leak in of_numa_parse_cpu_nodes()
        of: Add vendor prefix for Nordic Semiconductor
        dt-bindings: arm,nvic: Binding for ARM NVIC interrupt controller on Cortex-M
        dtc: update warning settings for new bus and node/property name checks
        scripts/dtc: Update to upstream version v1.4.4-8-g756ffc4f52f6
        scripts/dtc: automate getting dtc version and log in update script
        of: Add function for generating a DT modalias with a newline
        of: fix of_device_get_modalias returned length when truncating buffers
        Documentation: devicetree: move trivial-devices out of I2C realm
        dt-bindings: add vendor prefix for Dioo
        ..
      3ef2bc09
    • Linus Torvalds's avatar
      Merge tag 'for-4.12/dm-fixes' of... · 2eecf3a4
      Linus Torvalds authored
      Merge tag 'for-4.12/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
      
      Pull device mapper fixes from Mike Snitzer:
      
       - DM cache metadata fixes to short-circuit operations that require the
         metadata not be in 'fail_io' mode. Otherwise crashes are possible.
      
       - a DM cache fix to address the inability to adapt to continuous IO
         that happened to also reflect a changing working set (which required
         old blocks be demoted before the new working set could be promoted)
      
       - a DM cache smq policy cleanup that fell out from reviewing the above
      
       - fix the Kconfig help text for CONFIG_DM_INTEGRITY
      
      * tag 'for-4.12/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
        dm cache metadata: fail operations if fail_io mode has been established
        dm integrity: improve the Kconfig help text for DM_INTEGRITY
        dm cache policy smq: cleanup free_target_met() and clean_target_met()
        dm cache policy smq: allow demotions to happen even during continuous IO
      2eecf3a4
    • Linus Torvalds's avatar
      Merge tag 'libnvdimm-for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · 53ef7d0e
      Linus Torvalds authored
      Pull libnvdimm updates from Dan Williams:
       "The bulk of this has been in multiple -next releases. There were a few
        late breaking fixes and small features that got added in the last
        couple days, but the whole set has received a build success
        notification from the kbuild robot.
      
        Change summary:
      
         - Region media error reporting: A libnvdimm region device is the
           parent to one or more namespaces. To date, media errors have been
           reported via the "badblocks" attribute attached to pmem block
           devices for namespaces in "raw" or "memory" mode. Given that
           namespaces can be in "device-dax" or "btt-sector" mode this new
           interface reports media errors generically, i.e. independent of
           namespace modes or state.
      
           This subsequently allows userspace tooling to craft "ACPI 6.1
           Section 9.20.7.6 Function Index 4 - Clear Uncorrectable Error"
           requests and submit them via the ioctl path for NVDIMM root bus
           devices.
      
         - Introduce 'struct dax_device' and 'struct dax_operations': Prompted
           by a request from Linus and feedback from Christoph this allows for
           dax capable drivers to publish their own custom dax operations.
           This fixes the broken assumption that all dax operations are
           related to a persistent memory device, and makes it easier for
           other architectures and platforms to add customized persistent
           memory support.
      
         - 'libnvdimm' core updates: A new "deep_flush" sysfs attribute is
           available for storage appliance applications to manually trigger
           memory controllers to drain write-pending buffers that would
           otherwise be flushed automatically by the platform ADR
           (asynchronous-DRAM-refresh) mechanism at a power loss event.
           Support for "locked" DIMMs is included to prevent namespaces from
           surfacing when the namespace label data area is locked. Finally,
           fixes for various reported deadlocks and crashes, also tagged for
           -stable.
      
         - ACPI / nfit driver updates: General updates of the nfit driver to
           add DSM command overrides, ACPI 6.1 health state flags support, DSM
           payload debug available by default, and various fixes.
      
        Acknowledgements that came after the branch was pushed:
      
         - commmit 565851c9 "device-dax: fix sysfs attribute deadlock":
      Tested-by: default avatarYi Zhang <yizhan@redhat.com>
      
         - commit 23f49844 "libnvdimm: rework region badblocks clearing"
           Tested-by: Toshi Kani <toshi.kani@hpe.com>"
      
      * tag 'libnvdimm-for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (52 commits)
        libnvdimm, pfn: fix 'npfns' vs section alignment
        libnvdimm: handle locked label storage areas
        libnvdimm: convert NDD_ flags to use bitops, introduce NDD_LOCKED
        brd: fix uninitialized use of brd->dax_dev
        block, dax: use correct format string in bdev_dax_supported
        device-dax: fix sysfs attribute deadlock
        libnvdimm: restore "libnvdimm: band aid btt vs clear poison locking"
        libnvdimm: fix nvdimm_bus_lock() vs device_lock() ordering
        libnvdimm: rework region badblocks clearing
        acpi, nfit: kill ACPI_NFIT_DEBUG
        libnvdimm: fix clear length of nvdimm_forget_poison()
        libnvdimm, pmem: fix a NULL pointer BUG in nd_pmem_notify
        libnvdimm, region: sysfs trigger for nvdimm_flush()
        libnvdimm: fix phys_addr for nvdimm_clear_poison
        x86, dax, pmem: remove indirection around memcpy_from_pmem()
        block: remove block_device_operations ->direct_access()
        block, dax: convert bdev_dax_supported() to dax_direct_access()
        filesystem-dax: convert to dax_direct_access()
        Revert "block: use DAX for partition table reads"
        ext2, ext4, xfs: retrieve dax_device for iomap operations
        ...
      53ef7d0e
    • Linus Torvalds's avatar
      Merge tag 'staging-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · c6a677c6
      Linus Torvalds authored
      Pull staging/IIO updates from Greg KH:
       "Here is the big staging tree update for 4.12-rc1.
      
        It's a big one, adding about 350k new lines of crap^Wcode, mostly all
        in a big dump of media drivers from Intel. But there's other new
        drivers in here as well, yet-another-wifi driver, new IIO drivers, and
        a new crypto accelerator.
      
        We also deleted a bunch of stuff, mostly in patch cleanups, but also
        the Android ION code has shrunk a lot, and the Android low memory
        killer driver was finally deleted, much to the celebration of the -mm
        developers.
      
        All of these have been in linux-next with a few build issues that will
        show up when you merge to your tree"
      
      Merge conflicts in the new rtl8723bs driver (due to the wifi changes
      this merge window) handled as per linux-next, courtesy of Stephen
      Rothwell.
      
      * tag 'staging-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (1182 commits)
        staging: fsl-mc/dpio: add cpu <--> LE conversion for dpaa2_fd
        staging: ks7010: remove line continuations in quoted strings
        staging: vt6656: use tabs instead of spaces
        staging: android: ion: Fix unnecessary initialization of static variable
        staging: media: atomisp: fix range checking on clk_num
        staging: media: atomisp: fix misspelled word in comment
        staging: media: atomisp: kmap() can't fail
        staging: atomisp: remove #ifdef for runtime PM functions
        staging: atomisp: satm include directory is gone
        atomisp: remove some more unused files
        atomisp: remove hmm_load/store/clear indirections
        atomisp: kill off mmgr_free
        atomisp: clean up the hmm init/cleanup indirections
        atomisp: handle allocation calls before init in the hmm layer
        staging: fsl-dpaa2/eth: Add maintainer for Ethernet driver
        staging: fsl-dpaa2/eth: Add TODO file
        staging: fsl-dpaa2/eth: Add trace points
        staging: fsl-dpaa2/eth: Add driver specific stats
        staging: fsl-dpaa2/eth: Add ethtool support
        staging: fsl-dpaa2/eth: Add Freescale DPAA2 Ethernet driver
        ...
      c6a677c6
    • Linus Torvalds's avatar
      Merge tag 'media/v4.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · e87d51ac
      Linus Torvalds authored
      Pull media updates from Mauro Carvalho Chehab:
       "Media updates for v4.12-rc1:
      
         - new driver to support mediatek jpeg in hardware codec
      
         - rc-lirc, s5p-cec and st-cec staging drivers got promoted
      
         - hardware histogram support for vsp1 driver
      
         - added Virtual Media Controller driver, to make easier to test the
           media controller
      
         - added a new CEC driver (rainshadow-cec)
      
         - removed two staging LIRC drivers for obscure hardware that are too
           obsolete
      
         - added support for Intel SR300 Depth camera
      
         - some improvements at CEC and RC core
      
         - lots of driver cleanups, improvements all over the tree
      
        With this series, we're finally getting rid of the LIRC staging
        driver. There's just one left (lirc_zilog), with require more care,
        as part of its functionality (IR RX) is already provided by another
        driver. Work in progress to convert it on the proper way"
      
      * tag 'media/v4.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (304 commits)
        [media] ov2640: print error if devm_*_optional*() fails
        [media] atmel-isc: Fix the static checker warning
        [media] ov2640: add support for MEDIA_BUS_FMT_YVYU8_2X8 and MEDIA_BUS_FMT_VYUY8_2X8
        [media] ov2640: fix vflip control
        [media] ov2640: fix duplicate width+height returning from ov2640_select_win()
        [media] ov2640: add missing write to size change preamble
        [media] ov2640: add information about DSP register 0xc7
        [media] ov2640: improve banding filter register definitions/documentation
        [media] ov2640: fix init sequence alignment
        [media] ov2640: make GPIOLIB an optional dependency
        [media] xc5000: fix spelling mistake: "calibration"
        [media] vidioc-queryctrl.rst: fix menu/int menu references
        [media] media-entity: only call dev_dbg_obj if mdev is not NULL
        [media] pixfmt-meta-vsp1-hgo.rst: remove spurious '-'
        [media] mtk-vcodec: avoid warnings because of empty macros
        [media] coda: bump maximum number of internal framebuffers to 17
        [media] media: mtk-vcodec: remove informative log
        [media] subdev-formats.rst: remove spurious '-'
        [media] dw2102: limit messages to buffer size
        [media] ttusb2: limit messages to buffer size
        ...
      e87d51ac
    • Linus Torvalds's avatar
      Merge tag 'drm-coc-for-v4.12-rc1' of git://people.freedesktop.org/~airlied/linux · bdc713bf
      Linus Torvalds authored
      Pull drm CoC pointer from Dave Airlie:
       "Small supplementary pull request. I didn't want anyone saying we snuck
        this in in a the middle of a big pile of changes, so here is a clearly
        separate pull request documenting the code of conduct introduced for
        freedesktop.org and how it relates to dri-devel community"
      
      * tag 'drm-coc-for-v4.12-rc1' of git://people.freedesktop.org/~airlied/linux:
        drm: Document code of conduct
      bdc713bf
    • Linus Torvalds's avatar
      Merge tag 'drm-forgot-about-tegra-for-v4.12-rc1' of git://people.freedesktop.org/~airlied/linux · 1062ae49
      Linus Torvalds authored
      Pull drm tegra updates from Dave Airlie:
       "I missed a pull request from Thierry, this stuff has been in
        linux-next for a while anyways.
      
        It does contain a branch from the iommu tree, but Thierry said it
        should be fine"
      
      * tag 'drm-forgot-about-tegra-for-v4.12-rc1' of git://people.freedesktop.org/~airlied/linux:
        gpu: host1x: Fix host1x driver shutdown
        gpu: host1x: Support module reset
        gpu: host1x: Sort includes alphabetically
        drm/tegra: Add VIC support
        dt-bindings: Add bindings for the Tegra VIC
        drm/tegra: Add falcon helper library
        drm/tegra: Add Tegra DRM allocation API
        drm/tegra: Add tiling FB modifiers
        drm/tegra: Don't leak kernel pointer to userspace
        drm/tegra: Protect IOMMU operations by mutex
        drm/tegra: Enable IOVA API when IOMMU support is enabled
        gpu: host1x: Add IOMMU support
        gpu: host1x: Fix potential out-of-bounds access
        iommu/iova: Fix compile error with CONFIG_IOMMU_IOVA=m
        iommu: Add dummy implementations for !IOMMU_IOVA
        MAINTAINERS: Add related headers to IOMMU section
        iommu/iova: Consolidate code for adding new node to iovad domain rbtree
      1062ae49
  4. 05 May, 2017 18 commits
    • Linus Torvalds's avatar
      Merge tag 'gfs2-4.12.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 · 1a5fb64f
      Linus Torvalds authored
      Pull GFS2 updates from Bob Peterson:
       "We've got ten GFS2 patches for this merge window.
      
         - Andreas Gruenbacher wrote a patch to replace the deprecated call to
           rhashtable_walk_init with rhashtable_walk_enter.
      
         - Andreas also wrote a patch to eliminate redundant code in two of
           our debugfs sequence files.
      
         - Andreas also cleaned up the rhashtable key ugliness Linus pointed
           out during this cycle, following Linus's suggestions.
      
         - Andreas also wrote a patch to take advantage of his new function
           rhashtable_lookup_get_insert_fast. This makes glock lookup faster
           and more bullet-proof.
      
         - Andreas also wrote a patch to revert a patch in the evict path that
           caused occasional deadlocks, and is no longer needed.
      
         - Andrew Price wrote a patch to re-enable fallocate for the rindex
           system file to enable gfs2_grow to grow properly on secondary file
           system grow operations.
      
         - I wrote a patch to initialize an inode number field to make certain
           kernel trace points more understandable.
      
         - I also wrote a patch that makes GFS2 file system "withdraw" work
           more like it should by ignoring operations after a withdraw that
           would formerly cause a BUG() and kernel panic.
      
         - I also reworked the entire truncate/delete algorithm, scrapping the
           old recursive algorithm in favor of a new non-recursive algorithm.
           This was done for performance: This way, GFS2 no longer needs to
           lock multiple resource groups while doing truncates and deletes of
           files that cross multiple resource group boundaries, allowing for
           better parallelism. It also solves a problem whereby deleting large
           files would request a large chunk of kernel memory, which resulted
           in a get_page_from_freelist warning.
      
         - Due to a regression found during testing, I added a new patch to
           correct 'GFS2: Prevent BUG from occurring when normal Withdraws
           occur'."
      
      * tag 'gfs2-4.12.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
        GFS2: Allow glocks to be unlocked after withdraw
        GFS2: Non-recursive delete
        gfs2: Re-enable fallocate for the rindex
        Revert "GFS2: Wait for iopen glock dequeues"
        gfs2: Switch to rhashtable_lookup_get_insert_fast
        GFS2: Temporarily zero i_no_addr when creating a dinode
        gfs2: Don't pack struct lm_lockname
        gfs2: Deduplicate gfs2_{glocks,glstats}_open
        gfs2: Replace rhashtable_walk_init with rhashtable_walk_enter
        GFS2: Prevent BUG from occurring when normal Withdraws occur
      1a5fb64f
    • Linus Torvalds's avatar
      Merge tag 'for-linus-4.12-ofs-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux · aeced661
      Linus Torvalds authored
      Pull orangefs updates from Mike Marshall:
       "Orangefs cleanups, fixes and statx support.
      
        Some cleanups:
      
         - remove unused get_fsid_from_ino
         - fix bounds check for listxattr
         - clean up oversize xattr validation
         - do not set getattr_time on orangefs_lookup
         - return from orangefs_devreq_read quickly if possible
         - do not wait for timeout if umounting
         - handle zero size write in debugfs
      
        Bug fixes:
      
         - do not check possibly stale size on truncate
         - ensure the userspace component is unmounted if mount fails
         - total reimplementation of dir.c
      
        New feature:
      
         - implement statx
      
        The new implementation of dir.c is kind of a big deal, all new code.
        It has been posted to fs-devel during the previous rc period, we
        didn't get much review or feedback from there, but it has been
        reviewed very heavily here, so much so that we have two entire
        versions of the reimplementation.
      
        Not only does the new implementation fix some xfstests, but it passes
        all the new tests we made here that involve seeking and rewinding and
        giant directories and long file names. The new dir code has three
        patches itself:
      
         - skip forward to the next directory entry if seek is short
         - invalidate stored directory on seek
         - count directory pieces correctly"
      
      * tag 'for-linus-4.12-ofs-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
        orangefs: count directory pieces correctly
        orangefs: invalidate stored directory on seek
        orangefs: skip forward to the next directory entry if seek is short
        orangefs: handle zero size write in debugfs
        orangefs: do not wait for timeout if umounting
        orangefs: return from orangefs_devreq_read quickly if possible
        orangefs: ensure the userspace component is unmounted if mount fails
        orangefs: do not check possibly stale size on truncate
        orangefs: implement statx
        orangefs: remove ORANGEFS_READDIR macros
        orangefs: support very large directories
        orangefs: support llseek on directories
        orangefs: rewrite readdir to fix several bugs
        orangefs: do not set getattr_time on orangefs_lookup
        orangefs: clean up oversize xattr validation
        orangefs: fix bounds check for listxattr
        orangefs: remove unused get_fsid_from_ino
      aeced661
    • Linus Torvalds's avatar
      Merge tag 'befs-v4.12-rc1' of git://github.com/luisbg/linux-befs · 414975eb
      Linus Torvalds authored
      Pull befs fix from Luis de Bethencourt:
       "One fix from Fabian Frederick making the nfs client still work after a
        cache drop"
      
      * tag 'befs-v4.12-rc1' of git://github.com/luisbg/linux-befs:
        befs: make export work with cold dcache
      414975eb
    • Linus Torvalds's avatar
      Merge tag 'initramfs-fix-4.12-rc1' of git://github.com/stffrdhrn/linux · 58017a3e
      Linus Torvalds authored
      Pull initramfs fix from Stafford Horne:
       "This is a fix for an issue that has caused 4.11 to not boot on
        OpenRISC. I should have caught this during the 4.11 cycle but I had
        been busy on testing some other series of patches.
      
        I would have considered pushing it though a different path but Al Viro
        suggested submitting directly to you.
      
        Also, its just one as I havent really got anything else ready on my
        queue for 4.12.
      
        Summary:
      
         - Ensure fput() flush is done even for builtin initramfs"
      
      * tag 'initramfs-fix-4.12-rc1' of git://github.com/stffrdhrn/linux:
        initramfs: Always do fput() and load modules after rootfs populate
      58017a3e
    • Bob Peterson's avatar
      GFS2: Allow glocks to be unlocked after withdraw · ed17545d
      Bob Peterson authored
      This bug fixes a regression introduced by patch 0d1c7ae9.
      
      The intent of the patch was to stop promoting glocks after a
      file system is withdrawn due to a variety of errors, because doing
      so results in a BUG(). (You should be able to unmount after a
      withdraw rather than having the kernel panic.)
      
      Unfortunately, it also stopped demotions, so glocks could not be
      unlocked after withdraw, which means the unmount would hang.
      
      This patch allows function do_xmote to demote locks to an
      unlocked state after a withdraw, but not promote them.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      ed17545d
    • Eryu Guan's avatar
      xfs: fix use-after-free in xfs_finish_page_writeback · 161f55ef
      Eryu Guan authored
      Commit 28b783e4 ("xfs: bufferhead chains are invalid after
      end_page_writeback") fixed one use-after-free issue by
      pre-calculating the loop conditionals before calling bh->b_end_io()
      in the end_io processing loop, but it assigned 'next' pointer before
      checking end offset boundary & breaking the loop, at which point the
      bh might be freed already, and caused use-after-free.
      
      This is caught by KASAN when running fstests generic/127 on sub-page
      block size XFS.
      
      [ 2517.244502] run fstests generic/127 at 2017-04-27 07:30:50
      [ 2747.868840] ==================================================================
      [ 2747.876949] BUG: KASAN: use-after-free in xfs_destroy_ioend+0x3d3/0x4e0 [xfs] at addr ffff8801395ae698
      ...
      [ 2747.918245] Call Trace:
      [ 2747.920975]  dump_stack+0x63/0x84
      [ 2747.924673]  kasan_object_err+0x21/0x70
      [ 2747.928950]  kasan_report+0x271/0x530
      [ 2747.933064]  ? xfs_destroy_ioend+0x3d3/0x4e0 [xfs]
      [ 2747.938409]  ? end_page_writeback+0xce/0x110
      [ 2747.943171]  __asan_report_load8_noabort+0x19/0x20
      [ 2747.948545]  xfs_destroy_ioend+0x3d3/0x4e0 [xfs]
      [ 2747.953724]  xfs_end_io+0x1af/0x2b0 [xfs]
      [ 2747.958197]  process_one_work+0x5ff/0x1000
      [ 2747.962766]  worker_thread+0xe4/0x10e0
      [ 2747.966946]  kthread+0x2d3/0x3d0
      [ 2747.970546]  ? process_one_work+0x1000/0x1000
      [ 2747.975405]  ? kthread_create_on_node+0xc0/0xc0
      [ 2747.980457]  ? syscall_return_slowpath+0xe6/0x140
      [ 2747.985706]  ? do_page_fault+0x30/0x80
      [ 2747.989887]  ret_from_fork+0x2c/0x40
      [ 2747.993874] Object at ffff8801395ae690, in cache buffer_head size: 104
      [ 2748.001155] Allocated:
      [ 2748.003782] PID = 8327
      [ 2748.006411]  save_stack_trace+0x1b/0x20
      [ 2748.010688]  save_stack+0x46/0xd0
      [ 2748.014383]  kasan_kmalloc+0xad/0xe0
      [ 2748.018370]  kasan_slab_alloc+0x12/0x20
      [ 2748.022648]  kmem_cache_alloc+0xb8/0x1b0
      [ 2748.027024]  alloc_buffer_head+0x22/0xc0
      [ 2748.031399]  alloc_page_buffers+0xd1/0x250
      [ 2748.035968]  create_empty_buffers+0x30/0x410
      [ 2748.040730]  create_page_buffers+0x120/0x1b0
      [ 2748.045493]  __block_write_begin_int+0x17a/0x1800
      [ 2748.050740]  iomap_write_begin+0x100/0x2f0
      [ 2748.055308]  iomap_zero_range_actor+0x253/0x5c0
      [ 2748.060362]  iomap_apply+0x157/0x270
      [ 2748.064347]  iomap_zero_range+0x5a/0x80
      [ 2748.068624]  iomap_truncate_page+0x6b/0xa0
      [ 2748.073227]  xfs_setattr_size+0x1f7/0xa10 [xfs]
      [ 2748.078312]  xfs_vn_setattr_size+0x68/0x140 [xfs]
      [ 2748.083589]  xfs_file_fallocate+0x4ac/0x820 [xfs]
      [ 2748.088838]  vfs_fallocate+0x2cf/0x780
      [ 2748.093021]  SyS_fallocate+0x48/0x80
      [ 2748.097006]  do_syscall_64+0x18a/0x430
      [ 2748.101186]  return_from_SYSCALL_64+0x0/0x6a
      [ 2748.105948] Freed:
      [ 2748.108189] PID = 8327
      [ 2748.110816]  save_stack_trace+0x1b/0x20
      [ 2748.115093]  save_stack+0x46/0xd0
      [ 2748.118788]  kasan_slab_free+0x73/0xc0
      [ 2748.122969]  kmem_cache_free+0x7a/0x200
      [ 2748.127247]  free_buffer_head+0x41/0x80
      [ 2748.131524]  try_to_free_buffers+0x178/0x250
      [ 2748.136316]  xfs_vm_releasepage+0x2e9/0x3d0 [xfs]
      [ 2748.141563]  try_to_release_page+0x100/0x180
      [ 2748.146325]  invalidate_inode_pages2_range+0x7da/0xcf0
      [ 2748.152087]  xfs_shift_file_space+0x37d/0x6e0 [xfs]
      [ 2748.157557]  xfs_collapse_file_space+0x49/0x120 [xfs]
      [ 2748.163223]  xfs_file_fallocate+0x2a7/0x820 [xfs]
      [ 2748.168462]  vfs_fallocate+0x2cf/0x780
      [ 2748.172642]  SyS_fallocate+0x48/0x80
      [ 2748.176629]  do_syscall_64+0x18a/0x430
      [ 2748.180810]  return_from_SYSCALL_64+0x0/0x6a
      
      Fixed it by checking on offset against end & breaking out first,
      dereference bh only if there're still bufferheads to process.
      Signed-off-by: default avatarEryu Guan <eguan@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      161f55ef
    • Linus Torvalds's avatar
      Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · ab182e67
      Linus Torvalds authored
      Pull arm64 updates from Catalin Marinas:
      
       - kdump support, including two necessary memblock additions:
         memblock_clear_nomap() and memblock_cap_memory_range()
      
       - ARMv8.3 HWCAP bits for JavaScript conversion instructions, complex
         numbers and weaker release consistency
      
       - arm64 ACPI platform MSI support
      
       - arm perf updates: ACPI PMU support, L3 cache PMU in some Qualcomm
         SoCs, Cortex-A53 L2 cache events and DTLB refills, MAINTAINERS update
         for DT perf bindings
      
       - architected timer errata framework (the arch/arm64 changes only)
      
       - support for DMA_ATTR_FORCE_CONTIGUOUS in the arm64 iommu DMA API
      
       - arm64 KVM refactoring to use common system register definitions
      
       - remove support for ASID-tagged VIVT I-cache (no ARMv8 implementation
         using it and deprecated in the architecture) together with some
         I-cache handling clean-up
      
       - PE/COFF EFI header clean-up/hardening
      
       - define BUG() instruction without CONFIG_BUG
      
      * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (92 commits)
        arm64: Fix the DMA mmap and get_sgtable API with DMA_ATTR_FORCE_CONTIGUOUS
        arm64: Print DT machine model in setup_machine_fdt()
        arm64: pmu: Wire-up Cortex A53 L2 cache events and DTLB refills
        arm64: module: split core and init PLT sections
        arm64: pmuv3: handle pmuv3+
        arm64: Add CNTFRQ_EL0 trap handler
        arm64: Silence spurious kbuild warning on menuconfig
        arm64: pmuv3: use arm_pmu ACPI framework
        arm64: pmuv3: handle !PMUv3 when probing
        drivers/perf: arm_pmu: add ACPI framework
        arm64: add function to get a cpu's MADT GICC table
        drivers/perf: arm_pmu: split out platform device probe logic
        drivers/perf: arm_pmu: move irq request/free into probe
        drivers/perf: arm_pmu: split cpu-local irq request/free
        drivers/perf: arm_pmu: rename irq request/free functions
        drivers/perf: arm_pmu: handle no platform_device
        drivers/perf: arm_pmu: simplify cpu_pmu_request_irqs()
        drivers/perf: arm_pmu: factor out pmu registration
        drivers/perf: arm_pmu: fold init into alloc
        drivers/perf: arm_pmu: define armpmu_init_fn
        ...
      ab182e67
    • Mike Snitzer's avatar
      dm cache metadata: fail operations if fail_io mode has been established · 10add84e
      Mike Snitzer authored
      Otherwise it is possible to trigger crashes due to the metadata being
      inaccessible yet these methods don't safely account for that possibility
      without these checks.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      10add84e
    • Linus Torvalds's avatar
      Merge tag 'powerpc-4.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 7246f600
      Linus Torvalds authored
      Pull powerpc updates from Michael Ellerman:
       "Highlights include:
      
         - Larger virtual address space on 64-bit server CPUs. By default we
           use a 128TB virtual address space, but a process can request access
           to the full 512TB by passing a hint to mmap().
      
         - Support for the new Power9 "XIVE" interrupt controller.
      
         - TLB flushing optimisations for the radix MMU on Power9.
      
         - Support for CAPI cards on Power9, using the "Coherent Accelerator
           Interface Architecture 2.0".
      
         - The ability to configure the mmap randomisation limits at build and
           runtime.
      
         - Several small fixes and cleanups to the kprobes code, as well as
           support for KPROBES_ON_FTRACE.
      
         - Major improvements to handling of system reset interrupts,
           correctly treating them as NMIs, giving them a dedicated stack and
           using a new hypervisor call to trigger them, all of which should
           aid debugging and robustness.
      
         - Many fixes and other minor enhancements.
      
        Thanks to: Alastair D'Silva, Alexey Kardashevskiy, Alistair Popple,
        Andrew Donnellan, Aneesh Kumar K.V, Anshuman Khandual, Anton
        Blanchard, Balbir Singh, Ben Hutchings, Benjamin Herrenschmidt,
        Bhupesh Sharma, Chris Packham, Christian Zigotzky, Christophe Leroy,
        Christophe Lombard, Daniel Axtens, David Gibson, Gautham R. Shenoy,
        Gavin Shan, Geert Uytterhoeven, Guilherme G. Piccoli, Hamish Martin,
        Hari Bathini, Kees Cook, Laurent Dufour, Madhavan Srinivasan, Mahesh J
        Salgaonkar, Mahesh Salgaonkar, Masami Hiramatsu, Matt Brown, Matthew
        R. Ochs, Michael Neuling, Naveen N. Rao, Nicholas Piggin, Oliver
        O'Halloran, Pan Xinhui, Paul Mackerras, Rashmica Gupta, Russell
        Currey, Sukadev Bhattiprolu, Thadeu Lima de Souza Cascardo, Tobin C.
        Harding, Tyrel Datwyler, Uma Krishnan, Vaibhav Jain, Vipin K Parashar,
        Yang Shi"
      
      * tag 'powerpc-4.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (214 commits)
        powerpc/64s: Power9 has no LPCR[VRMASD] field so don't set it
        powerpc/powernv: Fix TCE kill on NVLink2
        powerpc/mm/radix: Drop support for CPUs without lockless tlbie
        powerpc/book3s/mce: Move add_taint() later in virtual mode
        powerpc/sysfs: Move #ifdef CONFIG_HOTPLUG_CPU out of the function body
        powerpc/smp: Document irq enable/disable after migrating IRQs
        powerpc/mpc52xx: Don't select user-visible RTAS_PROC
        powerpc/powernv: Document cxl dependency on special case in pnv_eeh_reset()
        powerpc/eeh: Clean up and document event handling functions
        powerpc/eeh: Avoid use after free in eeh_handle_special_event()
        cxl: Mask slice error interrupts after first occurrence
        cxl: Route eeh events to all drivers in cxl_pci_error_detected()
        cxl: Force context lock during EEH flow
        powerpc/64: Allow CONFIG_RELOCATABLE if COMPILE_TEST
        powerpc/xmon: Teach xmon oops about radix vectors
        powerpc/mm/hash: Fix off-by-one in comment about kernel contexts ids
        powerpc/pseries: Enable VFIO
        powerpc/powernv: Fix iommu table size calculation hook for small tables
        powerpc/powernv: Check kzalloc() return value in pnv_pci_table_alloc
        powerpc: Add arch/powerpc/tools directory
        ...
      7246f600
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace · e579dde6
      Linus Torvalds authored
      Pull namespace updates from Eric Biederman:
       "This is a set of small fixes that were mostly stumbled over during
        more significant development. This proc fix and the fix to
        posix-timers are the most significant of the lot.
      
        There is a lot of good development going on but unfortunately it
        didn't quite make the merge window"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
        proc: Fix unbalanced hard link numbers
        signal: Make kill_proc_info static
        rlimit: Properly call security_task_setrlimit
        signal: Remove unused definition of sig_user_definied
        ia64: Remove unused IA64_TASK_SIGHAND_OFFSET and IA64_SIGHAND_SIGLOCK_OFFSET
        ipc: Remove unused declaration of recompute_msgmni
        posix-timers: Correct sanity check in posix_cpu_nsleep
        sysctl: Remove dead register_sysctl_root
      e579dde6
    • Björn Jacke's avatar
      CIFS: add misssing SFM mapping for doublequote · 85435d7a
      Björn Jacke authored
      SFM is mapping doublequote to 0xF020
      
      Without this patch creating files with doublequote fails to Windows/Mac
      Signed-off-by: default avatarBjoern Jacke <bjacke@samba.org>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      CC: stable <stable@vger.kernel.org>
      85435d7a
    • Catalin Marinas's avatar
      arm64: Fix the DMA mmap and get_sgtable API with DMA_ATTR_FORCE_CONTIGUOUS · 92f66f84
      Catalin Marinas authored
      While honouring the DMA_ATTR_FORCE_CONTIGUOUS on arm64 (commit
      44176bb3: "arm64: Add support for DMA_ATTR_FORCE_CONTIGUOUS to
      IOMMU"), the existing uses of dma_mmap_attrs() and dma_get_sgtable()
      have been broken by passing a physically contiguous vm_struct with an
      invalid pages pointer through the common iommu API.
      
      Since the coherent allocation with DMA_ATTR_FORCE_CONTIGUOUS uses CMA,
      this patch simply reuses the existing swiotlb logic for mmap and
      get_sgtable.
      
      Note that the current implementation of get_sgtable (both swiotlb and
      iommu) is broken if dma_declare_coherent_memory() is used since such
      memory does not have a corresponding struct page. To be addressed in a
      subsequent patch.
      
      Fixes: 44176bb3 ("arm64: Add support for DMA_ATTR_FORCE_CONTIGUOUS to IOMMU")
      Reported-by: default avatarAndrzej Hajda <a.hajda@samsung.com>
      Cc: Geert Uytterhoeven <geert+renesas@glider.be>
      Acked-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Tested-by: default avatarAndrzej Hajda <a.hajda@samsung.com>
      Reviewed-by: default avatarAndrzej Hajda <a.hajda@samsung.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      92f66f84
    • Fabian Frederick's avatar
      befs: make export work with cold dcache · dcfd9b21
      Fabian Frederick authored
      based on commit b3b42c0d
      ("fs/affs: make export work with cold dcache")
      
      This adds get_parent function so that nfs client can still work after
      cache drop (Tested on NFS v4 with echo 3 > /proc/sys/vm/drop_caches)
      Signed-off-by: default avatarFabian Frederick <fabf@skynet.be>
      Signed-off-by: default avatarLuis de Bethencourt <luisbg@osg.samsung.com>
      dcfd9b21
    • Stafford Horne's avatar
      initramfs: Always do fput() and load modules after rootfs populate · 17a9be31
      Stafford Horne authored
      In OpenRISC we do not have a bootloader passed initrd, but the built in
      initramfs does contain the /init and other binaries, including modules.
      The previous commit 08865514 ("initramfs: finish fput() before
      accessing any binary from initramfs") made a change to only call fput()
      if the bootloader initrd was available, this caused intermittent crashes
      for OpenRISC.
      
      This patch changes the fput() to happen unconditionally if any rootfs is
      loaded. Also, I added some comments to make it a bit more clear why we
      call unpack_to_rootfs() multiple times.
      
      Fixes: 08865514 ("initramfs: finish fput() before accessing any binary from initramfs")
      Cc: stable@vger.kernel.org
      Cc: Lokesh Vutla <lokeshvutla@ti.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarStafford Horne <shorne@gmail.com>
      17a9be31
    • Dan Williams's avatar
      73616367
    • Dan Williams's avatar
      libnvdimm, pfn: fix 'npfns' vs section alignment · d5483fed
      Dan Williams authored
      Fix failures to create namespaces due to the vmem_altmap not advertising
      enough free space to store the memmap.
      
       WARNING: CPU: 15 PID: 8022 at arch/x86/mm/init_64.c:656 arch_add_memory+0xde/0xf0
       [..]
       Call Trace:
        dump_stack+0x63/0x83
        __warn+0xcb/0xf0
        warn_slowpath_null+0x1d/0x20
        arch_add_memory+0xde/0xf0
        devm_memremap_pages+0x244/0x440
        pmem_attach_disk+0x37e/0x490 [nd_pmem]
        nd_pmem_probe+0x7e/0xa0 [nd_pmem]
        nvdimm_bus_probe+0x71/0x120 [libnvdimm]
        driver_probe_device+0x2bb/0x460
        bind_store+0x114/0x160
        drv_attr_store+0x25/0x30
      
      In commit 658922e5 "libnvdimm, pfn: fix memmap reservation sizing"
      we arranged for the capacity to be allocated, but failed to also update
      the 'npfns' parameter. This leads to cases where there is enough
      capacity reserved to hold all the allocated sections, but
      vmemmap_populate_hugepages() still encounters -ENOMEM from
      altmap_alloc_block_buf().
      
      This fix is a stop-gap until we can teach the core memory hotplug
      implementation to permit sub-section hotplug.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 658922e5 ("libnvdimm, pfn: fix memmap reservation sizing")
      Reported-by: default avatarAnisha Allada <anisha.allada@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      d5483fed
    • Linus Torvalds's avatar
      Merge tag 'char-misc-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · af82455f
      Linus Torvalds authored
      Pull char/misc driver updates from Greg KH:
       "Here is the big set of new char/misc driver drivers and features for
        4.12-rc1.
      
        There's lots of new drivers added this time around, new firmware
        drivers from Google, more auxdisplay drivers, extcon drivers, fpga
        drivers, and a bunch of other driver updates. Nothing major, except if
        you happen to have the hardware for these drivers, and then you will
        be happy :)
      
        All of these have been in linux-next for a while with no reported
        issues"
      
      * tag 'char-misc-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (136 commits)
        firmware: google memconsole: Fix return value check in platform_memconsole_init()
        firmware: Google VPD: Fix return value check in vpd_platform_init()
        goldfish_pipe: fix build warning about using too much stack.
        goldfish_pipe: An implementation of more parallel pipe
        fpga fr br: update supported version numbers
        fpga: region: release FPGA region reference in error path
        fpga altera-hps2fpga: disable/unprepare clock on error in alt_fpga_bridge_probe()
        mei: drop the TODO from samples
        firmware: Google VPD sysfs driver
        firmware: Google VPD: import lib_vpd source files
        misc: lkdtm: Add volatile to intentional NULL pointer reference
        eeprom: idt_89hpesx: Add OF device ID table
        misc: ds1682: Add OF device ID table
        misc: tsl2550: Add OF device ID table
        w1: Remove unneeded use of assert() and remove w1_log.h
        w1: Use kernel common min() implementation
        uio_mf624: Align memory regions to page size and set correct offsets
        uio_mf624: Refactor memory info initialization
        uio: Allow handling of non page-aligned memory regions
        hangcheck-timer: Fix typo in comment
        ...
      af82455f
    • Daniel Vetter's avatar
      drm: Document code of conduct · 8676df50
      Daniel Vetter authored
      freedesktop.org has adopted a formal&enforced code of conduct:
      
      https://www.fooishbar.org/blog/fdo-contributor-covenant/
      https://www.freedesktop.org/wiki/CodeOfConduct/
      
      Besides formalizing things a bit more I don't think this changes
      anything for us, we've already peer-enforced respectful and
      constructive interactions since a long time. But it's good to document
      things properly.
      
      v2: Drop confusing note from commit message and clarify the grammer
      (Chris, Alex and others).
      
      Cc: Daniel Stone <daniels@collabora.com>
      Cc: Keith Packard <keithp@keithp.com>
      Cc: tfheen@err.no
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@intel.com>
      Reviewed-by: default avatarDaniel Stone <daniels@collabora.com>
      Reviewed-by: default avatarSumit Semwal <sumit.semwal@linaro.org>
      Acked-by: default avatarArchit Taneja <architt@codeaurora.org>
      Reviewed-by: default avatarMartin Peres <martin.peres@free.fr>
      Acked-by: default avatarThierry Reding <treding@nvidia.com>
      Acked-by: default avatarJani Nikula <jani.nikula@intel.com>
      Acked-by: default avatarVincent Abriou <vincent.abriou@st.com>
      Acked-by: default avatarNeil Armstrong <narmstrong@baylibre.com>
      Reviewed-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Acked-by: default avatarBrian Starkey <brian.starkey@arm.com>
      Acked-by: default avatarRob Clark <robdclark@gmail.com>
      Reviewed-by: default avatarDavid Herrmann <dh.herrmann@gmail.com>
      Acked-by: default avatarSean Paul <seanpaul@chromium.org>
      Reviewed-by: default avatarHarry Wentland <harry.wentland@amd.com>
      Reviewed-by: default avatarEric Anholt <eric@anholt.net>
      Acked-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Acked-by: default avatarGustavo Padovan <gustavo.padovan@collabora.com>
      Acked-by: default avatarMichel Dänzer <michel.daenzer@amd.com>
      Acked-by: default avatarLaurent Pinchart <laurent.pinchart@ideasonboard.com>
      Acked-by: default avatarSumit Semwal <sumit.semwal@linaro.org>
      Acked-by: default avatarKeith Packard <keithp@keithp.com>
      Acked-by: default avatarGabriel Krisman Bertazi <krisman@collabora.co.uk>
      Acked-by: default avatarAdam Jackson <ajax@redhat.com>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      8676df50