1. 11 Nov, 2016 20 commits
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 968ef8de
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "15 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        lib/stackdepot: export save/fetch stack for drivers
        mm: kmemleak: scan .data.ro_after_init
        memcg: prevent memcg caches to be both OFF_SLAB & OBJFREELIST_SLAB
        coredump: fix unfreezable coredumping task
        mm/filemap: don't allow partially uptodate page for pipes
        mm/hugetlb: fix huge page reservation leak in private mapping error paths
        ocfs2: fix not enough credit panic
        Revert "console: don't prefer first registered if DT specifies stdout-path"
        mm: hwpoison: fix thp split handling in memory_failure()
        swapfile: fix memory corruption via malformed swapfile
        mm/cma.c: check the max limit for cma allocation
        scripts/bloat-o-meter: fix SIGPIPE
        shmem: fix pageflags after swapping DMA32 object
        mm, frontswap: make sure allocated frontswap map is assigned
        mm: remove extra newline from allocation stall warning
      968ef8de
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · c5e4ca6d
      Linus Torvalds authored
      Pull VFS fixes from Al Viro:
       "Christoph's and Jan's aio fixes, fixup for generic_file_splice_read
        (removal of pointless detritus that actually breaks it when used for
        gfs2 ->splice_read()) and fixup for generic_file_read_iter()
        interaction with ITER_PIPE destinations."
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        splice: remove detritus from generic_file_splice_read()
        mm/filemap: don't allow partially uptodate page for pipes
        aio: fix freeze protection of aio writes
        fs: remove aio_run_iocb
        fs: remove the never implemented aio_fsync file operation
        aio: hold an extra file reference over AIO read/write operations
      c5e4ca6d
    • Linus Torvalds's avatar
      Merge tag 'ceph-for-4.9-rc5' of git://github.com/ceph/ceph-client · ef091b3c
      Linus Torvalds authored
      Pull Ceph fixes from Ilya Dryomov:
       "Ceph's ->read_iter() implementation is incompatible with the new
        generic_file_splice_read() code that went into -rc1.  Switch to the
        less efficient default_file_splice_read() for now; the proper fix is
        being held for 4.10.
      
        We also have a fix for a 4.8 regression and a trival libceph fixup"
      
      * tag 'ceph-for-4.9-rc5' of git://github.com/ceph/ceph-client:
        libceph: initialize last_linger_id with a large integer
        libceph: fix legacy layout decode with pool 0
        ceph: use default file splice read callback
      ef091b3c
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-4.9-3' of git://git.linux-nfs.org/projects/anna/linux-nfs · ef5beed9
      Linus Torvalds authored
      Pull NFS client bugfixes from Anna Schumaker:
       "Most of these fix regressions in 4.9, and none are going to stable
        this time around.
      
        Bugfixes:
         - Trim extra slashes in v4 nfs_paths to fix tools that use this
         - Fix a -Wmaybe-uninitialized warnings
         - Fix suspicious RCU usages
         - Fix Oops when mounting multiple servers at once
         - Suppress a false-positive pNFS error
         - Fix a DMAR failure in NFS over RDMA"
      
      * tag 'nfs-for-4.9-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
        xprtrdma: Fix DMAR failure in frwr_op_map() after reconnect
        fs/nfs: Fix used uninitialized warn in nfs4_slot_seqid_in_use()
        NFS: Don't print a pNFS error if we aren't using pNFS
        NFS: Ignore connections that have cl_rpcclient uninitialized
        SUNRPC: Fix suspicious RCU usage
        NFSv4.1: work around -Wmaybe-uninitialized warning
        NFS: Trim extra slash in v4 nfs_path
      ef5beed9
    • Linus Torvalds's avatar
      Merge tag 'xfs-fixes-for-linus-4.9-rc5' of... · a4fac3b5
      Linus Torvalds authored
      Merge tag 'xfs-fixes-for-linus-4.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs
      
      Pull xfs fix from Dave Chinner:
       "This is a fix for an unmount hang (regression) when the filesystem is
        shutdown.  It was supposed to go to you for -rc3, but I accidentally
        tagged the commit prior to it in that pullreq.
      
        Summary:
      
         - fix for aborting deferred transactions on filesystem shutdown"
      
      * tag 'xfs-fixes-for-linus-4.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
        xfs: defer should abort intent items if the trans roll fails
      a4fac3b5
    • Chris Wilson's avatar
      lib/stackdepot: export save/fetch stack for drivers · ae65a21f
      Chris Wilson authored
      Some drivers would like to record stacktraces in order to aide leak
      tracing.  As stackdepot already provides a facility for only storing the
      unique traces, thereby reducing the memory required, export that
      functionality for use by drivers.
      
      The code was originally created for KASAN and moved under lib in commit
      cd11016e ("mm, kasan: stackdepot implementation.  Enable stackdepot
      for SLAB") so that it could be shared with mm/.  In turn, we want to
      share it now with drivers.
      
      Link: http://lkml.kernel.org/r/20161108133209.22704-1-chris@chris-wilson.co.ukSigned-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ae65a21f
    • Jakub Kicinski's avatar
      mm: kmemleak: scan .data.ro_after_init · d7c19b06
      Jakub Kicinski authored
      Limit the number of kmemleak false positives by including
      .data.ro_after_init in memory scanning.  To achieve this we need to add
      symbols for start and end of the section to the linker scripts.
      
      The problem was been uncovered by commit 56989f6d ("genetlink: mark
      families as __ro_after_init").
      
      Link: http://lkml.kernel.org/r/1478274173-15218-1-git-send-email-jakub.kicinski@netronome.comReviewed-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d7c19b06
    • Greg Thelen's avatar
      memcg: prevent memcg caches to be both OFF_SLAB & OBJFREELIST_SLAB · f773e36d
      Greg Thelen authored
      While testing OBJFREELIST_SLAB integration with pagealloc, we found a
      bug where kmem_cache(sys) would be created with both CFLGS_OFF_SLAB &
      CFLGS_OBJFREELIST_SLAB.  When it happened, critical allocations needed
      for loading drivers or creating new caches will fail.
      
      The original kmem_cache is created early making OFF_SLAB not possible.
      When kmem_cache(sys) is created, OFF_SLAB is possible and if pagealloc
      is enabled it will try to enable it first under certain conditions.
      Given kmem_cache(sys) reuses the original flag, you can have both flags
      at the same time resulting in allocation failures and odd behaviors.
      
      This fix discards allocator specific flags from memcg before calling
      create_cache.
      
      The bug exists since 4.6-rc1 and affects testing debug pagealloc
      configurations.
      
      Fixes: b03a017b ("mm/slab: introduce new slab management type, OBJFREELIST_SLAB")
      Link: http://lkml.kernel.org/r/1478553075-120242-1-git-send-email-thgarnie@google.comSigned-off-by: default avatarGreg Thelen <gthelen@google.com>
      Signed-off-by: default avatarThomas Garnier <thgarnie@google.com>
      Tested-by: default avatarThomas Garnier <thgarnie@google.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f773e36d
    • Andrey Ryabinin's avatar
      coredump: fix unfreezable coredumping task · 70d78fe7
      Andrey Ryabinin authored
      It could be not possible to freeze coredumping task when it waits for
      'core_state->startup' completion, because threads are frozen in
      get_signal() before they got a chance to complete 'core_state->startup'.
      
      Inability to freeze a task during suspend will cause suspend to fail.
      Also CRIU uses cgroup freezer during dump operation.  So with an
      unfreezable task the CRIU dump will fail because it waits for a
      transition from 'FREEZING' to 'FROZEN' state which will never happen.
      
      Use freezer_do_not_count() to tell freezer to ignore coredumping task
      while it waits for core_state->startup completion.
      
      Link: http://lkml.kernel.org/r/1475225434-3753-1-git-send-email-aryabinin@virtuozzo.comSigned-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Acked-by: default avatarPavel Machek <pavel@ucw.cz>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      70d78fe7
    • Eryu Guan's avatar
      mm/filemap: don't allow partially uptodate page for pipes · 60da81ea
      Eryu Guan authored
      Starting from 4.9-rc1 kernel, I started noticing some test failures of
      sendfile(2) and splice(2) (sendfile0N and splice01 from LTP) when
      testing on sub-page block size filesystems (tested both XFS and ext4),
      these syscalls start to return EIO in the tests.  e.g.
      
        sendfile02    1  TFAIL  :  sendfile02.c:133: sendfile(2) failed to return expected value, expected: 26, got: -1
        sendfile02    2  TFAIL  :  sendfile02.c:133: sendfile(2) failed to return expected value, expected: 24, got: -1
        sendfile02    3  TFAIL  :  sendfile02.c:133: sendfile(2) failed to return expected value, expected: 22, got: -1
        sendfile02    4  TFAIL  :  sendfile02.c:133: sendfile(2) failed to return expected value, expected: 20, got: -1
      
      This is because that in sub-page block size cases, we don't need the
      whole page to be uptodate, only the part we care about is uptodate is OK
      (if fs has ->is_partially_uptodate defined).
      
      But page_cache_pipe_buf_confirm() doesn't have the ability to check the
      partially-uptodate case, it needs the whole page to be uptodate.  So it
      returns EIO in this case.
      
      This is a regression introduced by commit 82c156f8 ("switch
      generic_file_splice_read() to use of ->read_iter()").  Prior to the
      change, generic_file_splice_read() doesn't allow partially-uptodate page
      either, so it worked fine.
      
      Fix it by skipping the partially-uptodate check if we're working on a
      pipe in do_generic_file_read(), so we read the whole page from disk as
      long as the page is not uptodate.
      
      I think the other way to fix it is to add the ability to check & allow
      partially-uptodate page to page_cache_pipe_buf_confirm(), but that is
      much harder to do and seems gain little.
      
      Link: http://lkml.kernel.org/r/1477986187-12717-1-git-send-email-guaneryu@gmail.comSigned-off-by: default avatarEryu Guan <guaneryu@gmail.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      60da81ea
    • Mike Kravetz's avatar
      mm/hugetlb: fix huge page reservation leak in private mapping error paths · 96b96a96
      Mike Kravetz authored
      Error paths in hugetlb_cow() and hugetlb_no_page() may free a newly
      allocated huge page.
      
      If a reservation was associated with the huge page, alloc_huge_page()
      consumed the reservation while allocating.  When the newly allocated
      page is freed in free_huge_page(), it will increment the global
      reservation count.  However, the reservation entry in the reserve map
      will remain.
      
      This is not an issue for shared mappings as the entry in the reserve map
      indicates a reservation exists.  But, an entry in a private mapping
      reserve map indicates the reservation was consumed and no longer exists.
      This results in an inconsistency between the reserve map and the global
      reservation count.  This 'leaks' a reserved huge page.
      
      Create a new routine restore_reserve_on_error() to restore the reserve
      entry in these specific error paths.  This routine makes use of a new
      function vma_add_reservation() which will add a reserve entry for a
      specific address/page.
      
      In general, these error paths were rarely (if ever) taken on most
      architectures.  However, powerpc contained arch specific code that that
      resulted in an extra fault and execution of these error paths on all
      private mappings.
      
      Fixes: 67961f9d ("mm/hugetlb: fix huge page reserve accounting for private mappings)
      Link: http://lkml.kernel.org/r/1476933077-23091-2-git-send-email-mike.kravetz@oracle.comSigned-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reported-by: default avatarJan Stancek <jstancek@redhat.com>
      Tested-by: default avatarJan Stancek <jstancek@redhat.com>
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Kirill A . Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      96b96a96
    • Junxiao Bi's avatar
      ocfs2: fix not enough credit panic · d006c71f
      Junxiao Bi authored
      The following panic was caught when run ocfs2 disconfig single test
      (block size 512 and cluster size 8192).  ocfs2_journal_dirty() return
      -ENOSPC, that means credits were used up.
      
      The total credit should include 3 times of "num_dx_leaves" from
      ocfs2_dx_dir_rebalance(), because 2 times will be consumed in
      ocfs2_dx_dir_transfer_leaf() and 1 time will be consumed in
      ocfs2_dx_dir_new_cluster() -> __ocfs2_dx_dir_new_cluster() ->
      ocfs2_dx_dir_format_cluster().  But only two times is included in
      ocfs2_dx_dir_rebalance_credits(), fix it.
      
      This can cause read-only fs(v4.1+) or panic for mainline linux depending
      on mount option.
      
        ------------[ cut here ]------------
        kernel BUG at fs/ocfs2/journal.c:775!
        invalid opcode: 0000 [#1] SMP
        Modules linked in: ocfs2 nfsd lockd grace nfs_acl auth_rpcgss sunrpc autofs4 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sd_mod sg ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ppdev xen_kbdfront xen_netfront fb_sys_fops sysimgblt sysfillrect syscopyarea parport_pc parport acpi_cpufreq i2c_piix4 i2c_core pcspkr ext4 jbd2 mbcache xen_blkfront floppy pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod
        CPU: 2 PID: 10601 Comm: dd Not tainted 4.1.12-71.el6uek.bug24939243.x86_64 #2
        Hardware name: Xen HVM domU, BIOS 4.4.4OVM 02/11/2016
        task: ffff8800b6de6200 ti: ffff8800a7d48000 task.ti: ffff8800a7d48000
        RIP: ocfs2_journal_dirty+0xa7/0xb0 [ocfs2]
        RSP: 0018:ffff8800a7d4b6d8  EFLAGS: 00010286
        RAX: 00000000ffffffe4 RBX: 00000000814d0a9c RCX: 00000000000004f9
        RDX: ffffffffa008e990 RSI: ffffffffa008f1ee RDI: ffff8800622b6460
        RBP: ffff8800a7d4b6f8 R08: ffffffffa008f288 R09: ffff8800622b6460
        R10: 0000000000000000 R11: 0000000000000282 R12: 0000000002c8421e
        R13: ffff88006d0cad00 R14: ffff880092beef60 R15: 0000000000000070
        FS:  00007f9b83e92700(0000) GS:ffff8800be880000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007fb2c0d1a000 CR3: 0000000008f80000 CR4: 00000000000406e0
        Call Trace:
          ocfs2_dx_dir_transfer_leaf+0x159/0x1a0 [ocfs2]
          ocfs2_dx_dir_rebalance+0xd9b/0xea0 [ocfs2]
          ocfs2_find_dir_space_dx+0xd3/0x300 [ocfs2]
          ocfs2_prepare_dx_dir_for_insert+0x219/0x450 [ocfs2]
          ocfs2_prepare_dir_for_insert+0x1d6/0x580 [ocfs2]
          ocfs2_mknod+0x5a2/0x1400 [ocfs2]
          ocfs2_create+0x73/0x180 [ocfs2]
          vfs_create+0xd8/0x100
          lookup_open+0x185/0x1c0
          do_last+0x36d/0x780
          path_openat+0x92/0x470
          do_filp_open+0x4a/0xa0
          do_sys_open+0x11a/0x230
          SyS_open+0x1e/0x20
          system_call_fastpath+0x12/0x71
        Code: 1d 3f 29 09 00 48 85 db 74 1f 48 8b 03 0f 1f 80 00 00 00 00 48 8b 7b 08 48 83 c3 10 4c 89 e6 ff d0 48 8b 03 48 85 c0 75 eb eb 90 <0f> 0b eb fe 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54
        RIP  ocfs2_journal_dirty+0xa7/0xb0 [ocfs2]
        ---[ end trace 91ac5312a6ee1288 ]---
        Kernel panic - not syncing: Fatal exception
        Kernel Offset: disabled
      
      Link: http://lkml.kernel.org/r/1478248135-31963-1-git-send-email-junxiao.bi@oracle.comSigned-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Mark Fasheh <mfasheh@versity.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d006c71f
    • Hans de Goede's avatar
      Revert "console: don't prefer first registered if DT specifies stdout-path" · c6c7d83b
      Hans de Goede authored
      This reverts commit 05fd007e ("console: don't prefer first
      registered if DT specifies stdout-path").
      
      The reverted commit changes existing behavior on which many ARM boards
      rely.  Many ARM small-board-computers, like e.g.  the Raspberry Pi have
      both a video output and a serial console.  Depending on whether the user
      is using the device as a more regular computer; or as a headless device
      we need to have the console on either one or the other.
      
      Many users rely on the kernel behavior of the console being present on
      both outputs, before the reverted commit the console setup with no
      console= kernel arguments on an ARM board which sets stdout-path in dt
      would look like this:
      
        [root@localhost ~]# cat /proc/consoles
        ttyS0                -W- (EC p a)    4:64
        tty0                 -WU (E  p  )    4:1
      
      Where as after the reverted commit, it looks like this:
      
        [root@localhost ~]# cat /proc/consoles
        ttyS0                -W- (EC p a)    4:64
      
      This commit reverts commit 05fd007e ("console: don't prefer first
      registered if DT specifies stdout-path") restoring the original
      behavior.
      
      Fixes: 05fd007e ("console: don't prefer first registered if DT specifies stdout-path")
      Link: http://lkml.kernel.org/r/20161104121135.4780-2-hdegoede@redhat.comSigned-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Frank Rowand <frowand.list@gmail.com>
      Cc: Thorsten Leemhuis <regressions@leemhuis.info>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c6c7d83b
    • Naoya Horiguchi's avatar
      mm: hwpoison: fix thp split handling in memory_failure() · c3901e72
      Naoya Horiguchi authored
      When memory_failure() runs on a thp tail page after pmd is split, we
      trigger the following VM_BUG_ON_PAGE():
      
         page:ffffd7cd819b0040 count:0 mapcount:0 mapping:         (null) index:0x1
         flags: 0x1fffc000400000(hwpoison)
         page dumped because: VM_BUG_ON_PAGE(!page_count(p))
         ------------[ cut here ]------------
         kernel BUG at /src/linux-dev/mm/memory-failure.c:1132!
      
      memory_failure() passed refcount and page lock from tail page to head
      page, which is not needed because we can pass any subpage to
      split_huge_page().
      
      Fixes: 61f5d698 ("mm: re-enable THP")
      Link: http://lkml.kernel.org/r/1477961577-7183-1-git-send-email-n-horiguchi@ah.jp.nec.comSigned-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: <stable@vger.kernel.org>	[4.5+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c3901e72
    • Jann Horn's avatar
      swapfile: fix memory corruption via malformed swapfile · dd111be6
      Jann Horn authored
      When root activates a swap partition whose header has the wrong
      endianness, nr_badpages elements of badpages are swabbed before
      nr_badpages has been checked, leading to a buffer overrun of up to 8GB.
      
      This normally is not a security issue because it can only be exploited
      by root (more specifically, a process with CAP_SYS_ADMIN or the ability
      to modify a swap file/partition), and such a process can already e.g.
      modify swapped-out memory of any other userspace process on the system.
      
      Link: http://lkml.kernel.org/r/1477949533-2509-1-git-send-email-jann@thejh.netSigned-off-by: default avatarJann Horn <jann@thejh.net>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Acked-by: default avatarJerome Marchand <jmarchan@redhat.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dd111be6
    • Shiraz Hashim's avatar
      mm/cma.c: check the max limit for cma allocation · 6b36ba59
      Shiraz Hashim authored
      CMA allocation request size is represented by size_t that gets truncated
      when same is passed as int to bitmap_find_next_zero_area_off.
      
      We observe that during fuzz testing when cma allocation request is too
      high, bitmap_find_next_zero_area_off still returns success due to the
      truncation.  This leads to kernel crash, as subsequent code assumes that
      requested memory is available.
      
      Fail cma allocation in case the request breaches the corresponding cma
      region size.
      
      Link: http://lkml.kernel.org/r/1478189211-3467-1-git-send-email-shashim@codeaurora.orgSigned-off-by: default avatarShiraz Hashim <shashim@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6b36ba59
    • Alexey Dobriyan's avatar
      scripts/bloat-o-meter: fix SIGPIPE · eef06b82
      Alexey Dobriyan authored
      Fix piping output to a program which quickly exits (read: head -n1)
      
      	$ ./scripts/bloat-o-meter ../vmlinux-000 ../obj/vmlinux | head -n1
      	add/remove: 0/0 grow/shrink: 9/60 up/down: 124/-305 (-181)
      	close failed in file object destructor:
      	sys.excepthook is missing
      	lost sys.stderr
      
      Link: http://lkml.kernel.org/r/20161028204618.GA29923@avx2Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Cc: Matt Mackall <mpm@selenic.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      eef06b82
    • Hugh Dickins's avatar
      shmem: fix pageflags after swapping DMA32 object · 9956edf3
      Hugh Dickins authored
      If shmem_alloc_page() does not set PageLocked and PageSwapBacked, then
      shmem_replace_page() needs to do so for itself.  Without this, it puts
      newpage on the wrong lru, re-unlocks the unlocked newpage, and system
      descends into "Bad page" reports and freeze; or if CONFIG_DEBUG_VM=y, it
      hits an earlier VM_BUG_ON_PAGE(!PageLocked), depending on config.
      
      But shmem_replace_page() is not a common path: it's only called when
      swapin (or swapoff) finds the page was already read into an unsuitable
      zone: usually all zones are suitable, but gem objects for a few drm
      devices (gma500, omapdrm, crestline, broadwater) require zone DMA32 if
      there's more than 4GB of ram.
      
      Fixes: 800d8c63 ("shmem: add huge pages support")
      Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1611062003510.11253@eggly.anvilsSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: <stable@vger.kernel.org>	[4.8.x]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9956edf3
    • Vlastimil Babka's avatar
      mm, frontswap: make sure allocated frontswap map is assigned · 5e322bee
      Vlastimil Babka authored
      Christian Borntraeger reports:
      
      With commit 8ea1d2a1 ("mm, frontswap: convert frontswap_enabled to
      static key") kmemleak complains about a memory leak in swapon
      
          unreferenced object 0x3e09ba56000 (size 32112640):
            comm "swapon", pid 7852, jiffies 4294968787 (age 1490.770s)
            hex dump (first 32 bytes):
              00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
              00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
            backtrace:
               __vmalloc_node_range+0x194/0x2d8
               vzalloc+0x58/0x68
               SyS_swapon+0xd60/0x12f8
               system_call+0xd6/0x270
      
      Turns out kmemleak is right.  We now allocate the frontswap map
      depending on the kernel config (and no longer on the enablement)
      
        swapfile.c:
        [...]
            if (IS_ENABLED(CONFIG_FRONTSWAP))
                      frontswap_map = vzalloc(BITS_TO_LONGS(maxpages) * sizeof(long));
      
      but later on this is passed along
        --> enable_swap_info(p, prio, swap_map, cluster_info, frontswap_map);
      
      and ignored if frontswap is disabled
        --> frontswap_init(p->type, frontswap_map);
      
        static inline void frontswap_init(unsigned type, unsigned long *map)
        {
              if (frontswap_enabled())
                      __frontswap_init(type, map);
        }
      
      Thing is, that frontswap map is never freed.
      
      The leakage is relatively not that bad, because swapon is an infrequent
      and privileged operation.  However, if the first frontswap backend is
      registered after a swap type has been already enabled, it will WARN_ON
      in frontswap_register_ops() and frontswap will not be available for the
      swap type.
      
      Fix this by making sure the map is assigned by frontswap_init() as long
      as CONFIG_FRONTSWAP is enabled.
      
      Fixes: 8ea1d2a1 ("mm, frontswap: convert frontswap_enabled to static key")
      Link: http://lkml.kernel.org/r/20161026134220.2566-1-vbabka@suse.czSigned-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reported-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5e322bee
    • Tetsuo Handa's avatar
      mm: remove extra newline from allocation stall warning · 9e80c719
      Tetsuo Handa authored
      Commit 63f53dea ("mm: warn about allocations which stall for too
      long") by error embedded "\n" in the format string, resulting in strange
      output.
      
        [  722.876655] kworker/0:1: page alloction stalls for 160001ms, order:0
        [  722.876656] , mode:0x2400000(GFP_NOIO)
        [  722.876657] CPU: 0 PID: 6966 Comm: kworker/0:1 Not tainted 4.8.0+ #69
      
      Link: http://lkml.kernel.org/r/1476026219-7974-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jpSigned-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9e80c719
  2. 10 Nov, 2016 5 commits
  3. 09 Nov, 2016 3 commits
    • Linus Torvalds's avatar
      Merge tag 'sound-4.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 27bcd37e
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "This became a largish pull-request, as we've got a bunch of pending
        ASoC fixes at this time. One noticeable change is the removal of error
        directive in uapi/sound/asoc.h. We found that the API has been already
        used on Chromebooks, so we need to support it even now.
      
        A slight big LOC is found in Qualcomm lpass driver, but the rest are
        all small and easy fixes for ASoC drivers (sti, sun4i, Realtek codecs,
        Intel, tas571x, etc) in addition to the patches to harden the ALSA
        core proc file accesses"
      
      * tag 'sound-4.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (26 commits)
        ALSA: info: Return error for invalid read/write
        ALSA: info: Limit the proc text input size
        ASoC: samsung: spdif: Fix DMA filter initialization
        ASoC: sun4i-codec: Enable bus clock after getting GPIO
        ASoC: lpass-cpu: add module licence and description
        ASoC: lpass-platform: Fix broken pcm data usage
        ASoC: sun4i-codec: return error code instead of NULL when create_card fails
        ASoC: hdmi-codec: Fix hdmi_of_xlate_dai_name when #sound-dai-cells = <0>
        ASoC: samsung: get access to DMA engine early to defer probe properly
        ASoC: da7219: Connect output enable register to DAIOUT
        ASoC: Intel: Skylake: Fix to turn off hdmi power on probe failure
        ASoC: sti-sas: enable fast io for regmap
        ASoC: sti: fix channel status update after playback start
        ASoC: PXA: Brownstone needs I2C
        ASoC: Intel: Skylake: Always acquire runtime pm ref on unload
        ASoC: Intel: Atom: add terminate entry for dmi_system_id tables
        ASoC: rt298: fix jack type detect error
        ASoC: rt5663: fix a debug statement
        ASoC: cs4270: fix DAPM stream name mismatch
        ASoC: Intel: haswell depends on sst-firmware
        ...
      27bcd37e
    • Linus Torvalds's avatar
      Merge tag 'for-linus-4.9-rc4-ofs-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux · 3c6106da
      Linus Torvalds authored
      Pull orangefs fix from Mike Marshall:
       "We recently refactored the Orangefs debugfs code. The refactor seemed
        to trigger dan.carpenter@oracle.com's static tester to find a possible
        double-free in the code.
      
        While designing the fix we saw a condition under which the buffer
        being freed could also be overflowed.
      
        We also realized how to rebuild the related debugfs file's "contents"
        (a string) without deleting and re-creating the file.
      
        This fix should eliminate the possible double-free, the potential
        overflow and improve code readability"
      
      * tag 'for-linus-4.9-rc4-ofs-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
        orangefs: clean up debugfs
      3c6106da
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · ae67e87f
      Linus Torvalds authored
      Pull s390 fixes from Martin Schwidefsky:
       "Two bug fixes
      
         - a memory alignment fix in the s390 only hypfs code
      
         - a fix for the generic percpu code that caused ftrace to break on
           s390. This is not relevant for x86 but for all architectures that
           use the generic percpu code"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        percpu: use notrace variant of preempt_disable/preempt_enable
        s390/hypfs: Use get_free_page() instead of kmalloc to ensure page alignment
      ae67e87f
  4. 08 Nov, 2016 9 commits
    • Linus Torvalds's avatar
      Merge tag 'iommu-fixes-v4.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · e3a00f68
      Linus Torvalds authored
      Pull IOMMU fixes from Joerg Roedel:
      
       - Four patches from Robin Murphy fix several issues with the recently
         merged generic DT-bindings support for arm-smmu drivers
      
       - A fix for a dead-lock issue in the VT-d driver, which shows up on
         iommu hotplug
      
      * tag 'iommu-fixes-v4.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
        iommu/vt-d: Fix dead-locks in disable_dmar_iommu() path
        iommu/arm-smmu: Fix out-of-bounds dereference
        iommu/arm-smmu: Check that iommu_fwspecs are ours
        iommu/arm-smmu: Don't inadvertently reject multiple SMMUv3s
        iommu/arm-smmu: Work around ARM DMA configuration
      e3a00f68
    • Joerg Roedel's avatar
      iommu/vt-d: Fix dead-locks in disable_dmar_iommu() path · bea64033
      Joerg Roedel authored
      It turns out that the disable_dmar_iommu() code-path tried
      to get the device_domain_lock recursivly, which will
      dead-lock when this code runs on dmar removal. Fix both
      code-paths that could lead to the dead-lock.
      
      Fixes: 55d94043 ('iommu/vt-d: Get rid of domain->iommu_lock')
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      bea64033
    • Robin Murphy's avatar
      iommu/arm-smmu: Fix out-of-bounds dereference · 8c82d6ec
      Robin Murphy authored
      When we iterate a master's config entries, what we generally care
      about is the entry's stream map index, rather than the entry index
      itself, so it's nice to have the iterator automatically assign the
      former from the latter. Unfortunately, booting with KASAN reveals
      the oversight that using a simple comma operator results in the
      entry index being dereferenced before being checked for validity,
      so we always access one element past the end of the fwspec array.
      
      Flip things around so that the check always happens before the index
      may be dereferenced.
      
      Fixes: adfec2e7 ("iommu/arm-smmu: Convert to iommu_fwspec")
      Reported-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      8c82d6ec
    • Robin Murphy's avatar
      iommu/arm-smmu: Check that iommu_fwspecs are ours · 3c117b54
      Robin Murphy authored
      We seem to have forgotten to check that iommu_fwspecs actually belong to
      us before we go ahead and dereference their private data. Oops.
      
      Fixes: 021bb842 ("iommu/arm-smmu: Wire up generic configuration support")
      Signed-off-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      3c117b54
    • Robin Murphy's avatar
      iommu/arm-smmu: Don't inadvertently reject multiple SMMUv3s · ec615f43
      Robin Murphy authored
      We now delay installing our per-bus iommu_ops until we know an SMMU has
      successfully probed, as they don't serve much purpose beforehand, and
      doing so also avoids fights between multiple IOMMU drivers in a single
      kernel. However, the upshot of passing the return value of bus_set_iommu()
      back from our probe function is that if there happens to be more than
      one SMMUv3 device in a system, the second and subsequent probes will
      wind up returning -EBUSY to the driver core and getting torn down again.
      
      Avoid re-setting ops if ours are already installed, so that any genuine
      failures stand out.
      
      Fixes: 08d4ca2a ("iommu/arm-smmu: Support non-PCI devices with SMMUv3")
      CC: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      CC: Hanjun Guo <hanjun.guo@linaro.org>
      Signed-off-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      ec615f43
    • Robin Murphy's avatar
      iommu/arm-smmu: Work around ARM DMA configuration · fba4f8e5
      Robin Murphy authored
      The 32-bit ARM DMA configuration code predates the IOMMU core's default
      domain functionality, and instead relies on allocating its own domains
      and attaching any devices using the generic IOMMU binding to them.
      Unfortunately, it does this relatively early on in the creation of the
      device, before we've seen our add_device callback, which leads us to
      attempt to operate on a half-configured master.
      
      To avoid a crash, check for this situation on attach, but refuse to
      play, as there's nothing we can do. This at least allows VFIO to keep
      working for people who update their 32-bit DTs to the generic binding,
      albeit with a few (innocuous) warnings from the DMA layer on boot.
      Signed-off-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      fba4f8e5
    • Takashi Iwai's avatar
      ALSA: info: Return error for invalid read/write · 6809cd68
      Takashi Iwai authored
      Currently the ALSA proc handler allows read or write even if the proc
      file were write-only or read-only.  It's mostly harmless, does thing
      but allocating memory and ignores the input/output.  But it doesn't
      tell user about the invalid use, and it's confusing and inconsistent
      in comparison with other proc files.
      
      This patch adds some sanity checks and let the proc handler returning
      an -EIO error when the invalid read/write is performed.
      
      Cc: <stable@vger.kernel.org> # v4.2+
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      6809cd68
    • Takashi Iwai's avatar
      ALSA: info: Limit the proc text input size · 027a9fe6
      Takashi Iwai authored
      The ALSA proc handler allows currently the write in the unlimited size
      until kmalloc() fails.  But basically the write is supposed to be only
      for small inputs, mostly for one line inputs, and we don't have to
      handle too large sizes at all.  Since the kmalloc error results in the
      kernel warning, it's better to limit the size beforehand.
      
      This patch adds the limit of 16kB, which must be large enough for the
      currently existing code.
      
      Cc: stable@vger.kernel.org # v4.2+
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      027a9fe6
    • Heiko Carstens's avatar
      percpu: use notrace variant of preempt_disable/preempt_enable · 7f8d61f0
      Heiko Carstens authored
      Commit 345ddcc8 ("ftrace: Have set_ftrace_pid use the bitmap like
      events do") added a couple of this_cpu_read calls to the ftrace code.
      
      On x86 this is not a problem, since it has single instructions to read
      percpu data. Other architectures which use the generic variant now
      have additional preempt_disable and preempt_enable calls in the core
      ftrace code. This may lead to recursive calls and in result to a dead
      machine, e.g. if preemption and debugging options are enabled.
      
      To fix this use the notrace variant of preempt_disable and
      preempt_enable within the generic percpu code.
      Reported-and-bisected-by: default avatarSebastian Ott <sebott@linux.vnet.ibm.com>
      Tested-by: default avatarSebastian Ott <sebott@linux.vnet.ibm.com>
      Fixes: 345ddcc8 ("ftrace: Have set_ftrace_pid use the bitmap like events do")
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      7f8d61f0
  5. 07 Nov, 2016 3 commits