1. 11 Jan, 2017 10 commits
    • Michal Hocko's avatar
      mm: fix remote numa hits statistics · 2df26639
      Michal Hocko authored
      Jia He has noticed that commit b9f00e14 ("mm, page_alloc: reduce
      branches in zone_statistics") has an unintentional side effect that
      remote node allocation requests are accounted as NUMA_MISS rathat than
      NUMA_HIT and NUMA_OTHER if such a request doesn't use __GFP_OTHER_NODE.
      
      There are many of these potentially because the flag is used very rarely
      while we have many users of __alloc_pages_node.
      
      Fix this by simply ignoring __GFP_OTHER_NODE (it can be removed in a
      follow up patch) and treat all allocations that were satisfied from the
      preferred zone's node as NUMA_HITS because this is the same node we
      requested the allocation from in most cases.  If this is not the local
      node then we just account it as NUMA_OTHER rather than NUMA_LOCAL.
      
      One downsize would be that an allocation request for a node which is
      outside of the mempolicy nodemask would be reported as a hit which is a
      bit weird but that was the case before b9f00e14 already.
      
      Fixes: b9f00e14 ("mm, page_alloc: reduce branches in zone_statistics")
      Link: http://lkml.kernel.org/r/20170102153057.9451-2-mhocko@kernel.orgSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reported-by: default avatarJia He <hejianet@gmail.com>
      Reviewed-by: Vlastimil Babka <vbabka@suse.cz> # with cbmc[1] superpowers
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2df26639
    • Dan Williams's avatar
      mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done} · f931ab47
      Dan Williams authored
      Both arch_add_memory() and arch_remove_memory() expect a single threaded
      context.
      
      For example, arch/x86/mm/init_64.c::kernel_physical_mapping_init() does
      not hold any locks over this check and branch:
      
          if (pgd_val(*pgd)) {
          	pud = (pud_t *)pgd_page_vaddr(*pgd);
          	paddr_last = phys_pud_init(pud, __pa(vaddr),
          				   __pa(vaddr_end),
          				   page_size_mask);
          	continue;
          }
      
          pud = alloc_low_page();
          paddr_last = phys_pud_init(pud, __pa(vaddr), __pa(vaddr_end),
          			   page_size_mask);
      
      The result is that two threads calling devm_memremap_pages()
      simultaneously can end up colliding on pgd initialization.  This leads
      to crash signatures like the following where the loser of the race
      initializes the wrong pgd entry:
      
          BUG: unable to handle kernel paging request at ffff888ebfff0000
          IP: memcpy_erms+0x6/0x10
          PGD 2f8e8fc067 PUD 0 /* <---- Invalid PUD */
          Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
          CPU: 54 PID: 3818 Comm: systemd-udevd Not tainted 4.6.7+ #13
          task: ffff882fac290040 ti: ffff882f887a4000 task.ti: ffff882f887a4000
          RIP: memcpy_erms+0x6/0x10
          [..]
          Call Trace:
            ? pmem_do_bvec+0x205/0x370 [nd_pmem]
            ? blk_queue_enter+0x3a/0x280
            pmem_rw_page+0x38/0x80 [nd_pmem]
            bdev_read_page+0x84/0xb0
      
      Hold the standard memory hotplug mutex over calls to
      arch_{add,remove}_memory().
      
      Fixes: 41e94a85 ("add devm_memremap_pages")
      Link: http://lkml.kernel.org/r/148357647831.9498.12606007370121652979.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f931ab47
    • Eric Ren's avatar
      ocfs2: fix crash caused by stale lvb with fsdlm plugin · e7ee2c08
      Eric Ren authored
      The crash happens rather often when we reset some cluster nodes while
      nodes contend fiercely to do truncate and append.
      
      The crash backtrace is below:
      
         dlm: C21CBDA5E0774F4BA5A9D4F317717495: dlm_recover_grant 1 locks on 971 resources
         dlm: C21CBDA5E0774F4BA5A9D4F317717495: dlm_recover 9 generation 5 done: 4 ms
         ocfs2: Begin replay journal (node 318952601, slot 2) on device (253,18)
         ocfs2: End replay journal (node 318952601, slot 2) on device (253,18)
         ocfs2: Beginning quota recovery on device (253,18) for slot 2
         ocfs2: Finishing quota recovery on device (253,18) for slot 2
         (truncate,30154,1):ocfs2_truncate_file:470 ERROR: bug expression: le64_to_cpu(fe->i_size) != i_size_read(inode)
         (truncate,30154,1):ocfs2_truncate_file:470 ERROR: Inode 290321, inode i_size = 732 != di i_size = 937, i_flags = 0x1
         ------------[ cut here ]------------
         kernel BUG at /usr/src/linux/fs/ocfs2/file.c:470!
         invalid opcode: 0000 [#1] SMP
         Modules linked in: ocfs2_stack_user(OEN) ocfs2(OEN) ocfs2_nodemanager ocfs2_stackglue(OEN) quota_tree dlm(OEN) configfs fuse sd_mod    iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi af_packet iscsi_ibft iscsi_boot_sysfs softdog xfs libcrc32c ppdev parport_pc pcspkr parport      joydev virtio_balloon virtio_net i2c_piix4 acpi_cpufreq button processor ext4 crc16 jbd2 mbcache ata_generic cirrus virtio_blk ata_piix               drm_kms_helper ahci syscopyarea libahci sysfillrect sysimgblt fb_sys_fops ttm floppy libata drm virtio_pci virtio_ring uhci_hcd virtio ehci_hcd       usbcore serio_raw usb_common sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4
         Supported: No, Unsupported modules are loaded
         CPU: 1 PID: 30154 Comm: truncate Tainted: G           OE   N  4.4.21-69-default #1
         Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20151112_172657-sheep25 04/01/2014
         task: ffff88004ff6d240 ti: ffff880074e68000 task.ti: ffff880074e68000
         RIP: 0010:[<ffffffffa05c8c30>]  [<ffffffffa05c8c30>] ocfs2_truncate_file+0x640/0x6c0 [ocfs2]
         RSP: 0018:ffff880074e6bd50  EFLAGS: 00010282
         RAX: 0000000000000074 RBX: 000000000000029e RCX: 0000000000000000
         RDX: 0000000000000001 RSI: 0000000000000246 RDI: 0000000000000246
         RBP: ffff880074e6bda8 R08: 000000003675dc7a R09: ffffffff82013414
         R10: 0000000000034c50 R11: 0000000000000000 R12: ffff88003aab3448
         R13: 00000000000002dc R14: 0000000000046e11 R15: 0000000000000020
         FS:  00007f839f965700(0000) GS:ffff88007fc80000(0000) knlGS:0000000000000000
         CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
         CR2: 00007f839f97e000 CR3: 0000000036723000 CR4: 00000000000006e0
         Call Trace:
           ocfs2_setattr+0x698/0xa90 [ocfs2]
           notify_change+0x1ae/0x380
           do_truncate+0x5e/0x90
           do_sys_ftruncate.constprop.11+0x108/0x160
           entry_SYSCALL_64_fastpath+0x12/0x6d
         Code: 24 28 ba d6 01 00 00 48 c7 c6 30 43 62 a0 8b 41 2c 89 44 24 08 48 8b 41 20 48 c7 c1 78 a3 62 a0 48 89 04 24 31 c0 e8 a0 97 f9 ff <0f> 0b 3d 00 fe ff ff 0f 84 ab fd ff ff 83 f8 fc 0f 84 a2 fd ff
         RIP  [<ffffffffa05c8c30>] ocfs2_truncate_file+0x640/0x6c0 [ocfs2]
      
      It's because ocfs2_inode_lock() get us stale LVB in which the i_size is
      not equal to the disk i_size.  We mistakenly trust the LVB because the
      underlaying fsdlm dlm_lock() doesn't set lkb_sbflags with
      DLM_SBF_VALNOTVALID properly for us.  But, why?
      
      The current code tries to downconvert lock without DLM_LKF_VALBLK flag
      to tell o2cb don't update RSB's LVB if it's a PR->NULL conversion, even
      if the lock resource type needs LVB.  This is not the right way for
      fsdlm.
      
      The fsdlm plugin behaves different on DLM_LKF_VALBLK, it depends on
      DLM_LKF_VALBLK to decide if we care about the LVB in the LKB.  If
      DLM_LKF_VALBLK is not set, fsdlm will skip recovering RSB's LVB from
      this lkb and set the right DLM_SBF_VALNOTVALID appropriately when node
      failure happens.
      
      The following diagram briefly illustrates how this crash happens:
      
      RSB1 is inode metadata lock resource with LOCK_TYPE_USES_LVB;
      
      The 1st round:
      
                   Node1                                    Node2
      RSB1: PR
                                                        RSB1(master): NULL->EX
      ocfs2_downconvert_lock(PR->NULL, set_lvb==0)
        ocfs2_dlm_lock(no DLM_LKF_VALBLK)
      
      - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
      
      dlm_lock(no DLM_LKF_VALBLK)
        convert_lock(overwrite lkb->lkb_exflags
                     with no DLM_LKF_VALBLK)
      
      RSB1: NULL                                        RSB1: EX
                                                        reset Node2
      dlm_recover_rsbs()
        recover_lvb()
      
      /* The LVB is not trustable if the node with EX fails and
       * no lock >= PR is left. We should set RSB_VALNOTVALID for RSB1.
       */
      
       if(!(kb_exflags & DLM_LKF_VALBLK)) /* This means we miss the chance to
                 return;                   * to invalid the LVB here.
                                           */
      
      The 2nd round:
      
               Node 1                                Node2
      RSB1(become master from recovery)
      
      ocfs2_setattr()
        ocfs2_inode_lock(NULL->EX)
          /* dlm_lock() return the stale lvb without setting DLM_SBF_VALNOTVALID */
          ocfs2_meta_lvb_is_trustable() return 1 /* so we don't refresh inode from disk */
        ocfs2_truncate_file()
            mlog_bug_on_msg(disk isize != i_size_read(inode))  /* crash! */
      
      The fix is quite straightforward.  We keep to set DLM_LKF_VALBLK flag
      for dlm_lock() if the lock resource type needs LVB and the fsdlm plugin
      is uesed.
      
      Link: http://lkml.kernel.org/r/1481275846-6604-1-git-send-email-zren@suse.comSigned-off-by: default avatarEric Ren <zren@suse.com>
      Reviewed-by: default avatarJoseph Qi <jiangqi903@gmail.com>
      Cc: Mark Fasheh <mfasheh@versity.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e7ee2c08
    • Michal Hocko's avatar
      bpf: do not use KMALLOC_SHIFT_MAX · 7984c27c
      Michal Hocko authored
      Commit 01b3f521 ("bpf: fix allocation warnings in bpf maps and
      integer overflow") has added checks for the maximum allocateable size.
      It (ab)used KMALLOC_SHIFT_MAX for that purpose.
      
      While this is not incorrect it is not very clean because we already have
      KMALLOC_MAX_SIZE for this very reason so let's change both checks to use
      KMALLOC_MAX_SIZE instead.
      
      The original motivation for using KMALLOC_SHIFT_MAX was to work around
      an incorrect KMALLOC_MAX_SIZE which could lead to allocation warnings
      but it is no longer needed since "slab: make sure that KMALLOC_MAX_SIZE
      will fit into MAX_ORDER".
      
      Link: http://lkml.kernel.org/r/20161220130659.16461-3-mhocko@kernel.orgSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7984c27c
    • Michal Hocko's avatar
      mm, slab: make sure that KMALLOC_MAX_SIZE will fit into MAX_ORDER · bb1107f7
      Michal Hocko authored
      Andrey Konovalov has reported the following warning triggered by the
      syzkaller fuzzer.
      
        WARNING: CPU: 1 PID: 9935 at mm/page_alloc.c:3511 __alloc_pages_nodemask+0x159c/0x1e20
        Kernel panic - not syncing: panic_on_warn set ...
        CPU: 1 PID: 9935 Comm: syz-executor0 Not tainted 4.9.0-rc7+ #34
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
        Call Trace:
          __alloc_pages_slowpath mm/page_alloc.c:3511
          __alloc_pages_nodemask+0x159c/0x1e20 mm/page_alloc.c:3781
          alloc_pages_current+0x1c7/0x6b0 mm/mempolicy.c:2072
          alloc_pages include/linux/gfp.h:469
          kmalloc_order+0x1f/0x70 mm/slab_common.c:1015
          kmalloc_order_trace+0x1f/0x160 mm/slab_common.c:1026
          kmalloc_large include/linux/slab.h:422
          __kmalloc+0x210/0x2d0 mm/slub.c:3723
          kmalloc include/linux/slab.h:495
          ep_write_iter+0x167/0xb50 drivers/usb/gadget/legacy/inode.c:664
          new_sync_write fs/read_write.c:499
          __vfs_write+0x483/0x760 fs/read_write.c:512
          vfs_write+0x170/0x4e0 fs/read_write.c:560
          SYSC_write fs/read_write.c:607
          SyS_write+0xfb/0x230 fs/read_write.c:599
          entry_SYSCALL_64_fastpath+0x1f/0xc2
      
      The issue is caused by a lack of size check for the request size in
      ep_write_iter which should be fixed.  It, however, points to another
      problem, that SLUB defines KMALLOC_MAX_SIZE too large because the its
      KMALLOC_SHIFT_MAX is (MAX_ORDER + PAGE_SHIFT) which means that the
      resulting page allocator request might be MAX_ORDER which is too large
      (see __alloc_pages_slowpath).
      
      The same applies to the SLOB allocator which allows even larger sizes.
      Make sure that they are capped properly and never request more than
      MAX_ORDER order.
      
      Link: http://lkml.kernel.org/r/20161220130659.16461-2-mhocko@kernel.orgSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bb1107f7
    • Ross Zwisler's avatar
      dax: wrprotect pmd_t in dax_mapping_entry_mkclean · f729c8c9
      Ross Zwisler authored
      Currently dax_mapping_entry_mkclean() fails to clean and write protect
      the pmd_t of a DAX PMD entry during an *sync operation.  This can result
      in data loss in the following sequence:
      
      1) mmap write to DAX PMD, dirtying PMD radix tree entry and making the
         pmd_t dirty and writeable
      2) fsync, flushing out PMD data and cleaning the radix tree entry. We
         currently fail to mark the pmd_t as clean and write protected.
      3) more mmap writes to the PMD.  These don't cause any page faults since
         the pmd_t is dirty and writeable.  The radix tree entry remains clean.
      4) fsync, which fails to flush the dirty PMD data because the radix tree
         entry was clean.
      5) crash - dirty data that should have been fsync'd as part of 4) could
         still have been in the processor cache, and is lost.
      
      Fix this by marking the pmd_t clean and write protected in
      dax_mapping_entry_mkclean(), which is called as part of the fsync
      operation 2).  This will cause the writes in step 3) above to generate
      page faults where we'll re-dirty the PMD radix tree entry, resulting in
      flushes in the fsync that happens in step 4).
      
      Fixes: 4b4bb46d ("dax: clear dirty entry tags on cache flush")
      Link: http://lkml.kernel.org/r/1482272586-21177-3-git-send-email-ross.zwisler@linux.intel.comSigned-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f729c8c9
    • Ross Zwisler's avatar
      mm: add follow_pte_pmd() · 09796395
      Ross Zwisler authored
      Patch series "Write protect DAX PMDs in *sync path".
      
      Currently dax_mapping_entry_mkclean() fails to clean and write protect
      the pmd_t of a DAX PMD entry during an *sync operation.  This can result
      in data loss, as detailed in patch 2.
      
      This series is based on Dan's "libnvdimm-pending" branch, which is the
      current home for Jan's "dax: Page invalidation fixes" series.  You can
      find a working tree here:
      
        https://git.kernel.org/cgit/linux/kernel/git/zwisler/linux.git/log/?h=dax_pmd_clean
      
      This patch (of 2):
      
      Similar to follow_pte(), follow_pte_pmd() allows either a PTE leaf or a
      huge page PMD leaf to be found and returned.
      
      Link: http://lkml.kernel.org/r/1482272586-21177-2-git-send-email-ross.zwisler@linux.intel.comSigned-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Suggested-by: default avatarDave Hansen <dave.hansen@intel.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      09796395
    • Aneesh Kumar K.V's avatar
      mm/thp/pagecache/collapse: free the pte page table on collapse for thp page cache. · d670ffd8
      Aneesh Kumar K.V authored
      With THP page cache, when trying to build a huge page from regular pte
      pages, we just clear the pmd entry.  We will take another fault and at
      that point we will find the huge page in the radix tree, thereby using
      the huge page to complete the page fault
      
      The second fault path will allocate the needed pgtable_t page for archs
      like ppc64.  So no need to deposit the same in collapse path.
      Depositing them in the collapse path resulting in a pgtable_t memory
      leak also giving errors like
      
        BUG: non-zero nr_ptes on freeing mm: 3
      
      Fixes: 953c66c2 ("mm: THP page cache support for ppc64")
      Link: http://lkml.kernel.org/r/20161212163428.6780-2-aneesh.kumar@linux.vnet.ibm.comSigned-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d670ffd8
    • Ross Zwisler's avatar
      dax: fix deadlock with DAX 4k holes · 965d004a
      Ross Zwisler authored
      Currently in DAX if we have three read faults on the same hole address we
      can end up with the following:
      
      Thread 0		Thread 1		Thread 2
      --------		--------		--------
      dax_iomap_fault
       grab_mapping_entry
        lock_slot
         <locks empty DAX entry>
      
        			dax_iomap_fault
      			 grab_mapping_entry
      			  get_unlocked_mapping_entry
      			   <sleeps on empty DAX entry>
      
      						dax_iomap_fault
      						 grab_mapping_entry
      						  get_unlocked_mapping_entry
      						   <sleeps on empty DAX entry>
        dax_load_hole
         find_or_create_page
         ...
          page_cache_tree_insert
           dax_wake_mapping_entry_waiter
            <wakes one sleeper>
           __radix_tree_replace
            <swaps empty DAX entry with 4k zero page>
      
      			<wakes>
      			get_page
      			lock_page
      			...
      			put_locked_mapping_entry
      			unlock_page
      			put_page
      
      						<sleeps forever on the DAX
      						 wait queue>
      
      The crux of the problem is that once we insert a 4k zero page, all
      locking from then on is done in terms of that 4k zero page and any
      additional threads sleeping on the empty DAX entry will never be woken.
      
      Fix this by waking all sleepers when we replace the DAX radix tree entry
      with a 4k zero page.  This will allow all sleeping threads to
      successfully transition from locking based on the DAX empty entry to
      locking on the 4k zero page.
      
      With the test case reported by Xiong this happens very regularly in my
      test setup, with some runs resulting in 9+ threads in this deadlocked
      state.  With this fix I've been able to run that same test dozens of
      times in a loop without issue.
      
      Fixes: ac401cc7 ("dax: New fault locking")
      Link: http://lkml.kernel.org/r/1483479365-13607-1-git-send-email-ross.zwisler@linux.intel.comSigned-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Reported-by: default avatarXiong Zhou <xzhou@redhat.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: <stable@vger.kernel.org>	[4.7+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      965d004a
    • Vlastimil Babka's avatar
      MAINTAINERS: remove duplicate bug filling description · 5771f6ea
      Vlastimil Babka authored
      I have noticed that two different descriptions for B: entries in
      MAINTAINERS were merged: commit 68656443 ("MAINTAINERS: Add bug
      tracking system location entry type") and 2de2bd95 ("MAINTAINERS:
      add "B:" for URI where to file bugs").
      
      This patch keeps the description from 2de2bd95.  There has been a
      discussion [1] about whether this more detailed description is useful
      and what it exactly implies.  I find it more useful and general, and the
      author of 68656443 agreed in the end that either is fine.
      
      [1] https://lkml.org/lkml/2016/12/8/71
      
      Link: http://lkml.kernel.org/r/20161219085158.12114-1-vbabka@suse.czSigned-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5771f6ea
  2. 09 Jan, 2017 6 commits
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-for-v4.10-rc4' of git://people.freedesktop.org/~airlied/linux · bd5d7428
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "amdgpu, radeon, msm, meson, tilcdc, drm fixes.
      
        Just back online for a couple of days, gathered up the remaining fixes
        pull requests.
      
        This contains fixes for a few ARM platforms (msm, tilcdc, meson), and
        one core atomic fix. The AMD pull has some new hardware support
        (Polaris12) in it, but this is pretty limited to just hw enablement
        and shouldn't cause any problems"
      
      * tag 'drm-fixes-for-v4.10-rc4' of git://people.freedesktop.org/~airlied/linux:
        drm/amdgpu: drop verde dpm quirks
        drm/radeon: drop verde dpm quirks
        drm/radeon: update smc firmware selection for SI
        drm/amdgpu: update si kicker smc firmware
        drm/amd/powerplay: extend smu's response timeout time.
        drm/amdgpu: remove static integer for uvd pp state
        drm/amd/amdgpu: add Polaris12 PCI ID
        drm/amdgpu/powerplay: add Polaris12 support
        drm/amd/amdgpu: add Polaris12 support (v3)
        MAINTAINERS: Update mailing list for radeon and amdgpu
        drm/meson: Fix CVBS VDAC disable
        drm/meson: Fix CVBS initialization when HDMI is configured by bootloader
        drm: Clean up planes in atomic commit helper failure path
        drm: tilcdc: simplify the recovery from sync lost error on rev1
        drm/meson: Fix plane atomic check when no crtc for the plane
        drm/msm: Verify that MSM_SUBMIT_BO_FLAGS are set
        drm/msm: Put back the vaddr in submit_reloc()
        drm/msm: Ensure that the hardware write pointer is valid
      bd5d7428
    • Linus Torvalds's avatar
      Merge tag 'gpio-v4.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio · 756a7334
      Linus Torvalds authored
      Pull GPIO fixes from Linus Walleij:
      
       - move freeing of GPIO hogs to after freeing the device to get rid of a
         warning state.
      
       - a small compile warning fix
      
      * tag 'gpio-v4.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
        gpio: Move freeing of GPIO hogs before numbing of the device
        gpio: mxs: remove __init annotation
      756a7334
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · c92f5bdc
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix dumping of nft_quota entries, from Pablo Neira Ayuso.
      
       2) Fix out of bounds access in nf_tables discovered by KASAN, from
          Florian Westphal.
      
       3) Fix IRQ enabling in dp83867 driver, from Grygorii Strashko.
      
       4) Fix unicast filtering in be2net driver, from Ivan Vecera.
      
       5) tg3_get_stats64() can race with driver close and ethtool
          reconfigurations, fix from Michael Chan.
      
       6) Fix error handling when pass limit is reached in bpf code gen on
          x86. From Daniel Borkmann.
      
       7) Don't clobber switch ops and use proper MDIO nested reads and writes
          in bcm_sf2 driver, from Florian Fainelli.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (21 commits)
        net: dsa: bcm_sf2: Utilize nested MDIO read/write
        net: dsa: bcm_sf2: Do not clobber b53_switch_ops
        net: stmmac: fix maxmtu assignment to be within valid range
        bpf: change back to orig prog on too many passes
        tg3: Fix race condition in tg3_get_stats64().
        be2net: fix unicast list filling
        be2net: fix accesses to unicast list
        netlabel: add CALIPSO to the list of built-in protocols
        vti6: fix device register to report IFLA_INFO_KIND
        net: phy: dp83867: fix irq generation
        amd-xgbe: Fix IRQ processing when running in single IRQ mode
        sh_eth: R8A7740 supports packet shecksumming
        sh_eth: fix EESIPR values for SH77{34|63}
        r8169: fix the typo in the comment
        nl80211: fix sched scan netlink socket owner destruction
        bridge: netfilter: Fix dropping packets that moving through bridge interface
        netfilter: ipt_CLUSTERIP: check duplicate config when initializing
        netfilter: nft_payload: mangle ckecksum if NFT_PAYLOAD_L4CSUM_PSEUDOHDR is set
        netfilter: nf_tables: fix oob access
        netfilter: nft_queue: use raw_smp_processor_id()
        ...
      c92f5bdc
    • David S. Miller's avatar
      Merge branch 'bcm_sf2-fixes' · 03430fa1
      David S. Miller authored
      Florian Fainelli says:
      
      ====================
      net: dsa: bcm_sf2: Couple fixes
      
      Here are a couple of fixes for bcm_sf2, please queue these up for -stable
      as well, thank you very much!
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      03430fa1
    • Florian Fainelli's avatar
      net: dsa: bcm_sf2: Utilize nested MDIO read/write · 2cfe8f82
      Florian Fainelli authored
      We are implementing a MDIO bus which is behind another one, so use the
      nested version of the accessors to get lockdep annotations correct.
      
      Fixes: 461cd1b0 ("net: dsa: bcm_sf2: Register our slave MDIO bus")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2cfe8f82
    • Florian Fainelli's avatar
      net: dsa: bcm_sf2: Do not clobber b53_switch_ops · a4c61b92
      Florian Fainelli authored
      We make the bcm_sf2 driver override ds->ops which points to
      b53_switch_ops since b53_switch_alloc() did the assignent. This is all
      well and good until a second b53 switch comes in, and ends up using the
      bcm_sf2 operations. Make a proper local copy, substitute the ds->ops
      pointer and then override the operations.
      
      Fixes: f458995b ("net: dsa: bcm_sf2: Utilize core B53 driver when possible")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a4c61b92
  3. 08 Jan, 2017 15 commits
    • Dave Airlie's avatar
      Merge branch 'drm-fixes-4.10' of git://people.freedesktop.org/~agd5f/linux into drm-fixes · 6edd870b
      Dave Airlie authored
      Fixes for 4.10:
      - Polaris 12 support
      - Add new amd-gfx mailing list to MAINTAINERS file
      - UVD clockgating fix
      - SI dpm fixes
      
      * 'drm-fixes-4.10' of git://people.freedesktop.org/~agd5f/linux:
        drm/amdgpu: drop verde dpm quirks
        drm/radeon: drop verde dpm quirks
        drm/radeon: update smc firmware selection for SI
        drm/amdgpu: update si kicker smc firmware
        drm/amd/powerplay: extend smu's response timeout time.
        drm/amdgpu: remove static integer for uvd pp state
        drm/amd/amdgpu: add Polaris12 PCI ID
        drm/amdgpu/powerplay: add Polaris12 support
        drm/amd/amdgpu: add Polaris12 support (v3)
        MAINTAINERS: Update mailing list for radeon and amdgpu
      6edd870b
    • Kweh, Hock Leong's avatar
      net: stmmac: fix maxmtu assignment to be within valid range · a2cd64f3
      Kweh, Hock Leong authored
      There is no checking valid value of maxmtu when getting it from
      device tree. This resolution added the checking condition to
      ensure the assignment is made within a valid range.
      Signed-off-by: default avatarKweh, Hock Leong <hock.leong.kweh@intel.com>
      Reviewed-by: default avatarAndy Shevchenko <andy.shevchenko@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a2cd64f3
    • Dave Airlie's avatar
      Merge branch 'msm-fixes-4.10' of git://people.freedesktop.org/~robclark/linux into drm-fixes · 6906407e
      Dave Airlie authored
      A few fixes for 4.10.. the first fixes a long-standing logic bug, that
      by luck (ie. size of packets written into RB for a submit) wasn't hit
      on a3xx/a4xx but was causing intermittent GPU lockups on a5xx.  And a
      couple other robustness issues that Jordan noticed.
      
      * 'msm-fixes-4.10' of git://people.freedesktop.org/~robclark/linux:
        drm/msm: Verify that MSM_SUBMIT_BO_FLAGS are set
        drm/msm: Put back the vaddr in submit_reloc()
        drm/msm: Ensure that the hardware write pointer is valid
      6906407e
    • Dave Airlie's avatar
      Merge tag 'meson-drm-fixes-for-4.10' of... · 90e5d2d4
      Dave Airlie authored
      Merge tag 'meson-drm-fixes-for-4.10' of git://people.freedesktop.org/~narmstrong/linux into drm-fixes
      
      - plan atomic check oops fix
      - fix CVBS init when HDMI is configured by bootloader
      - fix CVBS VDAC disable
      
      * tag 'meson-drm-fixes-for-4.10' of git://people.freedesktop.org/~narmstrong/linux:
        drm/meson: Fix CVBS VDAC disable
        drm/meson: Fix CVBS initialization when HDMI is configured by bootloader
        drm/meson: Fix plane atomic check when no crtc for the plane
      90e5d2d4
    • Dave Airlie's avatar
      Merge tag 'tilcdc-4.10-fixes' of https://github.com/jsarha/linux into drm-fixes · 13fe46b5
      Dave Airlie authored
      tilcdc fixes for v4.10.
      
      * tag 'tilcdc-4.10-fixes' of https://github.com/jsarha/linux:
        drm: tilcdc: simplify the recovery from sync lost error on rev1
      13fe46b5
    • Dave Airlie's avatar
      Merge tag 'drm-misc-fixes-2017-01-04' of git://anongit.freedesktop.org/git/drm-misc into drm-fixes · e1ef6f71
      Dave Airlie authored
      single drm fix.
      
      * tag 'drm-misc-fixes-2017-01-04' of git://anongit.freedesktop.org/git/drm-misc:
        drm: Clean up planes in atomic commit helper failure path
      e1ef6f71
    • Linus Torvalds's avatar
      Linux 4.10-rc3 · a121103c
      Linus Torvalds authored
      a121103c
    • Daniel Borkmann's avatar
      bpf: change back to orig prog on too many passes · 9d5ecb09
      Daniel Borkmann authored
      If after too many passes still no image could be emitted, then
      swap back to the original program as we do in all other cases
      and don't use the one with blinding.
      
      Fixes: 959a7579 ("bpf, x86: add support for constant blinding")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9d5ecb09
    • Linus Torvalds's avatar
      Merge tag 'usb-4.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 83280e90
      Linus Torvalds authored
      Pull USB fixes from Greg KH:
       "Here are a bunch of USB fixes for 4.10-rc3. Yeah, it's a lot, an
        artifact of the holiday break I think.
      
        Lots of gadget and the usual XHCI fixups for reported issues (one day
        that driver will calm down...) Also included are a bunch of usb-serial
        driver fixes, and for good measure, a number of much-reported MUSB
        driver issues have finally been resolved.
      
        All of these have been in linux-next with no reported issues"
      
      * tag 'usb-4.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (72 commits)
        USB: fix problems with duplicate endpoint addresses
        usb: ohci-at91: use descriptor-based gpio APIs correctly
        usb: storage: unusual_uas: Add JMicron JMS56x to unusual device
        usb: hub: Move hub_port_disable() to fix warning if PM is disabled
        usb: musb: blackfin: add bfin_fifo_offset in bfin_ops
        usb: musb: fix compilation warning on unused function
        usb: musb: Fix trying to free already-free IRQ 4
        usb: musb: dsps: implement clear_ep_rxintr() callback
        usb: musb: core: add clear_ep_rxintr() to musb_platform_ops
        USB: serial: ti_usb_3410_5052: fix NULL-deref at open
        USB: serial: spcp8x5: fix NULL-deref at open
        USB: serial: quatech2: fix sleep-while-atomic in close
        USB: serial: pl2303: fix NULL-deref at open
        USB: serial: oti6858: fix NULL-deref at open
        USB: serial: omninet: fix NULL-derefs at open and disconnect
        USB: serial: mos7840: fix misleading interrupt-URB comment
        USB: serial: mos7840: remove unused write URB
        USB: serial: mos7840: fix NULL-deref at open
        USB: serial: mos7720: remove obsolete port initialisation
        USB: serial: mos7720: fix parallel probe
        ...
      83280e90
    • Linus Torvalds's avatar
      Merge tag 'char-misc-4.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · cc250e26
      Linus Torvalds authored
      Pull char/misc fixes from Greg KH:
       "Here are a few small char/misc driver fixes for 4.10-rc3.
      
        Two MEI driver fixes, and three NVMEM patches for reported issues, and
        a new Hyper-V driver MAINTAINER update. Nothing major at all, all have
        been in linux-next with no reported issues"
      
      * tag 'char-misc-4.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
        hyper-v: Add myself as additional MAINTAINER
        nvmem: fix nvmem_cell_read() return type doc
        nvmem: imx-ocotp: Fix wrong register size
        nvmem: qfprom: Allow single byte accesses for read/write
        mei: move write cb to completion on credentials failures
        mei: bus: fix mei_cldev_enable KDoc
      cc250e26
    • Linus Torvalds's avatar
      Merge tag 'staging-4.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · 6ea17ed1
      Linus Torvalds authored
      Pull staging/IIO fixes from Greg KH:
       "Here are some staging and IIO driver fixes for 4.10-rc3.
      
        Most of these are minor IIO fixes of reported issues, along with one
        network driver fix to resolve an issue. And a MAINTAINERS update with
        a new mailing list. All of these, except the MAINTAINERS file update,
        have been in linux-next with no reported issues (the MAINTAINERS patch
        happened on Friday...)"
      
      * tag 'staging-4.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
        MAINTAINERS: add greybus subsystem mailing list
        staging: octeon: Call SET_NETDEV_DEV()
        iio: accel: st_accel: fix LIS3LV02 reading and scaling
        iio: common: st_sensors: fix channel data parsing
        iio: max44000: correct value in illuminance_integration_time_available
        iio: adc: TI_AM335X_ADC should depend on HAS_DMA
        iio: bmi160: Fix time needed to sleep after command execution
        iio: 104-quad-8: Fix active level mismatch for the preset enable option
        iio: 104-quad-8: Fix off-by-one errors when addressing IOR
        iio: 104-quad-8: Fix index control configuration
      6ea17ed1
    • Michael Chan's avatar
      tg3: Fix race condition in tg3_get_stats64(). · f5992b72
      Michael Chan authored
      The driver's ndo_get_stats64() method is not always called under RTNL.
      So it can race with driver close or ethtool reconfigurations.  Fix the
      race condition by taking tp->lock spinlock in tg3_free_consistent()
      when freeing the tp->hw_stats memory block.  tg3_get_stats64() is
      already taking tp->lock.
      Reported-by: default avatarWang Yufen <wangyufen@huawei.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f5992b72
    • Ivan Vecera's avatar
      be2net: fix unicast list filling · 6052cd1a
      Ivan Vecera authored
      The adapter->pmac_id[0] item is used for primary MAC address but
      this is not true for adapter->uc_list[0] as is assumed in
      be_set_uc_list(). There are N UC addresses copied first from net_device
      to adapter->uc_list[1..N] and then N UC addresses from
      adapter->uc_list[0..N-1] are sent to HW. So the last UC address is never
      stored into HW and address 00:00:00:00;00:00 (from uc_list[0]) is used
      instead.
      
      Cc: Sathya Perla <sathya.perla@broadcom.com>
      Cc: Ajit Khaparde <ajit.khaparde@broadcom.com>
      Cc: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
      Cc: Somnath Kotur <somnath.kotur@broadcom.com>
      Fixes: b7172414 be2net: replace polling with sleeping in the FW completion path
      Signed-off-by: default avatarIvan Vecera <cera@cera.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6052cd1a
    • Johannes Weiner's avatar
      mm: workingset: fix use-after-free in shadow node shrinker · ea07b862
      Johannes Weiner authored
      Several people report seeing warnings about inconsistent radix tree
      nodes followed by crashes in the workingset code, which all looked like
      use-after-free access from the shadow node shrinker.
      
      Dave Jones managed to reproduce the issue with a debug patch applied,
      which confirmed that the radix tree shrinking indeed frees shadow nodes
      while they are still linked to the shadow LRU:
      
        WARNING: CPU: 2 PID: 53 at lib/radix-tree.c:643 delete_node+0x1e4/0x200
        CPU: 2 PID: 53 Comm: kswapd0 Not tainted 4.10.0-rc2-think+ #3
        Call Trace:
           delete_node+0x1e4/0x200
           __radix_tree_delete_node+0xd/0x10
           shadow_lru_isolate+0xe6/0x220
           __list_lru_walk_one.isra.4+0x9b/0x190
           list_lru_walk_one+0x23/0x30
           scan_shadow_nodes+0x2e/0x40
           shrink_slab.part.44+0x23d/0x5d0
           shrink_node+0x22c/0x330
           kswapd+0x392/0x8f0
      
      This is the WARN_ON_ONCE(!list_empty(&node->private_list)) placed in the
      inlined radix_tree_shrink().
      
      The problem is with 14b46879 ("mm: workingset: move shadow entry
      tracking to radix tree exceptional tracking"), which passes an update
      callback into the radix tree to link and unlink shadow leaf nodes when
      tree entries change, but forgot to pass the callback when reclaiming a
      shadow node.
      
      While the reclaimed shadow node itself is unlinked by the shrinker, its
      deletion from the tree can cause the left-most leaf node in the tree to
      be shrunk.  If that happens to be a shadow node as well, we don't unlink
      it from the LRU as we should.
      
      Consider this tree, where the s are shadow entries:
      
             root->rnode
                  |
             [0       n]
              |       |
           [s    ] [sssss]
      
      Now the shadow node shrinker reclaims the rightmost leaf node through
      the shadow node LRU:
      
             root->rnode
                  |
             [0        ]
              |
          [s     ]
      
      Because the parent of the deleted node is the first level below the
      root and has only one child in the left-most slot, the intermediate
      level is shrunk and the node containing the single shadow is put in
      its place:
      
             root->rnode
                  |
             [s        ]
      
      The shrinker again sees a single left-most slot in a first level node
      and thus decides to store the shadow in root->rnode directly and free
      the node - which is a leaf node on the shadow node LRU.
      
        root->rnode
             |
             s
      
      Without the update callback, the freed node remains on the shadow LRU,
      where it causes later shrinker runs to crash.
      
      Pass the node updater callback into __radix_tree_delete_node() in case
      the deletion causes the left-most branch in the tree to collapse too.
      
      Also add warnings when linked nodes are freed right away, rather than
      wait for the use-after-free when the list is scanned much later.
      
      Fixes: 14b46879 ("mm: workingset: move shadow entry tracking to radix tree exceptional tracking")
      Reported-by: default avatarDave Chinner <david@fromorbit.com>
      Reported-by: default avatarHugh Dickins <hughd@google.com>
      Reported-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Reported-and-tested-by: default avatarDave Jones <davej@codemonkey.org.uk>
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chris Leech <cleech@redhat.com>
      Cc: Lee Duncan <lduncan@suse.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Matthew Wilcox <mawilcox@linuxonhyperv.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ea07b862
    • Hugh Dickins's avatar
      mm: stop leaking PageTables · b0b9b3df
      Hugh Dickins authored
      4.10-rc loadtest (even on x86, and even without THPCache) fails with
      "fork: Cannot allocate memory" or some such; and /proc/meminfo shows
      PageTables growing.
      
      Commit 953c66c2 ("mm: THP page cache support for ppc64") that got
      merged in rc1 removed the freeing of an unused preallocated pagetable
      after do_fault_around() has called map_pages().
      
      This is usually a good optimization, so that the followup doesn't have
      to reallocate one; but it's not sufficient to shift the freeing into
      alloc_set_pte(), since there are failure cases (most commonly
      VM_FAULT_RETRY) which never reach finish_fault().
      
      Check and free it at the outer level in do_fault(), then we don't need
      to worry in alloc_set_pte(), and can restore that to how it was (I
      cannot find any reason to pte_free() under lock as it was doing).
      
      And fix a separate pagetable leak, or crash, introduced by the same
      change, that could only show up on some ppc64: why does do_set_pmd()'s
      failure case attempt to withdraw a pagetable when it never deposited
      one, at the same time overwriting (so leaking) the vmf->prealloc_pte?
      Residue of an earlier implementation, perhaps? Delete it.
      
      Fixes: 953c66c2 ("mm: THP page cache support for ppc64")
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b0b9b3df
  4. 07 Jan, 2017 4 commits
  5. 06 Jan, 2017 5 commits