1. 20 Jun, 2019 3 commits
  2. 17 Jun, 2019 1 commit
  3. 15 Jun, 2019 20 commits
  4. 14 Jun, 2019 16 commits
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 72a20cee
      Linus Torvalds authored
      Pull arm64 fixes from Will Deacon:
       "Here are some arm64 fixes for -rc5.
      
        The only non-trivial change (in terms of the diffstat) is fixing our
        SVE ptrace API for big-endian machines, but the majority of this is
        actually the addition of much-needed comments and updates to the
        documentation to try to avoid this mess biting us again in future.
      
        There are still a couple of small things on the horizon, but nothing
        major at this point.
      
        Summary:
      
         - Fix broken SVE ptrace API when running in a big-endian configuration
      
         - Fix performance regression due to off-by-one in TLBI range checking
      
         - Fix build regression when using Clang"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64/sve: Fix missing SVE/FPSIMD endianness conversions
        arm64: tlbflush: Ensure start/end of address range are aligned to stride
        arm64: Don't unconditionally add -Wno-psabi to KBUILD_CFLAGS
      72a20cee
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · fd6b99fa
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "16 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm/devm_memremap_pages: fix final page put race
        PCI/P2PDMA: track pgmap references per resource, not globally
        lib/genalloc: introduce chunk owners
        PCI/P2PDMA: fix the gen_pool_add_virt() failure path
        mm/devm_memremap_pages: introduce devm_memunmap_pages
        drivers/base/devres: introduce devm_release_action()
        mm/vmscan.c: fix trying to reclaim unevictable LRU page
        coredump: fix race condition between collapse_huge_page() and core dumping
        mm/mlock.c: change count_mm_mlocked_page_nr return type
        mm: mmu_gather: remove __tlb_reset_range() for force flush
        fs/ocfs2: fix race in ocfs2_dentry_attach_lock()
        mm/vmscan.c: fix recent_rotated history
        mm/mlock.c: mlockall error for flag MCL_ONFAULT
        scripts/decode_stacktrace.sh: prefix addr2line with $CROSS_COMPILE
        mm/list_lru.c: fix memory leak in __memcg_init_list_lru_node
        mm: memcontrol: don't batch updates of local VM stats and events
      fd6b99fa
    • Linus Torvalds's avatar
      Merge tag 'iommu-fixes-v5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · c78ad1be
      Linus Torvalds authored
      Pull iommu fixes from Joerg Roedel:
      
       - three fixes for Intel VT-d to fix a potential dead-lock, a formatting
         fix and a bit setting fix
      
       - one fix for the ARM-SMMU to make it work on some platforms with
         sub-optimal SMMU emulation
      
      * tag 'iommu-fixes-v5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
        iommu/arm-smmu: Avoid constant zero in TLBI writes
        iommu/vt-d: Set the right field for Page Walk Snoop
        iommu/vt-d: Fix lock inversion between iommu->lock and device_domain_lock
        iommu: Add missing new line for dma type
      c78ad1be
    • Linus Torvalds's avatar
      Merge tag 'gpio-v5.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio · 7617c9a0
      Linus Torvalds authored
      Pull GPIO fix from Linus Walleij:
       "A single fix for the PCA953x driver affecting some fringe variants of
        the chip"
      
      * tag 'gpio-v5.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
        gpio: pca953x: hack to fix 24 bit gpio expanders
      7617c9a0
    • Linus Torvalds's avatar
      Merge tag 'sound-5.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · bcb46a0e
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "It might feel like deja vu to receive a bulk of changes at rc5, and it
        happens again; we've got a collection of fixes for ASoC. Most of fixes
        are targeted for the newly merged SOF (Sound Open Firmware) stuff and
        the relevant fixes for Intel platforms.
      
        Other than that, there are a few regression fixes for the recent ASoC
        core changes and HD-audio quirk, as well as a couple of FireWire fixes
        and for other ASoC codecs"
      
      * tag 'sound-5.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (54 commits)
        Revert "ALSA: hda/realtek - Improve the headset mic for Acer Aspire laptops"
        ALSA: ice1712: Check correct return value to snd_i2c_sendbytes (EWS/DMX 6Fire)
        ALSA: oxfw: allow PCM capture for Stanton SCS.1m
        ALSA: firewire-motu: fix destruction of data for isochronous resources
        ASoC: Intel: sst: fix kmalloc call with wrong flags
        ASoC: core: Fix deadlock in snd_soc_instantiate_card()
        SoC: rt274: Fix internal jack assignment in set_jack callback
        ALSA: hdac: fix memory release for SST and SOF drivers
        ASoC: SOF: Intel: hda: use the defined ppcap functions
        ASoC: core: move DAI pre-links initiation to snd_soc_instantiate_card
        ASoC: Intel: cht_bsw_rt5672: fix kernel oops with platform_name override
        ASoC: Intel: cht_bsw_nau8824: fix kernel oops with platform_name override
        ASoC: Intel: bytcht_es8316: fix kernel oops with platform_name override
        ASoC: Intel: cht_bsw_max98090: fix kernel oops with platform_name override
        ASoC: sun4i-i2s: Add offset to RX channel select
        ASoC: sun4i-i2s: Fix sun8i tx channel offset mask
        ASoC: max98090: remove 24-bit format support if RJ is 0
        ASoC: da7219: Fix build error without CONFIG_I2C
        ASoC: SOF: Intel: hda: Fix COMPILE_TEST build error
        ASoC: SOF: fix DSP oops definitions in FW ABI
        ...
      bcb46a0e
    • Hui Wang's avatar
      Revert "ALSA: hda/realtek - Improve the headset mic for Acer Aspire laptops" · 17d30460
      Hui Wang authored
      This reverts commit 9cb40eb1.
      
      This patch introduces noise and headphone playback issue after
      rebooting or suspending/resuming. Let us revert it.
      
      BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=203831
      Fixes: 9cb40eb1 ("ALSA: hda/realtek - Improve the headset mic for Acer Aspire laptops")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarHui Wang <hui.wang@canonical.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      17d30460
    • Dan Williams's avatar
      mm/devm_memremap_pages: fix final page put race · 50f44ee7
      Dan Williams authored
      Logan noticed that devm_memremap_pages_release() kills the percpu_ref
      drops all the page references that were acquired at init and then
      immediately proceeds to unplug, arch_remove_memory(), the backing pages
      for the pagemap.  If for some reason device shutdown actually collides
      with a busy / elevated-ref-count page then arch_remove_memory() should
      be deferred until after that reference is dropped.
      
      As it stands the "wait for last page ref drop" happens *after*
      devm_memremap_pages_release() returns, which is obviously too late and
      can lead to crashes.
      
      Fix this situation by assigning the responsibility to wait for the
      percpu_ref to go idle to devm_memremap_pages() with a new ->cleanup()
      callback.  Implement the new cleanup callback for all
      devm_memremap_pages() users: pmem, devdax, hmm, and p2pdma.
      
      Link: http://lkml.kernel.org/r/155727339156.292046.5432007428235387859.stgit@dwillia2-desk3.amr.corp.intel.com
      Fixes: 41e94a85 ("add devm_memremap_pages")
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reported-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Reviewed-by: default avatarIra Weiny <ira.weiny@intel.com>
      Reviewed-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      50f44ee7
    • Dan Williams's avatar
      PCI/P2PDMA: track pgmap references per resource, not globally · 1570175a
      Dan Williams authored
      In preparation for fixing a race between devm_memremap_pages_release()
      and the final put of a page from the device-page-map, allocate a
      percpu-ref per p2pdma resource mapping.
      
      Link: http://lkml.kernel.org/r/155727338646.292046.9922678317501435597.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarIra Weiny <ira.weiny@intel.com>
      Reviewed-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1570175a
    • Dan Williams's avatar
      lib/genalloc: introduce chunk owners · 795ee306
      Dan Williams authored
      The p2pdma facility enables a provider to publish a pool of dma
      addresses for a consumer to allocate.  A genpool is used internally by
      p2pdma to collect dma resources, 'chunks', to be handed out to
      consumers.  Whenever a consumer allocates a resource it needs to pin the
      'struct dev_pagemap' instance that backs the chunk selected by
      pci_alloc_p2pmem().
      
      Currently that reference is taken globally on the entire provider
      device.  That sets up a lifetime mismatch whereby the p2pdma core needs
      to maintain hacks to make sure the percpu_ref is not released twice.
      
      This lifetime mismatch also stands in the way of a fix to
      devm_memremap_pages() whereby devm_memremap_pages_release() must wait for
      the percpu_ref ->release() callback to complete before it can proceed to
      teardown pages.
      
      So, towards fixing this situation, introduce the ability to store a 'chunk
      owner' at gen_pool_add() time, and a facility to retrieve the owner at
      gen_pool_{alloc,free}() time.  For p2pdma this will be used to store and
      recall individual dev_pagemap reference counter instances per-chunk.
      
      Link: http://lkml.kernel.org/r/155727338118.292046.13407378933221579644.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarIra Weiny <ira.weiny@intel.com>
      Reviewed-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      795ee306
    • Dan Williams's avatar
      PCI/P2PDMA: fix the gen_pool_add_virt() failure path · e615a191
      Dan Williams authored
      The pci_p2pdma_add_resource() implementation immediately frees the pgmap
      if gen_pool_add_virt() fails.  However, that means that when @dev
      triggers a devres release devm_memremap_pages_release() will crash
      trying to access the freed @pgmap.
      
      Use the new devm_memunmap_pages() to manually free the mapping in the
      error path.
      
      Link: http://lkml.kernel.org/r/155727337603.292046.13101332703665246702.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Fixes: 52916982 ("PCI/P2PDMA: Support peer-to-peer memory")
      Reviewed-by: default avatarIra Weiny <ira.weiny@intel.com>
      Acked-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e615a191
    • Dan Williams's avatar
      mm/devm_memremap_pages: introduce devm_memunmap_pages · 2e3f139e
      Dan Williams authored
      Use the new devm_release_action() facility to allow
      devm_memremap_pages_release() to be manually triggered.
      
      Link: http://lkml.kernel.org/r/155727337088.292046.5774214552136776763.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarIra Weiny <ira.weiny@intel.com>
      Reviewed-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2e3f139e
    • Dan Williams's avatar
      drivers/base/devres: introduce devm_release_action() · 2374b682
      Dan Williams authored
      Patch series "mm/devm_memremap_pages: Fix page release race", v2.
      
      Logan audited the devm_memremap_pages() shutdown path and noticed that
      it was possible to proceed to arch_remove_memory() before all potential
      page references have been reaped.
      
      Introduce a new ->cleanup() callback to do the work of waiting for any
      straggling page references and then perform the percpu_ref_exit() in
      devm_memremap_pages_release() context.
      
      For p2pdma this involves some deeper reworks to reference count
      resources on a per-instance basis rather than a per pci-device basis.  A
      modified genalloc api is introduced to convey a driver-private pointer
      through gen_pool_{alloc,free}() interfaces.  Also, a
      devm_memunmap_pages() api is introduced since p2pdma does not
      auto-release resources on a setup failure.
      
      The dax and pmem changes pass the nvdimm unit tests, and the p2pdma
      changes should now pass testing with the pci_p2pdma_release() fix.
      Jrme, how does this look for HMM?
      
      This patch (of 6):
      
      The devm_add_action() facility allows a resource allocation routine to
      add custom devm semantics.  One such user is devm_memremap_pages().
      
      There is now a need to manually trigger
      devm_memremap_pages_release().  Introduce devm_release_action() so the
      release action can be triggered via a new devm_memunmap_pages() api in a
      follow-on change.
      
      Link: http://lkml.kernel.org/r/155727336530.292046.2926860263201336366.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarIra Weiny <ira.weiny@intel.com>
      Reviewed-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2374b682
    • Minchan Kim's avatar
      mm/vmscan.c: fix trying to reclaim unevictable LRU page · a58f2cef
      Minchan Kim authored
      There was the below bug report from Wu Fangsuo.
      
      On the CMA allocation path, isolate_migratepages_range() could isolate
      unevictable LRU pages and reclaim_clean_page_from_list() can try to
      reclaim them if they are clean file-backed pages.
      
        page:ffffffbf02f33b40 count:86 mapcount:84 mapping:ffffffc08fa7a810 index:0x24
        flags: 0x19040c(referenced|uptodate|arch_1|mappedtodisk|unevictable|mlocked)
        raw: 000000000019040c ffffffc08fa7a810 0000000000000024 0000005600000053
        raw: ffffffc009b05b20 ffffffc009b05b20 0000000000000000 ffffffc09bf3ee80
        page dumped because: VM_BUG_ON_PAGE(PageLRU(page) || PageUnevictable(page))
        page->mem_cgroup:ffffffc09bf3ee80
        ------------[ cut here ]------------
        kernel BUG at /home/build/farmland/adroid9.0/kernel/linux/mm/vmscan.c:1350!
        Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
        Modules linked in:
        CPU: 0 PID: 7125 Comm: syz-executor Tainted: G S              4.14.81 #3
        Hardware name: ASR AQUILAC EVB (DT)
        task: ffffffc00a54cd00 task.stack: ffffffc009b00000
        PC is at shrink_page_list+0x1998/0x3240
        LR is at shrink_page_list+0x1998/0x3240
        pc : [<ffffff90083a2158>] lr : [<ffffff90083a2158>] pstate: 60400045
        sp : ffffffc009b05940
        ..
           shrink_page_list+0x1998/0x3240
           reclaim_clean_pages_from_list+0x3c0/0x4f0
           alloc_contig_range+0x3bc/0x650
           cma_alloc+0x214/0x668
           ion_cma_allocate+0x98/0x1d8
           ion_alloc+0x200/0x7e0
           ion_ioctl+0x18c/0x378
           do_vfs_ioctl+0x17c/0x1780
           SyS_ioctl+0xac/0xc0
      
      Wu found it's due to commit ad6b6704 ("mm: remove SWAP_MLOCK in
      ttu").  Before that, unevictable pages go to cull_mlocked so that we
      can't reach the VM_BUG_ON_PAGE line.
      
      To fix the issue, this patch filters out unevictable LRU pages from the
      reclaim_clean_pages_from_list in CMA.
      
      Link: http://lkml.kernel.org/r/20190524071114.74202-1-minchan@kernel.org
      Fixes: ad6b6704 ("mm: remove SWAP_MLOCK in ttu")
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Reported-by: default avatarWu Fangsuo <fangsuowu@asrmicro.com>
      Debugged-by: default avatarWu Fangsuo <fangsuowu@asrmicro.com>
      Tested-by: default avatarWu Fangsuo <fangsuowu@asrmicro.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Pankaj Suryawanshi <pankaj.suryawanshi@einfochips.com>
      Cc: <stable@vger.kernel.org>	[4.12+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a58f2cef
    • Andrea Arcangeli's avatar
      coredump: fix race condition between collapse_huge_page() and core dumping · 59ea6d06
      Andrea Arcangeli authored
      When fixing the race conditions between the coredump and the mmap_sem
      holders outside the context of the process, we focused on
      mmget_not_zero()/get_task_mm() callers in 04f5866e ("coredump: fix
      race condition between mmget_not_zero()/get_task_mm() and core
      dumping"), but those aren't the only cases where the mmap_sem can be
      taken outside of the context of the process as Michal Hocko noticed
      while backporting that commit to older -stable kernels.
      
      If mmgrab() is called in the context of the process, but then the
      mm_count reference is transferred outside the context of the process,
      that can also be a problem if the mmap_sem has to be taken for writing
      through that mm_count reference.
      
      khugepaged registration calls mmgrab() in the context of the process,
      but the mmap_sem for writing is taken later in the context of the
      khugepaged kernel thread.
      
      collapse_huge_page() after taking the mmap_sem for writing doesn't
      modify any vma, so it's not obvious that it could cause a problem to the
      coredump, but it happens to modify the pmd in a way that breaks an
      invariant that pmd_trans_huge_lock() relies upon.  collapse_huge_page()
      needs the mmap_sem for writing just to block concurrent page faults that
      call pmd_trans_huge_lock().
      
      Specifically the invariant that "!pmd_trans_huge()" cannot become a
      "pmd_trans_huge()" doesn't hold while collapse_huge_page() runs.
      
      The coredump will call __get_user_pages() without mmap_sem for reading,
      which eventually can invoke a lockless page fault which will need a
      functional pmd_trans_huge_lock().
      
      So collapse_huge_page() needs to use mmget_still_valid() to check it's
      not running concurrently with the coredump...  as long as the coredump
      can invoke page faults without holding the mmap_sem for reading.
      
      This has "Fixes: khugepaged" to facilitate backporting, but in my view
      it's more a bug in the coredump code that will eventually have to be
      rewritten to stop invoking page faults without the mmap_sem for reading.
      So the long term plan is still to drop all mmget_still_valid().
      
      Link: http://lkml.kernel.org/r/20190607161558.32104-1-aarcange@redhat.com
      Fixes: ba76149f ("thp: khugepaged")
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Reported-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      59ea6d06
    • swkhack's avatar
      mm/mlock.c: change count_mm_mlocked_page_nr return type · 0874bb49
      swkhack authored
      On a 64-bit machine the value of "vma->vm_end - vma->vm_start" may be
      negative when using 32 bit ints and the "count >> PAGE_SHIFT"'s result
      will be wrong.  So change the local variable and return value to
      unsigned long to fix the problem.
      
      Link: http://lkml.kernel.org/r/20190513023701.83056-1-swkhack@gmail.com
      Fixes: 0cf2f6f6 ("mm: mlock: check against vma for actual mlock() size")
      Signed-off-by: default avatarswkhack <swkhack@gmail.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0874bb49
    • Yang Shi's avatar
      mm: mmu_gather: remove __tlb_reset_range() for force flush · 7a30df49
      Yang Shi authored
      A few new fields were added to mmu_gather to make TLB flush smarter for
      huge page by telling what level of page table is changed.
      
      __tlb_reset_range() is used to reset all these page table state to
      unchanged, which is called by TLB flush for parallel mapping changes for
      the same range under non-exclusive lock (i.e.  read mmap_sem).
      
      Before commit dd2283f2 ("mm: mmap: zap pages with read mmap_sem in
      munmap"), the syscalls (e.g.  MADV_DONTNEED, MADV_FREE) which may update
      PTEs in parallel don't remove page tables.  But, the forementioned
      commit may do munmap() under read mmap_sem and free page tables.  This
      may result in program hang on aarch64 reported by Jan Stancek.  The
      problem could be reproduced by his test program with slightly modified
      below.
      
      ---8<---
      
      static int map_size = 4096;
      static int num_iter = 500;
      static long threads_total;
      
      static void *distant_area;
      
      void *map_write_unmap(void *ptr)
      {
      	int *fd = ptr;
      	unsigned char *map_address;
      	int i, j = 0;
      
      	for (i = 0; i < num_iter; i++) {
      		map_address = mmap(distant_area, (size_t) map_size, PROT_WRITE | PROT_READ,
      			MAP_SHARED | MAP_ANONYMOUS, -1, 0);
      		if (map_address == MAP_FAILED) {
      			perror("mmap");
      			exit(1);
      		}
      
      		for (j = 0; j < map_size; j++)
      			map_address[j] = 'b';
      
      		if (munmap(map_address, map_size) == -1) {
      			perror("munmap");
      			exit(1);
      		}
      	}
      
      	return NULL;
      }
      
      void *dummy(void *ptr)
      {
      	return NULL;
      }
      
      int main(void)
      {
      	pthread_t thid[2];
      
      	/* hint for mmap in map_write_unmap() */
      	distant_area = mmap(0, DISTANT_MMAP_SIZE, PROT_WRITE | PROT_READ,
      			MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
      	munmap(distant_area, (size_t)DISTANT_MMAP_SIZE);
      	distant_area += DISTANT_MMAP_SIZE / 2;
      
      	while (1) {
      		pthread_create(&thid[0], NULL, map_write_unmap, NULL);
      		pthread_create(&thid[1], NULL, dummy, NULL);
      
      		pthread_join(thid[0], NULL);
      		pthread_join(thid[1], NULL);
      	}
      }
      ---8<---
      
      The program may bring in parallel execution like below:
      
              t1                                        t2
      munmap(map_address)
        downgrade_write(&mm->mmap_sem);
        unmap_region()
        tlb_gather_mmu()
          inc_tlb_flush_pending(tlb->mm);
        free_pgtables()
          tlb->freed_tables = 1
          tlb->cleared_pmds = 1
      
                                              pthread_exit()
                                              madvise(thread_stack, 8M, MADV_DONTNEED)
                                                zap_page_range()
                                                  tlb_gather_mmu()
                                                    inc_tlb_flush_pending(tlb->mm);
      
        tlb_finish_mmu()
          if (mm_tlb_flush_nested(tlb->mm))
            __tlb_reset_range()
      
      __tlb_reset_range() would reset freed_tables and cleared_* bits, but this
      may cause inconsistency for munmap() which do free page tables.  Then it
      may result in some architectures, e.g.  aarch64, may not flush TLB
      completely as expected to have stale TLB entries remained.
      
      Use fullmm flush since it yields much better performance on aarch64 and
      non-fullmm doesn't yields significant difference on x86.
      
      The original proposed fix came from Jan Stancek who mainly debugged this
      issue, I just wrapped up everything together.
      
      Jan's testing results:
      
      v5.2-rc2-24-gbec7550c
      --------------------------
               mean     stddev
      real    37.382   2.780
      user     1.420   0.078
      sys     54.658   1.855
      
      v5.2-rc2-24-gbec7550c + "mm: mmu_gather: remove __tlb_reset_range() for force flush"
      ---------------------------------------------------------------------------------------_
               mean     stddev
      real    37.119   2.105
      user     1.548   0.087
      sys     55.698   1.357
      
      [akpm@linux-foundation.org: coding-style fixes]
      Link: http://lkml.kernel.org/r/1558322252-113575-1-git-send-email-yang.shi@linux.alibaba.com
      Fixes: dd2283f2 ("mm: mmap: zap pages with read mmap_sem in munmap")
      Signed-off-by: default avatarYang Shi <yang.shi@linux.alibaba.com>
      Signed-off-by: default avatarJan Stancek <jstancek@redhat.com>
      Reported-by: default avatarJan Stancek <jstancek@redhat.com>
      Tested-by: default avatarJan Stancek <jstancek@redhat.com>
      Suggested-by: default avatarWill Deacon <will.deacon@arm.com>
      Tested-by: default avatarWill Deacon <will.deacon@arm.com>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Nick Piggin <npiggin@gmail.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: <stable@vger.kernel.org>	[4.20+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7a30df49