Commits · 220387048d859896ccc362c0ebf9bc1e0fa62eb9 · Kirill Smelkov / linux

28 Sep, 2020 1 commit

ARM: Handle no IPI being registered in show_ipi_list() · 22038704

Marc Zyngier authored Sep 25, 2020

As SMP-on-UP is a valid configuration on 32bit ARM, do not assume that
IPIs are populated in show_ipi_list().
Reported-by: Guillaume Tucker <guillaume.tucker@collabora.com>
Reported-by: kernelci.org bot <bot@kernelci.org>
Tested-by: Guillaume Tucker <guillaume.tucker@collabora.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>

22038704

18 Sep, 2020 2 commits

arm: Move ipi_teardown() to a CONFIG_HOTPLUG_CPU section · ac15a54e

Marc Zyngier authored Sep 18, 2020

ipi_teardown() is only used when CONFIG_HOTPLUG_CPU is enabled.
Move the function to a location guarded by this config option.
Signed-off-by: Marc Zyngier <maz@kernel.org>

ac15a54e

arm64: Fix -Wunused-function warning when !CONFIG_HOTPLUG_CPU · 9d9edb96

YueHaibing authored Sep 18, 2020

If CONFIG_HOTPLUG_CPU is n, gcc warns:

arch/arm64/kernel/smp.c:967:13: warning: ‘ipi_teardown’ defined but not used [-Wunused-function]
 static void ipi_teardown(int cpu)
             ^~~~~~~~~~~~

Use #ifdef guard this.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200918123318.23764-1-yuehaibing@huawei.com

9d9edb96

17 Sep, 2020 13 commits

irqchip/gic: Cleanup Franken-GIC handling · 8594c3b8

Marc Zyngier authored Sep 15, 2020

Introduce a static key identifying Samsung's unique creation, allowing
to replace the indirect call to compute the base addresses with
a simple test on the static key.

Faster, cheaper, negative diffstat.
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>

8594c3b8

irqchip/bcm2836: Provide mask/unmask dummy methods for IPIs · c3330399

Marc Zyngier authored Sep 14, 2020

Although it doesn't seem possible to disable individual mailbox
interrupts, we still need to provide some callbacks.

Fixes: 09eb672ce4fb ("irqchip/bcm2836: Configure mailbox interrupts as standard interrupts")
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>

c3330399

ARM: Remove custom IRQ stat accounting · 5ebf353a

Marc Zyngier authored Jun 23, 2020

Let's switch the arm code to the core accounting, which already
does everything we need.
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>

5ebf353a

ARM: Kill __smp_cross_call and co · 8aa837cb

Marc Zyngier authored Jun 22, 2020

The old IPI registration interface is now unused on arm, so let's
get rid of it.
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>

8aa837cb

arm64: Remove custom IRQ stat accounting · a2638815

Marc Zyngier authored Jun 20, 2020

Let's switch the arm64 code to the core accounting, which already
does everything we need.
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>

a2638815

arm64: Kill __smp_cross_call and co · 5cebfd2d

Marc Zyngier authored May 09, 2020

The old IPI registration interface is now unused on arm64, so let's
get rid of it.
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>

5cebfd2d

irqchip/armada-370-xp: Configure IPIs as standard interrupts · f02147dd

Marc Zyngier authored Jun 22, 2020

To introduce IPIs as standard interrupts to the Armada 370-XP
driver, let's allocate a completely separate irqdomain and
irqchip combo that lives parallel to the "standard" one.

This effectively should be modelled as a chained interrupt
controller, but the code is in such a state that it is
pretty hard to shoehorn, as it would require the rewrite
of the MSI layer as well.
Signed-off-by: Marc Zyngier <maz@kernel.org>

f02147dd

irqchip/hip04: Configure IPIs as standard interrupts · a2df12c5

Marc Zyngier authored Jun 20, 2020

In order to switch the hip04 driver to provide standard interrupts
for IPIs, rework the way interrupts are allocated, making sure
the irqdomain covers the SGIs as well as the rest of the interrupt
range.

The driver is otherwise so old-school that it creates all interrupts
upfront (duh!), so there is hardly anything else to change, apart
from communicating the IPIs to the arch code.
Signed-off-by: Marc Zyngier <maz@kernel.org>

a2df12c5

irqchip/bcm2836: Configure mailbox interrupts as standard interrupts · 0809ae72

Marc Zyngier authored May 05, 2020

In order to switch the bcm2836 driver to privide standard interrupts
for IPIs, it first needs to stop lying about the way things work.

The mailbox interrupt is actually a multiplexer, with enough
bits to store 32 pending interrupts per CPU. So let's turn it
into a chained irqchip.

Once this is done, we can instanciate the corresponding IPIs,
and pass them to the architecture code.
Signed-off-by: Marc Zyngier <maz@kernel.org>

0809ae72

irqchip/gic-common: Don't enable SGIs by default · 3567c6ca

Marc Zyngier authored May 19, 2020

The architecture code now enables the IPIs as required, so no
need to enable SGIs by default in the GIC code.
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>

3567c6ca

irqchip/gic: Configure SGIs as standard interrupts · 64a267e9

Marc Zyngier authored Apr 25, 2020

Change the way we deal with GIC SGIs by turning them into proper
IRQs, and calling into the arch code to register the interrupt range
instead of a callback.
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>

64a267e9

irqchip/gic: Refactor SMP configuration · 7ec46b51

Marc Zyngier authored Apr 25, 2020

As we are about to change quite a lot of the SMP support code,
let's start by moving it around so that it minimizes the amount
of #ifdefery.
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>

7ec46b51

irqchip/gic-v3: Configure SGIs as standard interrupts · 64b499d8

Marc Zyngier authored Apr 25, 2020

Change the way we deal with GICv3 SGIs by turning them into proper
IRQs, and calling into the arch code to register the interrupt range
instead of a callback.
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>

64b499d8

13 Sep, 2020 5 commits

irqchip/gic-v3: Describe the SGI range · 70a29c32

Marc Zyngier authored Apr 25, 2020

As we are about to start making use of SGIs in a more conventional
way, let's describe it is the GICv3 list of interrupt types.
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>

70a29c32

ARM: Allow IPIs to be handled as normal interrupts · 56afcd3d

Marc Zyngier authored Jun 23, 2020

In order to deal with IPIs as normal interrupts, let's add
a new way to register them with the architecture code.

set_smp_ipi_range() takes a range of interrupts, and allows
the arch code to request them as if the were normal interrupts.
A standard handler is then called by the core IRQ code to deal
with the IPI.

This means that we don't need to call irq_enter/irq_exit, and
that we don't need to deal with set_irq_regs either. So let's
move the dispatcher into its own function, and leave handle_IPI()
as a compatibility function.

On the sending side, let's make use of ipi_send_mask, which
already exists for this purpose.

One of the major difference is that we end up, in some cases
(such as when performing IRQ time accounting on the scheduler
IPI), end up with nested irq_enter()/irq_exit() pairs.
Other than the (relatively small) overhead, there should be
no consequences to it (these pairs are designed to nest
correctly, and the accounting shouldn't be off).
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>

56afcd3d

arm64: Allow IPIs to be handled as normal interrupts · d3afc7f1

Marc Zyngier authored Apr 25, 2020

In order to deal with IPIs as normal interrupts, let's add
a new way to register them with the architecture code.

set_smp_ipi_range() takes a range of interrupts, and allows
the arch code to request them as if the were normal interrupts.
A standard handler is then called by the core IRQ code to deal
with the IPI.

This means that we don't need to call irq_enter/irq_exit, and
that we don't need to deal with set_irq_regs either. So let's
move the dispatcher into its own function, and leave handle_IPI()
as a compatibility function.

On the sending side, let's make use of ipi_send_mask, which
already exists for this purpose.

One of the major difference is that we end up, in some cases
(such as when performing IRQ time accounting on the scheduler
IPI), end up with nested irq_enter()/irq_exit() pairs.
Other than the (relatively small) overhead, there should be
no consequences to it (these pairs are designed to nest
correctly, and the accounting shouldn't be off).
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>

d3afc7f1

genirq: Allow interrupts to be excluded from /proc/interrupts · 83cfac95

Marc Zyngier authored May 19, 2020

A number of architectures implement IPI statistics directly,
duplicating the core kstat_irqs accounting. As we move IPIs to
being actual IRQs, we would end-up with a confusing display
in /proc/interrupts (where the IPIs would appear twice).

In order to solve this, allow interrupts to be flagged as
"hidden", which excludes them from /proc/interrupts.
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>

83cfac95

genirq: Add fasteoi IPI flow · c5e5ec03

Marc Zyngier authored May 19, 2020

For irqchips using the fasteoi flow, IPIs are a bit special.
They need to be EOI'd early (before calling the handler), as
funny things may happen in the handler (they do not necessarily
behave like a normal interrupt).
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>

c5e5ec03

07 Sep, 2020 1 commit
- Linux 5.9-rc4 · f4d51dff
  Linus Torvalds authored Sep 06, 2020
  
  f4d51dff
06 Sep, 2020 4 commits

Merge tag 'io_uring-5.9-2020-09-06' of git://git.kernel.dk/linux-block · a8205e31

Linus Torvalds authored Sep 06, 2020

Pull more io_uring fixes from Jens Axboe:
 "Two followup fixes. One is fixing a regression from this merge window,
  the other is two commits fixing cancelation of deferred requests.

  Both have gone through full testing, and both spawned a few new
  regression test additions to liburing.

   - Don't play games with const, properly store the output iovec and
     assign it as needed.

   - Deferred request cancelation fix (Pavel)"

* tag 'io_uring-5.9-2020-09-06' of git://git.kernel.dk/linux-block:
  io_uring: fix linked deferred ->files cancellation
  io_uring: fix cancel of deferred reqs with ->files
  io_uring: fix explicit async read/write mapping for large segments

a8205e31

Merge tag 'iommu-fixes-v5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · 2ccdd9f8

Linus Torvalds authored Sep 06, 2020

Pull iommu fixes from Joerg Roedel:

 - three Intel VT-d fixes to fix address handling on 32bit, fix a NULL
   pointer dereference bug and serialize a hardware register access as
   required by the VT-d spec.

 - two patches for AMD IOMMU to force AMD GPUs into translation mode
   when memory encryption is active and disallow using IOMMUv2
   functionality.  This makes the AMDGPU driver work when memory
   encryption is active.

 - two more fixes for AMD IOMMU to fix updating the Interrupt Remapping
   Table Entries.

 - MAINTAINERS file update for the Qualcom IOMMU driver.

* tag 'iommu-fixes-v5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/vt-d: Handle 36bit addressing for x86-32
  iommu/amd: Do not use IOMMUv2 functionality when SME is active
  iommu/amd: Do not force direct mapping when SME is active
  iommu/amd: Use cmpxchg_double() when updating 128-bit IRTE
  iommu/amd: Restore IRTE.RemapEn bit after programming IRTE
  iommu/vt-d: Fix NULL pointer dereference in dev_iommu_priv_set()
  iommu/vt-d: Serialize IOMMU GCMD register modifications
  MAINTAINERS: Update QUALCOMM IOMMU after Arm SMMU drivers move

2ccdd9f8

Merge tag 'x86-urgent-2020-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 015b3155

Linus Torvalds authored Sep 06, 2020

Pull x86 fixes from Ingo Molnar:

 - more generic entry code ABI fallout

 - debug register handling bugfixes

 - fix vmalloc mappings on 32-bit kernels

 - kprobes instrumentation output fix on 32-bit kernels

 - fix over-eager WARN_ON_ONCE() on !SMAP hardware

 - NUMA debugging fix

 - fix Clang related crash on !RETPOLINE kernels

* tag 'x86-urgent-2020-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/entry: Unbreak 32bit fast syscall
  x86/debug: Allow a single level of #DB recursion
  x86/entry: Fix AC assertion
  tracing/kprobes, x86/ptrace: Fix regs argument order for i386
  x86, fakenuma: Fix invalid starting node ID
  x86/mm/32: Bring back vmalloc faulting on x86_32
  x86/cmdline: Disable jump tables for cmdline.c

015b3155

Merge tag 'for-linus-5.9-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 68beef57

Linus Torvalds authored Sep 06, 2020

Pull xen updates from Juergen Gross:
 "A small series for fixing a problem with Xen PVH guests when running
  as backends (e.g. as dom0).

  Mapping other guests' memory is now working via ZONE_DEVICE, thus not
  requiring to abuse the memory hotplug functionality for that purpose"

* tag 'for-linus-5.9-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: add helpers to allocate unpopulated memory
  memremap: rename MEMORY_DEVICE_DEVDAX to MEMORY_DEVICE_GENERIC
  xen/balloon: add header guard

68beef57

05 Sep, 2020 14 commits

io_uring: fix linked deferred ->files cancellation · c127a2a1

Pavel Begunkov authored Sep 06, 2020

While looking for ->files in ->defer_list, consider that requests there
may actually be links.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

c127a2a1

io_uring: fix cancel of deferred reqs with ->files · b7ddce3c

Pavel Begunkov authored Sep 06, 2020

While trying to cancel requests with ->files, it also should look for
requests in ->defer_list, otherwise it might end up hanging a thread.

Cancel all requests in ->defer_list up to the last request there with
matching ->files, that's needed to follow drain ordering semantics.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

b7ddce3c

Merge tags 'auxdisplay-for-linus-v5.9-rc4', 'clang-format-for-linus-v5.9-rc4'... · dd9fb9bb

Linus Torvalds authored Sep 05, 2020

Merge tags 'auxdisplay-for-linus-v5.9-rc4', 'clang-format-for-linus-v5.9-rc4' and 'compiler-attributes-for-linus-v5.9-rc4' of git://github.com/ojeda/linux

Pull misc fixes from Miguel Ojeda:
 "A trivial patch for auxdisplay:

   - Replace HTTP links with HTTPS ones (Alexander A. Klimov)

  The usual clang-format trivial update:

   - Update with the latest for_each macro list (Miguel Ojeda)

  And Luc requested me to pick a sparse fix on my queue, so here it goes
  along with other two trivial Compiler Attributes ones (also from Luc).

   - sparse: use static inline for __chk_{user,io}_ptr() (Luc Van
     Oostenryck)

   - Compiler Attributes: fix comment concerning GCC 4.6 (Luc Van
     Oostenryck)

   - Compiler Attributes: remove comment about sparse not supporting
     __has_attribute (Luc Van Oostenryck)"

* tag 'auxdisplay-for-linus-v5.9-rc4' of git://github.com/ojeda/linux:
  auxdisplay: Replace HTTP links with HTTPS ones

* tag 'clang-format-for-linus-v5.9-rc4' of git://github.com/ojeda/linux:
  clang-format: Update with the latest for_each macro list

* tag 'compiler-attributes-for-linus-v5.9-rc4' of git://github.com/ojeda/linux:
  sparse: use static inline for __chk_{user,io}_ptr()
  Compiler Attributes: fix comment concerning GCC 4.6
  Compiler Attributes: remove comment about sparse not supporting __has_attribute

dd9fb9bb

Merge tag 'arc-5.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc · 70187f77

Linus Torvalds authored Sep 05, 2020

Pull ARC fixes from Vineet Gupta:

 - HSDK-4xd Dev system: perf driver updates for sampling interrupt

 - HSDK* Dev System: Ethernet broken [Evgeniy Didin]

 - HIGHMEM broken (2 memory banks) [Mike Rapoport]

 - show_regs() rewrite once and for all

 - Other minor fixes

* tag 'arc-5.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARC: [plat-hsdk]: Switch ethernet phy-mode to rgmii-id
  arc: fix memory initialization for systems with two memory banks
  irqchip/eznps: Fix build error for !ARC700 builds
  ARC: show_regs: fix r12 printing and simplify
  ARC: HSDK: wireup perf irq
  ARC: perf: don't bail setup if pct irq missing in device-tree
  ARC: pgalloc.h: delete a duplicated word + other fixes

70187f77

Merge branch 'akpm' (patches from Andrew) · 7514c036

Linus Torvalds authored Sep 05, 2020

Merge misc fixes from Andrew Morton:
 "19 patches.

  Subsystems affected by this patch series: MAINTAINERS, ipc, fork,
  checkpatch, lib, and mm (memcg, slub, pagemap, madvise, migration,
  hugetlb)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  include/linux/log2.h: add missing () around n in roundup_pow_of_two()
  mm/khugepaged.c: fix khugepaged's request size in collapse_file
  mm/hugetlb: fix a race between hugetlb sysctl handlers
  mm/hugetlb: try preferred node first when alloc gigantic page from cma
  mm/migrate: preserve soft dirty in remove_migration_pte()
  mm/migrate: remove unnecessary is_zone_device_page() check
  mm/rmap: fixup copying of soft dirty and uffd ptes
  mm/migrate: fixup setting UFFD_WP flag
  mm: madvise: fix vma user-after-free
  checkpatch: fix the usage of capture group ( ... )
  fork: adjust sysctl_max_threads definition to match prototype
  ipc: adjust proc_ipc_sem_dointvec definition to match prototype
  mm: track page table modifications in __apply_to_page_range()
  MAINTAINERS: IA64: mark Status as Odd Fixes only
  MAINTAINERS: add LLVM maintainers
  MAINTAINERS: update Cavium/Marvell entries
  mm: slub: fix conversion of freelist_corrupted()
  mm: memcg: fix memcg reclaim soft lockup
  memcg: fix use-after-free in uncharge_batch

7514c036

include/linux/log2.h: add missing () around n in roundup_pow_of_two() · 428fc0af

Jason Gunthorpe authored Sep 04, 2020

Otherwise gcc generates warnings if the expression is complicated.

Fixes: 312a0c17 ("[PATCH] LOG2: Alter roundup_pow_of_two() so that it can use a ilog2() on a constant")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: https://lkml.kernel.org/r/0-v1-8a2697e3c003+41165-log_brackets_jgg@nvidia.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>

428fc0af

mm/khugepaged.c: fix khugepaged's request size in collapse_file · e5a59d30

David Howells authored Sep 04, 2020

collapse_file() in khugepaged passes PAGE_SIZE as the number of pages to
be read to page_cache_sync_readahead().  The intent was probably to read
a single page.  Fix it to use the number of pages to the end of the
window instead.

Fixes: 99cb0dbd ("mm,thp: add read-only THP support for (non-shmem) FS")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Yang Shi <shy828301@gmail.com>
Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Eric Biggers <ebiggers@google.com>
Link: https://lkml.kernel.org/r/20200903140844.14194-2-willy@infradead.orgSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>

e5a59d30

mm/hugetlb: fix a race between hugetlb sysctl handlers · 17743798

Muchun Song authored Sep 04, 2020

There is a race between the assignment of `table->data` and write value
to the pointer of `table->data` in the __do_proc_doulongvec_minmax() on
the other thread.

  CPU0:                                 CPU1:
                                        proc_sys_write
  hugetlb_sysctl_handler                  proc_sys_call_handler
  hugetlb_sysctl_handler_common             hugetlb_sysctl_handler
    table->data = &tmp;                       hugetlb_sysctl_handler_common
                                                table->data = &tmp;
      proc_doulongvec_minmax
        do_proc_doulongvec_minmax           sysctl_head_finish
          __do_proc_doulongvec_minmax         unuse_table
            i = table->data;
            *i = val;  // corrupt CPU1's stack

Fix this by duplicating the `table`, and only update the duplicate of
it.  And introduce a helper of proc_hugetlb_doulongvec_minmax() to
simplify the code.

The following oops was seen:

    BUG: kernel NULL pointer dereference, address: 0000000000000000
    #PF: supervisor instruction fetch in kernel mode
    #PF: error_code(0x0010) - not-present page
    Code: Bad RIP value.
    ...
    Call Trace:
     ? set_max_huge_pages+0x3da/0x4f0
     ? alloc_pool_huge_page+0x150/0x150
     ? proc_doulongvec_minmax+0x46/0x60
     ? hugetlb_sysctl_handler_common+0x1c7/0x200
     ? nr_hugepages_store+0x20/0x20
     ? copy_fd_bitmaps+0x170/0x170
     ? hugetlb_sysctl_handler+0x1e/0x20
     ? proc_sys_call_handler+0x2f1/0x300
     ? unregister_sysctl_table+0xb0/0xb0
     ? __fd_install+0x78/0x100
     ? proc_sys_write+0x14/0x20
     ? __vfs_write+0x4d/0x90
     ? vfs_write+0xef/0x240
     ? ksys_write+0xc0/0x160
     ? __ia32_sys_read+0x50/0x50
     ? __close_fd+0x129/0x150
     ? __x64_sys_write+0x43/0x50
     ? do_syscall_64+0x6c/0x200
     ? entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: e5ff2159 ("hugetlb: multiple hstates for multiple page sizes")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/20200828031146.43035-1-songmuchun@bytedance.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>

17743798

mm/hugetlb: try preferred node first when alloc gigantic page from cma · 953f064a

Li Xinhai authored Sep 04, 2020

Since commit cf11e85f ("mm: hugetlb: optionally allocate gigantic
hugepages using cma"), the gigantic page would be allocated from node
which is not the preferred node, although there are pages available from
that node. The reason is that the nid parameter has been ignored in
alloc_gigantic_page().

Besides, the __GFP_THISNODE also need be checked if user required to
alloc only from the preferred node.

After this patch, the preferred node is tried first before other allowed
nodes, and don't try to allocate from other nodes if __GFP_THISNODE is
specified. If user don't specify the preferred node, the current node
will be used as preferred node, which makes sure consistent behavior of
allocating gigantic and non-gigantic hugetlb page.

Fixes: cf11e85f ("mm: hugetlb: optionally allocate gigantic hugepages using cma")
Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Link: https://lkml.kernel.org/r/20200902025016.697260-1-lixinhai.lxh@gmail.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>

953f064a

mm/migrate: preserve soft dirty in remove_migration_pte() · 3d321bf8

Ralph Campbell authored Sep 04, 2020

The code to remove a migration PTE and replace it with a device private
PTE was not copying the soft dirty bit from the migration entry.  This
could lead to page contents not being marked dirty when faulting the page
back from device private memory.
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Bharata B Rao <bharata@linux.ibm.com>
Link: https://lkml.kernel.org/r/20200831212222.22409-3-rcampbell@nvidia.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>

3d321bf8

mm/migrate: remove unnecessary is_zone_device_page() check · 6128763f

Ralph Campbell authored Sep 04, 2020

Patch series "mm/migrate: preserve soft dirty in remove_migration_pte()".

I happened to notice this from code inspection after seeing Alistair
Popple's patch ("mm/rmap: Fixup copying of soft dirty and uffd ptes").

This patch (of 2):

The check for is_zone_device_page() and is_device_private_page() is
unnecessary since the latter is sufficient to determine if the page is a
device private page.  Simplify the code for easier reading.
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Bharata B Rao <bharata@linux.ibm.com>
Link: https://lkml.kernel.org/r/20200831212222.22409-1-rcampbell@nvidia.com
Link: https://lkml.kernel.org/r/20200831212222.22409-2-rcampbell@nvidia.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>

6128763f

mm/rmap: fixup copying of soft dirty and uffd ptes · ad7df764

Alistair Popple authored Sep 04, 2020

During memory migration a pte is temporarily replaced with a migration
swap pte.  Some pte bits from the existing mapping such as the soft-dirty
and uffd write-protect bits are preserved by copying these to the
temporary migration swap pte.

However these bits are not stored at the same location for swap and
non-swap ptes.  Therefore testing these bits requires using the
appropriate helper function for the given pte type.

Unfortunately several code locations were found where the wrong helper
function is being used to test soft_dirty and uffd_wp bits which leads to
them getting incorrectly set or cleared during page-migration.

Fix these by using the correct tests based on pte type.

Fixes: a5430dda ("mm/migrate: support un-addressable ZONE_DEVICE page in migration")
Fixes: 8c3328f1 ("mm/migrate: migrate_vma() unmap page from vma while collecting pages")
Fixes: f45ec5ff ("userfaultfd: wp: support swap and page migration")
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Alistair Popple <alistair@popple.id.au>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200825064232.10023-2-alistair@popple.id.auSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ad7df764

mm/migrate: fixup setting UFFD_WP flag · ebdf8321

Alistair Popple authored Sep 04, 2020

Commit f45ec5ff ("userfaultfd: wp: support swap and page migration")
introduced support for tracking the uffd wp bit during page migration.
However the non-swap PTE variant was used to set the flag for zone device
private pages which are a type of swap page.

This leads to corruption of the swap offset if the original PTE has the
uffd_wp flag set.

Fixes: f45ec5ff ("userfaultfd: wp: support swap and page migration")
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Link: https://lkml.kernel.org/r/20200825064232.10023-1-alistair@popple.id.auSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ebdf8321

mm: madvise: fix vma user-after-free · 7867fd7c

Yang Shi authored Sep 04, 2020

The syzbot reported the below use-after-free:

  BUG: KASAN: use-after-free in madvise_willneed mm/madvise.c:293 [inline]
  BUG: KASAN: use-after-free in madvise_vma mm/madvise.c:942 [inline]
  BUG: KASAN: use-after-free in do_madvise.part.0+0x1c8b/0x1cf0 mm/madvise.c:1145
  Read of size 8 at addr ffff8880a6163eb0 by task syz-executor.0/9996

  CPU: 0 PID: 9996 Comm: syz-executor.0 Not tainted 5.9.0-rc1-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x18f/0x20d lib/dump_stack.c:118
    print_address_description.constprop.0.cold+0xae/0x497 mm/kasan/report.c:383
    __kasan_report mm/kasan/report.c:513 [inline]
    kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
    madvise_willneed mm/madvise.c:293 [inline]
    madvise_vma mm/madvise.c:942 [inline]
    do_madvise.part.0+0x1c8b/0x1cf0 mm/madvise.c:1145
    do_madvise mm/madvise.c:1169 [inline]
    __do_sys_madvise mm/madvise.c:1171 [inline]
    __se_sys_madvise mm/madvise.c:1169 [inline]
    __x64_sys_madvise+0xd9/0x110 mm/madvise.c:1169
    do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

  Allocated by task 9992:
    kmem_cache_alloc+0x138/0x3a0 mm/slab.c:3482
    vm_area_alloc+0x1c/0x110 kernel/fork.c:347
    mmap_region+0x8e5/0x1780 mm/mmap.c:1743
    do_mmap+0xcf9/0x11d0 mm/mmap.c:1545
    vm_mmap_pgoff+0x195/0x200 mm/util.c:506
    ksys_mmap_pgoff+0x43a/0x560 mm/mmap.c:1596
    do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

  Freed by task 9992:
    kmem_cache_free.part.0+0x67/0x1f0 mm/slab.c:3693
    remove_vma+0x132/0x170 mm/mmap.c:184
    remove_vma_list mm/mmap.c:2613 [inline]
    __do_munmap+0x743/0x1170 mm/mmap.c:2869
    do_munmap mm/mmap.c:2877 [inline]
    mmap_region+0x257/0x1780 mm/mmap.c:1716
    do_mmap+0xcf9/0x11d0 mm/mmap.c:1545
    vm_mmap_pgoff+0x195/0x200 mm/util.c:506
    ksys_mmap_pgoff+0x43a/0x560 mm/mmap.c:1596
    do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

It is because vma is accessed after releasing mmap_lock, but someone
else acquired the mmap_lock and the vma is gone.

Releasing mmap_lock after accessing vma should fix the problem.

Fixes: 692fe624 ("mm: Handle MADV_WILLNEED through vfs_fadvise()")
Reported-by: syzbot+b90df26038d1d5d85c97@syzkaller.appspotmail.com
Signed-off-by: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: <stable@vger.kernel.org>	[5.4+]
Link: https://lkml.kernel.org/r/20200816141204.162624-1-shy828301@gmail.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>

7867fd7c