1. 04 Feb, 2020 26 commits
    • Steven Price's avatar
      arm: mm: add p?d_leaf() definitions · 8a0af66b
      Steven Price authored
      walk_page_range() is going to be allowed to walk page tables other than
      those of user space.  For this it needs to know when it has reached a
      'leaf' entry in the page tables.  This information is provided by the
      p?d_leaf() functions/macros.
      
      For arm pmd_large() already exists and does what we want.  So simply
      provide the generic pmd_leaf() name.
      
      Link: http://lkml.kernel.org/r/20191218162402.45610-4-steven.price@arm.comSigned-off-by: default avatarSteven Price <steven.price@arm.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexandre Ghiti <alex@ghiti.fr>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: James Morse <james.morse@arm.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: "Liang, Kan" <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Zong Li <zong.li@sifive.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8a0af66b
    • Steven Price's avatar
      arc: mm: add p?d_leaf() definitions · 4f6b2c08
      Steven Price authored
      walk_page_range() is going to be allowed to walk page tables other than
      those of user space.  For this it needs to know when it has reached a
      'leaf' entry in the page tables.  This information will be provided by the
      p?d_leaf() functions/macros.
      
      For arc, we only have two levels, so only pmd_leaf() is needed.
      
      Link: http://lkml.kernel.org/r/20191218162402.45610-3-steven.price@arm.comSigned-off-by: default avatarSteven Price <steven.price@arm.com>
      Acked-by: default avatarVineet Gupta <vgupta@synopsys.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexandre Ghiti <alex@ghiti.fr>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: James Morse <james.morse@arm.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: "Liang, Kan" <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Zong Li <zong.li@sifive.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4f6b2c08
    • Steven Price's avatar
      mm: add generic p?d_leaf() macros · 93fab1b2
      Steven Price authored
      Patch series "Generic page walk and ptdump", v17.
      
      Many architectures current have a debugfs file for dumping the kernel page
      tables.  Currently each architecture has to implement custom functions for
      this because the details of walking the page tables used by the kernel are
      different between architectures.
      
      This series extends the capabilities of walk_page_range() so that it can
      deal with the page tables of the kernel (which have no VMAs and can
      contain larger huge pages than exist for user space).  A generic PTDUMP
      implementation is the implemented making use of the new functionality of
      walk_page_range() and finally arm64 and x86 are switch to using it,
      removing the custom table walkers.
      
      To enable a generic page table walker to walk the unusual mappings of the
      kernel we need to implement a set of functions which let us know when the
      walker has reached the leaf entry.  After a suggestion from Will Deacon
      I've chosen the name p?d_leaf() as this (hopefully) describes the purpose
      (and is a new name so has no historic baggage).  Some architectures have
      p?d_large macros but this is easily confused with "large pages".
      
      This series ends with a generic PTDUMP implemention for arm64 and x86.
      
      Mostly this is a clean up and there should be very little functional
      change.  The exceptions are:
      
      * arm64 PTDUMP debugfs now displays pages which aren't present (patch 22).
      
      * arm64 has the ability to efficiently process KASAN pages (which
        previously only x86 implemented).  This means that the combination of
        KASAN and DEBUG_WX is now useable.
      
      This patch (of 23):
      
      Exposing the pud/pgd levels of the page tables to walk_page_range() means
      we may come across the exotic large mappings that come with large areas of
      contiguous memory (such as the kernel's linear map).
      
      For architectures that don't provide all p?d_leaf() macros, provide
      generic do nothing default that are suitable where there cannot be leaf
      pages at that level.  Futher patches will add implementations for
      individual architectures.
      
      The name p?d_leaf() is chosen to minimize the confusion with existing uses
      of "large" pages and "huge" pages which do not necessary mean that the
      entry is a leaf (for example it may be a set of contiguous entries that
      only take 1 TLB slot).  For the purpose of walking the page tables we
      don't need to know how it will be represented in the TLB, but we do need
      to know for sure if it is a leaf of the tree.
      
      Link: http://lkml.kernel.org/r/20191218162402.45610-2-steven.price@arm.comSigned-off-by: default avatarSteven Price <steven.price@arm.com>
      Acked-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Liang, Kan" <kan.liang@linux.intel.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexandre Ghiti <alex@ghiti.fr>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Zong Li <zong.li@sifive.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      93fab1b2
    • Florian Westphal's avatar
      mm: remove __krealloc · 1c948715
      Florian Westphal authored
      Since 5.5-rc1 the last user of this function is gone, so remove the
      functionality.
      
      See commit
      2ad9d774 ("netfilter: conntrack: free extension area immediately")
      for details.
      
      Link: http://lkml.kernel.org/r/20191212223442.22141-1-fw@strlen.deSigned-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Acked-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1c948715
    • Randy Dunlap's avatar
      pinctrl: fix pxa2xx.c build warnings · 9a8c8b43
      Randy Dunlap authored
      Add #include of <linux/pinctrl/machine.h> to fix build
      warnings in pinctrl-pxa2xx.c.  Fixes these warnings:
      
      In file included from ../drivers/pinctrl/pxa/pinctrl-pxa2xx.c:24:0:
      ../drivers/pinctrl/pxa/../pinctrl-utils.h:36:8: warning: `enum pinctrl_map_type' declared inside parameter list [enabled by default]
         enum pinctrl_map_type type);
              ^
      ../drivers/pinctrl/pxa/../pinctrl-utils.h:36:8: warning: its scope is only this definition or declaration, which is probably not what you want [enabled by default]
      
      Link: http://lkml.kernel.org/r/0024542e-cba9-8f13-6c18-32d0050a6007@infradead.orgSigned-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Robert Jarzmik <robert.jarzmik@free.fr>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9a8c8b43
    • Andrew Morton's avatar
      drivers/block/null_blk_main.c: fix uninitialized var warnings · 046755a2
      Andrew Morton authored
      With gcc-7.2, many instances of
      
      drivers/block/null_blk_main.c: In function ‘nullb_device_zone_nr_conv_store’:
      drivers/block/null_blk_main.c:291:12: warning: ‘new_value’ may be used uninitialized in this function [-Wmaybe-uninitialized]
        dev->NAME = new_value;      \
                  ^
      drivers/block/null_blk_main.c:279:7: note: ‘new_value’ was declared here
        TYPE new_value;       \
             ^
      
      Presumably notabug, so use uninitialized_var() to suppress them.
      
      Cc: Shaohua Li <shli@fb.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      046755a2
    • Andrew Morton's avatar
      drivers/block/null_blk_main.c: fix layout · ca0a95a6
      Andrew Morton authored
      Each line here overflows 80 cols by exactly one character.  Delete one tab
      per line to fix.
      
      Cc: Shaohua Li <shli@fb.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ca0a95a6
    • Lu Shuaibing's avatar
      ipc/msg.c: consolidate all xxxctl_down() functions · 889b3317
      Lu Shuaibing authored
      A use of uninitialized memory in msgctl_down() because msqid64 in
      ksys_msgctl hasn't been initialized.  The local | msqid64 | is created in
      ksys_msgctl() and then passed into msgctl_down().  Along the way msqid64
      is never initialized before msgctl_down() checks msqid64->msg_qbytes.
      
      KUMSAN(KernelUninitializedMemorySantizer, a new error detection tool)
      reports:
      
      ==================================================================
      BUG: KUMSAN: use of uninitialized memory in msgctl_down+0x94/0x300
      Read of size 8 at addr ffff88806bb97eb8 by task syz-executor707/2022
      
      CPU: 0 PID: 2022 Comm: syz-executor707 Not tainted 5.2.0-rc4+ #63
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
      Call Trace:
       dump_stack+0x75/0xae
       __kumsan_report+0x17c/0x3e6
       kumsan_report+0xe/0x20
       msgctl_down+0x94/0x300
       ksys_msgctl.constprop.14+0xef/0x260
       do_syscall_64+0x7e/0x1f0
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x4400e9
      Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007ffd869e0598 EFLAGS: 00000246 ORIG_RAX: 0000000000000047
      RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 00000000004400e9
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
      RBP: 00000000006ca018 R08: 0000000000000000 R09: 0000000000000000
      R10: 00000000ffffffff R11: 0000000000000246 R12: 0000000000401970
      R13: 0000000000401a00 R14: 0000000000000000 R15: 0000000000000000
      
      The buggy address belongs to the page:
      page:ffffea0001aee5c0 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0
      flags: 0x100000000000000()
      raw: 0100000000000000 0000000000000000 ffffffff01ae0101 0000000000000000
      raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
      page dumped because: kumsan: bad access detected
      ==================================================================
      
      Syzkaller reproducer:
      msgctl$IPC_RMID(0x0, 0x0)
      
      C reproducer:
      // autogenerated by syzkaller (https://github.com/google/syzkaller)
      
      int main(void)
      {
        syscall(__NR_mmap, 0x20000000, 0x1000000, 3, 0x32, -1, 0);
        syscall(__NR_msgctl, 0, 0, 0);
        return 0;
      }
      
      [natechancellor@gmail.com: adjust indentation in ksys_msgctl]
        Link: https://github.com/ClangBuiltLinux/linux/issues/829
        Link: http://lkml.kernel.org/r/20191218032932.37479-1-natechancellor@gmail.com
      Link: http://lkml.kernel.org/r/20190613014044.24234-1-shuaibinglu@126.comSigned-off-by: default avatarLu Shuaibing <shuaibinglu@126.com>
      Signed-off-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Suggested-by: default avatarArnd Bergmann <arnd@arndb.de>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: NeilBrown <neilb@suse.com>
      From: Andrew Morton <akpm@linux-foundation.org>
      Subject: drivers/block/null_blk_main.c: fix layout
      
      Each line here overflows 80 cols by exactly one character.  Delete one tab
      per line to fix.
      
      Cc: Shaohua Li <shli@fb.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      889b3317
    • Manfred Spraul's avatar
      ipc/sem.c: document and update memory barriers · 8116b54e
      Manfred Spraul authored
      Document and update the memory barriers in ipc/sem.c:
      
      - Add smp_store_release() to wake_up_sem_queue_prepare() and
        document why it is needed.
      
      - Read q->status using READ_ONCE+smp_acquire__after_ctrl_dep().
        as the pair for the barrier inside wake_up_sem_queue_prepare().
      
      - Add comments to all barriers, and mention the rules in the block
        regarding locking.
      
      - Switch to using wake_q_add_safe().
      
      Link: http://lkml.kernel.org/r/20191020123305.14715-6-manfred@colorfullife.comSigned-off-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: <1vier1@web.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8116b54e
    • Manfred Spraul's avatar
      ipc/msg.c: update and document memory barriers · 0d97a82b
      Manfred Spraul authored
      Transfer findings from ipc/mqueue.c:
      
      - A control barrier was missing for the lockless receive case So in
        theory, not yet initialized data may have been copied to user space -
        obviously only for architectures where control barriers are not NOP.
      
      - use smp_store_release().  In theory, the refount may have been
        decreased to 0 already when wake_q_add() tries to get a reference.
      
      Link: http://lkml.kernel.org/r/20191020123305.14715-5-manfred@colorfullife.comSigned-off-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: <1vier1@web.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0d97a82b
    • Manfred Spraul's avatar
      ipc/mqueue.c: update/document memory barriers · c5b2cbdb
      Manfred Spraul authored
      Update and document memory barriers for mqueue.c:
      
      - ewp->state is read without any locks, thus READ_ONCE is required.
      
      - add smp_aquire__after_ctrl_dep() after the READ_ONCE, we need
        acquire semantics if the value is STATE_READY.
      
      - use wake_q_add_safe()
      
      - document why __set_current_state() may be used:
        Reading task->state cannot happen before the wake_q_add() call,
        which happens while holding info->lock. Thus the spin_unlock()
        is the RELEASE, and the spin_lock() is the ACQUIRE.
      
      For completeness: there is also a 3 CPU scenario, if the to be woken
      up task is already on another wake_q.
      Then:
      - CPU1: spin_unlock() of the task that goes to sleep is the RELEASE
      - CPU2: the spin_lock() of the waker is the ACQUIRE
      - CPU2: smp_mb__before_atomic inside wake_q_add() is the RELEASE
      - CPU3: smp_mb__after_spinlock() inside try_to_wake_up() is the ACQUIRE
      
      Link: http://lkml.kernel.org/r/20191020123305.14715-4-manfred@colorfullife.comSigned-off-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Reviewed-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Cc: Waiman Long <longman@redhat.com>
      Cc: <1vier1@web.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c5b2cbdb
    • Davidlohr Bueso's avatar
      ipc/mqueue.c: remove duplicated code · ed29f171
      Davidlohr Bueso authored
      pipelined_send() and pipelined_receive() are identical, so merge them.
      
      [manfred@colorfullife.com: add changelog]
      Link: http://lkml.kernel.org/r/20191020123305.14715-3-manfred@colorfullife.comSigned-off-by: default avatarDavidlohr Bueso <dave@stgolabs.net>
      Signed-off-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Cc: <1vier1@web.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ed29f171
    • Manfred Spraul's avatar
      smp_mb__{before,after}_atomic(): update Documentation · 39323c64
      Manfred Spraul authored
      When adding the _{acquire|release|relaxed}() variants of some atomic
      operations, it was forgotten to update Documentation/memory_barrier.txt:
      
      smp_mb__{before,after}_atomic() is now intended for all RMW operations
      that do not imply a memory barrier.
      
      1)
      	smp_mb__before_atomic();
      	atomic_add();
      
      2)
      	smp_mb__before_atomic();
      	atomic_xchg_relaxed();
      
      3)
      	smp_mb__before_atomic();
      	atomic_fetch_add_relaxed();
      
      Invalid would be:
      	smp_mb__before_atomic();
      	atomic_set();
      
      In addition, the patch splits the long sentence into multiple shorter
      sentences.
      
      Link: http://lkml.kernel.org/r/20191020123305.14715-2-manfred@colorfullife.com
      Fixes: 654672d4 ("locking/atomics: Add _{acquire|release|relaxed}() variants of some atomic operations")
      Signed-off-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Acked-by: default avatarWaiman Long <longman@redhat.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: <1vier1@web.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      39323c64
    • David Hildenbrand's avatar
      mm/memory_hotplug: drop valid_start/valid_end from test_pages_in_a_zone() · 92917998
      David Hildenbrand authored
      The callers are only interested in the actual zone, they don't care about
      boundaries.  Return the zone instead to simplify.
      
      Link: http://lkml.kernel.org/r/20200110183308.11849-1-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      92917998
    • David Hildenbrand's avatar
      mm/memory_hotplug: cleanup __remove_pages() · 52fb87c8
      David Hildenbrand authored
      Let's drop the basically unused section stuff and simplify.
      
      Also, let's use a shorter variant to calculate the number of pages to
      the next section boundary.
      
      Link: http://lkml.kernel.org/r/20191006085646.5768-11-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Pankaj Gupta <pagupta@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      52fb87c8
    • David Hildenbrand's avatar
      mm/memory_hotplug: drop local variables in shrink_zone_span() · 5d12071c
      David Hildenbrand authored
      Get rid of the unnecessary local variables.
      
      Link: http://lkml.kernel.org/r/20191006085646.5768-10-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Pankaj Gupta <pagupta@redhat.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5d12071c
    • David Hildenbrand's avatar
      mm/memory_hotplug: don't check for "all holes" in shrink_zone_span() · 950b68d9
      David Hildenbrand authored
      If we have holes, the holes will automatically get detected and removed
      once we remove the next bigger/smaller section.  The extra checks can go.
      
      Link: http://lkml.kernel.org/r/20191006085646.5768-9-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Pankaj Gupta <pagupta@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      950b68d9
    • David Hildenbrand's avatar
      mm/memory_hotplug: we always have a zone in find_(smallest|biggest)_section_pfn · 9b05158f
      David Hildenbrand authored
      With shrink_pgdat_span() out of the way, we now always have a valid zone.
      
      Link: http://lkml.kernel.org/r/20191006085646.5768-8-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Pankaj Gupta <pagupta@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9b05158f
    • David Hildenbrand's avatar
      mm/memory_hotplug: poison memmap in remove_pfn_range_from_zone() · d33695b1
      David Hildenbrand authored
      Let's poison the pages similar to when adding new memory in
      sparse_add_section().  Also call remove_pfn_range_from_zone() from
      memunmap_pages(), so we can poison the memmap from there as well.
      
      Link: http://lkml.kernel.org/r/20191006085646.5768-7-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Pankaj Gupta <pagupta@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d33695b1
    • Aneesh Kumar K.V's avatar
      mm/memmap_init: update variable name in memmap_init_zone · 1f8d75c1
      Aneesh Kumar K.V authored
      Patch series "mm/memory_hotplug: Shrink zones before removing memory", v6.
      
      This series fixes the access of uninitialized memmaps when shrinking
      zones/nodes and when removing memory.  Also, it contains all fixes for
      crashes that can be triggered when removing certain namespace using
      memunmap_pages() - ZONE_DEVICE, reported by Aneesh.
      
      We stop trying to shrink ZONE_DEVICE, as it's buggy, fixing it would be
      more involved (we don't have SECTION_IS_ONLINE as an indicator), and
      shrinking is only of limited use (set_zone_contiguous() cannot detect the
      ZONE_DEVICE as contiguous).
      
      We continue shrinking !ZONE_DEVICE zones, however, I reduced the amount of
      code to a minimum.  Shrinking is especially necessary to keep
      zone->contiguous set where possible, especially, on memory unplug of DIMMs
      at zone boundaries.
      
      --------------------------------------------------------------------------
      
      Zones are now properly shrunk when offlining memory blocks or when
      onlining failed.  This allows to properly shrink zones on memory unplug
      even if the separate memory blocks of a DIMM were onlined to different
      zones or re-onlined to a different zone after offlining.
      
      Example:
      
      :/# cat /proc/zoneinfo
      Node 1, zone  Movable
              spanned  0
              present  0
              managed  0
      :/# echo "online_movable" > /sys/devices/system/memory/memory41/state
      :/# echo "online_movable" > /sys/devices/system/memory/memory43/state
      :/# cat /proc/zoneinfo
      Node 1, zone  Movable
              spanned  98304
              present  65536
              managed  65536
      :/# echo 0 > /sys/devices/system/memory/memory43/online
      :/# cat /proc/zoneinfo
      Node 1, zone  Movable
              spanned  32768
              present  32768
              managed  32768
      :/# echo 0 > /sys/devices/system/memory/memory41/online
      :/# cat /proc/zoneinfo
      Node 1, zone  Movable
              spanned  0
              present  0
              managed  0
      
      This patch (of 6):
      
      The third argument is actually number of pages.  Change the variable name
      from size to nr_pages to indicate this better.
      
      No functional change in this patch.
      
      Link: http://lkml.kernel.org/r/20191006085646.5768-3-david@redhat.comSigned-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarPankaj Gupta <pagupta@redhat.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1f8d75c1
    • David Hildenbrand's avatar
      mm: factor out next_present_section_nr() · 4c605881
      David Hildenbrand authored
      Let's move it to the header and use the shorter variant from
      mm/page_alloc.c (the original one will also check
      "__highest_present_section_nr + 1", which is not necessary).  While at
      it, make the section_nr in next_pfn() const.
      
      In next_pfn(), we now return section_nr_to_pfn(-1) instead of -1 once we
      exceed __highest_present_section_nr, which doesn't make a difference in
      the caller as it is big enough (>= all sane end_pfn).
      
      Link: http://lkml.kernel.org/r/20200113144035.10848-3-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: "Jin, Zhi" <zhi.jin@intel.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4c605881
    • David Hildenbrand's avatar
      mm/page_alloc: fix and rework pfn handling in memmap_init_zone() · 948c436e
      David Hildenbrand authored
      Let's update the pfn manually whenever we continue the loop.  This makes
      the code easier to read but also less error prone (and we can directly fix
      one issue).
      
      When overlap_memmap_init() returns true, pfn is updated to
      "memblock_region_memory_end_pfn(r)".  So it already points at the *next*
      pfn to process.  Incrementing the pfn another time is wrong, we might
      leave one uninitialized.  I spotted this by inspecting the code, so I have
      no idea if this is relevant in practise (with kernelcore=mirror).
      
      Link: http://lkml.kernel.org/r/20200113144035.10848-2-david@redhat.com
      Fixes: a9a9e77f ("mm: move mirrored memory specific code outside of memmap_init_zone")
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reviewed-by: default avatarAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: "Jin, Zhi" <zhi.jin@intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      948c436e
    • David Hildenbrand's avatar
      mm/page_alloc.c: initialize memmap of unavailable memory directly · 4b094b78
      David Hildenbrand authored
      Let's make sure that all memory holes are actually marked PageReserved(),
      that page_to_pfn() produces reliable results, and that these pages are not
      detected as "mmap" pages due to the mapcount.
      
      E.g., booting a x86-64 QEMU guest with 4160 MB:
      
      [    0.010585] Early memory node ranges
      [    0.010586]   node   0: [mem 0x0000000000001000-0x000000000009efff]
      [    0.010588]   node   0: [mem 0x0000000000100000-0x00000000bffdefff]
      [    0.010589]   node   0: [mem 0x0000000100000000-0x0000000143ffffff]
      
      max_pfn is 0x144000.
      
      Before this change:
      
      [root@localhost ~]# ./page-types -r -a 0x144000,
                   flags      page-count       MB  symbolic-flags                     long-symbolic-flags
      0x0000000000000800           16384       64  ___________M_______________________________        mmap
                   total           16384       64
      
      After this change:
      
      [root@localhost ~]# ./page-types -r -a 0x144000,
                   flags      page-count       MB  symbolic-flags                     long-symbolic-flags
      0x0000000100000000           16384       64  ___________________________r_______________        reserved
                   total           16384       64
      
      IOW, especially the unavailable physical memory ("memory hole") in the
      last section would not get properly marked PageReserved() and is indicated
      to be "mmap" memory.
      
      Drop the trace of that function from include/linux/mm.h - nobody else
      needs it, and rename it accordingly.
      
      Note: The fake zone/node might not be covered by the zone/node span.  This
      is not an urgent issue (for now, we had the same node/zone due to the
      zeroing).  We'll need a clean way to mark memory holes (e.g., using a page
      type PageHole() if possible or a fake ZONE_INVALID) and eventually stop
      marking these memory holes PageReserved().
      
      Link: http://lkml.kernel.org/r/20191211163201.17179-4-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Bob Picco <bob.picco@oracle.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Steven Sistare <steven.sistare@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4b094b78
    • David Hildenbrand's avatar
      fs/proc/page.c: allow inspection of last section and fix end detection · abec749f
      David Hildenbrand authored
      If max_pfn does not fall onto a section boundary, it is possible to
      inspect PFNs up to max_pfn, and PFNs above max_pfn, however, max_pfn
      itself can't be inspected.  We can have a valid (and online) memmap at and
      above max_pfn if max_pfn is not aligned to a section boundary.  The whole
      early section has a memmap and is marked online.  Being able to inspect
      the state of these PFNs is valuable for debugging, especially because
      max_pfn can change on memory hotplug and expose these memmaps.
      
      Also, querying page flags via "./page-types -r -a 0x144001,"
      (tools/vm/page-types.c) inside a x86-64 guest with 4160MB under QEMU
      results in an (almost) endless loop in user space, because the end is not
      detected properly when starting after max_pfn.
      
      Instead, let's allow to inspect all pages in the highest section and
      return 0 directly if we try to access pages above that section.
      
      While at it, check the count before adjusting it, to avoid masking user
      errors.
      
      Link: http://lkml.kernel.org/r/20191211163201.17179-3-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Bob Picco <bob.picco@oracle.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Steven Sistare <steven.sistare@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      abec749f
    • David Hildenbrand's avatar
      mm/page_alloc.c: fix uninitialized memmaps on a partially populated last section · e822969c
      David Hildenbrand authored
      Patch series "mm: fix max_pfn not falling on section boundary", v2.
      
      Playing with different memory sizes for a x86-64 guest, I discovered that
      some memmaps (highest section if max_mem does not fall on the section
      boundary) are marked as being valid and online, but contain garbage.  We
      have to properly initialize these memmaps.
      
      Looking at /proc/kpageflags and friends, I found some more issues,
      partially related to this.
      
      This patch (of 3):
      
      If max_pfn is not aligned to a section boundary, we can easily run into
      BUGs.  This can e.g., be triggered on x86-64 under QEMU by specifying a
      memory size that is not a multiple of 128MB (e.g., 4097MB, but also
      4160MB).  I was told that on real HW, we can easily have this scenario
      (esp., one of the main reasons sub-section hotadd of devmem was added).
      
      The issue is, that we have a valid memmap (pfn_valid()) for the whole
      section, and the whole section will be marked "online".
      pfn_to_online_page() will succeed, but the memmap contains garbage.
      
      E.g., doing a "./page-types -r -a 0x144001" when QEMU was started with "-m
      4160M" - (see tools/vm/page-types.c):
      
      [  200.476376] BUG: unable to handle page fault for address: fffffffffffffffe
      [  200.477500] #PF: supervisor read access in kernel mode
      [  200.478334] #PF: error_code(0x0000) - not-present page
      [  200.479076] PGD 59614067 P4D 59614067 PUD 59616067 PMD 0
      [  200.479557] Oops: 0000 [#4] SMP NOPTI
      [  200.479875] CPU: 0 PID: 603 Comm: page-types Tainted: G      D W         5.5.0-rc1-next-20191209 #93
      [  200.480646] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu4
      [  200.481648] RIP: 0010:stable_page_flags+0x4d/0x410
      [  200.482061] Code: f3 ff 41 89 c0 48 b8 00 00 00 00 01 00 00 00 45 84 c0 0f 85 cd 02 00 00 48 8b 53 08 48 8b 2b 48f
      [  200.483644] RSP: 0018:ffffb139401cbe60 EFLAGS: 00010202
      [  200.484091] RAX: fffffffffffffffe RBX: fffffbeec5100040 RCX: 0000000000000000
      [  200.484697] RDX: 0000000000000001 RSI: ffffffff9535c7cd RDI: 0000000000000246
      [  200.485313] RBP: ffffffffffffffff R08: 0000000000000000 R09: 0000000000000000
      [  200.485917] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000144001
      [  200.486523] R13: 00007ffd6ba55f48 R14: 00007ffd6ba55f40 R15: ffffb139401cbf08
      [  200.487130] FS:  00007f68df717580(0000) GS:ffff9ec77fa00000(0000) knlGS:0000000000000000
      [  200.487804] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  200.488295] CR2: fffffffffffffffe CR3: 0000000135d48000 CR4: 00000000000006f0
      [  200.488897] Call Trace:
      [  200.489115]  kpageflags_read+0xe9/0x140
      [  200.489447]  proc_reg_read+0x3c/0x60
      [  200.489755]  vfs_read+0xc2/0x170
      [  200.490037]  ksys_pread64+0x65/0xa0
      [  200.490352]  do_syscall_64+0x5c/0xa0
      [  200.490665]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      But it can be triggered much easier via "cat /proc/kpageflags > /dev/null"
      after cold/hot plugging a DIMM to such a system:
      
      [root@localhost ~]# cat /proc/kpageflags > /dev/null
      [  111.517275] BUG: unable to handle page fault for address: fffffffffffffffe
      [  111.517907] #PF: supervisor read access in kernel mode
      [  111.518333] #PF: error_code(0x0000) - not-present page
      [  111.518771] PGD a240e067 P4D a240e067 PUD a2410067 PMD 0
      
      This patch fixes that by at least zero-ing out that memmap (so e.g.,
      page_to_pfn() will not crash).  Commit 907ec5fc ("mm: zero remaining
      unavailable struct pages") tried to fix a similar issue, but forgot to
      consider this special case.
      
      After this patch, there are still problems to solve.  E.g., not all of
      these pages falling into a memory hole will actually get initialized later
      and set PageReserved - they are only zeroed out - but at least the
      immediate crashes are gone.  A follow-up patch will take care of this.
      
      Link: http://lkml.kernel.org/r/20191211163201.17179-2-david@redhat.com
      Fixes: f7f99100 ("mm: stop zeroing memory during allocation in vmemmap")
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Tested-by: default avatarDaniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Steven Sistare <steven.sistare@oracle.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Bob Picco <bob.picco@oracle.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: <stable@vger.kernel.org>	[4.15+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e822969c
    • Gang He's avatar
      ocfs2: fix oops when writing cloned file · 2d797e9f
      Gang He authored
      Writing a cloned file triggers a kernel oops and the user-space command
      process is also killed by the system.  The bug can be reproduced stably
      via:
      
      1) create a file under ocfs2 file system directory.
      
        journalctl -b > aa.txt
      
      2) create a cloned file for this file.
      
        reflink aa.txt bb.txt
      
      3) write the cloned file with dd command.
      
        dd if=/dev/zero of=bb.txt bs=512 count=1 conv=notrunc
      
      The dd command is killed by the kernel, then you can see the oops message
      via dmesg command.
      
      [  463.875404] BUG: kernel NULL pointer dereference, address: 0000000000000028
      [  463.875413] #PF: supervisor read access in kernel mode
      [  463.875416] #PF: error_code(0x0000) - not-present page
      [  463.875418] PGD 0 P4D 0
      [  463.875425] Oops: 0000 [#1] SMP PTI
      [  463.875431] CPU: 1 PID: 2291 Comm: dd Tainted: G           OE     5.3.16-2-default
      [  463.875433] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      [  463.875500] RIP: 0010:ocfs2_refcount_cow+0xa4/0x5d0 [ocfs2]
      [  463.875505] Code: 06 89 6c 24 38 89 eb f6 44 24 3c 02 74 be 49 8b 47 28
      [  463.875508] RSP: 0018:ffffa2cb409dfce8 EFLAGS: 00010202
      [  463.875512] RAX: ffff8b1ebdca8000 RBX: 0000000000000001 RCX: ffff8b1eb73a9df0
      [  463.875515] RDX: 0000000000056a01 RSI: 0000000000000000 RDI: 0000000000000000
      [  463.875517] RBP: 0000000000000001 R08: ffff8b1eb73a9de0 R09: 0000000000000000
      [  463.875520] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
      [  463.875522] R13: ffff8b1eb922f048 R14: 0000000000000000 R15: ffff8b1eb922f048
      [  463.875526] FS:  00007f8f44d15540(0000) GS:ffff8b1ebeb00000(0000) knlGS:0000000000000000
      [  463.875529] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  463.875532] CR2: 0000000000000028 CR3: 000000003c17a000 CR4: 00000000000006e0
      [  463.875546] Call Trace:
      [  463.875596]  ? ocfs2_inode_lock_full_nested+0x18b/0x960 [ocfs2]
      [  463.875648]  ocfs2_file_write_iter+0xaf8/0xc70 [ocfs2]
      [  463.875672]  new_sync_write+0x12d/0x1d0
      [  463.875688]  vfs_write+0xad/0x1a0
      [  463.875697]  ksys_write+0xa1/0xe0
      [  463.875710]  do_syscall_64+0x60/0x1f0
      [  463.875743]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  463.875758] RIP: 0033:0x7f8f4482ed44
      [  463.875762] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 80 00 00 00
      [  463.875765] RSP: 002b:00007fff300a79d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [  463.875769] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f8f4482ed44
      [  463.875771] RDX: 0000000000000200 RSI: 000055f771b5c000 RDI: 0000000000000001
      [  463.875774] RBP: 0000000000000200 R08: 00007f8f44af9c78 R09: 0000000000000003
      [  463.875776] R10: 000000000000089f R11: 0000000000000246 R12: 000055f771b5c000
      [  463.875779] R13: 0000000000000200 R14: 0000000000000000 R15: 000055f771b5c000
      
      This regression problem was introduced by commit e74540b2 ("ocfs2:
      protect extent tree in ocfs2_prepare_inode_for_write()").
      
      Link: http://lkml.kernel.org/r/20200121050153.13290-1-ghe@suse.com
      Fixes: e74540b2 ("ocfs2: protect extent tree in ocfs2_prepare_inode_for_write()").
      Signed-off-by: default avatarGang He <ghe@suse.com>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2d797e9f
  2. 03 Feb, 2020 8 commits
    • Masahiro Yamada's avatar
      initramfs: do not show compression mode choice if INITRAMFS_SOURCE is empty · d4e9056d
      Masahiro Yamada authored
      Since commit ddd09bcc ("initramfs: make compression options not
      depend on INITRAMFS_SOURCE"), Kconfig asks the compression mode for
      the built-in initramfs regardless of INITRAMFS_SOURCE.
      
      It is technically simpler, but pointless from a UI perspective,
      Linus says [1].
      
      When INITRAMFS_SOURCE is empty, usr/Makefile creates a tiny default
      cpio, which is so small that nobody cares about the compression.
      
      This commit hides the Kconfig choice in that case. The default cpio
      is embedded without compression, which was the original behavior.
      
      [1]: https://lkml.org/lkml/2020/2/1/160Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d4e9056d
    • Linus Torvalds's avatar
      Merge tag 'for-5.6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · ad801428
      Linus Torvalds authored
      Pull more btrfs updates from David Sterba:
       "Fixes that arrived after the merge window freeze, mostly stable
        material.
      
         - fix race in tree-mod-log element tracking
      
         - fix bio flushing inside extent writepages
      
         - fix assertion when in-memory tracking of discarded extents finds an
           empty tree (eg. after adding a new device)
      
         - update logic of temporary read-only block groups to take into
           account overcommit
      
         - fix some fixup worker corner cases:
             - page could not go through proper COW cycle and the dirty status
               is lost due to page migration
             - deadlock if delayed allocation is performed under page lock
      
         - fix send emitting invalid clones within the same file
      
         - fix statfs reporting 0 free space when global block reserve size is
           larger than remaining free space but there is still space for new
           chunks"
      
      * tag 'for-5.6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: do not zero f_bavail if we have available space
        Btrfs: send, fix emission of invalid clone operations within the same file
        btrfs: do not do delalloc reservation under page lock
        btrfs: drop the -EBUSY case in __extent_writepage_io
        Btrfs: keep pages dirty when using btrfs_writepage_fixup_worker
        btrfs: take overcommit into account in inc_block_group_ro
        btrfs: fix force usage in inc_block_group_ro
        btrfs: Correctly handle empty trees in find_first_clear_extent_bit
        btrfs: flush write bio if we loop in extent_write_cache_pages
        Btrfs: fix race between adding and putting tree mod seq elements and nodes
      ad801428
    • Linus Torvalds's avatar
      Merge tag 'kgdb-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux · e17ac02b
      Linus Torvalds authored
      Pull kgdb updates from Daniel Thompson:
       "Everything for kgdb this time around is either simplifications or
        clean ups.
      
        In particular Douglas Anderson's modifications to the backtrace
        machine in the *last* dev cycle have enabled Doug to tidy up some MIPS
        specific backtrace code and stop sharing certain data structures
        across the kernel. Note that The MIPS folks were on Cc: for the MIPS
        patch and reacted positively (but without an explicit Acked-by).
      
        Doug also got rid of the implicit switching between tasks and register
        sets during some but not of kdb's backtrace actions (because the
        implicit switching was either confusing for users, pointless or both).
      
        Finally there is a coverity fix and patch to replace open coded
        console traversal with the proper helper function"
      
      * tag 'kgdb-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux:
        kdb: Use for_each_console() helper
        kdb: remove redundant assignment to pointer bp
        kdb: Get rid of confusing diag msg from "rd" if current task has no regs
        kdb: Gid rid of implicit setting of the current task / regs
        kdb: kdb_current_task shouldn't be exported
        kdb: kdb_current_regs should be private
        MIPS: kdb: Remove old workaround for backtracing on other CPUs
      e17ac02b
    • Linus Torvalds's avatar
      Merge tag 'char-misc-5.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · 754beeec
      Linus Torvalds authored
      Pull char/misc fix from Greg KH:
       "Here is a single patch, that fixes up a commit that came in the
        previous char/misc merge.
      
        It fixes a bug in the hpet driver that everyone keeps tripping over in
        their automated testing. Good thing is, people are catching it. Bad
        thing it wasn't caught by anyone testing before this. Oh well...
      
        This has been in linux-next for a few days with no reported issues"
      
      * tag 'char-misc-5.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
        char: hpet: Fix out-of-bounds read bug
      754beeec
    • Linus Torvalds's avatar
      Merge tag 'backlight-next-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight · 2367da5b
      Linus Torvalds authored
      Pull backlight updates from Lee Jones:
       "Fix-ups:
         - Remove superfluous code in ams369fg06
         - Convert over to GPIO descriptor (gpiod) in bd6107
      
        Bug Fixes:
         - Fix unsigned comparison to less than zero in qcom-wled"
      
      * tag 'backlight-next-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight:
        backlight: qcom-wled: Fix unsigned comparison to zero
        backlight: bd6107: Convert to use GPIO descriptor
        backlight: ams369fg06: Drop GPIO include
      2367da5b
    • Linus Torvalds's avatar
      Merge tag 'mfd-next-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd · af32f3a4
      Linus Torvalds authored
      Pull MFD updates from Lee Jones:
       "New Drivers:
         - Add support for ROHM BD71828 PMICs and GPIOs
         - Add support for Qualcomm Aqstic Audio Codecs WCD9340 and WCD9341
      
        New Device Support:
         - Add support for BD71828 to BD70528 RTC driver
         - Add support for Intel's Jasper Lake to LPSS PCI
      
        New Functionality:
         - Add support for Power Key to ROHM BD71828
         - Add support for Clocks to ROHM BD71828
         - Add support for GPIOs to Dialog DA9062
         - Add support for USB PD Notify to ChromiumOS EC
         - Allow callers to specify args when requesting regmap lookup; syscon
      
        Fix-ups:
         - Improve error handling and sanity checking; atmel-hlcdc, dln2
         - Device Tree support/documentation; bd71828, da9062, xylon,logicvc,
           ab8500, max14577, atmel-usart
         - Match devices using platform IDs; bd7xxxx
         - Refactor BD718x7 regulator component; bd718x7-regulator
         - Use standard interfaces/helpers; syscon, sm501
         - Trivial (whitespace, spelling, etc); ab8500-core, Kconfig
         - Remove unused code; db8500-prcmu, tqmx86
         - Wait until boot has finished before accessing registers;
           madera-core
         - Provide missing register value defaults; cs47l15-tables
         - Allow more time for hardware to reset; madera-core
      
        Bug Fixes:
         - Fix erroneous register values; rohm-bd70528
         - Fix register volatility; axp20x, rn5t618
         - Fix Kconfig dependencies; MFD_MAX77650
         - Fix incorrect compatible string; da9062-core
         - Fix syscon_regmap_lookup_by_phandle_args() stub; syscon"
      
      * tag 'mfd-next-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (41 commits)
        mfd: syscon: Fix syscon_regmap_lookup_by_phandle_args() dummy
        mfd: wcd934x: Add support to wcd9340/wcd9341 codec
        mfd: syscon: Add arguments support for syscon reference
        mfd: rn5t618: Mark ADC control register volatile
        dt-bindings: atmel-usart: Add microchip,sam9x60-{usart, dbgu}
        dt-bindings: atmel-usart: Remove wildcard
        mfd: cros_ec: Add cros-usbpd-notify subdevice
        mfd: da9062: Fix watchdog compatible string
        mfd: madera: Allow more time for hardware reset
        mfd: cs47l15: Add missing register default
        mfd: madera: Wait for boot done before accessing any other registers
        mfd: Kconfig: Rename Samsung to lowercase
        mfd: tqmx86: remove set but not used variable 'i2c_ien'
        mfd: dbx500-prcmu: Drop DSI pll clock functions
        mfd: dbx500-prcmu: Drop set_display_clocks()
        mfd: max77650: Select REGMAP_IRQ in Kconfig
        mfd: axp20x: Mark AXP20X_VBUS_IPSOUT_MGMT as volatile
        mfd: ab8500: Fix ab8500-clk typo
        mfd: intel-lpss: Add Intel Jasper Lake PCI IDs
        dt-bindings: mfd: max14577: Add reference to max14040_battery.txt descriptions
        ...
      af32f3a4
    • Linus Torvalds's avatar
      Merge tag 'hyperv-next-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux · d0fa9250
      Linus Torvalds authored
      Pull Hyper-V updates from Sasha Levin:
      
       - Most of the commits here are work to enable host-initiated
         hibernation support by Dexuan Cui.
      
       - Fix for a warning shown when host sends non-aligned balloon requests
         by Tianyu Lan.
      
      * tag 'hyperv-next-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
        hv_utils: Add the support of hibernation
        hv_utils: Support host-initiated hibernation request
        hv_utils: Support host-initiated restart request
        Tools: hv: Reopen the devices if read() or write() returns errors
        video: hyperv: hyperv_fb: Use physical memory for fb on HyperV Gen 1 VMs.
        Drivers: hv: vmbus: Ignore CHANNELMSG_TL_CONNECT_RESULT(23)
        video: hyperv_fb: Fix hibernation for the deferred IO feature
        Input: hyperv-keyboard: Add the support of hibernation
        hv_balloon: Balloon up according to request page number
      d0fa9250
    • Geert Uytterhoeven's avatar
      mfd: syscon: Fix syscon_regmap_lookup_by_phandle_args() dummy · 5312f321
      Geert Uytterhoeven authored
      If CONFIG_MFD_SYSCON=n:
      
          include/linux/mfd/syscon.h:54:23: warning: ‘syscon_regmap_lookup_by_phandle_args’ defined but not used [-Wunused-function]
      
      Fix this by adding the missing inline keyword.
      
      Fixes: 6a24f567 ("mfd: syscon: Add arguments support for syscon reference")
      Signed-off-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarLee Jones <lee.jones@linaro.org>
      5312f321
  3. 02 Feb, 2020 5 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc · 46d6b7be
      Linus Torvalds authored
      Pull sparc fix from David Miller:
       "adjtimex regression fix from Arnd"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
        sparc64: fix adjtimex regression
      46d6b7be
    • Linus Torvalds's avatar
      Merge tag 'leds-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/pavel/linux-leds · 545ae665
      Linus Torvalds authored
      Pull LED updates from Pavel Machek:
      
       - New driver for TI TPS6105X
      
       - Add managed API to get a LED from a device driver
      
       - Misc fixes and updates
      
      * tag 'leds-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/pavel/linux-leds: (22 commits)
        leds: lm3692x: Disable chip on brightness 0
        leds: lm3692x: Split out lm3692x_leds_disable
        leds: lm3692x: Move lm3692x_init and rename to lm3692x_leds_enable
        leds: lm3692x: Make sure we don't exceed the maximum LED current
        dt: bindings: lm3692x: Add led-max-microamp property
        leds: lm3692x: Allow to configure over voltage protection
        dt: bindings: lm3692x: Add ti,ovp-microvolt property
        leds: populate the device's of_node
        leds: Add managed API to get a LED from a device driver
        leds: Add of_led_get() and led_put()
        leds: lm3532: add pointer to documentation and fix typo
        leds: lm3532: use extended registration so that LED can be used for backlight
        leds: lm3642: remove warnings for bad strtol, cleanup gotos
        leds: rb532: cleanup whitespace
        ledtrig-pattern: fix email address quoting in MODULE_AUTHOR()
        dt-bindings: mfd: update TI tps6105x chip bindings
        leds: tps6105x: add driver for MFD chip LED mode
        led: max77650: add of_match table
        leds: bd2802: Convert to use GPIO descriptors
        leds: pca963x: Fix open-drain initialization
        ...
      545ae665
    • Linus Torvalds's avatar
      Merge branch 'pcmcia-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux · 15f8e733
      Linus Torvalds authored
      Pull pcmcia updates from Dominik Brodowski:
       "This is a series co-developed by Simon Geis and Lukas Panzer to clean
        up the i82092 PCMCIA device driver"
      
      * 'pcmcia-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux:
        PCMCIA/i82092: remove #if 0 block
        PCMCIA/i82092: delete enter/leave macro
        PCMCIA/i82092: include <linux/io.h> instead of <asm/io.h>
        PCMCIA/i82092: shorten the lines with over 80 characters
        PCMCIA/i82092: move assignment out of if condition
        PCMCIA/i82092: change code indentation
        PCMCIA/i82092: insert blank line after declarations
        PCMCIA/i82092: remove braces around single statement blocks
        PCMCIA/i82092: add/remove spaces to improve readability
        PCMCIA/i82092: use dev_<level> instead of printk
      15f8e733
    • Josef Bacik's avatar
      btrfs: do not zero f_bavail if we have available space · d55966c4
      Josef Bacik authored
      There was some logic added a while ago to clear out f_bavail in statfs()
      if we did not have enough free metadata space to satisfy our global
      reserve.  This was incorrect at the time, however didn't really pose a
      problem for normal file systems because we would often allocate chunks
      if we got this low on free metadata space, and thus wouldn't really hit
      this case unless we were actually full.
      
      Fast forward to today and now we are much better about not allocating
      metadata chunks all of the time.  Couple this with d792b0f1 ("btrfs:
      always reserve our entire size for the global reserve") which now means
      we'll easily have a larger global reserve than our free space, we are
      now more likely to trip over this while still having plenty of space.
      
      Fix this by skipping this logic if the global rsv's space_info is not
      full.  space_info->full is 0 unless we've attempted to allocate a chunk
      for that space_info and that has failed.  If this happens then the space
      for the global reserve is definitely sacred and we need to report
      b_avail == 0, but before then we can just use our calculated b_avail.
      Reported-by: default avatarMartin Steigerwald <martin@lichtvoll.de>
      Fixes: ca8a51b3 ("btrfs: statfs: report zero available if metadata are exhausted")
      CC: stable@vger.kernel.org # 4.5+
      Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
      Tested-By: default avatarMartin Steigerwald <martin@lichtvoll.de>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      d55966c4
    • Arnd Bergmann's avatar
      sparc64: fix adjtimex regression · 11648b83
      Arnd Bergmann authored
      Anatoly Pugachev reported one of the y2038 patches to introduce
      a fatal bug from a stupid typo:
      
      [   96.384129] watchdog: BUG: soft lockup - CPU#8 stuck for 22s!
      ...
      [   96.385624]  [0000000000652ca4] handle_mm_fault+0x84/0x320
      [   96.385668]  [0000000000b6f2bc] do_sparc64_fault+0x43c/0x820
      [   96.385720]  [0000000000407754] sparc64_realfault_common+0x10/0x20
      [   96.385769]  [000000000042fa28] __do_sys_sparc_clock_adjtime+0x28/0x80
      [   96.385819]  [00000000004307f0] sys_sparc_clock_adjtime+0x10/0x20
      [   96.385866]  [0000000000406294] linux_sparc_syscall+0x34/0x44
      
      Fix the code to dereference the correct pointer again.
      Reported-by: default avatarAnatoly Pugachev <matorola@gmail.com>
      Tested-by: default avatarAnatoly Pugachev <matorola@gmail.com>
      Fixes: 251ec1c1 ("y2038: sparc: remove use of struct timex")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      11648b83
  4. 01 Feb, 2020 1 commit