1. 18 Jul, 2022 40 commits
    • Anshuman Khandual's avatar
      mm/mmap: build protect protection_map[] with ARCH_HAS_VM_GET_PAGE_PROT · 09095f74
      Anshuman Khandual authored
      Now that protection_map[] has been moved inside those platforms that
      enable ARCH_HAS_VM_GET_PAGE_PROT.  Hence generic protection_map[] array
      now can be protected with CONFIG_ARCH_HAS_VM_GET_PAGE_PROT intead of
      __P000.
      
      Link: https://lkml.kernel.org/r/20220711070600.2378316-8-anshuman.khandual@arm.comSigned-off-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Reviewed-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Brian Cain <bcain@quicinc.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Dinh Nguyen <dinguyen@kernel.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vineet Gupta <vgupta@kernel.org>
      Cc: WANG Xuerui <kernel@xen0n.name>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      09095f74
    • Anshuman Khandual's avatar
      x86/mm: move protection_map[] inside the platform · 4867fbbd
      Anshuman Khandual authored
      This moves protection_map[] inside the platform and makes it a static. 
      This also defines a helper function add_encrypt_protection_map() that can
      update the protection_map[] array with pgprot_encrypted().
      
      Link: https://lkml.kernel.org/r/20220711070600.2378316-7-anshuman.khandual@arm.comSigned-off-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Brian Cain <bcain@quicinc.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Dinh Nguyen <dinguyen@kernel.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vineet Gupta <vgupta@kernel.org>
      Cc: WANG Xuerui <kernel@xen0n.name>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4867fbbd
    • Anshuman Khandual's avatar
      arm64/mm: move protection_map[] inside the platform · 42251045
      Anshuman Khandual authored
      This moves protection_map[] inside the platform and makes it a static.
      
      Link: https://lkml.kernel.org/r/20220711070600.2378316-6-anshuman.khandual@arm.comSigned-off-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Reviewed-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Brian Cain <bcain@quicinc.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Dinh Nguyen <dinguyen@kernel.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vineet Gupta <vgupta@kernel.org>
      Cc: WANG Xuerui <kernel@xen0n.name>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      42251045
    • Anshuman Khandual's avatar
      sparc/mm: move protection_map[] inside the platform · 25740d31
      Anshuman Khandual authored
      This moves protection_map[] inside the platform and while here, also
      enable ARCH_HAS_VM_GET_PAGE_PROT on 32 bit platforms via
      DECLARE_VM_GET_PAGE_PROT.
      
      Link: https://lkml.kernel.org/r/20220711070600.2378316-5-anshuman.khandual@arm.comSigned-off-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Reviewed-by: default avatarSam Ravnborg <sam@ravnborg.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Brian Cain <bcain@quicinc.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Dinh Nguyen <dinguyen@kernel.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vineet Gupta <vgupta@kernel.org>
      Cc: WANG Xuerui <kernel@xen0n.name>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      25740d31
    • Anshuman Khandual's avatar
      powerpc/mm: move protection_map[] inside the platform · 6eac1eaf
      Anshuman Khandual authored
      This moves protection_map[] inside the platform and while here, also
      enable ARCH_HAS_VM_GET_PAGE_PROT on 32 bit and nohash 64 (aka book3e/64)
      platforms via DECLARE_VM_GET_PAGE_PROT.
      
      Link: https://lkml.kernel.org/r/20220711070600.2378316-4-anshuman.khandual@arm.comSigned-off-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Reviewed-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Brian Cain <bcain@quicinc.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Dinh Nguyen <dinguyen@kernel.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vineet Gupta <vgupta@kernel.org>
      Cc: WANG Xuerui <kernel@xen0n.name>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6eac1eaf
    • Anshuman Khandual's avatar
      mm/mmap: define DECLARE_VM_GET_PAGE_PROT · 43957b5d
      Anshuman Khandual authored
      This just converts the generic vm_get_page_prot() implementation into a
      new macro i.e DECLARE_VM_GET_PAGE_PROT which later can be used across
      platforms when enabling them with ARCH_HAS_VM_GET_PAGE_PROT.  This does
      not create any functional change.
      
      Link: https://lkml.kernel.org/r/20220711070600.2378316-3-anshuman.khandual@arm.comSigned-off-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Reviewed-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Suggested-by: default avatarChristoph Hellwig <hch@infradead.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Brian Cain <bcain@quicinc.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Dinh Nguyen <dinguyen@kernel.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vineet Gupta <vgupta@kernel.org>
      Cc: WANG Xuerui <kernel@xen0n.name>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      43957b5d
    • Anshuman Khandual's avatar
      mm/mmap: build protect protection_map[] with __P000 · 84053271
      Anshuman Khandual authored
      Patch series "mm/mmap: Drop __SXXX/__PXXX macros from across platforms",
      v7.
      
      __SXXX/__PXXX macros are unnecessary abstraction layer in creating the
      generic protection_map[] array which is used for vm_get_page_prot().  This
      abstraction layer can be avoided, if the platforms just define the array
      protection_map[] for all possible vm_flags access permission combinations
      and also export vm_get_page_prot() implementation.
      
      This series drops __SXXX/__PXXX macros from across platforms in the tree. 
      First it build protects generic protection_map[] array with '#ifdef
      __P000' and moves it inside platforms which enable
      ARCH_HAS_VM_GET_PAGE_PROT.  Later this build protects same array with
      '#ifdef ARCH_HAS_VM_GET_PAGE_PROT' and moves inside remaining platforms
      while enabling ARCH_HAS_VM_GET_PAGE_PROT.  This adds a new macro
      DECLARE_VM_GET_PAGE_PROT defining the current generic vm_get_page_prot(),
      in order for it to be reused on platforms that do not require custom
      implementation.  Finally, ARCH_HAS_VM_GET_PAGE_PROT can just be dropped,
      as all platforms now define and export vm_get_page_prot(), via looking up
      a private and static protection_map[] array.  protection_map[] data type
      has been changed as 'static const' on all platforms that do not change it
      during boot.
      
      
      This patch (of 26):
      
      Build protect generic protection_map[] array with __P000, so that it can
      be moved inside all the platforms one after the other.  Otherwise there
      will be build failures during this process. 
      CONFIG_ARCH_HAS_VM_GET_PAGE_PROT cannot be used for this purpose as only
      certain platforms enable this config now.
      
      Link: https://lkml.kernel.org/r/20220711070600.2378316-1-anshuman.khandual@arm.com
      Link: https://lkml.kernel.org/r/20220711070600.2378316-2-anshuman.khandual@arm.comSigned-off-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Suggested-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Brian Cain <bcain@quicinc.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Dinh Nguyen <dinguyen@kernel.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vineet Gupta <vgupta@kernel.org>
      Cc: WANG Xuerui <kernel@xen0n.name>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      84053271
    • Linus Walleij's avatar
      mm: nommu: pass a pointer to virt_to_page() · 9330723c
      Linus Walleij authored
      Functions that work on a pointer to virtual memory such as virt_to_pfn()
      and users of that function such as virt_to_page() are supposed to pass a
      pointer to virtual memory, ideally a (void *) or other pointer.  However
      since many architectures implement virt_to_pfn() as a macro, this function
      becomes polymorphic and accepts both a (unsigned long) and a (void *).
      
      If we instead implement a proper virt_to_pfn(void *addr) function the
      following happens (occurred on arch/arm):
      
        mm/nommu.c: In function 'free_page_series':
        mm/nommu.c:501:50: warning: passing argument 1 of 'virt_to_pfn'
        makes pointer from integer without a cast [-Wint-conversion]
        struct page *page = virt_to_page(from);
      
      Fix this with an explicit cast.
      
      Link: https://lkml.kernel.org/r/20220630084124.691207-6-linus.walleij@linaro.orgSigned-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Marco Elver <elver@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9330723c
    • Linus Walleij's avatar
      mm: gup: pass a pointer to virt_to_page() · 396a400b
      Linus Walleij authored
      Functions that work on a pointer to virtual memory such as virt_to_pfn()
      and users of that function such as virt_to_page() are supposed to pass a
      pointer to virtual memory, ideally a (void *) or other pointer.  However
      since many architectures implement virt_to_pfn() as a macro, this function
      becomes polymorphic and accepts both a (unsigned long) and a (void *).
      
      If we instead implement a proper virt_to_pfn(void *addr) function the
      following happens (occurred on arch/arm):
      
        mm/gup.c: In function '__get_user_pages_locked':
        mm/gup.c:1599:49: warning: passing argument 1 of 'virt_to_pfn'
          makes pointer from integer without a cast [-Wint-conversion]
          pages[i] = virt_to_page(start);
      
      Fix this with an explicit cast.
      
      Link: https://lkml.kernel.org/r/20220630084124.691207-5-linus.walleij@linaro.orgSigned-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Reviewed-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Marco Elver <elver@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      396a400b
    • Linus Walleij's avatar
      mm: kfence: pass a pointer to virt_to_page() · 9e7ee421
      Linus Walleij authored
      Functions that work on a pointer to virtual memory such as virt_to_pfn()
      and users of that function such as virt_to_page() are supposed to pass a
      pointer to virtual memory, ideally a (void *) or other pointer.  However
      since many architectures implement virt_to_pfn() as a macro, this function
      becomes polymorphic and accepts both a (unsigned long) and a (void *).
      
      If we instead implement a proper virt_to_pfn(void *addr) function the
      following happens (occurred on arch/arm):
      
      mm/kfence/core.c:558:30: warning: passing argument 1
        of 'virt_to_pfn' makes pointer from integer without a
        cast [-Wint-conversion]
      
      In one case we can refer to __kfence_pool directly (and that is a proper
      (char *) pointer) and in the other call site we use an explicit cast.
      
      Link: https://lkml.kernel.org/r/20220630084124.691207-4-linus.walleij@linaro.orgSigned-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Reviewed-by: default avatarMarco Elver <elver@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9e7ee421
    • Linus Walleij's avatar
      mm/highmem: pass a pointer to virt_to_page() · 259ecb34
      Linus Walleij authored
      Functions that work on a pointer to virtual memory such as virt_to_pfn()
      and users of that function such as virt_to_page() are supposed to pass a
      pointer to virtual memory, ideally a (void *) or other pointer.  However
      since many architectures implement virt_to_pfn() as a macro, this function
      becomes polymorphic and accepts both a (unsigned long) and a (void *).
      
      If we instead implement a proper virt_to_pfn(void *addr) function the
      following happens (occurred on arch/arm):
      
      mm/highmem.c:153:29: warning: passing argument 1 of
        'virt_to_pfn' makes pointer from integer without a
        cast [-Wint-conversion]
      
      We already have a proper void * pointer in the scope of this function
      named "vaddr" so pass that instead.
      
      Link: https://lkml.kernel.org/r/20220630084124.691207-3-linus.walleij@linaro.orgSigned-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Marco Elver <elver@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      259ecb34
    • Linus Walleij's avatar
      lib/test_free_pages.c: pass a pointer to virt_to_page() · b3c56f8f
      Linus Walleij authored
      In a recent change to the Arm architecture with the end goal of removing
      highmem we need to convert virt_to_phys() and virt_to_pfn() to static
      inline functions.
      
      This will make them strongly typed.
      
      However since virt_to_* is always implemented as macros they have become
      polymorphic and accept both (void *) and e.g.  unsigned long as arguments.
      
      Other functions such as virt_to_page() simply wrap virt_to_pfn() and get
      affected indirectly.
      
      To be able to proceed, patch mm to use (void *) as argument to affected
      functions in all instances.
      
      
      This patch (of 5):
      
      A pointer into virtual memory is represented by a (void *) not an u32, so
      the compiler warns:
      
      lib/test_free_pages.c:20:50: warning: passing argument 1
        of 'virt_to_pfn' makes pointer from integer without a
        cast [-Wint-conversion]
      
      Fix this with an explicit cast.
      
      Link: https://lkml.kernel.org/r/20220630084124.691207-1-linus.walleij@linaro.org
      Link: https://lkml.kernel.org/r/20220630084124.691207-2-linus.walleij@linaro.orgSigned-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Marco Elver <elver@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      b3c56f8f
    • Xiang Yang's avatar
      mm/memcontrol.c: replace cgroup_memory_nokmem with mem_cgroup_kmem_disabled() · 9c94bef9
      Xiang Yang authored
      mem_cgroup_kmem_disabled() checks whether the kmem accounting is off. 
      Therefore, replace cgroup_memory_nokmem with mem_cgroup_kmem_disabled(),
      which is the same work in percpu.c and slab_common.c.
      
      Link: https://lkml.kernel.org/r/20220625061844.226764-1-xiangyang3@huawei.comSigned-off-by: default avatarXiang Yang <xiangyang3@huawei.com>
      Reviewed-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Acked-by: default avatarRoman Gushchin <roman.gushchin@linux.dev>
      Acked-by: default avatarSouptick Joarder (HPE) <jrdr.linux@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Shakeel Butt <shakeelb@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9c94bef9
    • Mel Gorman's avatar
      mm/page_alloc: replace local_lock with normal spinlock · 01b44456
      Mel Gorman authored
      struct per_cpu_pages is no longer strictly local as PCP lists can be
      drained remotely using a lock for protection.  While the use of local_lock
      works, it goes against the intent of local_lock which is for "pure CPU
      local concurrency control mechanisms and not suited for inter-CPU
      concurrency control" (Documentation/locking/locktypes.rst)
      
      local_lock protects against migration between when the percpu pointer is
      accessed and the pcp->lock acquired.  The lock acquisition is a preemption
      point so in the worst case, a task could migrate to another NUMA node and
      accidentally allocate remote memory.  The main requirement is to pin the
      task to a CPU that is suitable for PREEMPT_RT and !PREEMPT_RT.
      
      Replace local_lock with helpers that pin a task to a CPU, lookup the
      per-cpu structure and acquire the embedded lock.  It's similar to
      local_lock without breaking the intent behind the API.  It is not a
      complete API as only the parts needed for PCP-alloc are implemented but in
      theory, the generic helpers could be promoted to a general API if there
      was demand for an embedded lock within a per-cpu struct with a guarantee
      that the per-cpu structure locked matches the running CPU and cannot use
      get_cpu_var due to RT concerns.  PCP requires these semantics to avoid
      accidentally allocating remote memory.
      
      [mgorman@techsingularity.net: use pcp_spin_trylock_irqsave instead of pcpu_spin_trylock_irqsave]
        Link: https://lkml.kernel.org/r/20220627084645.GA27531@techsingularity.net
      Link: https://lkml.kernel.org/r/20220624125423.6126-8-mgorman@techsingularity.netSigned-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Tested-by: default avatarYu Zhao <yuzhao@google.com>
      Reviewed-by: default avatarNicolas Saenz Julienne <nsaenzju@redhat.com>
      Tested-by: default avatarNicolas Saenz Julienne <nsaenzju@redhat.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Tested-by: default avatarYu Zhao <yuzhao@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      01b44456
    • Nicolas Saenz Julienne's avatar
      mm/page_alloc: remotely drain per-cpu lists · 443c2acc
      Nicolas Saenz Julienne authored
      Some setups, notably NOHZ_FULL CPUs, are too busy to handle the per-cpu
      drain work queued by __drain_all_pages().  So introduce a new mechanism to
      remotely drain the per-cpu lists.  It is made possible by remotely locking
      'struct per_cpu_pages' new per-cpu spinlocks.  A benefit of this new
      scheme is that drain operations are now migration safe.
      
      There was no observed performance degradation vs.  the previous scheme. 
      Both netperf and hackbench were run in parallel to triggering the
      __drain_all_pages(NULL, true) code path around ~100 times per second.  The
      new scheme performs a bit better (~5%), although the important point here
      is there are no performance regressions vs.  the previous mechanism. 
      Per-cpu lists draining happens only in slow paths.
      
      Minchan Kim tested an earlier version and reported;
      
      	My workload is not NOHZ CPUs but run apps under heavy memory
      	pressure so they goes to direct reclaim and be stuck on
      	drain_all_pages until work on workqueue run.
      
      	unit: nanosecond
      	max(dur)        avg(dur)                count(dur)
      	166713013       487511.77786438033      1283
      
      	From traces, system encountered the drain_all_pages 1283 times and
      	worst case was 166ms and avg was 487us.
      
      	The other problem was alloc_contig_range in CMA. The PCP draining
      	takes several hundred millisecond sometimes though there is no
      	memory pressure or a few of pages to be migrated out but CPU were
      	fully booked.
      
      	Your patch perfectly removed those wasted time.
      
      Link: https://lkml.kernel.org/r/20220624125423.6126-7-mgorman@techsingularity.netSigned-off-by: default avatarNicolas Saenz Julienne <nsaenzju@redhat.com>
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Tested-by: default avatarYu Zhao <yuzhao@google.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      443c2acc
    • Mel Gorman's avatar
      mm/page_alloc: protect PCP lists with a spinlock · 4b23a68f
      Mel Gorman authored
      Currently the PCP lists are protected by using local_lock_irqsave to
      prevent migration and IRQ reentrancy but this is inconvenient.  Remote
      draining of the lists is impossible and a workqueue is required and every
      task allocation/free must disable then enable interrupts which is
      expensive.
      
      As preparation for dealing with both of those problems, protect the
      lists with a spinlock.  The IRQ-unsafe version of the lock is used
      because IRQs are already disabled by local_lock_irqsave.  spin_trylock
      is used in combination with local_lock_irqsave() but later will be
      replaced with a spin_trylock_irqsave when the local_lock is removed.
      
      The per_cpu_pages still fits within the same number of cache lines after
      this patch relative to before the series.
      
      struct per_cpu_pages {
              spinlock_t                 lock;                 /*     0     4 */
              int                        count;                /*     4     4 */
              int                        high;                 /*     8     4 */
              int                        batch;                /*    12     4 */
              short int                  free_factor;          /*    16     2 */
              short int                  expire;               /*    18     2 */
      
              /* XXX 4 bytes hole, try to pack */
      
              struct list_head           lists[13];            /*    24   208 */
      
              /* size: 256, cachelines: 4, members: 7 */
              /* sum members: 228, holes: 1, sum holes: 4 */
              /* padding: 24 */
      } __attribute__((__aligned__(64)));
      
      There is overhead in the fast path due to acquiring the spinlock even
      though the spinlock is per-cpu and uncontended in the common case.  Page
      Fault Test (PFT) running on a 1-socket reported the following results on a
      1 socket machine.
      
                                           5.19.0-rc3               5.19.0-rc3
                                              vanilla      mm-pcpspinirq-v5r16
      Hmean     faults/sec-1   869275.7381 (   0.00%)   874597.5167 *   0.61%*
      Hmean     faults/sec-3  2370266.6681 (   0.00%)  2379802.0362 *   0.40%*
      Hmean     faults/sec-5  2701099.7019 (   0.00%)  2664889.7003 *  -1.34%*
      Hmean     faults/sec-7  3517170.9157 (   0.00%)  3491122.8242 *  -0.74%*
      Hmean     faults/sec-8  3965729.6187 (   0.00%)  3939727.0243 *  -0.66%*
      
      There is a small hit in the number of faults per second but given that the
      results are more stable, it's borderline noise.
      
      [akpm@linux-foundation.org: add missing local_unlock_irqrestore() on contention path]
      Link: https://lkml.kernel.org/r/20220624125423.6126-6-mgorman@techsingularity.netSigned-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Tested-by: default avatarYu Zhao <yuzhao@google.com>
      Reviewed-by: default avatarNicolas Saenz Julienne <nsaenzju@redhat.com>
      Tested-by: default avatarNicolas Saenz Julienne <nsaenzju@redhat.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4b23a68f
    • Mel Gorman's avatar
      mm/page_alloc: remove mistaken page == NULL check in rmqueue · e2a66c21
      Mel Gorman authored
      If a page allocation fails, the ZONE_BOOSTER_WATERMARK should be tested,
      cleared and kswapd woken whether the allocation attempt was via the PCP or
      directly via the buddy list.
      
      Remove the page == NULL so the ZONE_BOOSTED_WATERMARK bit is checked
      unconditionally.  As it is unlikely that ZONE_BOOSTED_WATERMARK is set,
      mark the branch accordingly.
      
      Link: https://lkml.kernel.org/r/20220624125423.6126-5-mgorman@techsingularity.netSigned-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Tested-by: default avatarYu Zhao <yuzhao@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Nicolas Saenz Julienne <nsaenzju@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e2a66c21
    • Mel Gorman's avatar
      mm/page_alloc: split out buddy removal code from rmqueue into separate helper · 589d9973
      Mel Gorman authored
      This is a preparation page to allow the buddy removal code to be reused in
      a later patch.
      
      No functional change.
      
      Link: https://lkml.kernel.org/r/20220624125423.6126-4-mgorman@techsingularity.netSigned-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Tested-by: default avatarMinchan Kim <minchan@kernel.org>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Reviewed-by: default avatarNicolas Saenz Julienne <nsaenzju@redhat.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Tested-by: default avatarYu Zhao <yuzhao@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      589d9973
    • Mel Gorman's avatar
      mm/page_alloc: use only one PCP list for THP-sized allocations · 5d0a661d
      Mel Gorman authored
      The per_cpu_pages is cache-aligned on a standard x86-64 distribution
      configuration but a later patch will add a new field which would push the
      structure into the next cache line.  Use only one list to store THP-sized
      pages on the per-cpu list.  This assumes that the vast majority of
      THP-sized allocations are GFP_MOVABLE but even if it was another type, it
      would not contribute to serious fragmentation that potentially causes a
      later THP allocation failure.  Align per_cpu_pages on the cacheline
      boundary to ensure there is no false cache sharing.
      
      After this patch, the structure sizing is;
      
      struct per_cpu_pages {
              int                        count;                /*     0     4 */
              int                        high;                 /*     4     4 */
              int                        batch;                /*     8     4 */
              short int                  free_factor;          /*    12     2 */
              short int                  expire;               /*    14     2 */
              struct list_head           lists[13];            /*    16   208 */
      
              /* size: 256, cachelines: 4, members: 6 */
              /* padding: 32 */
      } __attribute__((__aligned__(64)));
      
      Link: https://lkml.kernel.org/r/20220624125423.6126-3-mgorman@techsingularity.netSigned-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Tested-by: default avatarMinchan Kim <minchan@kernel.org>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Tested-by: default avatarYu Zhao <yuzhao@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Nicolas Saenz Julienne <nsaenzju@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      5d0a661d
    • Mel Gorman's avatar
      mm/page_alloc: add page->buddy_list and page->pcp_list · bf75f200
      Mel Gorman authored
      Patch series "Drain remote per-cpu directly", v5.
      
      Some setups, notably NOHZ_FULL CPUs, may be running realtime or
      latency-sensitive applications that cannot tolerate interference due to
      per-cpu drain work queued by __drain_all_pages().  Introduce a new
      mechanism to remotely drain the per-cpu lists.  It is made possible by
      remotely locking 'struct per_cpu_pages' new per-cpu spinlocks.  This has
      two advantages, the time to drain is more predictable and other unrelated
      tasks are not interrupted.
      
      This series has the same intent as Nicolas' series "mm/page_alloc: Remote
      per-cpu lists drain support" -- avoid interference of a high priority task
      due to a workqueue item draining per-cpu page lists.  While many workloads
      can tolerate a brief interruption, it may cause a real-time task running
      on a NOHZ_FULL CPU to miss a deadline and at minimum, the draining is
      non-deterministic.
      
      Currently an IRQ-safe local_lock protects the page allocator per-cpu
      lists.  The local_lock on its own prevents migration and the IRQ disabling
      protects from corruption due to an interrupt arriving while a page
      allocation is in progress.
      
      This series adjusts the locking.  A spinlock is added to struct
      per_cpu_pages to protect the list contents while local_lock_irq is
      ultimately replaced by just the spinlock in the final patch.  This allows
      a remote CPU to safely.  Follow-on work should allow the spin_lock_irqsave
      to be converted to spin_lock to avoid IRQs being disabled/enabled in most
      cases.  The follow-on patch will be one kernel release later as it is
      relatively high risk and it'll make bisections more clear if there are any
      problems.
      
      Patch 1 is a cosmetic patch to clarify when page->lru is storing buddy pages
      	and when it is storing per-cpu pages.
      
      Patch 2 shrinks per_cpu_pages to make room for a spin lock. Strictly speaking
      	this is not necessary but it avoids per_cpu_pages consuming another
      	cache line.
      
      Patch 3 is a preparation patch to avoid code duplication.
      
      Patch 4 is a minor correction.
      
      Patch 5 uses a spin_lock to protect the per_cpu_pages contents while still
      	relying on local_lock to prevent migration, stabilise the pcp
      	lookup and prevent IRQ reentrancy.
      
      Patch 6 remote drains per-cpu pages directly instead of using a workqueue.
      
      Patch 7 uses a normal spinlock instead of local_lock for remote draining
      
      
      This patch (of 7):
      
      The page allocator uses page->lru for storing pages on either buddy or PCP
      lists.  Create page->buddy_list and page->pcp_list as a union with
      page->lru.  This is simply to clarify what type of list a page is on in
      the page allocator.
      
      No functional change intended.
      
      [minchan@kernel.org: fix page lru fields in macros]
      Link: https://lkml.kernel.org/r/20220624125423.6126-2-mgorman@techsingularity.netSigned-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Tested-by: default avatarMinchan Kim <minchan@kernel.org>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Reviewed-by: default avatarNicolas Saenz Julienne <nsaenzju@redhat.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Tested-by: default avatarYu Zhao <yuzhao@google.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      bf75f200
    • Mike Kravetz's avatar
      hugetlb: lazy page table copies in fork() · bcd51a3c
      Mike Kravetz authored
      Lazy page table copying at fork time was introduced with commit
      d992895b ("[PATCH] Lazy page table copies in fork()").  At the time,
      hugetlb was very new and did not support page faulting.  As a result, it
      was excluded.  When full page fault support was added for hugetlb, the
      exclusion was not removed.
      
      Simply remove the check that prevents lazy copying of hugetlb page tables
      at fork.  Of course, like other mappings this only applies to shared
      mappings.
      
      Lazy page table copying at fork will be less advantageous for hugetlb
      mappings because:
      - There are fewer page table entries with hugetlb
      - hugetlb pmds can be shared instead of copied
      
      In any case, completely eliminating the copy at fork time should speed
      things up.
      
      Link: https://lkml.kernel.org/r/20220621235620.291305-5-mike.kravetz@oracle.comSigned-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
      Cc: James Houghton <jthoughton@google.com>
      Cc: kernel test robot <lkp@intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Rolf Eike Beer <eike-kernel@sf-tec.de>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      bcd51a3c
    • Mike Kravetz's avatar
      hugetlb: do not update address in huge_pmd_unshare · 4ddb4d91
      Mike Kravetz authored
      As an optimization for loops sequentially processing hugetlb address
      ranges, huge_pmd_unshare would update a passed address if it unshared a
      pmd.  Updating a loop control variable outside the loop like this is
      generally a bad idea.  These loops are now using hugetlb_mask_last_page to
      optimize scanning when non-present ptes are discovered.  The same can be
      done when huge_pmd_unshare returns 1 indicating a pmd was unshared.
      
      Remove address update from huge_pmd_unshare.  Change the passed argument
      type and update all callers.  In loops sequentially processing addresses
      use hugetlb_mask_last_page to update address if pmd is unshared.
      
      [sfr@canb.auug.org.au: fix an unused variable warning/error]
        Link: https://lkml.kernel.org/r/20220622171117.70850960@canb.auug.org.au
      Link: https://lkml.kernel.org/r/20220621235620.291305-4-mike.kravetz@oracle.comSigned-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Acked-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Reviewed-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: James Houghton <jthoughton@google.com>
      Cc: kernel test robot <lkp@intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Rolf Eike Beer <eike-kernel@sf-tec.de>
      Cc: Will Deacon <will@kernel.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4ddb4d91
    • Baolin Wang's avatar
      arm64/hugetlb: implement arm64 specific hugetlb_mask_last_page · 1bcdb769
      Baolin Wang authored
      The HugeTLB address ranges are linearly scanned during fork, unmap and
      remap operations, and the linear scan can skip to the end of range mapped
      by the page table page if hitting a non-present entry, which can help to
      speed linear scanning of the HugeTLB address ranges.
      
      So hugetlb_mask_last_page() is introduced to help to update the address in
      the loop of HugeTLB linear scanning with getting the last huge page mapped
      by the associated page table page[1], when a non-present entry is
      encountered.
      
      Considering ARM64 specific cont-pte/pmd size HugeTLB, this patch
      implemented an ARM64 specific hugetlb_mask_last_page() to help this case.
      
      [1] https://lore.kernel.org/linux-mm/20220527225849.284839-1-mike.kravetz@oracle.com/
      
      [baolin.wang@linux.alibaba.com: fix build]
        Link: https://lkml.kernel.org/r/a14e7b39-6a8a-4609-b4a1-84ac574f5c96@linux.alibaba.com
      Link: https://lkml.kernel.org/r/20220621235620.291305-3-mike.kravetz@oracle.comSigned-off-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: James Houghton <jthoughton@google.com>
      Cc: kernel test robot <lkp@intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Rolf Eike Beer <eike-kernel@sf-tec.de>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1bcdb769
    • Mike Kravetz's avatar
      hugetlb: skip to end of PT page mapping when pte not present · e95a9851
      Mike Kravetz authored
      Patch series "hugetlb: speed up linear address scanning", v2.
      
      At unmap, fork and remap time hugetlb address ranges are linearly scanned.
      We can optimize these scans if the ranges are sparsely populated.
      
      Also, enable page table "Lazy copy" for hugetlb at fork.
      
      NOTE: Architectures not defining CONFIG_ARCH_WANT_GENERAL_HUGETLB need to
      add an arch specific version hugetlb_mask_last_page() to take advantage of
      sparse address scanning improvements.  Baolin Wang added the routine for
      arm64.  Other architectures which could be optimized are: ia64, mips,
      parisc, powerpc, s390, sh and sparc.
      
      
      This patch (of 4):
      
      HugeTLB address ranges are linearly scanned during fork, unmap and remap
      operations.  If a non-present entry is encountered, the code currently
      continues to the next huge page aligned address.  However, a non-present
      entry implies that the page table page for that entry is not present. 
      Therefore, the linear scan can skip to the end of range mapped by the page
      table page.  This can speed operations on large sparsely populated hugetlb
      mappings.
      
      Create a new routine hugetlb_mask_last_page() that will return an address
      mask.  When the mask is ORed with an address, the result will be the
      address of the last huge page mapped by the associated page table page. 
      Use this mask to update addresses in routines which linearly scan hugetlb
      address ranges when a non-present pte is encountered.
      
      hugetlb_mask_last_page is related to the implementation of huge_pte_offset
      as hugetlb_mask_last_page is called when huge_pte_offset returns NULL. 
      This patch only provides a complete hugetlb_mask_last_page implementation
      when CONFIG_ARCH_WANT_GENERAL_HUGETLB is defined.  Architectures which
      provide their own versions of huge_pte_offset can also provide their own
      version of hugetlb_mask_last_page.
      
      Link: https://lkml.kernel.org/r/20220621235620.291305-1-mike.kravetz@oracle.com
      Link: https://lkml.kernel.org/r/20220621235620.291305-2-mike.kravetz@oracle.comSigned-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Tested-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Reviewed-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Acked-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
      Cc: James Houghton <jthoughton@google.com>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Rolf Eike Beer <eike-kernel@sf-tec.de>
      Cc: David Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e95a9851
    • Kuan-Ying Lee's avatar
      kasan: separate double free case from invalid free · 3de0de75
      Kuan-Ying Lee authored
      Currently, KASAN describes all invalid-free/double-free bugs as
      "double-free or invalid-free".  This is ambiguous.
      
      KASAN should report "double-free" when a double-free is a more likely
      cause (the address points to the start of an object) and report
      "invalid-free" otherwise [1].
      
      [1] https://bugzilla.kernel.org/show_bug.cgi?id=212193
      
      Link: https://lkml.kernel.org/r/20220615062219.22618-1-Kuan-Ying.Lee@mediatek.comSigned-off-by: default avatarKuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
      Reviewed-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Reviewed-by: default avatarAndrey Konovalov <andreyknvl@gmail.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Matthias Brugger <matthias.bgg@gmail.com>
      Cc: Chinwen Chang <chinwen.chang@mediatek.com>
      Cc: Yee Lee <yee.lee@mediatek.com>
      Cc: Andrew Yang <andrew.yang@mediatek.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3de0de75
    • Yang Shi's avatar
      doc: proc: fix the description to THPeligible · cb55b838
      Yang Shi authored
      The THPeligible bit shows 1 if and only if the VMA is eligible for
      allocating THP and the THP is also PMD mappable.  Some misaligned file
      VMAs may be eligible for allocating THP but the THP can't be mapped by
      PMD.  Make this more explicitly to avoid ambiguity.
      
      Link: https://lkml.kernel.org/r/20220616174840.1202070-8-shy828301@gmail.comSigned-off-by: default avatarYang Shi <shy828301@gmail.com>
      Reviewed-by: default avatarZach O'Keefe <zokeefe@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      cb55b838
    • Yang Shi's avatar
      mm: khugepaged: reorg some khugepaged helpers · 1064026b
      Yang Shi authored
      The khugepaged_{enabled|always|req_madv} are not khugepaged only anymore,
      move them to huge_mm.h and rename to hugepage_flags_xxx, and remove
      khugepaged_req_madv due to no users.
      
      Also move khugepaged_defrag to khugepaged.c since its only caller is in
      that file, it doesn't have to be in a header file.
      
      Link: https://lkml.kernel.org/r/20220616174840.1202070-7-shy828301@gmail.comSigned-off-by: default avatarYang Shi <shy828301@gmail.com>
      Reviewed-by: default avatarZach O'Keefe <zokeefe@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1064026b
    • Yang Shi's avatar
      mm: thp: kill __transhuge_page_enabled() · 7da4e2cb
      Yang Shi authored
      The page fault path checks THP eligibility with __transhuge_page_enabled()
      which does the similar thing as hugepage_vma_check(), so use
      hugepage_vma_check() instead.
      
      However page fault allows DAX and !anon_vma cases, so added a new flag,
      in_pf, to hugepage_vma_check() to make page fault work correctly.
      
      The in_pf flag is also used to skip shmem and file THP for page fault
      since shmem handles THP in its own shmem_fault() and file THP allocation
      on fault is not supported yet.
      
      Also remove hugepage_vma_enabled() since hugepage_vma_check() is the only
      caller now, it is not necessary to have a helper function.
      
      Link: https://lkml.kernel.org/r/20220616174840.1202070-6-shy828301@gmail.comSigned-off-by: default avatarYang Shi <shy828301@gmail.com>
      Reviewed-by: default avatarZach O'Keefe <zokeefe@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      7da4e2cb
    • Yang Shi's avatar
      mm: thp: kill transparent_hugepage_active() · 9fec5168
      Yang Shi authored
      The transparent_hugepage_active() was introduced to show THP eligibility
      bit in smaps in proc, smaps is the only user.  But it actually does the
      similar check as hugepage_vma_check() which is used by khugepaged.  We
      definitely don't have to maintain two similar checks, so kill
      transparent_hugepage_active().
      
      This patch also fixed the wrong behavior for VM_NO_KHUGEPAGED vmas.
      
      Also move hugepage_vma_check() to huge_memory.c and huge_mm.h since it
      is not only for khugepaged anymore.
      
      [akpm@linux-foundation.org: check vma->vm_mm, per Zach]
      [akpm@linux-foundation.org: add comment to vdso check]
      Link: https://lkml.kernel.org/r/20220616174840.1202070-5-shy828301@gmail.comSigned-off-by: default avatarYang Shi <shy828301@gmail.com>
      Reviewed-by: default avatarZach O'Keefe <zokeefe@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9fec5168
    • Yang Shi's avatar
      mm: khugepaged: better comments for anon vma check in hugepage_vma_revalidate · f707fa49
      Yang Shi authored
      The hugepage_vma_revalidate() needs to check if the vma is still anonymous
      vma or not since the address may be unmapped then remapped to file before
      khugepaged reaquired the mmap_lock.
      
      The old comment is not quite helpful, elaborate this with better comment.
      
      Link: https://lkml.kernel.org/r/20220616174840.1202070-4-shy828301@gmail.comSigned-off-by: default avatarYang Shi <shy828301@gmail.com>
      Reviewed-by: default avatarZach O'Keefe <zokeefe@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f707fa49
    • Yang Shi's avatar
      mm: thp: consolidate vma size check to transhuge_vma_suitable · 4fa6893f
      Yang Shi authored
      There are couple of places that check whether the vma size is ok for THP
      or whether address fits, they are open coded and duplicate, use
      transhuge_vma_suitable() to do the job by passing in (vma->end -
      HPAGE_PMD_SIZE).
      
      Move vma size check into hugepage_vma_check().  This will make
      khugepaged_enter() is as same as khugepaged_enter_vma().  There is just
      one caller for khugepaged_enter(), replace it to khugepaged_enter_vma()
      and remove khugepaged_enter().
      
      Link: https://lkml.kernel.org/r/20220616174840.1202070-3-shy828301@gmail.comSigned-off-by: default avatarYang Shi <shy828301@gmail.com>
      Reviewed-by: default avatarZach O'Keefe <zokeefe@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4fa6893f
    • Yang Shi's avatar
      mm: khugepaged: check THP flag in hugepage_vma_check() · 66137fb3
      Yang Shi authored
      Patch series "Cleanup transhuge_xxx helpers", v5.
      
      This series is the follow-up of the discussion about cleaning up
      transhuge_xxx helpers at
      https://lore.kernel.org/linux-mm/627a71f8-e879-69a5-ceb3-fc8d29d2f7f1@suse.cz/.
      
      THP has a bunch of helpers that do VMA sanity check for different paths,
      they do the similar checks for the most callsites and have a lot duplicate
      codes.  And it is confusing what helpers should be used at what
      conditions.
      
      This series reorganized and cleaned up the code so that we could
      consolidate all the checks into hugepage_vma_check().
      
      The transhuge_vma_enabled(), transparent_hugepage_active() and
      __transparent_hugepage_enabled() are killed by this series.
      
      
      This patch (of 7):
      
      Currently the THP flag check in hugepage_vma_check() will fallthrough if
      the flag is NEVER and VM_HUGEPAGE is set.  This is not a problem for now
      since all the callers have the flag checked before or can't be invoked if
      the flag is NEVER.
      
      However, the following patch will call hugepage_vma_check() in more
      places, for example, page fault, so this flag must be checked in
      hugepge_vma_check().
      
      Link: https://lkml.kernel.org/r/20220616174840.1202070-1-shy828301@gmail.com
      Link: https://lkml.kernel.org/r/20220616174840.1202070-2-shy828301@gmail.comSigned-off-by: default avatarYang Shi <shy828301@gmail.com>
      Reviewed-by: default avatarZach O'Keefe <zokeefe@google.com>
      Reviewed-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      66137fb3
    • Shiyang Ruan's avatar
      xfs: add dax dedupe support · 13f9e267
      Shiyang Ruan authored
      Introduce xfs_mmaplock_two_inodes_and_break_dax_layout() for dax files who
      are going to be deduped.  After that, call compare range function only
      when files are both DAX or not.
      
      Link: https://lkml.kernel.org/r/20220603053738.1218681-15-ruansy.fnst@fujitsu.comSigned-off-by: default avatarShiyang Ruan <ruansy.fnst@fujitsu.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Dan Williams <dan.j.wiliams@intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
      Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Ritesh Harjani <riteshh@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      13f9e267
    • Shiyang Ruan's avatar
      xfs: support CoW in fsdax mode · ea6c49b7
      Shiyang Ruan authored
      In fsdax mode, WRITE and ZERO on a shared extent need CoW performed. 
      After that, new allocated extents needs to be remapped to the file.  So,
      add a CoW identification in ->iomap_begin(), and implement ->iomap_end()
      to do the remapping work.
      
      [akpm@linux-foundation.org: make xfs_dax_fault() static]
      Link: https://lkml.kernel.org/r/20220603053738.1218681-14-ruansy.fnst@fujitsu.comSigned-off-by: default avatarShiyang Ruan <ruansy.fnst@fujitsu.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.wiliams@intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
      Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Ritesh Harjani <riteshh@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      ea6c49b7
    • Shiyang Ruan's avatar
      fsdax: dedup file range to use a compare function · 6f7db389
      Shiyang Ruan authored
      With dax we cannot deal with readpage() etc.  So, we create a dax
      comparison function which is similar with vfs_dedupe_file_range_compare().
      And introduce dax_remap_file_range_prep() for filesystem use.
      
      Link: https://lkml.kernel.org/r/20220603053738.1218681-13-ruansy.fnst@fujitsu.comSigned-off-by: default avatarGoldwyn Rodrigues <rgoldwyn@suse.com>
      Signed-off-by: default avatarShiyang Ruan <ruansy.fnst@fujitsu.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Dan Williams <dan.j.wiliams@intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Ritesh Harjani <riteshh@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6f7db389
    • Shiyang Ruan's avatar
      fsdax: add dax_iomap_cow_copy() for dax zero · 8dbfc76d
      Shiyang Ruan authored
      Punch hole on a reflinked file needs dax_iomap_cow_copy() too.  Otherwise,
      data in not aligned area will be not correct.  So, add the CoW operation
      for not aligned case in dax_memzero().
      
      Link: https://lkml.kernel.org/r/20220603053738.1218681-12-ruansy.fnst@fujitsu.comSigned-off-by: default avatarShiyang Ruan <ruansy.fnst@fujitsu.com>
      Reviewed-by: default avatarRitesh Harjani <riteshh@linux.ibm.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Dan Williams <dan.j.wiliams@intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
      Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8dbfc76d
    • Shiyang Ruan's avatar
      fsdax: replace mmap entry in case of CoW · e5d6df73
      Shiyang Ruan authored
      Replace the existing entry to the newly allocated one in case of CoW. 
      Also, we mark the entry as PAGECACHE_TAG_TOWRITE so writeback marks this
      entry as writeprotected.  This helps us snapshots so new write pagefaults
      after snapshots trigger a CoW.
      
      Link: https://lkml.kernel.org/r/20220603053738.1218681-11-ruansy.fnst@fujitsu.comSigned-off-by: default avatarGoldwyn Rodrigues <rgoldwyn@suse.com>
      Signed-off-by: default avatarShiyang Ruan <ruansy.fnst@fujitsu.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarRitesh Harjani <riteshh@linux.ibm.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Dan Williams <dan.j.wiliams@intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e5d6df73
    • Shiyang Ruan's avatar
      fsdax: introduce dax_iomap_cow_copy() · ff17b8df
      Shiyang Ruan authored
      In the case where the iomap is a write operation and iomap is not equal to
      srcmap after iomap_begin, we consider it is a CoW operation.
      
      In this case, the destination (iomap->addr) points to a newly allocated
      extent.  It is needed to copy the data from srcmap to the extent.  In
      theory, it is better to copy the head and tail ranges which is outside of
      the non-aligned area instead of copying the whole aligned range.  But in
      dax page fault, it will always be an aligned range.  So copy the whole
      range in this case.
      
      Link: https://lkml.kernel.org/r/20220603053738.1218681-10-ruansy.fnst@fujitsu.comSigned-off-by: default avatarShiyang Ruan <ruansy.fnst@fujitsu.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Dan Williams <dan.j.wiliams@intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
      Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Ritesh Harjani <riteshh@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      ff17b8df
    • Shiyang Ruan's avatar
      fsdax: output address in dax_iomap_pfn() and rename it · e28cd3e5
      Shiyang Ruan authored
      Add address output in dax_iomap_pfn() in order to perform a memcpy() in
      CoW case.  Since this function both output address and pfn, rename it to
      dax_iomap_direct_access().
      
      [ruansy.fnst@fujitsu.com: initialize `rc', per Dan]
        Link: https://lore.kernel.org/linux-fsdevel/Yp8FUZnO64Qvyx5G@kili/
        Link: https://lkml.kernel.org/r/20220607143837.161174-1-ruansy.fnst@fujitsu.com
      Link: https://lkml.kernel.org/r/20220603053738.1218681-9-ruansy.fnst@fujitsu.comSigned-off-by: default avatarShiyang Ruan <ruansy.fnst@fujitsu.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarRitesh Harjani <riteshh@linux.ibm.com>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Dan Williams <dan.j.wiliams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
      Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e28cd3e5
    • Shiyang Ruan's avatar
      fsdax: set a CoW flag when associate reflink mappings · 6061b69b
      Shiyang Ruan authored
      Introduce a PAGE_MAPPING_DAX_COW flag to support association with CoW file
      mappings.  In this case, since the dax-rmap has already took the
      responsibility to look up for shared files by given dax page, the
      page->mapping is no longer to used for rmap but for marking that this dax
      page is shared.  And to make sure disassociation works fine, we use
      page->index as refcount, and clear page->mapping to the initial state when
      page->index is decreased to 0.
      
      With the help of this new flag, it is able to distinguish normal case and
      CoW case, and keep the warning in normal case.
      
      Link: https://lkml.kernel.org/r/20220603053738.1218681-8-ruansy.fnst@fujitsu.comSigned-off-by: default avatarShiyang Ruan <ruansy.fnst@fujitsu.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Dan Williams <dan.j.wiliams@intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
      Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Ritesh Harjani <riteshh@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6061b69b