1. 19 Jul, 2019 40 commits
    • Linus Torvalds's avatar
      Merge tag 'for-linus-5.3a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · b5d72dda
      Linus Torvalds authored
      Pull xen updates from Juergen Gross:
       "Fixes and features:
      
         - A series to introduce a common command line parameter for disabling
           paravirtual extensions when running as a guest in virtualized
           environment
      
         - A fix for int3 handling in Xen pv guests
      
         - Removal of the Xen-specific tmem driver as support of tmem in Xen
           has been dropped (and it was experimental only)
      
         - A security fix for running as Xen dom0 (XSA-300)
      
         - A fix for IRQ handling when offlining cpus in Xen guests
      
         - Some small cleanups"
      
      * tag 'for-linus-5.3a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xen: let alloc_xenballooned_pages() fail if not enough memory free
        xen/pv: Fix a boot up hang revealed by int3 self test
        x86/xen: Add "nopv" support for HVM guest
        x86/paravirt: Remove const mark from x86_hyper_xen_hvm variable
        xen: Map "xen_nopv" parameter to "nopv" and mark it obsolete
        x86: Add "nopv" parameter to disable PV extensions
        x86/xen: Mark xen_hvm_need_lapic() and xen_x2apic_para_available() as __init
        xen: remove tmem driver
        Revert "x86/paravirt: Set up the virt_spin_lock_key after static keys get initialized"
        xen/events: fix binding user event channels to cpus
      b5d72dda
    • Linus Torvalds's avatar
      Merge tag 'iomap-5.3-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 26473f83
      Linus Torvalds authored
      Pull iomap split/cleanup from Darrick Wong:
       "As promised, here's the second part of the iomap merge for 5.3, in
        which we break up iomap.c into smaller files grouped by functional
        area so that it'll be easier in the long run to maintain cohesiveness
        of code units and to review incoming patches. There are no functional
        changes and fs/iomap.c split cleanly.
      
        Summary:
      
         - Regroup the fs/iomap.c code by major functional area so that we can
           start development for 5.4 from a more stable base"
      
      * tag 'iomap-5.3-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        iomap: move internal declarations into fs/iomap/
        iomap: move the main iteration code into a separate file
        iomap: move the buffered IO code into a separate file
        iomap: move the direct IO code into a separate file
        iomap: move the SEEK_HOLE code into a separate file
        iomap: move the file mapping reporting code into a separate file
        iomap: move the swapfile code into a separate file
        iomap: start moving code to fs/iomap/
      26473f83
    • Linus Torvalds's avatar
      Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 4f5ed131
      Linus Torvalds authored
      Pull misc vfs updates from Al Viro:
       "Assorted stuff"
      
      * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        perf_event_get(): don't bother with fget_raw()
        vfs: update d_make_root() description
      4f5ed131
    • Linus Torvalds's avatar
      Merge branch 'work.adfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · d2fbf4b6
      Linus Torvalds authored
      Pull adfs updates from Al Viro:
       "More ADFS patches from Russell King"
      
      * 'work.adfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        fs/adfs: add time stamp and file type helpers
        fs/adfs: super: limit idlen according to directory type
        fs/adfs: super: fix use-after-free bug
        fs/adfs: super: safely update options on remount
        fs/adfs: super: correct superblock flags
        fs/adfs: clean up indirect disc addresses and fragment IDs
        fs/adfs: clean up error message printing
        fs/adfs: use %pV for error messages
        fs/adfs: use format_version from disc_record
        fs/adfs: add helper to get filesystem size
        fs/adfs: add helper to get discrecord from map
        fs/adfs: correct disc record structure
      d2fbf4b6
    • Linus Torvalds's avatar
      Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 933a90bf
      Linus Torvalds authored
      Pull vfs mount updates from Al Viro:
       "The first part of mount updates.
      
        Convert filesystems to use the new mount API"
      
      * 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
        mnt_init(): call shmem_init() unconditionally
        constify ksys_mount() string arguments
        don't bother with registering rootfs
        init_rootfs(): don't bother with init_ramfs_fs()
        vfs: Convert smackfs to use the new mount API
        vfs: Convert selinuxfs to use the new mount API
        vfs: Convert securityfs to use the new mount API
        vfs: Convert apparmorfs to use the new mount API
        vfs: Convert openpromfs to use the new mount API
        vfs: Convert xenfs to use the new mount API
        vfs: Convert gadgetfs to use the new mount API
        vfs: Convert oprofilefs to use the new mount API
        vfs: Convert ibmasmfs to use the new mount API
        vfs: Convert qib_fs/ipathfs to use the new mount API
        vfs: Convert efivarfs to use the new mount API
        vfs: Convert configfs to use the new mount API
        vfs: Convert binfmt_misc to use the new mount API
        convenience helper: get_tree_single()
        convenience helper get_tree_nodev()
        vfs: Kill sget_userns()
        ...
      933a90bf
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 5f4fc6d4
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix AF_XDP cq entry leak, from Ilya Maximets.
      
       2) Fix handling of PHY power-down on RTL8411B, from Heiner Kallweit.
      
       3) Add some new PCI IDs to iwlwifi, from Ihab Zhaika.
      
       4) Fix handling of neigh timers wrt. entries added by userspace, from
          Lorenzo Bianconi.
      
       5) Various cases of missing of_node_put(), from Nishka Dasgupta.
      
       6) The new NET_ACT_CT needs to depend upon NF_NAT, from Yue Haibing.
      
       7) Various RDS layer fixes, from Gerd Rausch.
      
       8) Fix some more fallout from TCQ_F_CAN_BYPASS generalization, from
          Cong Wang.
      
       9) Fix FIB source validation checks over loopback, also from Cong Wang.
      
      10) Use promisc for unsupported number of filters, from Justin Chen.
      
      11) Missing sibling route unlink on failure in ipv6, from Ido Schimmel.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (90 commits)
        tcp: fix tcp_set_congestion_control() use from bpf hook
        ag71xx: fix return value check in ag71xx_probe()
        ag71xx: fix error return code in ag71xx_probe()
        usb: qmi_wwan: add D-Link DWM-222 A2 device ID
        bnxt_en: Fix VNIC accounting when enabling aRFS on 57500 chips.
        net: dsa: sja1105: Fix missing unlock on error in sk_buff()
        gve: replace kfree with kvfree
        selftests/bpf: fix test_xdp_noinline on s390
        selftests/bpf: fix "valid read map access into a read-only array 1" on s390
        net/mlx5: Replace kfree with kvfree
        MAINTAINERS: update netsec driver
        ipv6: Unlink sibling route in case of failure
        liquidio: Replace vmalloc + memset with vzalloc
        udp: Fix typo in net/ipv4/udp.c
        net: bcmgenet: use promisc for unsupported filters
        ipv6: rt6_check should return NULL if 'from' is NULL
        tipc: initialize 'validated' field of received packets
        selftests: add a test case for rp_filter
        fib: relax source validation check for loopback packets
        mlxsw: spectrum: Do not process learned records with a dummy FID
        ...
      5f4fc6d4
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 249be851
      Linus Torvalds authored
      Merge yet more updates from Andrew Morton:
       "The rest of MM and a kernel-wide procfs cleanup.
      
        Summary of the more significant patches:
      
         - Patch series "mm/memory_hotplug: Factor out memory block
           devicehandling", v3. David Hildenbrand.
      
           Some spring-cleaning of the memory hotplug code, notably in
           drivers/base/memory.c
      
         - "mm: thp: fix false negative of shmem vma's THP eligibility". Yang
           Shi.
      
           Fix /proc/pid/smaps output for THP pages used in shmem.
      
         - "resource: fix locking in find_next_iomem_res()" + 1. Nadav Amit.
      
           Bugfix and speedup for kernel/resource.c
      
         - Patch series "mm: Further memory block device cleanups", David
           Hildenbrand.
      
           More spring-cleaning of the memory hotplug code.
      
         - Patch series "mm: Sub-section memory hotplug support". Dan
           Williams.
      
           Generalise the memory hotplug code so that pmem can use it more
           completely. Then remove the hacks from the libnvdimm code which
           were there to work around the memory-hotplug code's constraints.
      
         - "proc/sysctl: add shared variables for range check", Matteo Croce.
      
           We have about 250 instances of
      
                int zero;
                ...
                        .extra1 = &zero,
      
           in the tree. This is a tree-wide sweep to make all those private
           "zero"s and "one"s use global variables.
      
           Alas, it isn't practical to make those two global integers const"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (38 commits)
        proc/sysctl: add shared variables for range check
        mm: migrate: remove unused mode argument
        mm/sparsemem: cleanup 'section number' data types
        libnvdimm/pfn: stop padding pmem namespaces to section alignment
        libnvdimm/pfn: fix fsdax-mode namespace info-block zero-fields
        mm/devm_memremap_pages: enable sub-section remap
        mm: document ZONE_DEVICE memory-model implications
        mm/sparsemem: support sub-section hotplug
        mm/sparsemem: prepare for sub-section ranges
        mm: kill is_dev_zone() helper
        mm/hotplug: kill is_dev_zone() usage in __remove_pages()
        mm/sparsemem: convert kmalloc_section_memmap() to populate_section_memmap()
        mm/hotplug: prepare shrink_{zone, pgdat}_span for sub-section removal
        mm/sparsemem: add helpers track active portions of a section at boot
        mm/sparsemem: introduce a SECTION_IS_EARLY flag
        mm/sparsemem: introduce struct mem_section_usage
        drivers/base/memory.c: get rid of find_memory_block_hinted()
        mm/memory_hotplug: move and simplify walk_memory_blocks()
        mm/memory_hotplug: rename walk_memory_range() and pass start+size instead of pfns
        mm: make register_mem_sect_under_node() static
        ...
      249be851
    • Eric Dumazet's avatar
      tcp: fix tcp_set_congestion_control() use from bpf hook · 8d650cde
      Eric Dumazet authored
      Neal reported incorrect use of ns_capable() from bpf hook.
      
      bpf_setsockopt(...TCP_CONGESTION...)
        -> tcp_set_congestion_control()
         -> ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)
          -> ns_capable_common()
           -> current_cred()
            -> rcu_dereference_protected(current->cred, 1)
      
      Accessing 'current' in bpf context makes no sense, since packets
      are processed from softirq context.
      
      As Neal stated : The capability check in tcp_set_congestion_control()
      was written assuming a system call context, and then was reused from
      a BPF call site.
      
      The fix is to add a new parameter to tcp_set_congestion_control(),
      so that the ns_capable() call is only performed under the right
      context.
      
      Fixes: 91b5b21c ("bpf: Add support for changing congestion control")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Lawrence Brakmo <brakmo@fb.com>
      Reported-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8d650cde
    • Wei Yongjun's avatar
      ag71xx: fix return value check in ag71xx_probe() · 269b7c5f
      Wei Yongjun authored
      In case of error, the function of_get_mac_address() returns ERR_PTR()
      and never returns NULL. The NULL test in the return value check should
      be replaced with IS_ERR().
      
      Fixes: d51b6ce4 ("net: ethernet: add ag71xx driver")
      Signed-off-by: default avatarWei Yongjun <weiyongjun1@huawei.com>
      Reviewed-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      269b7c5f
    • Wei Yongjun's avatar
      ag71xx: fix error return code in ag71xx_probe() · 6f5fa8d2
      Wei Yongjun authored
      Fix to return error code -ENOMEM from the dmam_alloc_coherent() error
      handling case instead of 0, as done elsewhere in this function.
      
      Fixes: d51b6ce4 ("net: ethernet: add ag71xx driver")
      Signed-off-by: default avatarWei Yongjun <weiyongjun1@huawei.com>
      Reviewed-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6f5fa8d2
    • Matteo Croce's avatar
      proc/sysctl: add shared variables for range check · eec4844f
      Matteo Croce authored
      In the sysctl code the proc_dointvec_minmax() function is often used to
      validate the user supplied value between an allowed range.  This
      function uses the extra1 and extra2 members from struct ctl_table as
      minimum and maximum allowed value.
      
      On sysctl handler declaration, in every source file there are some
      readonly variables containing just an integer which address is assigned
      to the extra1 and extra2 members, so the sysctl range is enforced.
      
      The special values 0, 1 and INT_MAX are very often used as range
      boundary, leading duplication of variables like zero=0, one=1,
      int_max=INT_MAX in different source files:
      
          $ git grep -E '\.extra[12].*&(zero|one|int_max)' |wc -l
          248
      
      Add a const int array containing the most commonly used values, some
      macros to refer more easily to the correct array member, and use them
      instead of creating a local one for every object file.
      
      This is the bloat-o-meter output comparing the old and new binary
      compiled with the default Fedora config:
      
          # scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o
          add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164)
          Data                                         old     new   delta
          sysctl_vals                                    -      12     +12
          __kstrtab_sysctl_vals                          -      12     +12
          max                                           14      10      -4
          int_max                                       16       -     -16
          one                                           68       -     -68
          zero                                         128      28    -100
          Total: Before=20583249, After=20583085, chg -0.00%
      
      [mcroce@redhat.com: tipc: remove two unused variables]
        Link: http://lkml.kernel.org/r/20190530091952.4108-1-mcroce@redhat.com
      [akpm@linux-foundation.org: fix net/ipv6/sysctl_net_ipv6.c]
      [arnd@arndb.de: proc/sysctl: make firmware loader table conditional]
        Link: http://lkml.kernel.org/r/20190617130014.1713870-1-arnd@arndb.de
      [akpm@linux-foundation.org: fix fs/eventpoll.c]
      Link: http://lkml.kernel.org/r/20190430180111.10688-1-mcroce@redhat.comSigned-off-by: default avatarMatteo Croce <mcroce@redhat.com>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarAaron Tomlin <atomlin@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      eec4844f
    • Keith Busch's avatar
      mm: migrate: remove unused mode argument · 37109694
      Keith Busch authored
      migrate_page_move_mapping() doesn't use the mode argument.  Remove it
      and update callers accordingly.
      
      Link: http://lkml.kernel.org/r/20190508210301.8472-1-keith.busch@intel.comSigned-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-by: default avatarZi Yan <ziy@nvidia.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      37109694
    • Dan Williams's avatar
      mm/sparsemem: cleanup 'section number' data types · 9a845030
      Dan Williams authored
      David points out that there is a mixture of 'int' and 'unsigned long'
      usage for section number data types.  Update the memory hotplug path to
      use 'unsigned long' consistently for section numbers.
      
      [akpm@linux-foundation.org: fix printk format]
      Link: http://lkml.kernel.org/r/156107543656.1329419.11505835211949439815.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reported-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9a845030
    • Dan Williams's avatar
      libnvdimm/pfn: stop padding pmem namespaces to section alignment · a3619190
      Dan Williams authored
      Now that the mm core supports section-unaligned hotplug of ZONE_DEVICE
      memory, we no longer need to add padding at pfn/dax device creation
      time.  The kernel will still honor padding established by older kernels.
      
      Link: http://lkml.kernel.org/r/156092356588.979959.6793371748950931916.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reported-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richardw.yang@linux.intel.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a3619190
    • Dan Williams's avatar
      libnvdimm/pfn: fix fsdax-mode namespace info-block zero-fields · 7e3e888d
      Dan Williams authored
      At namespace creation time there is the potential for the "expected to
      be zero" fields of a 'pfn' info-block to be filled with indeterminate
      data.  While the kernel buffer is zeroed on allocation it is immediately
      overwritten by nd_pfn_validate() filling it with the current contents of
      the on-media info-block location.  For fields like, 'flags' and the
      'padding' it potentially means that future implementations can not rely on
      those fields being zero.
      
      In preparation to stop using the 'start_pad' and 'end_trunc' fields for
      section alignment, arrange for fields that are not explicitly
      initialized to be guaranteed zero.  Bump the minor version to indicate
      it is safe to assume the 'padding' and 'flags' are zero.  Otherwise,
      this corruption is expected to benign since all other critical fields
      are explicitly initialized.
      
      Note The cc: stable is about spreading this new policy to as many
      kernels as possible not fixing an issue in those kernels.  It is not
      until the change titled "libnvdimm/pfn: Stop padding pmem namespaces to
      section alignment" where this improper initialization becomes a problem.
      So if someone decides to backport "libnvdimm/pfn: Stop padding pmem
      namespaces to section alignment" (which is not tagged for stable), make
      sure this pre-requisite is flagged.
      
      Link: http://lkml.kernel.org/r/156092356065.979959.6681003754765958296.stgit@dwillia2-desk3.amr.corp.intel.com
      Fixes: 32ab0a3f ("libnvdimm, pmem: 'struct page' for pmem")
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
      Cc: <stable@vger.kernel.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richardw.yang@linux.intel.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7e3e888d
    • Dan Williams's avatar
      mm/devm_memremap_pages: enable sub-section remap · 7cc7867f
      Dan Williams authored
      Teach devm_memremap_pages() about the new sub-section capabilities of
      arch_{add,remove}_memory().  Effectively, just replace all usage of
      align_start, align_end, and align_size with res->start, res->end, and
      resource_size(res).  The existing sanity check will still make sure that
      the two separate remap attempts do not collide within a sub-section (2MB
      on x86).
      
      Link: http://lkml.kernel.org/r/156092355542.979959.10060071713397030576.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richardw.yang@linux.intel.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7cc7867f
    • Dan Williams's avatar
      mm: document ZONE_DEVICE memory-model implications · a0653406
      Dan Williams authored
      Explain the general mechanisms of 'ZONE_DEVICE' pages and list the users
      of 'devm_memremap_pages()'.
      
      [dan.j.williams@intel.com: update ZONE_DEVICE memory model documentation]
        Link: http://lkml.kernel.org/r/156109575458.1409767.1885676287099277666.stgit@dwillia2-desk3.amr.corp.intel.com
      Link: http://lkml.kernel.org/r/156092354985.979959.15763234410543451710.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reported-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Reviewed-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richardw.yang@linux.intel.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a0653406
    • Dan Williams's avatar
      mm/sparsemem: support sub-section hotplug · ba72b4c8
      Dan Williams authored
      The libnvdimm sub-system has suffered a series of hacks and broken
      workarounds for the memory-hotplug implementation's awkward
      section-aligned (128MB) granularity.
      
      For example the following backtrace is emitted when attempting
      arch_add_memory() with physical address ranges that intersect 'System
      RAM' (RAM) with 'Persistent Memory' (PMEM) within a given section:
      
          # cat /proc/iomem | grep -A1 -B1 Persistent\ Memory
          100000000-1ffffffff : System RAM
          200000000-303ffffff : Persistent Memory (legacy)
          304000000-43fffffff : System RAM
          440000000-23ffffffff : Persistent Memory
          2400000000-43bfffffff : Persistent Memory
            2400000000-43bfffffff : namespace2.0
      
          WARNING: CPU: 38 PID: 928 at arch/x86/mm/init_64.c:850 add_pages+0x5c/0x60
          [..]
          RIP: 0010:add_pages+0x5c/0x60
          [..]
          Call Trace:
           devm_memremap_pages+0x460/0x6e0
           pmem_attach_disk+0x29e/0x680 [nd_pmem]
           ? nd_dax_probe+0xfc/0x120 [libnvdimm]
           nvdimm_bus_probe+0x66/0x160 [libnvdimm]
      
      It was discovered that the problem goes beyond RAM vs PMEM collisions as
      some platform produce PMEM vs PMEM collisions within a given section.
      The libnvdimm workaround for that case revealed that the libnvdimm
      section-alignment-padding implementation has been broken for a long
      while.
      
      A fix for that long-standing breakage introduces as many problems as it
      solves as it would require a backward-incompatible change to the
      namespace metadata interpretation.  Instead of that dubious route [1],
      address the root problem in the memory-hotplug implementation.
      
      Note that EEXIST is no longer treated as success as that is how
      sparse_add_section() reports subsection collisions, it was also obviated
      by recent changes to perform the request_region() for 'System RAM'
      before arch_add_memory() in the add_memory() sequence.
      
      [1] https://lore.kernel.org/r/155000671719.348031.2347363160141119237.stgit@dwillia2-desk3.amr.corp.intel.com
      
      [osalvador@suse.de: fix deactivate_section for early sections]
        Link: http://lkml.kernel.org/r/20190715081549.32577-2-osalvador@suse.de
      Link: http://lkml.kernel.org/r/156092354368.979959.6232443923440952359.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarOscar Salvador <osalvador@suse.de>
      Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Wei Yang <richardw.yang@linux.intel.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ba72b4c8
    • Dan Williams's avatar
      mm/sparsemem: prepare for sub-section ranges · 7ea62160
      Dan Williams authored
      Prepare the memory hot-{add,remove} paths for handling sub-section
      ranges by plumbing the starting page frame and number of pages being
      handled through arch_{add,remove}_memory() to
      sparse_{add,remove}_one_section().
      
      This is simply plumbing, small cleanups, and some identifier renames.
      No intended functional changes.
      
      Link: http://lkml.kernel.org/r/156092353780.979959.9713046515562743194.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarPavel Tatashin <pasha.tatashin@soleen.com>
      Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Wei Yang <richardw.yang@linux.intel.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7ea62160
    • Dan Williams's avatar
      mm: kill is_dev_zone() helper · 46d945ae
      Dan Williams authored
      Given there are no more usages of is_dev_zone() outside of 'ifdef
      CONFIG_ZONE_DEVICE' protection, kill off the compilation helper.
      
      Link: http://lkml.kernel.org/r/156092353211.979959.1489004866360828964.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Reviewed-by: default avatarPavel Tatashin <pasha.tatashin@soleen.com>
      Reviewed-by: default avatarWei Yang <richardw.yang@linux.intel.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      46d945ae
    • Dan Williams's avatar
      mm/hotplug: kill is_dev_zone() usage in __remove_pages() · 96da4350
      Dan Williams authored
      The zone type check was a leftover from the cleanup that plumbed altmap
      through the memory hotplug path, i.e.  commit da024512 "mm: pass the
      vmem_altmap to arch_remove_memory and __remove_pages".
      
      Link: http://lkml.kernel.org/r/156092352642.979959.6664333788149363039.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richardw.yang@linux.intel.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      96da4350
    • Dan Williams's avatar
      mm/sparsemem: convert kmalloc_section_memmap() to populate_section_memmap() · e9c0a3f0
      Dan Williams authored
      Allow sub-section sized ranges to be added to the memmap.
      
      populate_section_memmap() takes an explict pfn range rather than
      assuming a full section, and those parameters are plumbed all the way
      through to vmmemap_populate().  There should be no sub-section usage in
      current deployments.  New warnings are added to clarify which memmap
      allocation paths are sub-section capable.
      
      Link: http://lkml.kernel.org/r/156092352058.979959.6551283472062305149.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarPavel Tatashin <pasha.tatashin@soleen.com>
      Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richardw.yang@linux.intel.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e9c0a3f0
    • Dan Williams's avatar
      mm/hotplug: prepare shrink_{zone, pgdat}_span for sub-section removal · 49ba3c6b
      Dan Williams authored
      Sub-section hotplug support reduces the unit of operation of hotplug
      from section-sized-units (PAGES_PER_SECTION) to sub-section-sized units
      (PAGES_PER_SUBSECTION).  Teach shrink_{zone,pgdat}_span() to consider
      PAGES_PER_SUBSECTION boundaries as the points where pfn_valid(), not
      valid_section(), can toggle.
      
      [osalvador@suse.de: fix shrink_{zone,node}_span]
        Link: http://lkml.kernel.org/r/20190717090725.23618-3-osalvador@suse.de
      Link: http://lkml.kernel.org/r/156092351496.979959.12703722803097017492.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarOscar Salvador <osalvador@suse.de>
      Reviewed-by: default avatarPavel Tatashin <pasha.tatashin@soleen.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Wei Yang <richardw.yang@linux.intel.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      49ba3c6b
    • Dan Williams's avatar
      mm/sparsemem: add helpers track active portions of a section at boot · f46edbd1
      Dan Williams authored
      Prepare for hot{plug,remove} of sub-ranges of a section by tracking a
      sub-section active bitmask, each bit representing a PMD_SIZE span of the
      architecture's memory hotplug section size.
      
      The implications of a partially populated section is that pfn_valid()
      needs to go beyond a valid_section() check and either determine that the
      section is an "early section", or read the sub-section active ranges
      from the bitmask.  The expectation is that the bitmask (subsection_map)
      fits in the same cacheline as the valid_section() / early_section()
      data, so the incremental performance overhead to pfn_valid() should be
      negligible.
      
      The rationale for using early_section() to short-ciruit the
      subsection_map check is that there are legacy code paths that use
      pfn_valid() at section granularity before validating the pfn against
      pgdat data.  So, the early_section() check allows those traditional
      assumptions to persist while also permitting subsection_map to tell the
      truth for purposes of populating the unused portions of early sections
      with PMEM and other ZONE_DEVICE mappings.
      
      Link: http://lkml.kernel.org/r/156092350874.979959.18185938451405518285.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reported-by: default avatarQian Cai <cai@lca.pw>
      Tested-by: default avatarJane Chu <jane.chu@oracle.com>
      Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Wei Yang <richardw.yang@linux.intel.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f46edbd1
    • Dan Williams's avatar
      mm/sparsemem: introduce a SECTION_IS_EARLY flag · 326e1b8f
      Dan Williams authored
      In preparation for sub-section hotplug, track whether a given section
      was created during early memory initialization, or later via memory
      hotplug.  This distinction is needed to maintain the coarse expectation
      that pfn_valid() returns true for any pfn within a given section even if
      that section has pages that are reserved from the page allocator.
      
      For example one of the of goals of subsection hotplug is to support
      cases where the system physical memory layout collides System RAM and
      PMEM within a section.  Several pfn_valid() users expect to just check
      if a section is valid, but they are not careful to check if the given
      pfn is within a "System RAM" boundary and instead expect pgdat
      information to further validate the pfn.
      
      Rather than unwind those paths to make their pfn_valid() queries more
      precise a follow on patch uses the SECTION_IS_EARLY flag to maintain the
      traditional expectation that pfn_valid() returns true for all early
      sections.
      
      Link: https://lore.kernel.org/lkml/1560366952-10660-1-git-send-email-cai@lca.pw/
      Link: http://lkml.kernel.org/r/156092350358.979959.5817209875548072819.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reported-by: default avatarQian Cai <cai@lca.pw>
      Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richardw.yang@linux.intel.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      326e1b8f
    • Dan Williams's avatar
      mm/sparsemem: introduce struct mem_section_usage · f1eca35a
      Dan Williams authored
      Patch series "mm: Sub-section memory hotplug support", v10.
      
      The memory hotplug section is an arbitrary / convenient unit for memory
      hotplug.  'Section-size' units have bled into the user interface
      ('memblock' sysfs) and can not be changed without breaking existing
      userspace.  The section-size constraint, while mostly benign for typical
      memory hotplug, has and continues to wreak havoc with 'device-memory'
      use cases, persistent memory (pmem) in particular.  Recall that pmem
      uses devm_memremap_pages(), and subsequently arch_add_memory(), to
      allocate a 'struct page' memmap for pmem.  However, it does not use the
      'bottom half' of memory hotplug, i.e.  never marks pmem pages online and
      never exposes the userspace memblock interface for pmem.  This leaves an
      opening to redress the section-size constraint.
      
      To date, the libnvdimm subsystem has attempted to inject padding to
      satisfy the internal constraints of arch_add_memory().  Beyond
      complicating the code, leading to bugs [2], wasting memory, and limiting
      configuration flexibility, the padding hack is broken when the platform
      changes this physical memory alignment of pmem from one boot to the
      next.  Device failure (intermittent or permanent) and physical
      reconfiguration are events that can cause the platform firmware to
      change the physical placement of pmem on a subsequent boot, and device
      failure is an everyday event in a data-center.
      
      It turns out that sections are only a hard requirement of the
      user-facing interface for memory hotplug and with a bit more
      infrastructure sub-section arch_add_memory() support can be added for
      kernel internal usages like devm_memremap_pages().  Here is an analysis
      of the current design assumptions in the current code and how they are
      addressed in the new implementation:
      
      Current design assumptions:
      
       - Sections that describe boot memory (early sections) are never
         unplugged / removed.
      
       - pfn_valid(), in the CONFIG_SPARSEMEM_VMEMMAP=y, case devolves to a
         valid_section() check
      
       - __add_pages() and helper routines assume all operations occur in
         PAGES_PER_SECTION units.
      
       - The memblock sysfs interface only comprehends full sections
      
      New design assumptions:
      
       - Sections are instrumented with a sub-section bitmask to track (on
         x86) individual 2MB sub-divisions of a 128MB section.
      
       - Partially populated early sections can be extended with additional
         sub-sections, and those sub-sections can be removed with
         arch_remove_memory(). With this in place we no longer lose usable
         memory capacity to padding.
      
       - pfn_valid() is updated to look deeper than valid_section() to also
         check the active-sub-section mask. This indication is in the same
         cacheline as the valid_section() so the performance impact is
         expected to be negligible. So far the lkp robot has not reported any
         regressions.
      
       - Outside of the core vmemmap population routines which are replaced,
         other helper routines like shrink_{zone,pgdat}_span() are updated to
         handle the smaller granularity. Core memory hotplug routines that
         deal with online memory are not touched.
      
       - The existing memblock sysfs user api guarantees / assumptions are not
         touched since this capability is limited to !online
         !memblock-sysfs-accessible sections.
      
      Meanwhile the issue reports continue to roll in from users that do not
      understand when and how the 128MB constraint will bite them.  The current
      implementation relied on being able to support at least one misaligned
      namespace, but that immediately falls over on any moderately complex
      namespace creation attempt.  Beyond the initial problem of 'System RAM'
      colliding with pmem, and the unsolvable problem of physical alignment
      changes, Linux is now being exposed to platforms that collide pmem ranges
      with other pmem ranges by default [3].  In short, devm_memremap_pages()
      has pushed the venerable section-size constraint past the breaking point,
      and the simplicity of section-aligned arch_add_memory() is no longer
      tenable.
      
      These patches are exposed to the kbuild robot on a subsection-v10 branch
      [4], and a preview of the unit test for this functionality is available
      on the 'subsection-pending' branch of ndctl [5].
      
      [2]: https://lore.kernel.org/r/155000671719.348031.2347363160141119237.stgit@dwillia2-desk3.amr.corp.intel.com
      [3]: https://github.com/pmem/ndctl/issues/76
      [4]: https://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm.git/log/?h=subsection-v10
      [5]: https://github.com/pmem/ndctl/commit/7c59b4867e1c
      
      This patch (of 13):
      
      Towards enabling memory hotplug to track partial population of a section,
      introduce 'struct mem_section_usage'.
      
      A pointer to a 'struct mem_section_usage' instance replaces the existing
      pointer to a 'pageblock_flags' bitmap.  Effectively it adds one more
      'unsigned long' beyond the 'pageblock_flags' (usemap) allocation to house
      a new 'subsection_map' bitmap.  The new bitmap enables the memory
      hot{plug,remove} implementation to act on incremental sub-divisions of a
      section.
      
      SUBSECTION_SHIFT is defined as global constant instead of per-architecture
      value like SECTION_SIZE_BITS in order to allow cross-arch compatibility of
      subsection users.  Specifically a common subsection size allows for the
      possibility that persistent memory namespace configurations be made
      compatible across architectures.
      
      The primary motivation for this functionality is to support platforms that
      mix "System RAM" and "Persistent Memory" within a single section, or
      multiple PMEM ranges with different mapping lifetimes within a single
      section.  The section restriction for hotplug has caused an ongoing saga
      of hacks and bugs for devm_memremap_pages() users.
      
      Beyond the fixups to teach existing paths how to retrieve the 'usemap'
      from a section, and updates to usemap allocation path, there are no
      expected behavior changes.
      
      Link: http://lkml.kernel.org/r/156092349845.979959.73333291612799019.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Reviewed-by: default avatarWei Yang <richardw.yang@linux.intel.com>
      Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f1eca35a
    • David Hildenbrand's avatar
      drivers/base/memory.c: get rid of find_memory_block_hinted() · dd625285
      David Hildenbrand authored
      No longer needed, let's remove it.  Also, drop the "hint" parameter
      completely from "find_memory_block_by_id", as nobody needs it anymore.
      
      [david@redhat.com: v3]
        Link: http://lkml.kernel.org/r/20190620183139.4352-7-david@redhat.com
      [david@redhat.com: handle zero-length walks]
        Link: http://lkml.kernel.org/r/1c2edc22-afd7-2211-c4c7-40e54e5007e8@redhat.com
      Link: http://lkml.kernel.org/r/20190614100114.311-7-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: default avatarQian Cai <cai@lca.pw>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Andrew Banman <andrew.banman@hpe.com>
      Cc: Mike Travis <mike.travis@hpe.com>
      Cc: Oscar Salvador <osalvador@suse.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dd625285
    • David Hildenbrand's avatar
      mm/memory_hotplug: move and simplify walk_memory_blocks() · ea884641
      David Hildenbrand authored
      Let's move walk_memory_blocks() to the place where memory block logic
      resides and simplify it.  While at it, add a type for the callback
      function.
      
      Link: http://lkml.kernel.org/r/20190614100114.311-6-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Andrew Banman <andrew.banman@hpe.com>
      Cc: Mike Travis <mike.travis@hpe.com>
      Cc: Oscar Salvador <osalvador@suse.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Qian Cai <cai@lca.pw>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ea884641
    • David Hildenbrand's avatar
      mm/memory_hotplug: rename walk_memory_range() and pass start+size instead of pfns · fbcf73ce
      David Hildenbrand authored
      walk_memory_range() was once used to iterate over sections.  Now, it
      iterates over memory blocks.  Rename the function, fixup the
      documentation.
      
      Also, pass start+size instead of PFNs, which is what most callers
      already have at hand.  (we'll rework link_mem_sections() most probably
      soon)
      
      Follow-up patches will rework, simplify, and move walk_memory_blocks()
      to drivers/base/memory.c.
      
      Note: walk_memory_blocks() only works correctly right now if the
      start_pfn is aligned to a section start.  This is the case right now,
      but we'll generalize the function in a follow up patch so the semantics
      match the documentation.
      
      [akpm@linux-foundation.org: remove unused variable]
      Link: http://lkml.kernel.org/r/20190614100114.311-5-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Rashmica Gupta <rashmica.g@gmail.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fbcf73ce
    • David Hildenbrand's avatar
      mm: make register_mem_sect_under_node() static · 8d595c4c
      David Hildenbrand authored
      It is only used internally.
      
      Link: http://lkml.kernel.org/r/20190614100114.311-4-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8d595c4c
    • David Hildenbrand's avatar
      drivers/base/memory: use "unsigned long" for block ids · 90ec010f
      David Hildenbrand authored
      Block ids are just shifted section numbers, so let's also use "unsigned
      long" for them, too.
      
      Link: http://lkml.kernel.org/r/20190614100114.311-3-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      90ec010f
    • David Hildenbrand's avatar
      mm: section numbers use the type "unsigned long" · 2491f0a2
      David Hildenbrand authored
      Patch series "mm: Further memory block device cleanups", v1.
      
      Some further cleanups around memory block devices.  Especially, clean up
      and simplify walk_memory_range().  Including some other minor cleanups.
      
      This patch (of 6):
      
      We are using a mixture of "int" and "unsigned long".  Let's make this
      consistent by using "unsigned long" everywhere.  We'll do the same with
      memory block ids next.
      
      While at it, turn the "unsigned long i" in removable_show() into an int
      - sections_per_block is an int.
      
      [akpm@linux-foundation.org: s/unsigned long i/unsigned long nr/]
      [david@redhat.com: v3]
        Link: http://lkml.kernel.org/r/20190620183139.4352-2-david@redhat.com
      Link: http://lkml.kernel.org/r/20190614100114.311-2-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Baoquan He <bhe@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2491f0a2
    • Nadav Amit's avatar
      resource: avoid unnecessary lookups in find_next_iomem_res() · 75639875
      Nadav Amit authored
      find_next_iomem_res() shows up to be a source for overhead in dax
      benchmarks.
      
      Improve performance by not considering children of the tree if the top
      level does not match.  Since the range of the parents should include the
      range of the children such check is redundant.
      
      Running sysbench on dax (pmem emulation, with write_cache disabled):
      
        sysbench fileio --file-total-size=3G --file-test-mode=rndwr \
         --file-io-mode=mmap --threads=4 --file-fsync-mode=fdatasync run
      
      Provides the following results:
      
      		events (avg/stddev)
      		-------------------
        5.2-rc3:	1247669.0000/16075.39
        w/patch:	1286320.5000/16402.72	(+3%)
      
      Link: http://lkml.kernel.org/r/20190613045903.4922-3-namit@vmware.comSigned-off-by: default avatarNadav Amit <namit@vmware.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      75639875
    • Nadav Amit's avatar
      resource: fix locking in find_next_iomem_res() · 49f17c26
      Nadav Amit authored
      Since resources can be removed, locking should ensure that the resource
      is not removed while accessing it.  However, find_next_iomem_res() does
      not hold the lock while copying the data of the resource.
      
      Keep holding the lock while the data is copied.  While at it, change the
      return value to a more informative value.  It is disregarded by the
      callers.
      
      [akpm@linux-foundation.org: fix find_next_iomem_res() documentation]
      Link: http://lkml.kernel.org/r/20190613045903.4922-2-namit@vmware.com
      Fixes: ff3cc952 ("resource: Add remove_resource interface")
      Signed-off-by: default avatarNadav Amit <namit@vmware.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      49f17c26
    • Yang Shi's avatar
      mm: thp: fix false negative of shmem vma's THP eligibility · c0630669
      Yang Shi authored
      Commit 7635d9cb ("mm, thp, proc: report THP eligibility for each
      vma") introduced THPeligible bit for processes' smaps.  But, when
      checking the eligibility for shmem vma, __transparent_hugepage_enabled()
      is called to override the result from shmem_huge_enabled().  It may
      result in the anonymous vma's THP flag override shmem's.  For example,
      running a simple test which create THP for shmem, but with anonymous THP
      disabled, when reading the process's smaps, it may show:
      
        7fc92ec00000-7fc92f000000 rw-s 00000000 00:14 27764 /dev/shm/test
        Size:               4096 kB
        ...
        [snip]
        ...
        ShmemPmdMapped:     4096 kB
        ...
        [snip]
        ...
        THPeligible:    0
      
      And, /proc/meminfo does show THP allocated and PMD mapped too:
      
        ShmemHugePages:     4096 kB
        ShmemPmdMapped:     4096 kB
      
      This doesn't make too much sense.  The shmem objects should be treated
      separately from anonymous THP.  Calling shmem_huge_enabled() with
      checking MMF_DISABLE_THP sounds good enough.  And, we could skip stack
      and dax vma check since we already checked if the vma is shmem already.
      
      Also check if vma is suitable for THP by calling
      transhuge_vma_suitable().
      
      And minor fix to smaps output format and documentation.
      
      Link: http://lkml.kernel.org/r/1560401041-32207-3-git-send-email-yang.shi@linux.alibaba.com
      Fixes: 7635d9cb ("mm, thp, proc: report THP eligibility for each vma")
      Signed-off-by: default avatarYang Shi <yang.shi@linux.alibaba.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c0630669
    • Yang Shi's avatar
      mm: thp: make transhuge_vma_suitable available for anonymous THP · 43675e6f
      Yang Shi authored
      transhuge_vma_suitable() was only available for shmem THP, but anonymous
      THP has the same check except pgoff check.  And, it will be used for THP
      eligible check in the later patch, so make it available for all kind of
      THPs.  This also helps reduce code duplication slightly.
      
      Since anonymous THP doesn't have to check pgoff, so make pgoff check
      shmem vma only.
      
      And regroup some functions in include/linux/mm.h to solve compile issue
      since transhuge_vma_suitable() needs call vma_is_anonymous() which was
      defined after huge_mm.h is included.
      
      [akpm@linux-foundation.org: fix typo]
      [yang.shi@linux.alibaba.com: v4]
        Link: http://lkml.kernel.org/r/1563400758-124759-2-git-send-email-yang.shi@linux.alibaba.com
      Link: http://lkml.kernel.org/r/1560401041-32207-2-git-send-email-yang.shi@linux.alibaba.comSigned-off-by: default avatarYang Shi <yang.shi@linux.alibaba.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      43675e6f
    • Wei Yang's avatar
      mm/sparse.c: set section nid for hot-add memory · 26f26bed
      Wei Yang authored
      In case of NODE_NOT_IN_PAGE_FLAGS is set, we store section's node id in
      section_to_node_table[].  While for hot-add memory, this is missed.
      Without this information, page_to_nid() may not give the right node id.
      
      BTW, current online_pages works because it leverages nid in
      memory_block.  But the granularity of node id should be mem_section
      wide.
      
      Link: http://lkml.kernel.org/r/20190618005537.18878-1-richardw.yang@linux.intel.comSigned-off-by: default avatarWei Yang <richardw.yang@linux.intel.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      26f26bed
    • David Hildenbrand's avatar
      mm/memory_hotplug: remove "zone" parameter from sparse_remove_one_section · b9bf8d34
      David Hildenbrand authored
      The parameter is unused, so let's drop it.  Memory removal paths should
      never care about zones.  This is the job of memory offlining and will
      require more refactorings.
      
      Link: http://lkml.kernel.org/r/20190527111152.16324-12-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarWei Yang <richardw.yang@linux.intel.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Andrew Banman <andrew.banman@hpe.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chintan Pandya <cpandya@codeaurora.org>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jun Yao <yaojun8558363@gmail.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qian Cai <cai@lca.pw>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b9bf8d34
    • David Hildenbrand's avatar
      mm/memory_hotplug: make unregister_memory_block_under_nodes() never fail · a31b264c
      David Hildenbrand authored
      We really don't want anything during memory hotunplug to fail.  We
      always pass a valid memory block device, that check can go.  Avoid
      allocating memory and eventually failing.  As we are always called under
      lock, we can use a static piece of memory.  This avoids having to put
      the structure onto the stack, having to guess about the stack size of
      callers.
      
      Patch inspired by a patch from Oscar Salvador.
      
      In the future, there might be no need to iterate over nodes at all.
      mem->nid should tell us exactly what to remove.  Memory block devices
      with mixed nodes (added during boot) should properly fenced off and
      never removed.
      
      Link: http://lkml.kernel.org/r/20190527111152.16324-11-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarWei Yang <richardw.yang@linux.intel.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Andrew Banman <andrew.banman@hpe.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chintan Pandya <cpandya@codeaurora.org>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jun Yao <yaojun8558363@gmail.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a31b264c
    • David Hildenbrand's avatar
      mm/memory_hotplug: remove memory block devices before arch_remove_memory() · 4c4b7f9b
      David Hildenbrand authored
      Let's factor out removing of memory block devices, which is only
      necessary for memory added via add_memory() and friends that created
      memory block devices.  Remove the devices before calling
      arch_remove_memory().
      
      This finishes factoring out memory block device handling from
      arch_add_memory() and arch_remove_memory().
      
      Link: http://lkml.kernel.org/r/20190527111152.16324-10-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
      Cc: Andrew Banman <andrew.banman@hpe.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chintan Pandya <cpandya@codeaurora.org>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jun Yao <yaojun8558363@gmail.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Oscar Salvador <osalvador@suse.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4c4b7f9b