1. 18 Oct, 2020 24 commits
  2. 27 Sep, 2020 9 commits
    • Linus Torvalds's avatar
      Linux 5.9-rc7 · a1b8638b
      Linus Torvalds authored
      a1b8638b
    • Linus Torvalds's avatar
      Merge tag 'kbuild-fixes-v5.9-4' of... · 16bc1d54
      Linus Torvalds authored
      Merge tag 'kbuild-fixes-v5.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
      
      Pull Kbuild fixes from Masahiro Yamada:
      
       - ignore compiler stubs for PPC to fix builds
      
       - fix the usage of --target mentioned in the LLVM document
      
      * tag 'kbuild-fixes-v5.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
        Documentation/llvm: Fix clang target examples
        scripts/kallsyms: skip ppc compiler stub *.long_branch.* / *.plt_branch.*
      16bc1d54
    • Linus Torvalds's avatar
      Merge tag 'x86-urgent-2020-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f8818559
      Linus Torvalds authored
      Pull x86 fixes from Thomas Gleixner:
       "Two fixes for the x86 interrupt code:
      
         - Unbreak the magic 'search the timer interrupt' logic in IO/APIC
           code which got wreckaged when the core interrupt code made the
           state tracking logic stricter.
      
           That caused the interrupt line to stay masked after switching from
           IO/APIC to PIC delivery mode, which obviously prevents interrupts
           from being delivered.
      
         - Make run_on_irqstack_code() typesafe. The function argument is a
           void pointer which is then cast to 'void (*fun)(void *).
      
           This breaks Control Flow Integrity checking in clang. Use proper
           helper functions for the three variants reuqired"
      
      * tag 'x86-urgent-2020-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/ioapic: Unbreak check_timer()
        x86/irq: Make run_on_irqstack_cond() typesafe
      f8818559
    • Linus Torvalds's avatar
      Merge tag 'timers-urgent-2020-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ba25f057
      Linus Torvalds authored
      Pull timer updates from Thomas Gleixner:
       "A set of clocksource/clockevents updates:
      
         - Reset the TI/DM timer before enabling it instead of doing it the
           other way round.
      
         - Initialize the reload value for the GX6605s timer correctly so the
           hardware counter starts at 0 again after overrun.
      
         - Make error return value negative in the h8300 timer init function"
      
      * tag 'timers-urgent-2020-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        clocksource/drivers/timer-gx6605s: Fixup counter reload
        clocksource/drivers/timer-ti-dm: Do reset before enable
        clocksource/drivers/h8300_timer8: Fix wrong return value in h8300_8timer_init()
      ba25f057
    • Peter Xu's avatar
      mm/thp: Split huge pmds/puds if they're pinned when fork() · d042035e
      Peter Xu authored
      Pinned pages shouldn't be write-protected when fork() happens, because
      follow up copy-on-write on these pages could cause the pinned pages to
      be replaced by random newly allocated pages.
      
      For huge PMDs, we split the huge pmd if pinning is detected.  So that
      future handling will be done by the PTE level (with our latest changes,
      each of the small pages will be copied).  We can achieve this by let
      copy_huge_pmd() return -EAGAIN for pinned pages, so that we'll
      fallthrough in copy_pmd_range() and finally land the next
      copy_pte_range() call.
      
      Huge PUDs will be even more special - so far it does not support
      anonymous pages.  But it can actually be done the same as the huge PMDs
      even if the split huge PUDs means to erase the PUD entries.  It'll
      guarantee the follow up fault ins will remap the same pages in either
      parent/child later.
      
      This might not be the most efficient way, but it should be easy and
      clean enough.  It should be fine, since we're tackling with a very rare
      case just to make sure userspaces that pinned some thps will still work
      even without MADV_DONTFORK and after they fork()ed.
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d042035e
    • Peter Xu's avatar
      mm: Do early cow for pinned pages during fork() for ptes · 70e806e4
      Peter Xu authored
      This allows copy_pte_range() to do early cow if the pages were pinned on
      the source mm.
      
      Currently we don't have an accurate way to know whether a page is pinned
      or not.  The only thing we have is page_maybe_dma_pinned().  However
      that's good enough for now.  Especially, with the newly added
      mm->has_pinned flag to make sure we won't affect processes that never
      pinned any pages.
      
      It would be easier if we can do GFP_KERNEL allocation within
      copy_one_pte().  Unluckily, we can't because we're with the page table
      locks held for both the parent and child processes.  So the page
      allocation needs to be done outside copy_one_pte().
      
      Some trick is there in copy_present_pte(), majorly the wrprotect trick
      to block concurrent fast-gup.  Comments in the function should explain
      better in place.
      
      Oleg Nesterov reported a (probably harmless) bug during review that we
      didn't reset entry.val properly in copy_pte_range() so that potentially
      there's chance to call add_swap_count_continuation() multiple times on
      the same swp entry.  However that should be harmless since even if it
      happens, the same function (add_swap_count_continuation()) will return
      directly noticing that there're enough space for the swp counter.  So
      instead of a standalone stable patch, it is touched up in this patch
      directly.
      
      Link: https://lore.kernel.org/lkml/20200914143829.GA1424636@nvidia.com/Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      70e806e4
    • Peter Xu's avatar
      mm/fork: Pass new vma pointer into copy_page_range() · 7a4830c3
      Peter Xu authored
      This prepares for the future work to trigger early cow on pinned pages
      during fork().
      
      No functional change intended.
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7a4830c3
    • Peter Xu's avatar
      mm: Introduce mm_struct.has_pinned · 008cfe44
      Peter Xu authored
      (Commit message majorly collected from Jason Gunthorpe)
      
      Reduce the chance of false positive from page_maybe_dma_pinned() by
      keeping track if the mm_struct has ever been used with pin_user_pages().
      This allows cases that might drive up the page ref_count to avoid any
      penalty from handling dma_pinned pages.
      
      Future work is planned, to provide a more sophisticated solution, likely
      to turn it into a real counter.  For now, make it atomic_t but use it as
      a boolean for simplicity.
      Suggested-by: default avatarJason Gunthorpe <jgg@ziepe.ca>
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      008cfe44
    • Thomas Gleixner's avatar
      Merge tag 'timers-v5.9-rc4' of... · a7b6c0fe
      Thomas Gleixner authored
      Merge tag 'timers-v5.9-rc4' of https://git.linaro.org/people/daniel.lezcano/linux into timers/urgent
      
      Pull clocksource/clockevent fixes from Daniel Lezcano:
      
       - Fix wrong signed return value when checking of_iomap in the probe
         function for the h8300 timer (Tianjia Zhang)
      
       - Fix reset sequence when setting up the timer on the dm_timer (Tony
         Lindgren)
      
       - Fix counter reload when the interrupt fires on gx6605s (Guo Ren)
      a7b6c0fe
  3. 26 Sep, 2020 7 commits
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · a1bffa48
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Three fixes: one in drivers (lpfc) and two for zoned block devices.
      
        The latter also impinges on the block layer but only to introduce a
        new block API for setting the zone model rather than fiddling with the
        queue directly in the zoned block driver"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: sd: sd_zbc: Fix ZBC disk initialization
        scsi: sd: sd_zbc: Fix handling of host-aware ZBC disks
        scsi: lpfc: Fix initial FLOGI failure due to BBSCN not supported
      a1bffa48
    • Linus Torvalds's avatar
      Merge tag 'io_uring-5.9-2020-09-25' of git://git.kernel.dk/linux-block · 692495ba
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
       "Two fixes for regressions in this cycle, and one that goes to 5.8
        stable:
      
         - fix leak of getname() retrieved filename
      
         - remove plug->nowait assignment, fixing a regression with btrfs
      
         - fix for async buffered retry"
      
      * tag 'io_uring-5.9-2020-09-25' of git://git.kernel.dk/linux-block:
        io_uring: ensure async buffered read-retry is setup properly
        io_uring: don't unconditionally set plug->nowait = true
        io_uring: ensure open/openat2 name is cleaned on cancelation
      692495ba
    • Linus Torvalds's avatar
      Merge tag 'block-5.9-2020-09-25' of git://git.kernel.dk/linux-block · 9d2fbaef
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "NVMe pull request from Christoph, and removal of a dead define.
      
         - fix error during controller probe that cause double free irqs
           (Keith Busch)
      
         - FC connection establishment fix (James Smart)
      
         - properly handle completions for invalid tags (Xianting Tian)
      
         - pass the correct nsid to the command effects and supported log
           (Chaitanya Kulkarni)"
      
      * tag 'block-5.9-2020-09-25' of git://git.kernel.dk/linux-block:
        block: remove unused BLK_QC_T_EAGAIN flag
        nvme-core: don't use NVME_NSID_ALL for command effects and supported log
        nvme-fc: fail new connections to a deleted host or remote port
        nvme-pci: fix NULL req in completion handler
        nvme: return errors for hwmon init
      9d2fbaef
    • Linus Torvalds's avatar
      Merge tag 's390-5.9-7' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · eeddbe68
      Linus Torvalds authored
      Pull s390 fix from Vasily Gorbik:
       "Fix truncated ZCRYPT_PERDEV_REQCNT ioctl result. Copy entire reqcnt
        list"
      
      * tag 's390-5.9-7' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390/zcrypt: Fix ZCRYPT_PERDEV_REQCNT ioctl
      eeddbe68
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 8fb1e910
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "9 patches.
      
        Subsystems affected by this patch series: mm (thp, memcg, gup,
        migration, memory-hotplug), lib, and x86"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm: don't rely on system state to detect hot-plug operations
        mm: replace memmap_context by meminit_context
        arch/x86/lib/usercopy_64.c: fix __copy_user_flushcache() cache writeback
        lib/memregion.c: include memregion.h
        lib/string.c: implement stpcpy
        mm/migrate: correct thp migration stats
        mm/gup: fix gup_fast with dynamic page table folding
        mm: memcontrol: fix missing suffix of workingset_restore
        mm, THP, swap: fix allocating cluster for swapfile by mistake
      8fb1e910
    • Minchan Kim's avatar
      mm: validate pmd after splitting · ce268425
      Minchan Kim authored
      syzbot reported the following KASAN splat:
      
        general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN
        KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
        CPU: 1 PID: 6826 Comm: syz-executor142 Not tainted 5.9.0-rc4-syzkaller #0
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        RIP: 0010:__lock_acquire+0x84/0x2ae0 kernel/locking/lockdep.c:4296
        Code: ff df 8a 04 30 84 c0 0f 85 e3 16 00 00 83 3d 56 58 35 08 00 0f 84 0e 17 00 00 83 3d 25 c7 f5 07 00 74 2c 4c 89 e8 48 c1 e8 03 <80> 3c 30 00 74 12 4c 89 ef e8 3e d1 5a 00 48 be 00 00 00 00 00 fc
        RSP: 0018:ffffc90004b9f850 EFLAGS: 00010006
        Call Trace:
          lock_acquire+0x140/0x6f0 kernel/locking/lockdep.c:5006
          __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
          _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
          spin_lock include/linux/spinlock.h:354 [inline]
          madvise_cold_or_pageout_pte_range+0x52f/0x25c0 mm/madvise.c:389
          walk_pmd_range mm/pagewalk.c:89 [inline]
          walk_pud_range mm/pagewalk.c:160 [inline]
          walk_p4d_range mm/pagewalk.c:193 [inline]
          walk_pgd_range mm/pagewalk.c:229 [inline]
          __walk_page_range+0xe7b/0x1da0 mm/pagewalk.c:331
          walk_page_range+0x2c3/0x5c0 mm/pagewalk.c:427
          madvise_pageout_page_range mm/madvise.c:521 [inline]
          madvise_pageout mm/madvise.c:557 [inline]
          madvise_vma mm/madvise.c:946 [inline]
          do_madvise+0x12d0/0x2090 mm/madvise.c:1145
          __do_sys_madvise mm/madvise.c:1171 [inline]
          __se_sys_madvise mm/madvise.c:1169 [inline]
          __x64_sys_madvise+0x76/0x80 mm/madvise.c:1169
          do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
          entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The backing vma was shmem.
      
      In case of split page of file-backed THP, madvise zaps the pmd instead
      of remapping of sub-pages.  So we need to check pmd validity after
      split.
      
      Reported-by: syzbot+ecf80462cb7d5d552bc7@syzkaller.appspotmail.com
      Fixes: 1a4e58cc ("mm: introduce MADV_PAGEOUT")
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ce268425
    • Laurent Dufour's avatar
      mm: don't rely on system state to detect hot-plug operations · f85086f9
      Laurent Dufour authored
      In register_mem_sect_under_node() the system_state's value is checked to
      detect whether the call is made during boot time or during an hot-plug
      operation.  Unfortunately, that check against SYSTEM_BOOTING is wrong
      because regular memory is registered at SYSTEM_SCHEDULING state.  In
      addition, memory hot-plug operation can be triggered at this system
      state by the ACPI [1].  So checking against the system state is not
      enough.
      
      The consequence is that on system with interleaved node's ranges like this:
      
       Early memory node ranges
         node   1: [mem 0x0000000000000000-0x000000011fffffff]
         node   2: [mem 0x0000000120000000-0x000000014fffffff]
         node   1: [mem 0x0000000150000000-0x00000001ffffffff]
         node   0: [mem 0x0000000200000000-0x000000048fffffff]
         node   2: [mem 0x0000000490000000-0x00000007ffffffff]
      
      This can be seen on PowerPC LPAR after multiple memory hot-plug and
      hot-unplug operations are done.  At the next reboot the node's memory
      ranges can be interleaved and since the call to link_mem_sections() is
      made in topology_init() while the system is in the SYSTEM_SCHEDULING
      state, the node's id is not checked, and the sections registered to
      multiple nodes:
      
        $ ls -l /sys/devices/system/memory/memory21/node*
        total 0
        lrwxrwxrwx 1 root root     0 Aug 24 05:27 node1 -> ../../node/node1
        lrwxrwxrwx 1 root root     0 Aug 24 05:27 node2 -> ../../node/node2
      
      In that case, the system is able to boot but if later one of theses
      memory blocks is hot-unplugged and then hot-plugged, the sysfs
      inconsistency is detected and this is triggering a BUG_ON():
      
        kernel BUG at /Users/laurent/src/linux-ppc/mm/memory_hotplug.c:1084!
        Oops: Exception in kernel mode, sig: 5 [#1]
        LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
        Modules linked in: rpadlpar_io rpaphp pseries_rng rng_core vmx_crypto gf128mul binfmt_misc ip_tables x_tables xfs libcrc32c crc32c_vpmsum autofs4
        CPU: 8 PID: 10256 Comm: drmgr Not tainted 5.9.0-rc1+ #25
        Call Trace:
          add_memory_resource+0x23c/0x340 (unreliable)
          __add_memory+0x5c/0xf0
          dlpar_add_lmb+0x1b4/0x500
          dlpar_memory+0x1f8/0xb80
          handle_dlpar_errorlog+0xc0/0x190
          dlpar_store+0x198/0x4a0
          kobj_attr_store+0x30/0x50
          sysfs_kf_write+0x64/0x90
          kernfs_fop_write+0x1b0/0x290
          vfs_write+0xe8/0x290
          ksys_write+0xdc/0x130
          system_call_exception+0x160/0x270
          system_call_common+0xf0/0x27c
      
      This patch addresses the root cause by not relying on the system_state
      value to detect whether the call is due to a hot-plug operation.  An
      extra parameter is added to link_mem_sections() detailing whether the
      operation is due to a hot-plug operation.
      
      [1] According to Oscar Salvador, using this qemu command line, ACPI
      memory hotplug operations are raised at SYSTEM_SCHEDULING state:
      
        $QEMU -enable-kvm -machine pc -smp 4,sockets=4,cores=1,threads=1 -cpu host -monitor pty \
              -m size=$MEM,slots=255,maxmem=4294967296k  \
              -numa node,nodeid=0,cpus=0-3,mem=512 -numa node,nodeid=1,mem=512 \
              -object memory-backend-ram,id=memdimm0,size=134217728 -device pc-dimm,node=0,memdev=memdimm0,id=dimm0,slot=0 \
              -object memory-backend-ram,id=memdimm1,size=134217728 -device pc-dimm,node=0,memdev=memdimm1,id=dimm1,slot=1 \
              -object memory-backend-ram,id=memdimm2,size=134217728 -device pc-dimm,node=0,memdev=memdimm2,id=dimm2,slot=2 \
              -object memory-backend-ram,id=memdimm3,size=134217728 -device pc-dimm,node=0,memdev=memdimm3,id=dimm3,slot=3 \
              -object memory-backend-ram,id=memdimm4,size=134217728 -device pc-dimm,node=1,memdev=memdimm4,id=dimm4,slot=4 \
              -object memory-backend-ram,id=memdimm5,size=134217728 -device pc-dimm,node=1,memdev=memdimm5,id=dimm5,slot=5 \
              -object memory-backend-ram,id=memdimm6,size=134217728 -device pc-dimm,node=1,memdev=memdimm6,id=dimm6,slot=6 \
      
      Fixes: 4fbce633 ("mm/memory_hotplug.c: make register_mem_sect_under_node() a callback of walk_memory_range()")
      Signed-off-by: default avatarLaurent Dufour <ldufour@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Nathan Lynch <nathanl@linux.ibm.com>
      Cc: Scott Cheloha <cheloha@linux.ibm.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20200915094143.79181-3-ldufour@linux.ibm.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f85086f9